The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which
means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE
will still attempt to collapse into a Anon THP. This violates the rule we have
agreed upon: never means never.
Another rule for madvise, referring to David's suggestion: “allowing for collapsing
in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
To address this issue, should check whether the Anon THP configuration is disabled
in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is set.
In summary, the current strategy is:
1. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == false
(global THP settings are not enabled), it means mTHP of that orders are prohibited
from being used, then madvise_collapse() is forbidden for that orders.
2. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == true
(global THP settings are enabled), and inherit & orders == 0, it means mTHP of that
orders are still prohibited from being used, thus madvise_collapse() is not allowed
for that orders.
Reviewed-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
include/linux/huge_mm.h | 23 +++++++++++++++++++----
1 file changed, 19 insertions(+), 4 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 2f190c90192d..199ddc9f04a1 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
unsigned long orders)
{
/* Optimization to check if required orders are enabled early. */
- if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
- unsigned long mask = READ_ONCE(huge_anon_orders_always);
+ if (vma_is_anonymous(vma)) {
+ unsigned long always = READ_ONCE(huge_anon_orders_always);
+ unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
+ unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
+ unsigned long mask = always | madvise;
+
+ /*
+ * If the system-wide THP/mTHP sysfs settings are disabled,
+ * then we should never allow hugepages.
+ */
+ if (!(mask & orders) && !(hugepage_global_enabled() && (inherit & orders)))
+ return 0;
+
+ if (!(tva_flags & TVA_ENFORCE_SYSFS))
+ goto skip;
+ mask = always;
if (vm_flags & VM_HUGEPAGE)
- mask |= READ_ONCE(huge_anon_orders_madvise);
+ mask |= madvise;
if (hugepage_global_always() ||
((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
- mask |= READ_ONCE(huge_anon_orders_inherit);
+ mask |= inherit;
orders &= mask;
if (!orders)
return 0;
}
+skip:
return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
}
--
2.43.5
On 05.06.25 10:00, Baolin Wang wrote:
> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which
> means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE
> will still attempt to collapse into a Anon THP. This violates the rule we have
> agreed upon: never means never.
>
> Another rule for madvise, referring to David's suggestion: “allowing for collapsing
> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>
> To address this issue, should check whether the Anon THP configuration is disabled
> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is set.
>
> In summary, the current strategy is:
>
> 1. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == false
> (global THP settings are not enabled), it means mTHP of that orders are prohibited
> from being used, then madvise_collapse() is forbidden for that orders.
>
> 2. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == true
> (global THP settings are enabled), and inherit & orders == 0, it means mTHP of that
> orders are still prohibited from being used, thus madvise_collapse() is not allowed
> for that orders.
>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
> include/linux/huge_mm.h | 23 +++++++++++++++++++----
> 1 file changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 2f190c90192d..199ddc9f04a1 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> unsigned long orders)
> {
> /* Optimization to check if required orders are enabled early. */
> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> + if (vma_is_anonymous(vma)) {
> + unsigned long always = READ_ONCE(huge_anon_orders_always);
> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> + unsigned long mask = always | madvise;
> +
> + /*
> + * If the system-wide THP/mTHP sysfs settings are disabled,
> + * then we should never allow hugepages.
> + */> + if (!(mask & orders) && !(hugepage_global_enabled() &&
(inherit & orders)))
> + return 0;
I'm still trying to digest that. Isn't there a way for us to work with
the orders,
essentially masking off all orders that are forbidden globally. Similar
to below, if !orders, then return 0?
/* Orders disabled directly. */
orders &= ~TODO;
/* Orders disabled by inheriting from the global toggle. */
if (!hugepage_global_enabled())
orders &= ~READ_ONCE(huge_anon_orders_inherit);
TODO is probably a -1ULL and then clearing always/madvise/inherit. Could
add a simple helper for that
huge_anon_orders_never
--
Cheers,
David / dhildenb
On 2025/6/11 20:34, David Hildenbrand wrote:
> On 05.06.25 10:00, Baolin Wang wrote:
>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings,
>> which
>> means that even though we have disabled the Anon THP configuration,
>> MADV_COLLAPSE
>> will still attempt to collapse into a Anon THP. This violates the rule
>> we have
>> agreed upon: never means never.
>>
>> Another rule for madvise, referring to David's suggestion: “allowing
>> for collapsing
>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>
>> To address this issue, should check whether the Anon THP configuration
>> is disabled
>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is
>> set.
>>
>> In summary, the current strategy is:
>>
>> 1. If always & orders == 0, and madvise & orders == 0, and
>> hugepage_global_enabled() == false
>> (global THP settings are not enabled), it means mTHP of that orders
>> are prohibited
>> from being used, then madvise_collapse() is forbidden for that orders.
>>
>> 2. If always & orders == 0, and madvise & orders == 0, and
>> hugepage_global_enabled() == true
>> (global THP settings are enabled), and inherit & orders == 0, it means
>> mTHP of that
>> orders are still prohibited from being used, thus madvise_collapse()
>> is not allowed
>> for that orders.
>>
>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>> include/linux/huge_mm.h | 23 +++++++++++++++++++----
>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 2f190c90192d..199ddc9f04a1 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct
>> vm_area_struct *vma,
>> unsigned long orders)
>> {
>> /* Optimization to check if required orders are enabled early. */
>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>> + if (vma_is_anonymous(vma)) {
>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>> + unsigned long mask = always | madvise;
>> +
>> + /*
>> + * If the system-wide THP/mTHP sysfs settings are disabled,
>> + * then we should never allow hugepages.
> > + */> + if (!(mask & orders) &&
> !(hugepage_global_enabled() && (inherit & orders)))
>> + return 0;
>
> I'm still trying to digest that. Isn't there a way for us to work with
> the orders,
> essentially masking off all orders that are forbidden globally. Similar
> to below, if !orders, then return 0?
> /* Orders disabled directly. */
> orders &= ~TODO;
> /* Orders disabled by inheriting from the global toggle. */
> if (!hugepage_global_enabled())
> orders &= ~READ_ONCE(huge_anon_orders_inherit);
>
> TODO is probably a -1ULL and then clearing always/madvise/inherit. Could
> add a simple helper for that
>
> huge_anon_orders_never
I followed Lorenzo's suggestion to simplify the logic. Does that look
more readable?
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 2f190c90192d..3087ac7631e0 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -265,6 +265,43 @@ unsigned long __thp_vma_allowable_orders(struct
vm_area_struct *vma,
unsigned long tva_flags,
unsigned long orders);
+/* Strictly mask requested anonymous orders according to sysfs settings. */
+static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
+ unsigned long tva_flags, unsigned long
orders)
+{
+ unsigned long always = READ_ONCE(huge_anon_orders_always);
+ unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
+ unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
+ bool inherit_enabled = hugepage_global_enabled();
+ bool has_madvise = vm_flags & VM_HUGEPAGE;
+ unsigned long mask = always | madvise;
+
+ mask = always | madvise;
+ if (inherit_enabled)
+ mask |= inherit;
+
+ /* All set to/inherit NEVER - never means never globally, abort. */
+ if (!(mask & orders))
+ return 0;
+
+ /*
+ * Otherwise, we only enforce sysfs settings if asked. In addition,
+ * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
+ * is not set, we don't bother checking whether the VMA has
VM_HUGEPAGE
+ * set.
+ */
+ if (!(tva_flags & TVA_ENFORCE_SYSFS))
+ return orders;
+
+ mask = always;
+ if (has_madvise)
+ mask |= madvise;
+ if (hugepage_global_always() || (has_madvise && inherit_enabled))
+ mask |= inherit;
+
+ return orders & mask;
+}
+
/**
* thp_vma_allowable_orders - determine hugepage orders that are
allowed for vma
* @vma: the vm area to check
@@ -287,19 +324,8 @@ unsigned long thp_vma_allowable_orders(struct
vm_area_struct *vma,
unsigned long orders)
{
/* Optimization to check if required orders are enabled early. */
- if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
- unsigned long mask = READ_ONCE(huge_anon_orders_always);
-
- if (vm_flags & VM_HUGEPAGE)
- mask |= READ_ONCE(huge_anon_orders_madvise);
- if (hugepage_global_always() ||
- ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
- mask |= READ_ONCE(huge_anon_orders_inherit);
-
- orders &= mask;
- if (!orders)
- return 0;
- }
+ if (vma_is_anonymous(vma))
+ orders = __thp_mask_anon_orders(vm_flags, tva_flags,
orders);
return __thp_vma_allowable_orders(vma, vm_flags, tva_flags,
orders);
}
On 12.06.25 09:51, Baolin Wang wrote:
>
>
> On 2025/6/11 20:34, David Hildenbrand wrote:
>> On 05.06.25 10:00, Baolin Wang wrote:
>>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings,
>>> which
>>> means that even though we have disabled the Anon THP configuration,
>>> MADV_COLLAPSE
>>> will still attempt to collapse into a Anon THP. This violates the rule
>>> we have
>>> agreed upon: never means never.
>>>
>>> Another rule for madvise, referring to David's suggestion: “allowing
>>> for collapsing
>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>>
>>> To address this issue, should check whether the Anon THP configuration
>>> is disabled
>>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is
>>> set.
>>>
>>> In summary, the current strategy is:
>>>
>>> 1. If always & orders == 0, and madvise & orders == 0, and
>>> hugepage_global_enabled() == false
>>> (global THP settings are not enabled), it means mTHP of that orders
>>> are prohibited
>>> from being used, then madvise_collapse() is forbidden for that orders.
>>>
>>> 2. If always & orders == 0, and madvise & orders == 0, and
>>> hugepage_global_enabled() == true
>>> (global THP settings are enabled), and inherit & orders == 0, it means
>>> mTHP of that
>>> orders are still prohibited from being used, thus madvise_collapse()
>>> is not allowed
>>> for that orders.
>>>
>>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> ---
>>> include/linux/huge_mm.h | 23 +++++++++++++++++++----
>>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>> index 2f190c90192d..199ddc9f04a1 100644
>>> --- a/include/linux/huge_mm.h
>>> +++ b/include/linux/huge_mm.h
>>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct
>>> vm_area_struct *vma,
>>> unsigned long orders)
>>> {
>>> /* Optimization to check if required orders are enabled early. */
>>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>>> + if (vma_is_anonymous(vma)) {
>>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>>> + unsigned long mask = always | madvise;
>>> +
>>> + /*
>>> + * If the system-wide THP/mTHP sysfs settings are disabled,
>>> + * then we should never allow hugepages.
>> > + */> + if (!(mask & orders) &&
>> !(hugepage_global_enabled() && (inherit & orders)))
>>> + return 0;
>>
>> I'm still trying to digest that. Isn't there a way for us to work with
>> the orders,
>> essentially masking off all orders that are forbidden globally. Similar
>> to below, if !orders, then return 0?
>> /* Orders disabled directly. */
>> orders &= ~TODO;
>> /* Orders disabled by inheriting from the global toggle. */
>> if (!hugepage_global_enabled())
>> orders &= ~READ_ONCE(huge_anon_orders_inherit);
>>
>> TODO is probably a -1ULL and then clearing always/madvise/inherit. Could
>> add a simple helper for that
>>
>> huge_anon_orders_never
>
> I followed Lorenzo's suggestion to simplify the logic. Does that look
> more readable?
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 2f190c90192d..3087ac7631e0 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -265,6 +265,43 @@ unsigned long __thp_vma_allowable_orders(struct
> vm_area_struct *vma,
> unsigned long tva_flags,
> unsigned long orders);
>
> +/* Strictly mask requested anonymous orders according to sysfs settings. */
> +static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> + unsigned long tva_flags, unsigned long
> orders)
> +{
> + unsigned long always = READ_ONCE(huge_anon_orders_always);
> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> + bool inherit_enabled = hugepage_global_enabled();
> + bool has_madvise = vm_flags & VM_HUGEPAGE;
> + unsigned long mask = always | madvise;
> +
> + mask = always | madvise;
> + if (inherit_enabled)
> + mask |= inherit;
> +
> + /* All set to/inherit NEVER - never means never globally, abort. */
> + if (!(mask & orders))
> + return 0;
Still confusing. I am not sure if we would properly catch when someone
specifies e.g., 2M and 1M, while we only have 2M disabled.
I would rewrite the function to only ever substract from "orders".
...
/* Disallow orders that are set to NEVER directly ... */
order &= (always | madvise | inherit);
/* ... or through inheritance. */
if (inherit_enabled)
orders &= ~inherit;
/*
* Otherwise, we only enforce sysfs settings if asked. In addition,
* if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
* is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
* set.
*/
if (!orders || !(tva_flags & TVA_ENFORCE_SYSFS))
return orders;
> +
> + /*
> + * Otherwise, we only enforce sysfs settings if asked. In addition,
> + * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> + * is not set, we don't bother checking whether the VMA has
> VM_HUGEPAGE
> + * set.
> + */
> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
> + return orders;
> +
> + mask = always;
> + if (has_madvise)
> + mask |= madvise;
> + if (hugepage_global_always() || (has_madvise && inherit_enabled))
> + mask |= inherit;
Similarly, this can maybe become (not 100% sure if I got it right, the
condition above is confusing)
if (!has_madvise) {
/*
* Without VM_HUGEPAGE, only allow orders that are set to
* ALWAYS directly ...
*/
order &= (always | inherit);
/* ... or through inheritance. */
if (!hugepage_global_always())
orders &= ~inherit;
}
--
Cheers,
David / dhildenb
On Thu, Jun 12, 2025 at 10:51:17AM +0200, David Hildenbrand wrote:
> On 12.06.25 09:51, Baolin Wang wrote:
> >
> >
> > On 2025/6/11 20:34, David Hildenbrand wrote:
> > > On 05.06.25 10:00, Baolin Wang wrote:
> > > > The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings,
> > > > which
> > > > means that even though we have disabled the Anon THP configuration,
> > > > MADV_COLLAPSE
> > > > will still attempt to collapse into a Anon THP. This violates the rule
> > > > we have
> > > > agreed upon: never means never.
> > > >
> > > > Another rule for madvise, referring to David's suggestion: “allowing
> > > > for collapsing
> > > > in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
> > > >
> > > > To address this issue, should check whether the Anon THP configuration
> > > > is disabled
> > > > in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is
> > > > set.
> > > >
> > > > In summary, the current strategy is:
> > > >
> > > > 1. If always & orders == 0, and madvise & orders == 0, and
> > > > hugepage_global_enabled() == false
> > > > (global THP settings are not enabled), it means mTHP of that orders
> > > > are prohibited
> > > > from being used, then madvise_collapse() is forbidden for that orders.
> > > >
> > > > 2. If always & orders == 0, and madvise & orders == 0, and
> > > > hugepage_global_enabled() == true
> > > > (global THP settings are enabled), and inherit & orders == 0, it means
> > > > mTHP of that
> > > > orders are still prohibited from being used, thus madvise_collapse()
> > > > is not allowed
> > > > for that orders.
> > > >
> > > > Reviewed-by: Zi Yan <ziy@nvidia.com>
> > > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > > > ---
> > > > include/linux/huge_mm.h | 23 +++++++++++++++++++----
> > > > 1 file changed, 19 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > > index 2f190c90192d..199ddc9f04a1 100644
> > > > --- a/include/linux/huge_mm.h
> > > > +++ b/include/linux/huge_mm.h
> > > > @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct
> > > > vm_area_struct *vma,
> > > > unsigned long orders)
> > > > {
> > > > /* Optimization to check if required orders are enabled early. */
> > > > - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> > > > - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> > > > + if (vma_is_anonymous(vma)) {
> > > > + unsigned long always = READ_ONCE(huge_anon_orders_always);
> > > > + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > > > + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> > > > + unsigned long mask = always | madvise;
> > > > +
> > > > + /*
> > > > + * If the system-wide THP/mTHP sysfs settings are disabled,
> > > > + * then we should never allow hugepages.
> > > > + */> + if (!(mask & orders) &&
> > > !(hugepage_global_enabled() && (inherit & orders)))
> > > > + return 0;
> > >
> > > I'm still trying to digest that. Isn't there a way for us to work with
> > > the orders,
> > > essentially masking off all orders that are forbidden globally. Similar
> > > to below, if !orders, then return 0?
> > > /* Orders disabled directly. */
> > > orders &= ~TODO;
> > > /* Orders disabled by inheriting from the global toggle. */
> > > if (!hugepage_global_enabled())
> > > orders &= ~READ_ONCE(huge_anon_orders_inherit);
> > >
> > > TODO is probably a -1ULL and then clearing always/madvise/inherit. Could
> > > add a simple helper for that
> > >
> > > huge_anon_orders_never
> >
> > I followed Lorenzo's suggestion to simplify the logic. Does that look
> > more readable?
> >
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index 2f190c90192d..3087ac7631e0 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -265,6 +265,43 @@ unsigned long __thp_vma_allowable_orders(struct
> > vm_area_struct *vma,
> > unsigned long tva_flags,
> > unsigned long orders);
> >
> > +/* Strictly mask requested anonymous orders according to sysfs settings. */
> > +static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> > + unsigned long tva_flags, unsigned long
> > orders)
> > +{
> > + unsigned long always = READ_ONCE(huge_anon_orders_always);
> > + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> > + bool inherit_enabled = hugepage_global_enabled();
> > + bool has_madvise = vm_flags & VM_HUGEPAGE;
> > + unsigned long mask = always | madvise;
> > +
> > + mask = always | madvise;
> > + if (inherit_enabled)
> > + mask |= inherit;
> > +
> > + /* All set to/inherit NEVER - never means never globally, abort. */
> > + if (!(mask & orders))
> > + return 0;
>
> Still confusing. I am not sure if we would properly catch when someone
> specifies e.g., 2M and 1M, while we only have 2M disabled.
I did wonder if we should call 'orders' something like 'requested_orders'
or something.
This check is always against the input orders which we might conceivably
want.
For instance in madvise_collapse():
if (!thp_vma_allowable_order(vma, vma->vm_flags, 0, PMD_ORDER))
return -EINVAL;
I don't think, if it's only possible for PMD order collapse, and that is
disabled, but mTHP 64 KB let's say is enabled, it'd be fine for
MADV_COLLAPSE to succeed at the PMD order.
>
>
> I would rewrite the function to only ever substract from "orders".
Hm.
>
> ...
>
> /* Disallow orders that are set to NEVER directly ... */
> order &= (always | madvise | inherit);
^s
I think you mean (always | madvise) here.
>
> /* ... or through inheritance. */
> if (inherit_enabled)
> orders &= ~inherit;
order & (inherit & ~inherit) is going to always be zero :)
So this should be
orders &= inherit.
The problem is, when you come to the next stage where you are done checking
the 'are we in a NEVER situation regardless of TVA_ENFORCE_SYSFS' you've
now corrupted orders.
Because you've included inherit even if !(tva_flags & TVA_ENFORCE_SYSFS).
And there's no way to recover that information.
>
> /*
> * Otherwise, we only enforce sysfs settings if asked. In addition,
> * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> * set.
> */
> if (!orders || !(tva_flags & TVA_ENFORCE_SYSFS))
> return orders;
I don't think this is much delta to what we have now.
I do wonder if we should return mask & orders here, actually, to account
for the fact that, in theory, orders could set > PMD for
!TVA_ENFORCE_SYSFS) (not currently the case).
>
> > +
> > + /*
> > + * Otherwise, we only enforce sysfs settings if asked. In addition,
> > + * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> > + * is not set, we don't bother checking whether the VMA has
> > VM_HUGEPAGE
> > + * set.
> > + */
> > + if (!(tva_flags & TVA_ENFORCE_SYSFS))
> > + return orders;
> > +
> > + mask = always;
> > + if (has_madvise)
> > + mask |= madvise;
> > + if (hugepage_global_always() || (has_madvise && inherit_enabled))
> > + mask |= inherit;
>
> Similarly, this can maybe become (not 100% sure if I got it right, the
> condition above is confusing)
>
> if (!has_madvise) {
> /*
> * Without VM_HUGEPAGE, only allow orders that are set to
> * ALWAYS directly ...
> */
> order &= (always | inherit);
Obviously orders is corrupted at this point so this won't work, but I'm not
sure this is right?
If no madvise, only then obey always/inherit? Hm?
> /* ... or through inheritance. */
> if (!hugepage_global_always())
> orders &= ~inherit;
I'm not sure about this ~inherit again, that means we ignore inherit no?
> }
And we need a branch for madvise too no?
I think all of this is a _clear_ example of what a mess all this is.
I think we need a deeper refactor, but I think my suggested changes make at
least what we have here less horrid to get through.
I think maybe a better refactoring that's in the spirit of this is:
if (has_madvise) {
mask |= madvise;
if (inherit_enabled)
mask |= inherit;
} else if (hugepage_global_always()) {
mask |= inherit;
}
What do you think?
>
> --
> Cheers,
>
> David / dhildenb
>
On 12.06.25 15:07, Lorenzo Stoakes wrote:
> On Thu, Jun 12, 2025 at 10:51:17AM +0200, David Hildenbrand wrote:
>> On 12.06.25 09:51, Baolin Wang wrote:
>>>
>>>
>>> On 2025/6/11 20:34, David Hildenbrand wrote:
>>>> On 05.06.25 10:00, Baolin Wang wrote:
>>>>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings,
>>>>> which
>>>>> means that even though we have disabled the Anon THP configuration,
>>>>> MADV_COLLAPSE
>>>>> will still attempt to collapse into a Anon THP. This violates the rule
>>>>> we have
>>>>> agreed upon: never means never.
>>>>>
>>>>> Another rule for madvise, referring to David's suggestion: “allowing
>>>>> for collapsing
>>>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>>>>
>>>>> To address this issue, should check whether the Anon THP configuration
>>>>> is disabled
>>>>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is
>>>>> set.
>>>>>
>>>>> In summary, the current strategy is:
>>>>>
>>>>> 1. If always & orders == 0, and madvise & orders == 0, and
>>>>> hugepage_global_enabled() == false
>>>>> (global THP settings are not enabled), it means mTHP of that orders
>>>>> are prohibited
>>>>> from being used, then madvise_collapse() is forbidden for that orders.
>>>>>
>>>>> 2. If always & orders == 0, and madvise & orders == 0, and
>>>>> hugepage_global_enabled() == true
>>>>> (global THP settings are enabled), and inherit & orders == 0, it means
>>>>> mTHP of that
>>>>> orders are still prohibited from being used, thus madvise_collapse()
>>>>> is not allowed
>>>>> for that orders.
>>>>>
>>>>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>>> ---
>>>>> include/linux/huge_mm.h | 23 +++++++++++++++++++----
>>>>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>>> index 2f190c90192d..199ddc9f04a1 100644
>>>>> --- a/include/linux/huge_mm.h
>>>>> +++ b/include/linux/huge_mm.h
>>>>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct
>>>>> vm_area_struct *vma,
>>>>> unsigned long orders)
>>>>> {
>>>>> /* Optimization to check if required orders are enabled early. */
>>>>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>>>>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>>>>> + if (vma_is_anonymous(vma)) {
>>>>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>>>>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>>>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>>>>> + unsigned long mask = always | madvise;
>>>>> +
>>>>> + /*
>>>>> + * If the system-wide THP/mTHP sysfs settings are disabled,
>>>>> + * then we should never allow hugepages.
>>>> > + */> + if (!(mask & orders) &&
>>>> !(hugepage_global_enabled() && (inherit & orders)))
>>>>> + return 0;
>>>>
>>>> I'm still trying to digest that. Isn't there a way for us to work with
>>>> the orders,
>>>> essentially masking off all orders that are forbidden globally. Similar
>>>> to below, if !orders, then return 0?
>>>> /* Orders disabled directly. */
>>>> orders &= ~TODO;
>>>> /* Orders disabled by inheriting from the global toggle. */
>>>> if (!hugepage_global_enabled())
>>>> orders &= ~READ_ONCE(huge_anon_orders_inherit);
>>>>
>>>> TODO is probably a -1ULL and then clearing always/madvise/inherit. Could
>>>> add a simple helper for that
>>>>
>>>> huge_anon_orders_never
>>>
>>> I followed Lorenzo's suggestion to simplify the logic. Does that look
>>> more readable?
>>>
>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>> index 2f190c90192d..3087ac7631e0 100644
>>> --- a/include/linux/huge_mm.h
>>> +++ b/include/linux/huge_mm.h
>>> @@ -265,6 +265,43 @@ unsigned long __thp_vma_allowable_orders(struct
>>> vm_area_struct *vma,
>>> unsigned long tva_flags,
>>> unsigned long orders);
>>>
>>> +/* Strictly mask requested anonymous orders according to sysfs settings. */
>>> +static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
>>> + unsigned long tva_flags, unsigned long
>>> orders)
>>> +{
>>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>>> + bool inherit_enabled = hugepage_global_enabled();
>>> + bool has_madvise = vm_flags & VM_HUGEPAGE;
>>> + unsigned long mask = always | madvise;
>>> +
>>> + mask = always | madvise;
>>> + if (inherit_enabled)
>>> + mask |= inherit;
>>> +
>>> + /* All set to/inherit NEVER - never means never globally, abort. */
>>> + if (!(mask & orders))
>>> + return 0;
>>
>> Still confusing. I am not sure if we would properly catch when someone
>> specifies e.g., 2M and 1M, while we only have 2M disabled.
>
> I did wonder if we should call 'orders' something like 'requested_orders'
> or something.
>
> This check is always against the input orders which we might conceivably
> want.
>
> For instance in madvise_collapse():
>
> if (!thp_vma_allowable_order(vma, vma->vm_flags, 0, PMD_ORDER))
> return -EINVAL;
>
> I don't think, if it's only possible for PMD order collapse, and that is
> disabled, but mTHP 64 KB let's say is enabled, it'd be fine for
> MADV_COLLAPSE to succeed at the PMD order.
>
>
>>
>>
>> I would rewrite the function to only ever substract from "orders".
>
> Hm.
>
>>
>> ...
>>
>> /* Disallow orders that are set to NEVER directly ... */
>> order &= (always | madvise | inherit);
> ^s
>
> I think you mean (always | madvise) here.
>
>>
>> /* ... or through inheritance. */
>> if (inherit_enabled)
>> orders &= ~inherit;
>
> order & (inherit & ~inherit) is going to always be zero :)
>
> So this should be
>
> orders &= inherit.
>
> The problem is, when you come to the next stage where you are done checking
> the 'are we in a NEVER situation regardless of TVA_ENFORCE_SYSFS' you've
> now corrupted orders.
>
> Because you've included inherit even if !(tva_flags & TVA_ENFORCE_SYSFS).
>
> And there's no way to recover that information.
>
>>
>> /*
>> * Otherwise, we only enforce sysfs settings if asked. In addition,
>> * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
>> * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
>> * set.
>> */
>> if (!orders || !(tva_flags & TVA_ENFORCE_SYSFS))
>> return orders;
>
> I don't think this is much delta to what we have now.
>
> I do wonder if we should return mask & orders here, actually, to account
> for the fact that, in theory, orders could set > PMD for
> !TVA_ENFORCE_SYSFS) (not currently the case).
>
>>
>>> +
>>> + /*
>>> + * Otherwise, we only enforce sysfs settings if asked. In addition,
>>> + * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
>>> + * is not set, we don't bother checking whether the VMA has
>>> VM_HUGEPAGE
>>> + * set.
>>> + */
>>> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
>>> + return orders;
>>> +
>>> + mask = always;
>>> + if (has_madvise)
>>> + mask |= madvise;
>>> + if (hugepage_global_always() || (has_madvise && inherit_enabled))
>>> + mask |= inherit;
>>
>> Similarly, this can maybe become (not 100% sure if I got it right, the
>> condition above is confusing)
>>
>> if (!has_madvise) {
>> /*
>> * Without VM_HUGEPAGE, only allow orders that are set to
>> * ALWAYS directly ...
>> */
>> order &= (always | inherit);
>
> Obviously orders is corrupted at this point so this won't work, but I'm not
> sure this is right?
>
> If no madvise, only then obey always/inherit? Hm?
>
>
>> /* ... or through inheritance. */
>> if (!hugepage_global_always())
>> orders &= ~inherit;
>
> I'm not sure about this ~inherit again, that means we ignore inherit no?
>
>> }
>
> And we need a branch for madvise too no?
>
> I think all of this is a _clear_ example of what a mess all this is.
>
> I think we need a deeper refactor, but I think my suggested changes make at
> least what we have here less horrid to get through.
>
> I think maybe a better refactoring that's in the spirit of this is:
>
> if (has_madvise) {
> mask |= madvise;
> if (inherit_enabled)
> mask |= inherit;
> } else if (hugepage_global_always()) {
> mask |= inherit;
> }
>
> What do you think?
The masks are just disgusting :(
While you were writing that reply, I just sent out something else. Not
sure if that makes sense.
I'm having a hard time figuring out what we even want here in the
existing code.
--
Cheers,
David / dhildenb
On Thu, Jun 12, 2025 at 03:13:19PM +0200, David Hildenbrand wrote: [snip] > > The masks are just disgusting :( Yeah I was shocked when I saw this implementation it's bloody terrible. It just is. > > While you were writing that reply, I just sent out something else. Not sure > if that makes sense. Replied there :>) > > I'm having a hard time figuring out what we even want here in the existing > code. I'm confused as to why this has been done in this way in general. I feel a churny refactoring coming on... ;) > > -- > Cheers, > > David / dhildenb >
On 2025/6/12 16:51, David Hildenbrand wrote:
> On 12.06.25 09:51, Baolin Wang wrote:
>>
>>
>> On 2025/6/11 20:34, David Hildenbrand wrote:
>>> On 05.06.25 10:00, Baolin Wang wrote:
>>>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings,
>>>> which
>>>> means that even though we have disabled the Anon THP configuration,
>>>> MADV_COLLAPSE
>>>> will still attempt to collapse into a Anon THP. This violates the rule
>>>> we have
>>>> agreed upon: never means never.
>>>>
>>>> Another rule for madvise, referring to David's suggestion: “allowing
>>>> for collapsing
>>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>>>
>>>> To address this issue, should check whether the Anon THP configuration
>>>> is disabled
>>>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is
>>>> set.
>>>>
>>>> In summary, the current strategy is:
>>>>
>>>> 1. If always & orders == 0, and madvise & orders == 0, and
>>>> hugepage_global_enabled() == false
>>>> (global THP settings are not enabled), it means mTHP of that orders
>>>> are prohibited
>>>> from being used, then madvise_collapse() is forbidden for that orders.
>>>>
>>>> 2. If always & orders == 0, and madvise & orders == 0, and
>>>> hugepage_global_enabled() == true
>>>> (global THP settings are enabled), and inherit & orders == 0, it means
>>>> mTHP of that
>>>> orders are still prohibited from being used, thus madvise_collapse()
>>>> is not allowed
>>>> for that orders.
>>>>
>>>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>> ---
>>>> include/linux/huge_mm.h | 23 +++++++++++++++++++----
>>>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>> index 2f190c90192d..199ddc9f04a1 100644
>>>> --- a/include/linux/huge_mm.h
>>>> +++ b/include/linux/huge_mm.h
>>>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct
>>>> vm_area_struct *vma,
>>>> unsigned long orders)
>>>> {
>>>> /* Optimization to check if required orders are enabled
>>>> early. */
>>>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>>>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>>>> + if (vma_is_anonymous(vma)) {
>>>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>>>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>>>> + unsigned long mask = always | madvise;
>>>> +
>>>> + /*
>>>> + * If the system-wide THP/mTHP sysfs settings are disabled,
>>>> + * then we should never allow hugepages.
>>> > + */> + if (!(mask & orders) &&
>>> !(hugepage_global_enabled() && (inherit & orders)))
>>>> + return 0;
>>>
>>> I'm still trying to digest that. Isn't there a way for us to work with
>>> the orders,
>>> essentially masking off all orders that are forbidden globally. Similar
>>> to below, if !orders, then return 0?
>>> /* Orders disabled directly. */
>>> orders &= ~TODO;
>>> /* Orders disabled by inheriting from the global toggle. */
>>> if (!hugepage_global_enabled())
>>> orders &= ~READ_ONCE(huge_anon_orders_inherit);
>>>
>>> TODO is probably a -1ULL and then clearing always/madvise/inherit. Could
>>> add a simple helper for that
>>>
>>> huge_anon_orders_never
>>
>> I followed Lorenzo's suggestion to simplify the logic. Does that look
>> more readable?
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 2f190c90192d..3087ac7631e0 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -265,6 +265,43 @@ unsigned long __thp_vma_allowable_orders(struct
>> vm_area_struct *vma,
>> unsigned long tva_flags,
>> unsigned long orders);
>>
>> +/* Strictly mask requested anonymous orders according to sysfs
>> settings. */
>> +static inline unsigned long __thp_mask_anon_orders(unsigned long
>> vm_flags,
>> + unsigned long tva_flags, unsigned long
>> orders)
>> +{
>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>> + bool inherit_enabled = hugepage_global_enabled();
>> + bool has_madvise = vm_flags & VM_HUGEPAGE;
>> + unsigned long mask = always | madvise;
>> +
>> + mask = always | madvise;
>> + if (inherit_enabled)
>> + mask |= inherit;
>> +
>> + /* All set to/inherit NEVER - never means never globally,
>> abort. */
>> + if (!(mask & orders))
>> + return 0;
>
> Still confusing. I am not sure if we would properly catch when someone
> specifies e.g., 2M and 1M, while we only have 2M disabled.
IIUC, Yes. In your case, we will only allow order 8 (1M mTHP).
> I would rewrite the function to only ever substract from "orders".
>
> ...
>
> /* Disallow orders that are set to NEVER directly ... */
> order &= (always | madvise | inherit);
>
> /* ... or through inheritance. */
> if (inherit_enabled)
> orders &= ~inherit;
Sorry, I didn't get you here.
If orders = THP_ORDERS_ALL_ANON, inherit = 0x200 (order 9), always and
madvise are 0, and inherit_enabled = true. Then orders will be 0 with
your logic. But we should allow order 9, right?
>
> /*
> * Otherwise, we only enforce sysfs settings if asked. In addition,
> * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> * set.
> */
> if (!orders || !(tva_flags & TVA_ENFORCE_SYSFS))
> return orders;
>
>> +
>> + /*
>> + * Otherwise, we only enforce sysfs settings if asked. In
>> addition,
>> + * if the user sets a sysfs mode of madvise and if
>> TVA_ENFORCE_SYSFS
>> + * is not set, we don't bother checking whether the VMA has
>> VM_HUGEPAGE
>> + * set.
>> + */
>> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
>> + return orders;
>> +
>> + mask = always;
>> + if (has_madvise)
>> + mask |= madvise;
>> + if (hugepage_global_always() || (has_madvise && inherit_enabled))
>> + mask |= inherit;
>
> Similarly, this can maybe become (not 100% sure if I got it right, the
> condition above is confusing)
IMO, this is the original logic.
> if (!has_madvise) {
> /*
> * Without VM_HUGEPAGE, only allow orders that are set to
> * ALWAYS directly ...
> */
> order &= (always | inherit);
> /* ... or through inheritance. */
> if (!hugepage_global_always())
> orders &= ~inherit;
Ditto. I can not understand this too.
> }
>
On 12.06.25 14:45, Baolin Wang wrote:
>
>
> On 2025/6/12 16:51, David Hildenbrand wrote:
>> On 12.06.25 09:51, Baolin Wang wrote:
>>>
>>>
>>> On 2025/6/11 20:34, David Hildenbrand wrote:
>>>> On 05.06.25 10:00, Baolin Wang wrote:
>>>>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings,
>>>>> which
>>>>> means that even though we have disabled the Anon THP configuration,
>>>>> MADV_COLLAPSE
>>>>> will still attempt to collapse into a Anon THP. This violates the rule
>>>>> we have
>>>>> agreed upon: never means never.
>>>>>
>>>>> Another rule for madvise, referring to David's suggestion: “allowing
>>>>> for collapsing
>>>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>>>>
>>>>> To address this issue, should check whether the Anon THP configuration
>>>>> is disabled
>>>>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is
>>>>> set.
>>>>>
>>>>> In summary, the current strategy is:
>>>>>
>>>>> 1. If always & orders == 0, and madvise & orders == 0, and
>>>>> hugepage_global_enabled() == false
>>>>> (global THP settings are not enabled), it means mTHP of that orders
>>>>> are prohibited
>>>>> from being used, then madvise_collapse() is forbidden for that orders.
>>>>>
>>>>> 2. If always & orders == 0, and madvise & orders == 0, and
>>>>> hugepage_global_enabled() == true
>>>>> (global THP settings are enabled), and inherit & orders == 0, it means
>>>>> mTHP of that
>>>>> orders are still prohibited from being used, thus madvise_collapse()
>>>>> is not allowed
>>>>> for that orders.
>>>>>
>>>>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>>> ---
>>>>> include/linux/huge_mm.h | 23 +++++++++++++++++++----
>>>>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>>> index 2f190c90192d..199ddc9f04a1 100644
>>>>> --- a/include/linux/huge_mm.h
>>>>> +++ b/include/linux/huge_mm.h
>>>>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct
>>>>> vm_area_struct *vma,
>>>>> unsigned long orders)
>>>>> {
>>>>> /* Optimization to check if required orders are enabled
>>>>> early. */
>>>>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>>>>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>>>>> + if (vma_is_anonymous(vma)) {
>>>>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>>>>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>>>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>>>>> + unsigned long mask = always | madvise;
>>>>> +
>>>>> + /*
>>>>> + * If the system-wide THP/mTHP sysfs settings are disabled,
>>>>> + * then we should never allow hugepages.
>>>> > + */> + if (!(mask & orders) &&
>>>> !(hugepage_global_enabled() && (inherit & orders)))
>>>>> + return 0;
>>>>
>>>> I'm still trying to digest that. Isn't there a way for us to work with
>>>> the orders,
>>>> essentially masking off all orders that are forbidden globally. Similar
>>>> to below, if !orders, then return 0?
>>>> /* Orders disabled directly. */
>>>> orders &= ~TODO;
>>>> /* Orders disabled by inheriting from the global toggle. */
>>>> if (!hugepage_global_enabled())
>>>> orders &= ~READ_ONCE(huge_anon_orders_inherit);
>>>>
>>>> TODO is probably a -1ULL and then clearing always/madvise/inherit. Could
>>>> add a simple helper for that
>>>>
>>>> huge_anon_orders_never
>>>
>>> I followed Lorenzo's suggestion to simplify the logic. Does that look
>>> more readable?
>>>
>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>> index 2f190c90192d..3087ac7631e0 100644
>>> --- a/include/linux/huge_mm.h
>>> +++ b/include/linux/huge_mm.h
>>> @@ -265,6 +265,43 @@ unsigned long __thp_vma_allowable_orders(struct
>>> vm_area_struct *vma,
>>> unsigned long tva_flags,
>>> unsigned long orders);
>>>
>>> +/* Strictly mask requested anonymous orders according to sysfs
>>> settings. */
>>> +static inline unsigned long __thp_mask_anon_orders(unsigned long
>>> vm_flags,
>>> + unsigned long tva_flags, unsigned long
>>> orders)
>>> +{
>>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>>> + bool inherit_enabled = hugepage_global_enabled();
>>> + bool has_madvise = vm_flags & VM_HUGEPAGE;
>>> + unsigned long mask = always | madvise;
>>> +
>>> + mask = always | madvise;
>>> + if (inherit_enabled)
>>> + mask |= inherit;
>>> +
>>> + /* All set to/inherit NEVER - never means never globally,
>>> abort. */
>>> + if (!(mask & orders))
>>> + return 0;
>>
>> Still confusing. I am not sure if we would properly catch when someone
>> specifies e.g., 2M and 1M, while we only have 2M disabled.
>
> IIUC, Yes. In your case, we will only allow order 8 (1M mTHP).
>
>> I would rewrite the function to only ever substract from "orders".
>>
>> ...
>>
>> /* Disallow orders that are set to NEVER directly ... */
>> order &= (always | madvise | inherit);
>>
>> /* ... or through inheritance. */
>> if (inherit_enabled)
>> orders &= ~inherit;
>
> Sorry, I didn't get you here.
>
> If orders = THP_ORDERS_ALL_ANON, inherit = 0x200 (order 9), always and
> madvise are 0, and inherit_enabled = true. Then orders will be 0 with
> your logic. But we should allow order 9, right?
Yeah, all confusing, because the temporary variables don't help.
if (!inherit_enabled)
or simply
if (!hugepage_global_enabled();)
Let me try again below.
>
>>
>> /*
>> * Otherwise, we only enforce sysfs settings if asked. In addition,
>> * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
>> * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
>> * set.
>> */
>> if (!orders || !(tva_flags & TVA_ENFORCE_SYSFS))
>> return orders;
>>
>>> +
>>> + /*
>>> + * Otherwise, we only enforce sysfs settings if asked. In
>>> addition,
>>> + * if the user sets a sysfs mode of madvise and if
>>> TVA_ENFORCE_SYSFS
>>> + * is not set, we don't bother checking whether the VMA has
>>> VM_HUGEPAGE
>>> + * set.
>>> + */
>>> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
>>> + return orders;
>>> +
>>> + mask = always;
>>> + if (has_madvise)
>>> + mask |= madvise;
>>> + if (hugepage_global_always() || (has_madvise && inherit_enabled))
>>> + mask |= inherit;
>>
>> Similarly, this can maybe become (not 100% sure if I got it right, the
>> condition above is confusing)
>
> IMO, this is the original logic.
Yeah, and it's absolutely confusing stuff.
Let me try again by only clearing flags. Maybe this would be clearer?
(and correct? still confused why the latter part is so complicated in existing
code)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 8b8f353cc7b81..66fdfe06e4996 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -265,6 +265,42 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
unsigned long tva_flags,
unsigned long orders);
+/* Strictly mask requested anonymous orders according to sysfs settings. */
+static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
+ unsigned long tva_flags, unsigned long orders)
+{
+ const unsigned long always = READ_ONCE(huge_anon_orders_always);
+ const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
+ const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
+ const unsigned long never = ~(always | madvise | inherit);
+
+ /* Disallow orders that are set to NEVER directly ... */
+ orders &= ~never;
+
+ /* ... or through inheritance (global == NEVER). */
+ if (!hugepage_global_enabled())
+ orders &= ~inherit;
+
+ /*
+ * Otherwise, we only enforce sysfs settings if asked. In addition,
+ * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
+ * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
+ * set.
+ */
+ if (!(tva_flags & TVA_ENFORCE_SYSFS))
+ return orders;
+
+ if (!(vm_flags & VM_HUGEPAGE)) {
+ /* Disallow orders that are set to MADVISE directly ... */
+ orders &= ~madvise;
+
+ /* ... or through inheritance (global == MADVISE). */
+ if (!hugepage_global_always())
+ orders &= ~inherit;
+ }
+ return orders;
+}
+
/**
* thp_vma_allowable_orders - determine hugepage orders that are allowed for vma
* @vma: the vm area to check
@@ -287,16 +323,8 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
unsigned long orders)
{
/* Optimization to check if required orders are enabled early. */
- if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
- unsigned long mask = READ_ONCE(huge_anon_orders_always);
-
- if (vm_flags & VM_HUGEPAGE)
- mask |= READ_ONCE(huge_anon_orders_madvise);
- if (hugepage_global_always() ||
- ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
- mask |= READ_ONCE(huge_anon_orders_inherit);
-
- orders &= mask;
+ if (vma_is_anonymous(vma)) {
+ orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
if (!orders)
return 0;
}
--
Cheers,
David / dhildenb
On Thu, Jun 12, 2025 at 03:05:22PM +0200, David Hildenbrand wrote:
> On 12.06.25 14:45, Baolin Wang wrote:
> >
> >
> > On 2025/6/12 16:51, David Hildenbrand wrote:
> > > On 12.06.25 09:51, Baolin Wang wrote:
> > > >
> > > >
> > > > On 2025/6/11 20:34, David Hildenbrand wrote:
> > > > > On 05.06.25 10:00, Baolin Wang wrote:
> > > > > > The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings,
> > > > > > which
> > > > > > means that even though we have disabled the Anon THP configuration,
> > > > > > MADV_COLLAPSE
> > > > > > will still attempt to collapse into a Anon THP. This violates the rule
> > > > > > we have
> > > > > > agreed upon: never means never.
> > > > > >
> > > > > > Another rule for madvise, referring to David's suggestion: “allowing
> > > > > > for collapsing
> > > > > > in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
> > > > > >
> > > > > > To address this issue, should check whether the Anon THP configuration
> > > > > > is disabled
> > > > > > in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is
> > > > > > set.
> > > > > >
> > > > > > In summary, the current strategy is:
> > > > > >
> > > > > > 1. If always & orders == 0, and madvise & orders == 0, and
> > > > > > hugepage_global_enabled() == false
> > > > > > (global THP settings are not enabled), it means mTHP of that orders
> > > > > > are prohibited
> > > > > > from being used, then madvise_collapse() is forbidden for that orders.
> > > > > >
> > > > > > 2. If always & orders == 0, and madvise & orders == 0, and
> > > > > > hugepage_global_enabled() == true
> > > > > > (global THP settings are enabled), and inherit & orders == 0, it means
> > > > > > mTHP of that
> > > > > > orders are still prohibited from being used, thus madvise_collapse()
> > > > > > is not allowed
> > > > > > for that orders.
> > > > > >
> > > > > > Reviewed-by: Zi Yan <ziy@nvidia.com>
> > > > > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > > > > > ---
> > > > > > include/linux/huge_mm.h | 23 +++++++++++++++++++----
> > > > > > 1 file changed, 19 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > > > > index 2f190c90192d..199ddc9f04a1 100644
> > > > > > --- a/include/linux/huge_mm.h
> > > > > > +++ b/include/linux/huge_mm.h
> > > > > > @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct
> > > > > > vm_area_struct *vma,
> > > > > > unsigned long orders)
> > > > > > {
> > > > > > /* Optimization to check if required orders are enabled
> > > > > > early. */
> > > > > > - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> > > > > > - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> > > > > > + if (vma_is_anonymous(vma)) {
> > > > > > + unsigned long always = READ_ONCE(huge_anon_orders_always);
> > > > > > + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > > > > > + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> > > > > > + unsigned long mask = always | madvise;
> > > > > > +
> > > > > > + /*
> > > > > > + * If the system-wide THP/mTHP sysfs settings are disabled,
> > > > > > + * then we should never allow hugepages.
> > > > > > + */> + if (!(mask & orders) &&
> > > > > !(hugepage_global_enabled() && (inherit & orders)))
> > > > > > + return 0;
> > > > >
> > > > > I'm still trying to digest that. Isn't there a way for us to work with
> > > > > the orders,
> > > > > essentially masking off all orders that are forbidden globally. Similar
> > > > > to below, if !orders, then return 0?
> > > > > /* Orders disabled directly. */
> > > > > orders &= ~TODO;
> > > > > /* Orders disabled by inheriting from the global toggle. */
> > > > > if (!hugepage_global_enabled())
> > > > > orders &= ~READ_ONCE(huge_anon_orders_inherit);
> > > > >
> > > > > TODO is probably a -1ULL and then clearing always/madvise/inherit. Could
> > > > > add a simple helper for that
> > > > >
> > > > > huge_anon_orders_never
> > > >
> > > > I followed Lorenzo's suggestion to simplify the logic. Does that look
> > > > more readable?
> > > >
> > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > > index 2f190c90192d..3087ac7631e0 100644
> > > > --- a/include/linux/huge_mm.h
> > > > +++ b/include/linux/huge_mm.h
> > > > @@ -265,6 +265,43 @@ unsigned long __thp_vma_allowable_orders(struct
> > > > vm_area_struct *vma,
> > > > unsigned long tva_flags,
> > > > unsigned long orders);
> > > >
> > > > +/* Strictly mask requested anonymous orders according to sysfs
> > > > settings. */
> > > > +static inline unsigned long __thp_mask_anon_orders(unsigned long
> > > > vm_flags,
> > > > + unsigned long tva_flags, unsigned long
> > > > orders)
> > > > +{
> > > > + unsigned long always = READ_ONCE(huge_anon_orders_always);
> > > > + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > > > + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> > > > + bool inherit_enabled = hugepage_global_enabled();
> > > > + bool has_madvise = vm_flags & VM_HUGEPAGE;
> > > > + unsigned long mask = always | madvise;
> > > > +
> > > > + mask = always | madvise;
> > > > + if (inherit_enabled)
> > > > + mask |= inherit;
> > > > +
> > > > + /* All set to/inherit NEVER - never means never globally,
> > > > abort. */
> > > > + if (!(mask & orders))
> > > > + return 0;
> > >
> > > Still confusing. I am not sure if we would properly catch when someone
> > > specifies e.g., 2M and 1M, while we only have 2M disabled.
> >
> > IIUC, Yes. In your case, we will only allow order 8 (1M mTHP).
> >
> > > I would rewrite the function to only ever substract from "orders".
> > >
> > > ...
> > >
> > > /* Disallow orders that are set to NEVER directly ... */
> > > order &= (always | madvise | inherit);
> > >
> > > /* ... or through inheritance. */
> > > if (inherit_enabled)
> > > orders &= ~inherit;
> >
> > Sorry, I didn't get you here.
> >
> > If orders = THP_ORDERS_ALL_ANON, inherit = 0x200 (order 9), always and
> > madvise are 0, and inherit_enabled = true. Then orders will be 0 with
> > your logic. But we should allow order 9, right?
>
> Yeah, all confusing, because the temporary variables don't help.
>
> if (!inherit_enabled)
>
> or simply
>
> if (!hugepage_global_enabled();)
>
> Let me try again below.
>
> >
> > >
> > > /*
> > > * Otherwise, we only enforce sysfs settings if asked. In addition,
> > > * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> > > * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> > > * set.
> > > */
> > > if (!orders || !(tva_flags & TVA_ENFORCE_SYSFS))
> > > return orders;
> > >
> > > > +
> > > > + /*
> > > > + * Otherwise, we only enforce sysfs settings if asked. In
> > > > addition,
> > > > + * if the user sets a sysfs mode of madvise and if
> > > > TVA_ENFORCE_SYSFS
> > > > + * is not set, we don't bother checking whether the VMA has
> > > > VM_HUGEPAGE
> > > > + * set.
> > > > + */
> > > > + if (!(tva_flags & TVA_ENFORCE_SYSFS))
> > > > + return orders;
> > > > +
> > > > + mask = always;
> > > > + if (has_madvise)
> > > > + mask |= madvise;
> > > > + if (hugepage_global_always() || (has_madvise && inherit_enabled))
> > > > + mask |= inherit;
> > >
> > > Similarly, this can maybe become (not 100% sure if I got it right, the
> > > condition above is confusing)
> >
> > IMO, this is the original logic.
>
> Yeah, and it's absolutely confusing stuff.
>
> Let me try again by only clearing flags. Maybe this would be clearer?
> (and correct? still confused why the latter part is so complicated in existing
> code)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 8b8f353cc7b81..66fdfe06e4996 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -265,6 +265,42 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
> unsigned long tva_flags,
> unsigned long orders);
> +/* Strictly mask requested anonymous orders according to sysfs settings. */
> +static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> + unsigned long tva_flags, unsigned long orders)
> +{
> + const unsigned long always = READ_ONCE(huge_anon_orders_always);
> + const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> + const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> + const unsigned long never = ~(always | madvise | inherit);
> +
> + /* Disallow orders that are set to NEVER directly ... */
> + orders &= ~never;
> +
> + /* ... or through inheritance (global == NEVER). */
> + if (!hugepage_global_enabled())
> + orders &= ~inherit;
> +
> + /*
> + * Otherwise, we only enforce sysfs settings if asked. In addition,
> + * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> + * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> + * set.
> + */
> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
> + return orders;
This implicitly does a & mask as per suggested previous version, which I think
is correct but worth pointing out.
> +
> + if (!(vm_flags & VM_HUGEPAGE)) {
Don't love this sort of mega negation here. I read this as _does_ have huge
page...
> + /* Disallow orders that are set to MADVISE directly ... */
> + orders &= ~madvise;
> +
> + /* ... or through inheritance (global == MADVISE). */
> + if (!hugepage_global_always())
> + orders &= ~inherit;
I hate this implicit 'not hugepage global always so this means either never or
madvise and since we cleared orders for never this means madvise' mental
gymnastics required here.
Yeah I feel this is a bridge too far, we're getting into double negation and I
think that's more confusiong.
> + }
I propose a compromise as I rather like your 'exclude never' negation bit.
So:
/* Strictly mask requested anonymous orders according to sysfs settings. */
static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
unsigned long tva_flags, unsigned long orders)
{
const unsigned long always = READ_ONCE(huge_anon_orders_always);
const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
const unsigned long never = ~(always | madvise | inherit);
const bool inherit_enabled = hugepage_global_enabled();
/* Disallow orders that are set to NEVER directly ... */
orders &= ~never;
/* ... or through inheritance (global == NEVER). */
if (!inherit_enabled)
orders &= ~inherit;
/*
* Otherwise, we only enforce sysfs settings if asked. In addition,
* if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
* is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
* set.
*/
if (!(tva_flags & TVA_ENFORCE_SYSFS))
return orders;
if (hugepage_global_always())
return orders & (always | inherit);
/* We already excluded never inherit above. */
if (vm_flags & VM_HUGEPAGE)
return orders & (always | madvise | inherit);
return orders & always;
}
What do you think?
> + return orders;
> +}
> +
> /**
> * thp_vma_allowable_orders - determine hugepage orders that are allowed for vma
> * @vma: the vm area to check
> @@ -287,16 +323,8 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> unsigned long orders)
> {
> /* Optimization to check if required orders are enabled early. */
> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> -
> - if (vm_flags & VM_HUGEPAGE)
> - mask |= READ_ONCE(huge_anon_orders_madvise);
> - if (hugepage_global_always() ||
> - ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
> - mask |= READ_ONCE(huge_anon_orders_inherit);
> -
> - orders &= mask;
> + if (vma_is_anonymous(vma)) {
> + orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
> if (!orders)
> return 0;
I pointed out to Baolin that __thp_vma_allowable_orders() handles the orders ==
0 case almost immediately so there's no need to do this, it just makes the code
noisier.
I mean we _could_ keep it but I think it's better not to for cleanliness, what
do you think?
> }
>
>
> --
> Cheers,
>
> David / dhildenb
>
>> @@ -265,6 +265,42 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
>> unsigned long tva_flags,
>> unsigned long orders);
>> +/* Strictly mask requested anonymous orders according to sysfs settings. */
>> +static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
>> + unsigned long tva_flags, unsigned long orders)
>> +{
>> + const unsigned long always = READ_ONCE(huge_anon_orders_always);
>> + const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>> + const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>> + const unsigned long never = ~(always | madvise | inherit);
>> +
>> + /* Disallow orders that are set to NEVER directly ... */
>> + orders &= ~never;
>> +
>> + /* ... or through inheritance (global == NEVER). */
>> + if (!hugepage_global_enabled())
>> + orders &= ~inherit;
>> +
>> + /*
>> + * Otherwise, we only enforce sysfs settings if asked. In addition,
>> + * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
>> + * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
>> + * set.
>> + */
>> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
>> + return orders;
>
> This implicitly does a & mask as per suggested previous version, which I think
> is correct but worth pointing out.
Yes.
>
>> +
>> + if (!(vm_flags & VM_HUGEPAGE)) {
>
> Don't love this sort of mega negation here. I read this as _does_ have huge
> page...
Well, it's very common to do that, but not objecting to something that
is clearer ;)
I assume you spotted the
if (!(tva_flags & TVA_ENFORCE_SYSFS))
:P
if (vm_flags & VM_HUGEPAGE)
return orders;
Would have been easier.
>
>> + /* Disallow orders that are set to MADVISE directly ... */
>> + orders &= ~madvise;
>> +
>> + /* ... or through inheritance (global == MADVISE). */
>> + if (!hugepage_global_always())
>> + orders &= ~inherit;
>
> I hate this implicit 'not hugepage global always so this means either never or
> madvise and since we cleared orders for never this means madvise' mental
> gymnastics required here.
>
> Yeah I feel this is a bridge too far, we're getting into double negation and I
> think that's more confusiong.
Same here ... I think we should just have hugepage_global_madvise(). :)
>
>
>> + }
>
> I propose a compromise as I rather like your 'exclude never' negation bit.
>
> So:
>
> /* Strictly mask requested anonymous orders according to sysfs settings. */
> static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> unsigned long tva_flags, unsigned long orders)
> {
> const unsigned long always = READ_ONCE(huge_anon_orders_always);
> const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
> const unsigned long never = ~(always | madvise | inherit);
> const bool inherit_enabled = hugepage_global_enabled();
Can we just have hugepage_global_never/disabled() to use instead?
>
> /* Disallow orders that are set to NEVER directly ... */
> orders &= ~never;
>
> /* ... or through inheritance (global == NEVER). */
> if (!inherit_enabled)
> orders &= ~inherit;>
> /*
> * Otherwise, we only enforce sysfs settings if asked. In addition,
> * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> * set.
> */
> if (!(tva_flags & TVA_ENFORCE_SYSFS))
> return orders;
>
> if (hugepage_global_always())
> return orders & (always | inherit);
>
> /* We already excluded never inherit above. */
> if (vm_flags & VM_HUGEPAGE)
> return orders & (always | madvise | inherit);
>
> return orders & always;
> }
>
> What do you think?
With the fixup, it would work for me. No magical "mask" variables :D
> >
>> + return orders;
>> +}
>> +
>> /**
>> * thp_vma_allowable_orders - determine hugepage orders that are allowed for vma
>> * @vma: the vm area to check
>> @@ -287,16 +323,8 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
>> unsigned long orders)
>> {
>> /* Optimization to check if required orders are enabled early. */
>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>> -
>> - if (vm_flags & VM_HUGEPAGE)
>> - mask |= READ_ONCE(huge_anon_orders_madvise);
>> - if (hugepage_global_always() ||
>> - ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
>> - mask |= READ_ONCE(huge_anon_orders_inherit);
>> -
>> - orders &= mask;
>> + if (vma_is_anonymous(vma)) {
>> + orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
>> if (!orders)
>> return 0;
>
> I pointed out to Baolin that __thp_vma_allowable_orders() handles the orders ==
> 0 case almost immediately so there's no need to do this, it just makes the code
> noisier.
The reason we added it in the first place was to not do the (expensive)
function call.
--
Cheers,
David / dhildenb
On Thu, Jun 12, 2025 at 04:09:27PM +0200, David Hildenbrand wrote:
>
>
> > > @@ -265,6 +265,42 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
> > > unsigned long tva_flags,
> > > unsigned long orders);
> > > +/* Strictly mask requested anonymous orders according to sysfs settings. */
> > > +static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> > > + unsigned long tva_flags, unsigned long orders)
> > > +{
> > > + const unsigned long always = READ_ONCE(huge_anon_orders_always);
> > > + const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > > + const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> > > + const unsigned long never = ~(always | madvise | inherit);
> > > +
> > > + /* Disallow orders that are set to NEVER directly ... */
> > > + orders &= ~never;
> > > +
> > > + /* ... or through inheritance (global == NEVER). */
> > > + if (!hugepage_global_enabled())
> > > + orders &= ~inherit;
> > > +
> > > + /*
> > > + * Otherwise, we only enforce sysfs settings if asked. In addition,
> > > + * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> > > + * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> > > + * set.
> > > + */
> > > + if (!(tva_flags & TVA_ENFORCE_SYSFS))
> > > + return orders;
> >
> > This implicitly does a & mask as per suggested previous version, which I think
> > is correct but worth pointing out.
>
> Yes.
>
> >
> > > +
> > > + if (!(vm_flags & VM_HUGEPAGE)) {
> >
> > Don't love this sort of mega negation here. I read this as _does_ have huge
> > page...
>
> Well, it's very common to do that, but not objecting to something that is
> clearer ;)
>
> I assume you spotted the
>
> if (!(tva_flags & TVA_ENFORCE_SYSFS))
>
> :P
Lol yeah I know I know, I just think I guess in this case because you're
negating elsewhere it makes it harder...
>
> if (vm_flags & VM_HUGEPAGE)
> return orders;
>
>
> Would have been easier.
>
> >
> > > + /* Disallow orders that are set to MADVISE directly ... */
> > > + orders &= ~madvise;
> > > +
> > > + /* ... or through inheritance (global == MADVISE). */
> > > + if (!hugepage_global_always())
> > > + orders &= ~inherit;
> >
> > I hate this implicit 'not hugepage global always so this means either never or
> > madvise and since we cleared orders for never this means madvise' mental
> > gymnastics required here.
> >
> > Yeah I feel this is a bridge too far, we're getting into double negation and I
> > think that's more confusiong.
>
>
> Same here ... I think we should just have hugepage_global_madvise(). :)
Ideally in future not have these stupid globals all over the place and rework
this whole damn thing...
>
> >
> >
> > > + }
> >
> > I propose a compromise as I rather like your 'exclude never' negation bit.
> >
> > So:
> >
> > /* Strictly mask requested anonymous orders according to sysfs settings. */
> > static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> > unsigned long tva_flags, unsigned long orders)
> > {
> > const unsigned long always = READ_ONCE(huge_anon_orders_always);
> > const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
> > const unsigned long never = ~(always | madvise | inherit);
> > const bool inherit_enabled = hugepage_global_enabled();
>
> Can we just have hugepage_global_never/disabled() to use instead?
This would be nice!
Could be a follow up... though again would be nice to somehow do away with all
this crap altogether.
>
> >
> > /* Disallow orders that are set to NEVER directly ... */
> > orders &= ~never;
> >
> > /* ... or through inheritance (global == NEVER). */
> > if (!inherit_enabled)
> > orders &= ~inherit;>
> > /*
> > * Otherwise, we only enforce sysfs settings if asked. In addition,
> > * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> > * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> > * set.
> > */
> > if (!(tva_flags & TVA_ENFORCE_SYSFS))
> > return orders;
> >
> > if (hugepage_global_always())
> > return orders & (always | inherit);
> >
> > /* We already excluded never inherit above. */
> > if (vm_flags & VM_HUGEPAGE)
> > return orders & (always | madvise | inherit);
> >
> > return orders & always;
> > }
> >
> > What do you think?
>
> With the fixup, it would work for me. No magical "mask" variables :D
Thanks!
>
> > >
> > > + return orders;
> > > +}
> > > +
> > > /**
> > > * thp_vma_allowable_orders - determine hugepage orders that are allowed for vma
> > > * @vma: the vm area to check
> > > @@ -287,16 +323,8 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> > > unsigned long orders)
> > > {
> > > /* Optimization to check if required orders are enabled early. */
> > > - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> > > - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> > > -
> > > - if (vm_flags & VM_HUGEPAGE)
> > > - mask |= READ_ONCE(huge_anon_orders_madvise);
> > > - if (hugepage_global_always() ||
> > > - ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
> > > - mask |= READ_ONCE(huge_anon_orders_inherit);
> > > -
> > > - orders &= mask;
> > > + if (vma_is_anonymous(vma)) {
> > > + orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
> > > if (!orders)
> > > return 0;
> >
> > I pointed out to Baolin that __thp_vma_allowable_orders() handles the orders ==
> > 0 case almost immediately so there's no need to do this, it just makes the code
> > noisier.
>
> The reason we added it in the first place was to not do the (expensive)
> function call.
Ack point taken!
>
>
> --
> Cheers,
>
> David / dhildenb
>
For convenience, I enclose the fixed version + tweaked the inherit local bool to
be inherit_never to be clearer:
/* Strictly mask requested anonymous orders according to sysfs settings. */
static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
unsigned long tva_flags, unsigned long orders)
{
const unsigned long always = READ_ONCE(huge_anon_orders_always);
const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
const unsigned long never = ~(always | madvise | inherit);
const bool inherit_never = !hugepage_global_enabled();
/* Disallow orders that are set to NEVER directly ... */
orders &= ~never;
/* ... or through inheritance (global == NEVER). */
if (inherit_never)
orders &= ~inherit;
/*
* Otherwise, we only enforce sysfs settings if asked. In addition,
* if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
* is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
* set.
*/
if (!(tva_flags & TVA_ENFORCE_SYSFS))
return orders;
/* We already excluded never inherit above. */
if (vm_flags & VM_HUGEPAGE)
return orders & (always | madvise | inherit);
if (hugepage_global_always())
return orders & (always | inherit);
return orders & always;
}
On 2025/6/12 22:49, Lorenzo Stoakes wrote:
> On Thu, Jun 12, 2025 at 04:09:27PM +0200, David Hildenbrand wrote:
>>
>>
>>>> @@ -265,6 +265,42 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
>>>> unsigned long tva_flags,
>>>> unsigned long orders);
>>>> +/* Strictly mask requested anonymous orders according to sysfs settings. */
>>>> +static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
>>>> + unsigned long tva_flags, unsigned long orders)
>>>> +{
>>>> + const unsigned long always = READ_ONCE(huge_anon_orders_always);
>>>> + const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>>> + const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>>>> + const unsigned long never = ~(always | madvise | inherit);
>>>> +
>>>> + /* Disallow orders that are set to NEVER directly ... */
>>>> + orders &= ~never;
>>>> +
>>>> + /* ... or through inheritance (global == NEVER). */
>>>> + if (!hugepage_global_enabled())
>>>> + orders &= ~inherit;
>>>> +
>>>> + /*
>>>> + * Otherwise, we only enforce sysfs settings if asked. In addition,
>>>> + * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
>>>> + * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
>>>> + * set.
>>>> + */
>>>> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
>>>> + return orders;
>>>
>>> This implicitly does a & mask as per suggested previous version, which I think
>>> is correct but worth pointing out.
>>
>> Yes.
>>
>>>
>>>> +
>>>> + if (!(vm_flags & VM_HUGEPAGE)) {
>>>
>>> Don't love this sort of mega negation here. I read this as _does_ have huge
>>> page...
>>
>> Well, it's very common to do that, but not objecting to something that is
>> clearer ;)
>>
>> I assume you spotted the
>>
>> if (!(tva_flags & TVA_ENFORCE_SYSFS))
>>
>> :P
>
> Lol yeah I know I know, I just think I guess in this case because you're
> negating elsewhere it makes it harder...
>
>>
>> if (vm_flags & VM_HUGEPAGE)
>> return orders;
>>
>>
>> Would have been easier.
>>
>>>
>>>> + /* Disallow orders that are set to MADVISE directly ... */
>>>> + orders &= ~madvise;
>>>> +
>>>> + /* ... or through inheritance (global == MADVISE). */
>>>> + if (!hugepage_global_always())
>>>> + orders &= ~inherit;
>>>
>>> I hate this implicit 'not hugepage global always so this means either never or
>>> madvise and since we cleared orders for never this means madvise' mental
>>> gymnastics required here.
>>>
>>> Yeah I feel this is a bridge too far, we're getting into double negation and I
>>> think that's more confusiong.
>>
>>
>> Same here ... I think we should just have hugepage_global_madvise(). :)
>
> Ideally in future not have these stupid globals all over the place and rework
> this whole damn thing...
>
>>
>>>
>>>
>>>> + }
>>>
>>> I propose a compromise as I rather like your 'exclude never' negation bit.
>>>
>>> So:
>>>
>>> /* Strictly mask requested anonymous orders according to sysfs settings. */
>>> static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
>>> unsigned long tva_flags, unsigned long orders)
>>> {
>>> const unsigned long always = READ_ONCE(huge_anon_orders_always);
>>> const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>> const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
>>> const unsigned long never = ~(always | madvise | inherit);
>>> const bool inherit_enabled = hugepage_global_enabled();
>>
>> Can we just have hugepage_global_never/disabled() to use instead?
>
> This would be nice!
>
> Could be a follow up... though again would be nice to somehow do away with all
> this crap altogether.
>
>>
>>>
>>> /* Disallow orders that are set to NEVER directly ... */
>>> orders &= ~never;
>>>
>>> /* ... or through inheritance (global == NEVER). */
>>> if (!inherit_enabled)
>>> orders &= ~inherit;>
>>> /*
>>> * Otherwise, we only enforce sysfs settings if asked. In addition,
>>> * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
>>> * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
>>> * set.
>>> */
>>> if (!(tva_flags & TVA_ENFORCE_SYSFS))
>>> return orders;
>>>
>>> if (hugepage_global_always())
>>> return orders & (always | inherit);
>>>
>>> /* We already excluded never inherit above. */
>>> if (vm_flags & VM_HUGEPAGE)
>>> return orders & (always | madvise | inherit);
>>>
>>> return orders & always;
>>> }
>>>
>>> What do you think?
>>
>> With the fixup, it would work for me. No magical "mask" variables :D
>
> Thanks!
Fair enough. You both prefer the 'exclude never' logic, and I will
follow that logic.
>>>> + return orders;
>>>> +}
>>>> +
>>>> /**
>>>> * thp_vma_allowable_orders - determine hugepage orders that are allowed for vma
>>>> * @vma: the vm area to check
>>>> @@ -287,16 +323,8 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
>>>> unsigned long orders)
>>>> {
>>>> /* Optimization to check if required orders are enabled early. */
>>>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>>>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>>>> -
>>>> - if (vm_flags & VM_HUGEPAGE)
>>>> - mask |= READ_ONCE(huge_anon_orders_madvise);
>>>> - if (hugepage_global_always() ||
>>>> - ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
>>>> - mask |= READ_ONCE(huge_anon_orders_inherit);
>>>> -
>>>> - orders &= mask;
>>>> + if (vma_is_anonymous(vma)) {
>>>> + orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
>>>> if (!orders)
>>>> return 0;
>>>
>>> I pointed out to Baolin that __thp_vma_allowable_orders() handles the orders ==
>>> 0 case almost immediately so there's no need to do this, it just makes the code
>>> noisier.
>>
>> The reason we added it in the first place was to not do the (expensive)
>> function call.
>
> Ack point taken!
>
>>
>>
>> --
>> Cheers,
>>
>> David / dhildenb
>>
>
> For convenience, I enclose the fixed version + tweaked the inherit local bool to
> be inherit_never to be clearer:
>
> /* Strictly mask requested anonymous orders according to sysfs settings. */
> static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> unsigned long tva_flags, unsigned long orders)
> {
> const unsigned long always = READ_ONCE(huge_anon_orders_always);
> const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
> const unsigned long never = ~(always | madvise | inherit);
> const bool inherit_never = !hugepage_global_enabled();
>
> /* Disallow orders that are set to NEVER directly ... */
> orders &= ~never;
>
> /* ... or through inheritance (global == NEVER). */
> if (inherit_never)
> orders &= ~inherit;
>
> /*
> * Otherwise, we only enforce sysfs settings if asked. In addition,
> * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> * set.
> */
> if (!(tva_flags & TVA_ENFORCE_SYSFS))
> return orders;
>
> /* We already excluded never inherit above. */
> if (vm_flags & VM_HUGEPAGE)
> return orders & (always | madvise | inherit);
>
> if (hugepage_global_always())
> return orders & (always | inherit);
>
> return orders & always;
> }
Thanks Lorenzo. Let me follow this logic and do some testing.
On Fri, Jun 13, 2025 at 10:07:20AM +0800, Baolin Wang wrote:
>
>
> On 2025/6/12 22:49, Lorenzo Stoakes wrote:
> > On Thu, Jun 12, 2025 at 04:09:27PM +0200, David Hildenbrand wrote:
> > >
> > >
> > > > > @@ -265,6 +265,42 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
> > > > > unsigned long tva_flags,
> > > > > unsigned long orders);
> > > > > +/* Strictly mask requested anonymous orders according to sysfs settings. */
> > > > > +static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> > > > > + unsigned long tva_flags, unsigned long orders)
> > > > > +{
> > > > > + const unsigned long always = READ_ONCE(huge_anon_orders_always);
> > > > > + const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > > > > + const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> > > > > + const unsigned long never = ~(always | madvise | inherit);
> > > > > +
> > > > > + /* Disallow orders that are set to NEVER directly ... */
> > > > > + orders &= ~never;
> > > > > +
> > > > > + /* ... or through inheritance (global == NEVER). */
> > > > > + if (!hugepage_global_enabled())
> > > > > + orders &= ~inherit;
> > > > > +
> > > > > + /*
> > > > > + * Otherwise, we only enforce sysfs settings if asked. In addition,
> > > > > + * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> > > > > + * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> > > > > + * set.
> > > > > + */
> > > > > + if (!(tva_flags & TVA_ENFORCE_SYSFS))
> > > > > + return orders;
> > > >
> > > > This implicitly does a & mask as per suggested previous version, which I think
> > > > is correct but worth pointing out.
> > >
> > > Yes.
> > >
> > > >
> > > > > +
> > > > > + if (!(vm_flags & VM_HUGEPAGE)) {
> > > >
> > > > Don't love this sort of mega negation here. I read this as _does_ have huge
> > > > page...
> > >
> > > Well, it's very common to do that, but not objecting to something that is
> > > clearer ;)
> > >
> > > I assume you spotted the
> > >
> > > if (!(tva_flags & TVA_ENFORCE_SYSFS))
> > >
> > > :P
> >
> > Lol yeah I know I know, I just think I guess in this case because you're
> > negating elsewhere it makes it harder...
> >
> > >
> > > if (vm_flags & VM_HUGEPAGE)
> > > return orders;
> > >
> > >
> > > Would have been easier.
> > >
> > > >
> > > > > + /* Disallow orders that are set to MADVISE directly ... */
> > > > > + orders &= ~madvise;
> > > > > +
> > > > > + /* ... or through inheritance (global == MADVISE). */
> > > > > + if (!hugepage_global_always())
> > > > > + orders &= ~inherit;
> > > >
> > > > I hate this implicit 'not hugepage global always so this means either never or
> > > > madvise and since we cleared orders for never this means madvise' mental
> > > > gymnastics required here.
> > > >
> > > > Yeah I feel this is a bridge too far, we're getting into double negation and I
> > > > think that's more confusiong.
> > >
> > >
> > > Same here ... I think we should just have hugepage_global_madvise(). :)
> >
> > Ideally in future not have these stupid globals all over the place and rework
> > this whole damn thing...
> >
> > >
> > > >
> > > >
> > > > > + }
> > > >
> > > > I propose a compromise as I rather like your 'exclude never' negation bit.
> > > >
> > > > So:
> > > >
> > > > /* Strictly mask requested anonymous orders according to sysfs settings. */
> > > > static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> > > > unsigned long tva_flags, unsigned long orders)
> > > > {
> > > > const unsigned long always = READ_ONCE(huge_anon_orders_always);
> > > > const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > > > const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
> > > > const unsigned long never = ~(always | madvise | inherit);
> > > > const bool inherit_enabled = hugepage_global_enabled();
> > >
> > > Can we just have hugepage_global_never/disabled() to use instead?
> >
> > This would be nice!
> >
> > Could be a follow up... though again would be nice to somehow do away with all
> > this crap altogether.
> >
> > >
> > > >
> > > > /* Disallow orders that are set to NEVER directly ... */
> > > > orders &= ~never;
> > > >
> > > > /* ... or through inheritance (global == NEVER). */
> > > > if (!inherit_enabled)
> > > > orders &= ~inherit;>
> > > > /*
> > > > * Otherwise, we only enforce sysfs settings if asked. In addition,
> > > > * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> > > > * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> > > > * set.
> > > > */
> > > > if (!(tva_flags & TVA_ENFORCE_SYSFS))
> > > > return orders;
> > > >
> > > > if (hugepage_global_always())
> > > > return orders & (always | inherit);
> > > >
> > > > /* We already excluded never inherit above. */
> > > > if (vm_flags & VM_HUGEPAGE)
> > > > return orders & (always | madvise | inherit);
> > > >
> > > > return orders & always;
> > > > }
> > > >
> > > > What do you think?
> > >
> > > With the fixup, it would work for me. No magical "mask" variables :D
> >
> > Thanks!
>
> Fair enough. You both prefer the 'exclude never' logic, and I will follow
> that logic.
Appreciate it, sorry to be a pain and thanks a lot!
>
> > > > > + return orders;
> > > > > +}
> > > > > +
> > > > > /**
> > > > > * thp_vma_allowable_orders - determine hugepage orders that are allowed for vma
> > > > > * @vma: the vm area to check
> > > > > @@ -287,16 +323,8 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> > > > > unsigned long orders)
> > > > > {
> > > > > /* Optimization to check if required orders are enabled early. */
> > > > > - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> > > > > - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> > > > > -
> > > > > - if (vm_flags & VM_HUGEPAGE)
> > > > > - mask |= READ_ONCE(huge_anon_orders_madvise);
> > > > > - if (hugepage_global_always() ||
> > > > > - ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
> > > > > - mask |= READ_ONCE(huge_anon_orders_inherit);
> > > > > -
> > > > > - orders &= mask;
> > > > > + if (vma_is_anonymous(vma)) {
> > > > > + orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
> > > > > if (!orders)
> > > > > return 0;
> > > >
> > > > I pointed out to Baolin that __thp_vma_allowable_orders() handles the orders ==
> > > > 0 case almost immediately so there's no need to do this, it just makes the code
> > > > noisier.
> > >
> > > The reason we added it in the first place was to not do the (expensive)
> > > function call.
> >
> > Ack point taken!
> >
> > >
> > >
> > > --
> > > Cheers,
> > >
> > > David / dhildenb
> > >
> >
> > For convenience, I enclose the fixed version + tweaked the inherit local bool to
> > be inherit_never to be clearer:
> >
> > /* Strictly mask requested anonymous orders according to sysfs settings. */
> > static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> > unsigned long tva_flags, unsigned long orders)
> > {
> > const unsigned long always = READ_ONCE(huge_anon_orders_always);
> > const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
> > const unsigned long never = ~(always | madvise | inherit);
> > const bool inherit_never = !hugepage_global_enabled();
> >
> > /* Disallow orders that are set to NEVER directly ... */
> > orders &= ~never;
> >
> > /* ... or through inheritance (global == NEVER). */
> > if (inherit_never)
> > orders &= ~inherit;
> >
> > /*
> > * Otherwise, we only enforce sysfs settings if asked. In addition,
> > * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> > * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> > * set.
> > */
> > if (!(tva_flags & TVA_ENFORCE_SYSFS))
> > return orders;
> >
> > /* We already excluded never inherit above. */
> > if (vm_flags & VM_HUGEPAGE)
> > return orders & (always | madvise | inherit);
> >
> > if (hugepage_global_always())
> > return orders & (always | inherit);
> >
> > return orders & always;
> > }
>
> Thanks Lorenzo. Let me follow this logic and do some testing.
Thanks, much appreciated!
On Thu, Jun 12, 2025 at 02:27:06PM +0100, Lorenzo Stoakes wrote:
[snip]
> I propose a compromise as I rather like your 'exclude never' negation bit.
>
> So:
>
> /* Strictly mask requested anonymous orders according to sysfs settings. */
> static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> unsigned long tva_flags, unsigned long orders)
> {
> const unsigned long always = READ_ONCE(huge_anon_orders_always);
> const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
> const unsigned long never = ~(always | madvise | inherit);
> const bool inherit_enabled = hugepage_global_enabled();
>
> /* Disallow orders that are set to NEVER directly ... */
> orders &= ~never;
>
> /* ... or through inheritance (global == NEVER). */
> if (!inherit_enabled)
> orders &= ~inherit;
>
> /*
> * Otherwise, we only enforce sysfs settings if asked. In addition,
> * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> * set.
> */
> if (!(tva_flags & TVA_ENFORCE_SYSFS))
> return orders;
>
> if (hugepage_global_always())
> return orders & (always | inherit);
>
> /* We already excluded never inherit above. */
> if (vm_flags & VM_HUGEPAGE)
> return orders & (always | madvise | inherit);
Of course... I immediately made a mistake... swap these two statements around. I
thought it'd be 'neater' to do the first one first, but of course it means
madvise (rather than inherit) orders don't get selected.
This WHOLE THING needs refactoring.
>
> return orders & always;
> }
>
> What do you think?
>
>
> > + return orders;
> > +}
> > +
> > /**
> > * thp_vma_allowable_orders - determine hugepage orders that are allowed for vma
> > * @vma: the vm area to check
> > @@ -287,16 +323,8 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> > unsigned long orders)
> > {
> > /* Optimization to check if required orders are enabled early. */
> > - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> > - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> > -
> > - if (vm_flags & VM_HUGEPAGE)
> > - mask |= READ_ONCE(huge_anon_orders_madvise);
> > - if (hugepage_global_always() ||
> > - ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
> > - mask |= READ_ONCE(huge_anon_orders_inherit);
> > -
> > - orders &= mask;
> > + if (vma_is_anonymous(vma)) {
> > + orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
> > if (!orders)
> > return 0;
>
> I pointed out to Baolin that __thp_vma_allowable_orders() handles the orders ==
> 0 case almost immediately so there's no need to do this, it just makes the code
> noisier.
>
> I mean we _could_ keep it but I think it's better not to for cleanliness, what
> do you think?
>
> > }
> >
> >
> > --
> > Cheers,
> >
> > David / dhildenb
> >
>
On 2025/6/12 21:29, Lorenzo Stoakes wrote:
> On Thu, Jun 12, 2025 at 02:27:06PM +0100, Lorenzo Stoakes wrote:
> [snip]
>
>> I propose a compromise as I rather like your 'exclude never' negation bit.
>>
>> So:
>>
>> /* Strictly mask requested anonymous orders according to sysfs settings. */
>> static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
>> unsigned long tva_flags, unsigned long orders)
>> {
>> const unsigned long always = READ_ONCE(huge_anon_orders_always);
>> const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>> const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
>> const unsigned long never = ~(always | madvise | inherit);
>> const bool inherit_enabled = hugepage_global_enabled();
>>
>> /* Disallow orders that are set to NEVER directly ... */
>> orders &= ~never;
>>
>> /* ... or through inheritance (global == NEVER). */
>> if (!inherit_enabled)
>> orders &= ~inherit;
>>
>> /*
>> * Otherwise, we only enforce sysfs settings if asked. In addition,
>> * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
>> * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
>> * set.
>> */
>> if (!(tva_flags & TVA_ENFORCE_SYSFS))
>> return orders;
>>
>> if (hugepage_global_always())
>> return orders & (always | inherit);
>>
>> /* We already excluded never inherit above. */
>> if (vm_flags & VM_HUGEPAGE)
>> return orders & (always | madvise | inherit);
>
> Of course... I immediately made a mistake... swap these two statements around. I
> thought it'd be 'neater' to do the first one first, but of course it means
> madvise (rather than inherit) orders don't get selected.
>
> This WHOLE THING needs refactoring.
Personally, I think the 'exclude never' logic becomes more complicated.
I made a simpler change without adding a new helper. What do you think?
static inline
unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
unsigned long vm_flags,
unsigned long tva_flags,
unsigned long orders)
{
/* Optimization to check if required orders are enabled early. */
if (vma_is_anonymous(vma)) {
unsigned long mask = READ_ONCE(huge_anon_orders_always);
bool huge_enforce = !(tva_flags & TVA_ENFORCE_SYSFS);
bool has_madvise = vm_flags & VM_HUGEPAGE;
/*
* if the user sets a sysfs mode of madvise and if
TVA_ENFORCE_SYSFS
* is not set, we don't bother checking whether the VMA
has VM_HUGEPAGE
* set.
*/
if (huge_enforce || has_madvise)
mask |= READ_ONCE(huge_anon_orders_madvise);
if (hugepage_global_always() ||
((has_madvise || huge_enforce) &&
hugepage_global_enabled()))
mask |= READ_ONCE(huge_anon_orders_inherit);
orders &= mask;
if (!orders)
return 0;
}
return __thp_vma_allowable_orders(vma, vm_flags, tva_flags,
orders);
}
>
>>
>> return orders & always;
>> }
>>
>> What do you think?
>>
>>
>>> + return orders;
>>> +}
>>> +
>>> /**
>>> * thp_vma_allowable_orders - determine hugepage orders that are allowed for vma
>>> * @vma: the vm area to check
>>> @@ -287,16 +323,8 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
>>> unsigned long orders)
>>> {
>>> /* Optimization to check if required orders are enabled early. */
>>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>>> -
>>> - if (vm_flags & VM_HUGEPAGE)
>>> - mask |= READ_ONCE(huge_anon_orders_madvise);
>>> - if (hugepage_global_always() ||
>>> - ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
>>> - mask |= READ_ONCE(huge_anon_orders_inherit);
>>> -
>>> - orders &= mask;
>>> + if (vma_is_anonymous(vma)) {
>>> + orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
>>> if (!orders)
>>> return 0;
>>
>> I pointed out to Baolin that __thp_vma_allowable_orders() handles the orders ==
>> 0 case almost immediately so there's no need to do this, it just makes the code
>> noisier.
>>
>> I mean we _could_ keep it but I think it's better not to for cleanliness, what
>> do you think?
>>
>>> }
>>>
>>>
>>> --
>>> Cheers,
>>>
>>> David / dhildenb
>>>
>>
On Thu, Jun 12, 2025 at 10:13:40PM +0800, Baolin Wang wrote:
>
>
> On 2025/6/12 21:29, Lorenzo Stoakes wrote:
> > On Thu, Jun 12, 2025 at 02:27:06PM +0100, Lorenzo Stoakes wrote:
> > [snip]
> >
> > > I propose a compromise as I rather like your 'exclude never' negation bit.
> > >
> > > So:
> > >
> > > /* Strictly mask requested anonymous orders according to sysfs settings. */
> > > static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> > > unsigned long tva_flags, unsigned long orders)
> > > {
> > > const unsigned long always = READ_ONCE(huge_anon_orders_always);
> > > const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > > const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
> > > const unsigned long never = ~(always | madvise | inherit);
> > > const bool inherit_enabled = hugepage_global_enabled();
> > >
> > > /* Disallow orders that are set to NEVER directly ... */
> > > orders &= ~never;
> > >
> > > /* ... or through inheritance (global == NEVER). */
> > > if (!inherit_enabled)
> > > orders &= ~inherit;
> > >
> > > /*
> > > * Otherwise, we only enforce sysfs settings if asked. In addition,
> > > * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> > > * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> > > * set.
> > > */
> > > if (!(tva_flags & TVA_ENFORCE_SYSFS))
> > > return orders;
> > >
> > > if (hugepage_global_always())
> > > return orders & (always | inherit);
> > >
> > > /* We already excluded never inherit above. */
> > > if (vm_flags & VM_HUGEPAGE)
> > > return orders & (always | madvise | inherit);
> >
> > Of course... I immediately made a mistake... swap these two statements around. I
> > thought it'd be 'neater' to do the first one first, but of course it means
> > madvise (rather than inherit) orders don't get selected.
> >
> > This WHOLE THING needs refactoring.
>
> Personally, I think the 'exclude never' logic becomes more complicated. I
> made a simpler change without adding a new helper. What do you think?
>
> static inline
> unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> unsigned long vm_flags,
> unsigned long tva_flags,
> unsigned long orders)
> {
> /* Optimization to check if required orders are enabled early. */
> if (vma_is_anonymous(vma)) {
I hate the level of indentation here. There's really no reason not to have this
as a helper as this just solves this problem and any sane compiler will inline.
> unsigned long mask = READ_ONCE(huge_anon_orders_always);
> bool huge_enforce = !(tva_flags & TVA_ENFORCE_SYSFS);
Huge enforce is when we don't have enforce flag set? This is super confusing.
> bool has_madvise = vm_flags & VM_HUGEPAGE;
>
> /*
> * if the user sets a sysfs mode of madvise and if
> TVA_ENFORCE_SYSFS
> * is not set, we don't bother checking whether the VMA has
> VM_HUGEPAGE
> * set.
> */
> if (huge_enforce || has_madvise)
> mask |= READ_ONCE(huge_anon_orders_madvise);
I find this more confusing, honestly.
I far prefer having the never checks up front, and I prefer David's 'explicitly
deal with never through negations' approach.
I also think adding these READ_ONCE()'s here adds a ton of noise.
> if (hugepage_global_always() ||
> ((has_madvise || huge_enforce) &&
> hugepage_global_enabled()))
This combination of conditions is just horribly confusing. And why are you
giving an explanation above but not for this one...
This is just combining a ton of logic in a confusing way.
> mask |= READ_ONCE(huge_anon_orders_inherit);
>
> orders &= mask;
> if (!orders)
> return 0;
> }
>
> return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
> }
Thanks for the suggestion but I do prefer the proposed compromise solution.
>
> >
> > >
> > > return orders & always;
> > > }
> > >
> > > What do you think?
> > >
> > >
> > > > + return orders;
> > > > +}
> > > > +
> > > > /**
> > > > * thp_vma_allowable_orders - determine hugepage orders that are allowed for vma
> > > > * @vma: the vm area to check
> > > > @@ -287,16 +323,8 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> > > > unsigned long orders)
> > > > {
> > > > /* Optimization to check if required orders are enabled early. */
> > > > - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> > > > - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> > > > -
> > > > - if (vm_flags & VM_HUGEPAGE)
> > > > - mask |= READ_ONCE(huge_anon_orders_madvise);
> > > > - if (hugepage_global_always() ||
> > > > - ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
> > > > - mask |= READ_ONCE(huge_anon_orders_inherit);
> > > > -
> > > > - orders &= mask;
> > > > + if (vma_is_anonymous(vma)) {
> > > > + orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
> > > > if (!orders)
> > > > return 0;
> > >
> > > I pointed out to Baolin that __thp_vma_allowable_orders() handles the orders ==
> > > 0 case almost immediately so there's no need to do this, it just makes the code
> > > noisier.
> > >
> > > I mean we _could_ keep it but I think it's better not to for cleanliness, what
> > > do you think?
> > >
> > > > }
> > > >
> > > >
> > > > --
> > > > Cheers,
> > > >
> > > > David / dhildenb
> > > >
> > >
On 12.06.25 16:13, Baolin Wang wrote:
>
>
> On 2025/6/12 21:29, Lorenzo Stoakes wrote:
>> On Thu, Jun 12, 2025 at 02:27:06PM +0100, Lorenzo Stoakes wrote:
>> [snip]
>>
>>> I propose a compromise as I rather like your 'exclude never' negation bit.
>>>
>>> So:
>>>
>>> /* Strictly mask requested anonymous orders according to sysfs settings. */
>>> static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
>>> unsigned long tva_flags, unsigned long orders)
>>> {
>>> const unsigned long always = READ_ONCE(huge_anon_orders_always);
>>> const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>> const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
>>> const unsigned long never = ~(always | madvise | inherit);
>>> const bool inherit_enabled = hugepage_global_enabled();
>>>
>>> /* Disallow orders that are set to NEVER directly ... */
>>> orders &= ~never;
>>>
>>> /* ... or through inheritance (global == NEVER). */
>>> if (!inherit_enabled)
>>> orders &= ~inherit;
>>>
>>> /*
>>> * Otherwise, we only enforce sysfs settings if asked. In addition,
>>> * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
>>> * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
>>> * set.
>>> */
>>> if (!(tva_flags & TVA_ENFORCE_SYSFS))
>>> return orders;
>>>
>>> if (hugepage_global_always())
>>> return orders & (always | inherit);
>>>
>>> /* We already excluded never inherit above. */
>>> if (vm_flags & VM_HUGEPAGE)
>>> return orders & (always | madvise | inherit);
>>
>> Of course... I immediately made a mistake... swap these two statements around. I
>> thought it'd be 'neater' to do the first one first, but of course it means
>> madvise (rather than inherit) orders don't get selected.
>>
>> This WHOLE THING needs refactoring.
>
> Personally, I think the 'exclude never' logic becomes more complicated.
> I made a simpler change without adding a new helper. What do you think?
>
> static inline
> unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> unsigned long vm_flags,
> unsigned long tva_flags,
> unsigned long orders)
> {
> /* Optimization to check if required orders are enabled early. */
> if (vma_is_anonymous(vma)) {
> unsigned long mask = READ_ONCE(huge_anon_orders_always);
> bool huge_enforce = !(tva_flags & TVA_ENFORCE_SYSFS);
> bool has_madvise = vm_flags & VM_HUGEPAGE;
>
> /*
> * if the user sets a sysfs mode of madvise and if
> TVA_ENFORCE_SYSFS
> * is not set, we don't bother checking whether the VMA
> has VM_HUGEPAGE
> * set.
> */
> if (huge_enforce || has_madvise)
> mask |= READ_ONCE(huge_anon_orders_madvise);
> if (hugepage_global_always() ||
> ((has_madvise || huge_enforce) &&
> hugepage_global_enabled()))
OMG is that ugly.
--
Cheers,
David / dhildenb
On 2025/6/12 21:05, David Hildenbrand wrote:
> On 12.06.25 14:45, Baolin Wang wrote:
>>
>>
>> On 2025/6/12 16:51, David Hildenbrand wrote:
>>> On 12.06.25 09:51, Baolin Wang wrote:
>>>>
>>>>
>>>> On 2025/6/11 20:34, David Hildenbrand wrote:
>>>>> On 05.06.25 10:00, Baolin Wang wrote:
>>>>>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs
>>>>>> settings,
>>>>>> which
>>>>>> means that even though we have disabled the Anon THP configuration,
>>>>>> MADV_COLLAPSE
>>>>>> will still attempt to collapse into a Anon THP. This violates the
>>>>>> rule
>>>>>> we have
>>>>>> agreed upon: never means never.
>>>>>>
>>>>>> Another rule for madvise, referring to David's suggestion: “allowing
>>>>>> for collapsing
>>>>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>>>>>
>>>>>> To address this issue, should check whether the Anon THP
>>>>>> configuration
>>>>>> is disabled
>>>>>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS
>>>>>> flag is
>>>>>> set.
>>>>>>
>>>>>> In summary, the current strategy is:
>>>>>>
>>>>>> 1. If always & orders == 0, and madvise & orders == 0, and
>>>>>> hugepage_global_enabled() == false
>>>>>> (global THP settings are not enabled), it means mTHP of that orders
>>>>>> are prohibited
>>>>>> from being used, then madvise_collapse() is forbidden for that
>>>>>> orders.
>>>>>>
>>>>>> 2. If always & orders == 0, and madvise & orders == 0, and
>>>>>> hugepage_global_enabled() == true
>>>>>> (global THP settings are enabled), and inherit & orders == 0, it
>>>>>> means
>>>>>> mTHP of that
>>>>>> orders are still prohibited from being used, thus madvise_collapse()
>>>>>> is not allowed
>>>>>> for that orders.
>>>>>>
>>>>>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>>>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>>>> ---
>>>>>> include/linux/huge_mm.h | 23 +++++++++++++++++++----
>>>>>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>>>> index 2f190c90192d..199ddc9f04a1 100644
>>>>>> --- a/include/linux/huge_mm.h
>>>>>> +++ b/include/linux/huge_mm.h
>>>>>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct
>>>>>> vm_area_struct *vma,
>>>>>> unsigned long orders)
>>>>>> {
>>>>>> /* Optimization to check if required orders are enabled
>>>>>> early. */
>>>>>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>>>>>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>>>>>> + if (vma_is_anonymous(vma)) {
>>>>>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>>>>>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>>>>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>>>>>> + unsigned long mask = always | madvise;
>>>>>> +
>>>>>> + /*
>>>>>> + * If the system-wide THP/mTHP sysfs settings are disabled,
>>>>>> + * then we should never allow hugepages.
>>>>> > + */> + if (!(mask & orders) &&
>>>>> !(hugepage_global_enabled() && (inherit & orders)))
>>>>>> + return 0;
>>>>>
>>>>> I'm still trying to digest that. Isn't there a way for us to work with
>>>>> the orders,
>>>>> essentially masking off all orders that are forbidden globally.
>>>>> Similar
>>>>> to below, if !orders, then return 0?
>>>>> /* Orders disabled directly. */
>>>>> orders &= ~TODO;
>>>>> /* Orders disabled by inheriting from the global toggle. */
>>>>> if (!hugepage_global_enabled())
>>>>> orders &= ~READ_ONCE(huge_anon_orders_inherit);
>>>>>
>>>>> TODO is probably a -1ULL and then clearing always/madvise/inherit.
>>>>> Could
>>>>> add a simple helper for that
>>>>>
>>>>> huge_anon_orders_never
>>>>
>>>> I followed Lorenzo's suggestion to simplify the logic. Does that look
>>>> more readable?
>>>>
>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>> index 2f190c90192d..3087ac7631e0 100644
>>>> --- a/include/linux/huge_mm.h
>>>> +++ b/include/linux/huge_mm.h
>>>> @@ -265,6 +265,43 @@ unsigned long __thp_vma_allowable_orders(struct
>>>> vm_area_struct *vma,
>>>> unsigned long tva_flags,
>>>> unsigned long orders);
>>>>
>>>> +/* Strictly mask requested anonymous orders according to sysfs
>>>> settings. */
>>>> +static inline unsigned long __thp_mask_anon_orders(unsigned long
>>>> vm_flags,
>>>> + unsigned long tva_flags, unsigned long
>>>> orders)
>>>> +{
>>>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>>>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>>>> + bool inherit_enabled = hugepage_global_enabled();
>>>> + bool has_madvise = vm_flags & VM_HUGEPAGE;
>>>> + unsigned long mask = always | madvise;
>>>> +
>>>> + mask = always | madvise;
>>>> + if (inherit_enabled)
>>>> + mask |= inherit;
>>>> +
>>>> + /* All set to/inherit NEVER - never means never globally,
>>>> abort. */
>>>> + if (!(mask & orders))
>>>> + return 0;
>>>
>>> Still confusing. I am not sure if we would properly catch when someone
>>> specifies e.g., 2M and 1M, while we only have 2M disabled.
>>
>> IIUC, Yes. In your case, we will only allow order 8 (1M mTHP).
>>
>>> I would rewrite the function to only ever substract from "orders".
>>>
>>> ...
>>>
>>> /* Disallow orders that are set to NEVER directly ... */
>>> order &= (always | madvise | inherit);
>>>
>>> /* ... or through inheritance. */
>>> if (inherit_enabled)
>>> orders &= ~inherit;
>>
>> Sorry, I didn't get you here.
>>
>> If orders = THP_ORDERS_ALL_ANON, inherit = 0x200 (order 9), always and
>> madvise are 0, and inherit_enabled = true. Then orders will be 0 with
>> your logic. But we should allow order 9, right?
>
> Yeah, all confusing, because the temporary variables don't help.
>
> if (!inherit_enabled)
>
> or simply
>
> if (!hugepage_global_enabled();)
>
> Let me try again below.
>
>>
>>>
>>> /*
>>> * Otherwise, we only enforce sysfs settings if asked. In addition,
>>> * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
>>> * is not set, we don't bother checking whether the VMA has
>>> VM_HUGEPAGE
>>> * set.
>>> */
>>> if (!orders || !(tva_flags & TVA_ENFORCE_SYSFS))
>>> return orders;
>>>
>>>> +
>>>> + /*
>>>> + * Otherwise, we only enforce sysfs settings if asked. In
>>>> addition,
>>>> + * if the user sets a sysfs mode of madvise and if
>>>> TVA_ENFORCE_SYSFS
>>>> + * is not set, we don't bother checking whether the VMA has
>>>> VM_HUGEPAGE
>>>> + * set.
>>>> + */
>>>> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
>>>> + return orders;
>>>> +
>>>> + mask = always;
>>>> + if (has_madvise)
>>>> + mask |= madvise;
>>>> + if (hugepage_global_always() || (has_madvise &&
>>>> inherit_enabled))
>>>> + mask |= inherit;
>>>
>>> Similarly, this can maybe become (not 100% sure if I got it right, the
>>> condition above is confusing)
>>
>> IMO, this is the original logic.
>
> Yeah, and it's absolutely confusing stuff.
>
> Let me try again by only clearing flags. Maybe this would be clearer?
> (and correct? still confused why the latter part is so complicated in
> existing
> code)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 8b8f353cc7b81..66fdfe06e4996 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -265,6 +265,42 @@ unsigned long __thp_vma_allowable_orders(struct
> vm_area_struct *vma,
> unsigned long tva_flags,
> unsigned long orders);
>
> +/* Strictly mask requested anonymous orders according to sysfs
> settings. */
> +static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> + unsigned long tva_flags, unsigned long orders)
> +{
> + const unsigned long always = READ_ONCE(huge_anon_orders_always);
> + const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> + const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> + const unsigned long never = ~(always | madvise | inherit);
> +
> + /* Disallow orders that are set to NEVER directly ... */
> + orders &= ~never;
> +
> + /* ... or through inheritance (global == NEVER). */
> + if (!hugepage_global_enabled())
> + orders &= ~inherit;
> +
> + /*
> + * Otherwise, we only enforce sysfs settings if asked. In addition,
> + * if the user sets a sysfs mode of madvise and if
> TVA_ENFORCE_SYSFS
> + * is not set, we don't bother checking whether the VMA has
> VM_HUGEPAGE
> + * set.
> + */
> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
> + return orders;
> +
> + if (!(vm_flags & VM_HUGEPAGE)) {
> + /* Disallow orders that are set to MADVISE directly ... */
> + orders &= ~madvise;
> +
> + /* ... or through inheritance (global == MADVISE). */
> + if (!hugepage_global_always())
> + orders &= ~inherit;
Seems we can drop this 'inherit' check, cause if
!hugepage_global_enabled() == true, which always means
!hugepage_global_always() is true.
> + }
> + return orders;
> +}
Otherwise look good to me.
> +
> /**
> * thp_vma_allowable_orders - determine hugepage orders that are
> allowed for vma
> * @vma: the vm area to check
> @@ -287,16 +323,8 @@ unsigned long thp_vma_allowable_orders(struct
> vm_area_struct *vma,
> unsigned long orders)
> {
> /* Optimization to check if required orders are enabled early. */
> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> -
> - if (vm_flags & VM_HUGEPAGE)
> - mask |= READ_ONCE(huge_anon_orders_madvise);
> - if (hugepage_global_always() ||
> - ((vm_flags & VM_HUGEPAGE) &&
> hugepage_global_enabled()))
> - mask |= READ_ONCE(huge_anon_orders_inherit);
> -
> - orders &= mask;
> + if (vma_is_anonymous(vma)) {
> + orders = __thp_mask_anon_orders(vm_flags, tva_flags,
> orders);
> if (!orders)
> return 0;
> }
>
>
On 2025/6/12 21:25, Baolin Wang wrote:
>
>
> On 2025/6/12 21:05, David Hildenbrand wrote:
>> On 12.06.25 14:45, Baolin Wang wrote:
>>>
>>>
>>> On 2025/6/12 16:51, David Hildenbrand wrote:
>>>> On 12.06.25 09:51, Baolin Wang wrote:
>>>>>
>>>>>
>>>>> On 2025/6/11 20:34, David Hildenbrand wrote:
>>>>>> On 05.06.25 10:00, Baolin Wang wrote:
>>>>>>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs
>>>>>>> settings,
>>>>>>> which
>>>>>>> means that even though we have disabled the Anon THP configuration,
>>>>>>> MADV_COLLAPSE
>>>>>>> will still attempt to collapse into a Anon THP. This violates the
>>>>>>> rule
>>>>>>> we have
>>>>>>> agreed upon: never means never.
>>>>>>>
>>>>>>> Another rule for madvise, referring to David's suggestion: “allowing
>>>>>>> for collapsing
>>>>>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>>>>>>
>>>>>>> To address this issue, should check whether the Anon THP
>>>>>>> configuration
>>>>>>> is disabled
>>>>>>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS
>>>>>>> flag is
>>>>>>> set.
>>>>>>>
>>>>>>> In summary, the current strategy is:
>>>>>>>
>>>>>>> 1. If always & orders == 0, and madvise & orders == 0, and
>>>>>>> hugepage_global_enabled() == false
>>>>>>> (global THP settings are not enabled), it means mTHP of that orders
>>>>>>> are prohibited
>>>>>>> from being used, then madvise_collapse() is forbidden for that
>>>>>>> orders.
>>>>>>>
>>>>>>> 2. If always & orders == 0, and madvise & orders == 0, and
>>>>>>> hugepage_global_enabled() == true
>>>>>>> (global THP settings are enabled), and inherit & orders == 0, it
>>>>>>> means
>>>>>>> mTHP of that
>>>>>>> orders are still prohibited from being used, thus madvise_collapse()
>>>>>>> is not allowed
>>>>>>> for that orders.
>>>>>>>
>>>>>>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>>>>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>>>>> ---
>>>>>>> include/linux/huge_mm.h | 23 +++++++++++++++++++----
>>>>>>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>>>>>>
>>>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>>>>> index 2f190c90192d..199ddc9f04a1 100644
>>>>>>> --- a/include/linux/huge_mm.h
>>>>>>> +++ b/include/linux/huge_mm.h
>>>>>>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct
>>>>>>> vm_area_struct *vma,
>>>>>>> unsigned long orders)
>>>>>>> {
>>>>>>> /* Optimization to check if required orders are enabled
>>>>>>> early. */
>>>>>>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>>>>>>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>>>>>>> + if (vma_is_anonymous(vma)) {
>>>>>>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>>>>>>> + unsigned long madvise =
>>>>>>> READ_ONCE(huge_anon_orders_madvise);
>>>>>>> + unsigned long inherit =
>>>>>>> READ_ONCE(huge_anon_orders_inherit);
>>>>>>> + unsigned long mask = always | madvise;
>>>>>>> +
>>>>>>> + /*
>>>>>>> + * If the system-wide THP/mTHP sysfs settings are disabled,
>>>>>>> + * then we should never allow hugepages.
>>>>>> > + */> + if (!(mask & orders) &&
>>>>>> !(hugepage_global_enabled() && (inherit & orders)))
>>>>>>> + return 0;
>>>>>>
>>>>>> I'm still trying to digest that. Isn't there a way for us to work
>>>>>> with
>>>>>> the orders,
>>>>>> essentially masking off all orders that are forbidden globally.
>>>>>> Similar
>>>>>> to below, if !orders, then return 0?
>>>>>> /* Orders disabled directly. */
>>>>>> orders &= ~TODO;
>>>>>> /* Orders disabled by inheriting from the global toggle. */
>>>>>> if (!hugepage_global_enabled())
>>>>>> orders &= ~READ_ONCE(huge_anon_orders_inherit);
>>>>>>
>>>>>> TODO is probably a -1ULL and then clearing always/madvise/inherit.
>>>>>> Could
>>>>>> add a simple helper for that
>>>>>>
>>>>>> huge_anon_orders_never
>>>>>
>>>>> I followed Lorenzo's suggestion to simplify the logic. Does that look
>>>>> more readable?
>>>>>
>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>>> index 2f190c90192d..3087ac7631e0 100644
>>>>> --- a/include/linux/huge_mm.h
>>>>> +++ b/include/linux/huge_mm.h
>>>>> @@ -265,6 +265,43 @@ unsigned long __thp_vma_allowable_orders(struct
>>>>> vm_area_struct *vma,
>>>>> unsigned long tva_flags,
>>>>> unsigned long orders);
>>>>>
>>>>> +/* Strictly mask requested anonymous orders according to sysfs
>>>>> settings. */
>>>>> +static inline unsigned long __thp_mask_anon_orders(unsigned long
>>>>> vm_flags,
>>>>> + unsigned long tva_flags, unsigned long
>>>>> orders)
>>>>> +{
>>>>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>>>>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>>>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>>>>> + bool inherit_enabled = hugepage_global_enabled();
>>>>> + bool has_madvise = vm_flags & VM_HUGEPAGE;
>>>>> + unsigned long mask = always | madvise;
>>>>> +
>>>>> + mask = always | madvise;
>>>>> + if (inherit_enabled)
>>>>> + mask |= inherit;
>>>>> +
>>>>> + /* All set to/inherit NEVER - never means never globally,
>>>>> abort. */
>>>>> + if (!(mask & orders))
>>>>> + return 0;
>>>>
>>>> Still confusing. I am not sure if we would properly catch when someone
>>>> specifies e.g., 2M and 1M, while we only have 2M disabled.
>>>
>>> IIUC, Yes. In your case, we will only allow order 8 (1M mTHP).
>>>
>>>> I would rewrite the function to only ever substract from "orders".
>>>>
>>>> ...
>>>>
>>>> /* Disallow orders that are set to NEVER directly ... */
>>>> order &= (always | madvise | inherit);
>>>>
>>>> /* ... or through inheritance. */
>>>> if (inherit_enabled)
>>>> orders &= ~inherit;
>>>
>>> Sorry, I didn't get you here.
>>>
>>> If orders = THP_ORDERS_ALL_ANON, inherit = 0x200 (order 9), always and
>>> madvise are 0, and inherit_enabled = true. Then orders will be 0 with
>>> your logic. But we should allow order 9, right?
>>
>> Yeah, all confusing, because the temporary variables don't help.
>>
>> if (!inherit_enabled)
>>
>> or simply
>>
>> if (!hugepage_global_enabled();)
>>
>> Let me try again below.
>>
>>>
>>>>
>>>> /*
>>>> * Otherwise, we only enforce sysfs settings if asked. In addition,
>>>> * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
>>>> * is not set, we don't bother checking whether the VMA has
>>>> VM_HUGEPAGE
>>>> * set.
>>>> */
>>>> if (!orders || !(tva_flags & TVA_ENFORCE_SYSFS))
>>>> return orders;
>>>>
>>>>> +
>>>>> + /*
>>>>> + * Otherwise, we only enforce sysfs settings if asked. In
>>>>> addition,
>>>>> + * if the user sets a sysfs mode of madvise and if
>>>>> TVA_ENFORCE_SYSFS
>>>>> + * is not set, we don't bother checking whether the VMA has
>>>>> VM_HUGEPAGE
>>>>> + * set.
>>>>> + */
>>>>> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
>>>>> + return orders;
>>>>> +
>>>>> + mask = always;
>>>>> + if (has_madvise)
>>>>> + mask |= madvise;
>>>>> + if (hugepage_global_always() || (has_madvise &&
>>>>> inherit_enabled))
>>>>> + mask |= inherit;
>>>>
>>>> Similarly, this can maybe become (not 100% sure if I got it right, the
>>>> condition above is confusing)
>>>
>>> IMO, this is the original logic.
>>
>> Yeah, and it's absolutely confusing stuff.
>>
>> Let me try again by only clearing flags. Maybe this would be clearer?
>> (and correct? still confused why the latter part is so complicated in
>> existing
>> code)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 8b8f353cc7b81..66fdfe06e4996 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -265,6 +265,42 @@ unsigned long __thp_vma_allowable_orders(struct
>> vm_area_struct *vma,
>> unsigned long tva_flags,
>> unsigned long orders);
>>
>> +/* Strictly mask requested anonymous orders according to sysfs
>> settings. */
>> +static inline unsigned long __thp_mask_anon_orders(unsigned long
>> vm_flags,
>> + unsigned long tva_flags, unsigned long orders)
>> +{
>> + const unsigned long always = READ_ONCE(huge_anon_orders_always);
>> + const unsigned long madvise =
>> READ_ONCE(huge_anon_orders_madvise);
>> + const unsigned long inherit =
>> READ_ONCE(huge_anon_orders_inherit);
>> + const unsigned long never = ~(always | madvise | inherit);
>> +
>> + /* Disallow orders that are set to NEVER directly ... */
>> + orders &= ~never;
>> +
>> + /* ... or through inheritance (global == NEVER). */
>> + if (!hugepage_global_enabled())
>> + orders &= ~inherit;
>> +
>> + /*
>> + * Otherwise, we only enforce sysfs settings if asked. In
>> addition,
>> + * if the user sets a sysfs mode of madvise and if
>> TVA_ENFORCE_SYSFS
>> + * is not set, we don't bother checking whether the VMA has
>> VM_HUGEPAGE
>> + * set.
>> + */
>> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
>> + return orders;
>> +
>> + if (!(vm_flags & VM_HUGEPAGE)) {
>> + /* Disallow orders that are set to MADVISE directly
>> ... */
>> + orders &= ~madvise;
>> +
>> + /* ... or through inheritance (global == MADVISE). */
>> + if (!hugepage_global_always())
>> + orders &= ~inherit;
>
> Seems we can drop this 'inherit' check, cause if
> !hugepage_global_enabled() == true, which always means
> !hugepage_global_always() is true.
Please ignore this incorrect comment. Logic is confusing :)
On 12/06/25 1:21 pm, Baolin Wang wrote:
>
>
> On 2025/6/11 20:34, David Hildenbrand wrote:
>> On 05.06.25 10:00, Baolin Wang wrote:
>>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs
>>> settings, which
>>> means that even though we have disabled the Anon THP configuration,
>>> MADV_COLLAPSE
>>> will still attempt to collapse into a Anon THP. This violates the
>>> rule we have
>>> agreed upon: never means never.
>>>
>>> Another rule for madvise, referring to David's suggestion: “allowing
>>> for collapsing
>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>>
>>> To address this issue, should check whether the Anon THP
>>> configuration is disabled
>>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag
>>> is set.
>>>
>>> In summary, the current strategy is:
>>>
>>> 1. If always & orders == 0, and madvise & orders == 0, and
>>> hugepage_global_enabled() == false
>>> (global THP settings are not enabled), it means mTHP of that orders
>>> are prohibited
>>> from being used, then madvise_collapse() is forbidden for that orders.
>>>
>>> 2. If always & orders == 0, and madvise & orders == 0, and
>>> hugepage_global_enabled() == true
>>> (global THP settings are enabled), and inherit & orders == 0, it
>>> means mTHP of that
>>> orders are still prohibited from being used, thus madvise_collapse()
>>> is not allowed
>>> for that orders.
>>>
>>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> ---
>>> include/linux/huge_mm.h | 23 +++++++++++++++++++----
>>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>> index 2f190c90192d..199ddc9f04a1 100644
>>> --- a/include/linux/huge_mm.h
>>> +++ b/include/linux/huge_mm.h
>>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct
>>> vm_area_struct *vma,
>>> unsigned long orders)
>>> {
>>> /* Optimization to check if required orders are enabled early. */
>>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>>> + if (vma_is_anonymous(vma)) {
>>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>>> + unsigned long mask = always | madvise;
>>> +
>>> + /*
>>> + * If the system-wide THP/mTHP sysfs settings are disabled,
>>> + * then we should never allow hugepages.
>> > + */> + if (!(mask & orders) &&
>> !(hugepage_global_enabled() && (inherit & orders)))
>>> + return 0;
>>
>> I'm still trying to digest that. Isn't there a way for us to work
>> with the orders,
>> essentially masking off all orders that are forbidden globally.
>> Similar to below, if !orders, then return 0?
>> /* Orders disabled directly. */
>> orders &= ~TODO;
>> /* Orders disabled by inheriting from the global toggle. */
>> if (!hugepage_global_enabled())
>> orders &= ~READ_ONCE(huge_anon_orders_inherit);
>>
>> TODO is probably a -1ULL and then clearing always/madvise/inherit.
>> Could add a simple helper for that
>>
>> huge_anon_orders_never
>
> I followed Lorenzo's suggestion to simplify the logic. Does that look
> more readable?
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 2f190c90192d..3087ac7631e0 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -265,6 +265,43 @@ unsigned long __thp_vma_allowable_orders(struct
> vm_area_struct *vma,
> unsigned long tva_flags,
> unsigned long orders);
>
> +/* Strictly mask requested anonymous orders according to sysfs
> settings. */
> +static inline unsigned long __thp_mask_anon_orders(unsigned long
> vm_flags,
> + unsigned long tva_flags, unsigned long
> orders)
> +{
> + unsigned long always = READ_ONCE(huge_anon_orders_always);
> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> + bool inherit_enabled = hugepage_global_enabled();
> + bool has_madvise = vm_flags & VM_HUGEPAGE;
Let us name this honour_madvise to indicate that this VMA must honour
the madvise setting.
> + unsigned long mask = always | madvise;
> +
> + mask = always | madvise;
> + if (inherit_enabled)
> + mask |= inherit;
> +
> + /* All set to/inherit NEVER - never means never globally,
> abort. */
> + if (!(mask & orders))
> + return 0;
> +
> + /*
> + * Otherwise, we only enforce sysfs settings if asked. In
> addition,
> + * if the user sets a sysfs mode of madvise and if
> TVA_ENFORCE_SYSFS
> + * is not set, we don't bother checking whether the VMA has
> VM_HUGEPAGE
> + * set.
> + */
> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
> + return orders;
> +
> + mask = always;
> + if (has_madvise)
> + mask |= madvise;
> + if (hugepage_global_always() || (has_madvise && inherit_enabled))
> + mask |= inherit;
> +
> + return orders & mask;
> +}
> +
> /**
> * thp_vma_allowable_orders - determine hugepage orders that are
> allowed for vma
> * @vma: the vm area to check
> @@ -287,19 +324,8 @@ unsigned long thp_vma_allowable_orders(struct
> vm_area_struct *vma,
> unsigned long orders)
> {
> /* Optimization to check if required orders are enabled early. */
> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> -
> - if (vm_flags & VM_HUGEPAGE)
> - mask |= READ_ONCE(huge_anon_orders_madvise);
> - if (hugepage_global_always() ||
> - ((vm_flags & VM_HUGEPAGE) &&
> hugepage_global_enabled()))
> - mask |= READ_ONCE(huge_anon_orders_inherit);
> -
> - orders &= mask;
> - if (!orders)
> - return 0;
> - }
> + if (vma_is_anonymous(vma))
> + orders = __thp_mask_anon_orders(vm_flags, tva_flags,
> orders);
>
> return __thp_vma_allowable_orders(vma, vm_flags, tva_flags,
> orders);
> }
Overall this is very readable!
On 12.06.25 10:46, Dev Jain wrote:
>
> On 12/06/25 1:21 pm, Baolin Wang wrote:
>>
>>
>> On 2025/6/11 20:34, David Hildenbrand wrote:
>>> On 05.06.25 10:00, Baolin Wang wrote:
>>>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs
>>>> settings, which
>>>> means that even though we have disabled the Anon THP configuration,
>>>> MADV_COLLAPSE
>>>> will still attempt to collapse into a Anon THP. This violates the
>>>> rule we have
>>>> agreed upon: never means never.
>>>>
>>>> Another rule for madvise, referring to David's suggestion: “allowing
>>>> for collapsing
>>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>>>
>>>> To address this issue, should check whether the Anon THP
>>>> configuration is disabled
>>>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag
>>>> is set.
>>>>
>>>> In summary, the current strategy is:
>>>>
>>>> 1. If always & orders == 0, and madvise & orders == 0, and
>>>> hugepage_global_enabled() == false
>>>> (global THP settings are not enabled), it means mTHP of that orders
>>>> are prohibited
>>>> from being used, then madvise_collapse() is forbidden for that orders.
>>>>
>>>> 2. If always & orders == 0, and madvise & orders == 0, and
>>>> hugepage_global_enabled() == true
>>>> (global THP settings are enabled), and inherit & orders == 0, it
>>>> means mTHP of that
>>>> orders are still prohibited from being used, thus madvise_collapse()
>>>> is not allowed
>>>> for that orders.
>>>>
>>>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>> ---
>>>> include/linux/huge_mm.h | 23 +++++++++++++++++++----
>>>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>> index 2f190c90192d..199ddc9f04a1 100644
>>>> --- a/include/linux/huge_mm.h
>>>> +++ b/include/linux/huge_mm.h
>>>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct
>>>> vm_area_struct *vma,
>>>> unsigned long orders)
>>>> {
>>>> /* Optimization to check if required orders are enabled early. */
>>>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>>>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>>>> + if (vma_is_anonymous(vma)) {
>>>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>>>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>>>> + unsigned long mask = always | madvise;
>>>> +
>>>> + /*
>>>> + * If the system-wide THP/mTHP sysfs settings are disabled,
>>>> + * then we should never allow hugepages.
>>> > + */> + if (!(mask & orders) &&
>>> !(hugepage_global_enabled() && (inherit & orders)))
>>>> + return 0;
>>>
>>> I'm still trying to digest that. Isn't there a way for us to work
>>> with the orders,
>>> essentially masking off all orders that are forbidden globally.
>>> Similar to below, if !orders, then return 0?
>>> /* Orders disabled directly. */
>>> orders &= ~TODO;
>>> /* Orders disabled by inheriting from the global toggle. */
>>> if (!hugepage_global_enabled())
>>> orders &= ~READ_ONCE(huge_anon_orders_inherit);
>>>
>>> TODO is probably a -1ULL and then clearing always/madvise/inherit.
>>> Could add a simple helper for that
>>>
>>> huge_anon_orders_never
>>
>> I followed Lorenzo's suggestion to simplify the logic. Does that look
>> more readable?
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 2f190c90192d..3087ac7631e0 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -265,6 +265,43 @@ unsigned long __thp_vma_allowable_orders(struct
>> vm_area_struct *vma,
>> unsigned long tva_flags,
>> unsigned long orders);
>>
>> +/* Strictly mask requested anonymous orders according to sysfs
>> settings. */
>> +static inline unsigned long __thp_mask_anon_orders(unsigned long
>> vm_flags,
>> + unsigned long tva_flags, unsigned long
>> orders)
>> +{
>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>> + bool inherit_enabled = hugepage_global_enabled();
>> + bool has_madvise = vm_flags & VM_HUGEPAGE;
>
> Let us name this honour_madvise to indicate that this VMA must honour
> the madvise setting.
Not clear, because there is also "VM_NOHUGEPAGE" :/
See my reply where we maybe (if I got it right) only need a single check
and can just avoid a confusing temporary variable completely.
--
Cheers,
David / dhildenb
On Thu, Jun 5, 2025 at 2:01 AM Baolin Wang
<baolin.wang@linux.alibaba.com> wrote:
>
> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which
> means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE
> will still attempt to collapse into a Anon THP. This violates the rule we have
> agreed upon: never means never.
>
> Another rule for madvise, referring to David's suggestion: “allowing for collapsing
> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>
> To address this issue, should check whether the Anon THP configuration is disabled
> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is set.
>
> In summary, the current strategy is:
>
> 1. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == false
> (global THP settings are not enabled), it means mTHP of that orders are prohibited
> from being used, then madvise_collapse() is forbidden for that orders.
>
> 2. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == true
> (global THP settings are enabled), and inherit & orders == 0, it means mTHP of that
> orders are still prohibited from being used, thus madvise_collapse() is not allowed
> for that orders.
>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
> include/linux/huge_mm.h | 23 +++++++++++++++++++----
> 1 file changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 2f190c90192d..199ddc9f04a1 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> unsigned long orders)
> {
> /* Optimization to check if required orders are enabled early. */
> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> + if (vma_is_anonymous(vma)) {
> + unsigned long always = READ_ONCE(huge_anon_orders_always);
> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> + unsigned long mask = always | madvise;
> +
> + /*
> + * If the system-wide THP/mTHP sysfs settings are disabled,
> + * then we should never allow hugepages.
> + */
> + if (!(mask & orders) && !(hugepage_global_enabled() && (inherit & orders)))
> + return 0;
Hi Baolin,
Thanks for making this change so we enforce the 'never' being never
rule. As others have noted, this statement is hard to read-- is there
any improvements we can make to the readability as Lorenzo suggested?
Either way, I've tested this and the functionality works as intended.
Tested-by: Nico Pache <npache@redhat.com>
-- Nico
> +
> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
> + goto skip;
>
> + mask = always;
> if (vm_flags & VM_HUGEPAGE)
> - mask |= READ_ONCE(huge_anon_orders_madvise);
> + mask |= madvise;
> if (hugepage_global_always() ||
> ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
> - mask |= READ_ONCE(huge_anon_orders_inherit);
> + mask |= inherit;
>
> orders &= mask;
> if (!orders)
> return 0;
> }
>
> +skip:
> return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
> }
>
> --
> 2.43.5
>
On 2025/6/9 02:37, Nico Pache wrote:
> On Thu, Jun 5, 2025 at 2:01 AM Baolin Wang
> <baolin.wang@linux.alibaba.com> wrote:
>>
>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which
>> means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE
>> will still attempt to collapse into a Anon THP. This violates the rule we have
>> agreed upon: never means never.
>>
>> Another rule for madvise, referring to David's suggestion: “allowing for collapsing
>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>
>> To address this issue, should check whether the Anon THP configuration is disabled
>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is set.
>>
>> In summary, the current strategy is:
>>
>> 1. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == false
>> (global THP settings are not enabled), it means mTHP of that orders are prohibited
>> from being used, then madvise_collapse() is forbidden for that orders.
>>
>> 2. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == true
>> (global THP settings are enabled), and inherit & orders == 0, it means mTHP of that
>> orders are still prohibited from being used, thus madvise_collapse() is not allowed
>> for that orders.
>>
>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>> include/linux/huge_mm.h | 23 +++++++++++++++++++----
>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 2f190c90192d..199ddc9f04a1 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
>> unsigned long orders)
>> {
>> /* Optimization to check if required orders are enabled early. */
>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>> + if (vma_is_anonymous(vma)) {
>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>> + unsigned long mask = always | madvise;
>> +
>> + /*
>> + * If the system-wide THP/mTHP sysfs settings are disabled,
>> + * then we should never allow hugepages.
>> + */
>> + if (!(mask & orders) && !(hugepage_global_enabled() && (inherit & orders)))
>> + return 0;
> Hi Baolin,
>
> Thanks for making this change so we enforce the 'never' being never
> rule. As others have noted, this statement is hard to read-- is there
> any improvements we can make to the readability as Lorenzo suggested?
Yes, will do in next version.
> Either way, I've tested this and the functionality works as intended.
>
> Tested-by: Nico Pache <npache@redhat.com>
Thanks for testing.
Not related to your patch at all, but man this whole thing (thp allowed orders)
needs significant improvement, it seems always perversely complicated for a
relatively simple operation.
Overall I LOVE what you're doing here, but I feel we can clarify things a
little while we're at it to make it clear exactly what we're doing.
This is a very important change so forgive my fiddling about here but I'm
hoping we can take the opportunity to make things a little simpler!
On Thu, Jun 05, 2025 at 04:00:58PM +0800, Baolin Wang wrote:
> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which
> means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE
> will still attempt to collapse into a Anon THP. This violates the rule we have
> agreed upon: never means never.
>
> Another rule for madvise, referring to David's suggestion: “allowing for collapsing
> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
I'm generally not sure it's worth talking only about MADV_COLLAPSE here when
you're changing what THP is permitted across the board, I may have missed some
discussion and forgive me if so, but what is special about MADV_COLLAPSE's use
of thp_vma_allowable_orders() that makes it ignore 'never's moreso than other
users?
>
> To address this issue, should check whether the Anon THP configuration is disabled
> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is set.
>
> In summary, the current strategy is:
>
> 1. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == false
> (global THP settings are not enabled), it means mTHP of that orders are prohibited
> from being used, then madvise_collapse() is forbidden for that orders.
>
> 2. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == true
> (global THP settings are enabled), and inherit & orders == 0, it means mTHP of that
> orders are still prohibited from being used, thus madvise_collapse() is not allowed
> for that orders.
OK so it's already confusing that the global settings only impact 'inherit'
settings below, so they're not really global at all, but rather perhaps should
be called 'inherited'.
Maybe I need to submit a patch to rename thp_inherited_enabled(), or perhaps
that'd just add to the confusion :P
OK this is also not your fault just general commentary.
Anyway, I feel points 1 and 2 can more succinctly be summed up as below,
also there's no need to refer to the code, it's actually clearer I think to
refer to the underlying logic:
If no hugepage modes are enabled for the desired orders, nor can we
enable them by inheriting from a 'global' enabled setting - then it
must be the case that all desired orders either specify or inherit
'NEVER' - and we must abort.
>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
> include/linux/huge_mm.h | 23 +++++++++++++++++++----
> 1 file changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 2f190c90192d..199ddc9f04a1 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> unsigned long orders)
> {
> /* Optimization to check if required orders are enabled early. */
> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> + if (vma_is_anonymous(vma)) {
> + unsigned long always = READ_ONCE(huge_anon_orders_always);
> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> + unsigned long mask = always | madvise;
> +
> + /*
> + * If the system-wide THP/mTHP sysfs settings are disabled,
> + * then we should never allow hugepages.
> + */
> + if (!(mask & orders) && !(hugepage_global_enabled() && (inherit & orders)))
> + return 0;
> +
> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
> + goto skip;
>
> + mask = always;
> if (vm_flags & VM_HUGEPAGE)
> - mask |= READ_ONCE(huge_anon_orders_madvise);
> + mask |= madvise;
> if (hugepage_global_always() ||
> ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
> - mask |= READ_ONCE(huge_anon_orders_inherit);
> + mask |= inherit;
>
> orders &= mask;
> if (!orders)
> return 0;
> }
>
> +skip:
> return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
> }
I feel this is compressing a lot of logic in a way that took me several
readings to understand (hey I might not be the smartest cookie in the jar,
but we need to account for all levels of kernel developer ;)
I feel like we can make things a lot clearer here by separating out with a
helper function (means we can drop some indentation too), and also take
advantage of the fact that, if orders == 0, __thp_vma_allowable_orders()
exits with 0 early so no need for us to do so ourselves:
/* Strictly mask requested anonymous orders according to sysfs settings. */
static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
unsigned long tva_flags, unsigned long orders)
{
unsigned long always = READ_ONCE(huge_anon_orders_always);
unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
bool inherit_enabled = hugepage_global_enabled();
bool has_madvise = vm_flags & VM_HUGEPAGE;
unsigned long mask = always | madvise;
mask = always | madvise;
if (inherit_enabled)
mask |= inherit;
/* All set to/inherit NEVER - never means never globally, abort. */
if (!(mask & orders))
return 0;
/* Otherwise, we only enforce sysfs settings if asked. */
if (!(tva_flags & TVA_ENFORCE_SYSFS))
return orders;
mask = always;
if (has_madvise)
mask |= madvise;
if (hugepage_global_always() || (has_madvise && inherit_enabled))
mask |= inherit;
return orders & mask;
}
...
static inline
unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
unsigned long vm_flags,
unsigned long tva_flags,
unsigned long orders)
{
if (vma_is_anonymous(vma))
orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
}
>
> --
> 2.43.5
>
On 2025/6/7 19:55, Lorenzo Stoakes wrote:
> Not related to your patch at all, but man this whole thing (thp allowed orders)
> needs significant improvement, it seems always perversely complicated for a
> relatively simple operation.
>
> Overall I LOVE what you're doing here, but I feel we can clarify things a
> little while we're at it to make it clear exactly what we're doing.
>
> This is a very important change so forgive my fiddling about here but I'm
> hoping we can take the opportunity to make things a little simpler!
>
> On Thu, Jun 05, 2025 at 04:00:58PM +0800, Baolin Wang wrote:
>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which
>> means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE
>> will still attempt to collapse into a Anon THP. This violates the rule we have
>> agreed upon: never means never.
>>
>> Another rule for madvise, referring to David's suggestion: “allowing for collapsing
>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>
> I'm generally not sure it's worth talking only about MADV_COLLAPSE here when
> you're changing what THP is permitted across the board, I may have missed some
> discussion and forgive me if so, but what is special about MADV_COLLAPSE's use
> of thp_vma_allowable_orders() that makes it ignore 'never's moreso than other
> users?
We found that MADV_COLLAPSE ignores the THP configuration, meaning that
even when THP is set to 'never', MADV_COLLAPSE can still collapse into
THPs (and mTHPs in the future). This is because when MADV_COLLAPSE calls
thp_vma_allowable_orders(), it does not set the TVA_ENFORCE_SYSFS flag,
which means it ignores the system-wide Anon THP sysfs settings.
So this patch set is aimed to fix the THP policy for MADV_COLLAPSE.
>> To address this issue, should check whether the Anon THP configuration is disabled
>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is set.
>>
>> In summary, the current strategy is:
>>
>> 1. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == false
>> (global THP settings are not enabled), it means mTHP of that orders are prohibited
>> from being used, then madvise_collapse() is forbidden for that orders.
>>
>> 2. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == true
>> (global THP settings are enabled), and inherit & orders == 0, it means mTHP of that
>> orders are still prohibited from being used, thus madvise_collapse() is not allowed
>> for that orders.
>
> OK so it's already confusing that the global settings only impact 'inherit'
> settings below, so they're not really global at all, but rather perhaps should
> be called 'inherited'.
>
> Maybe I need to submit a patch to rename thp_inherited_enabled(), or perhaps
> that'd just add to the confusion :P
>
> OK this is also not your fault just general commentary.
>
> Anyway, I feel points 1 and 2 can more succinctly be summed up as below,
> also there's no need to refer to the code, it's actually clearer I think to
> refer to the underlying logic:
>
> If no hugepage modes are enabled for the desired orders, nor can we
> enable them by inheriting from a 'global' enabled setting - then it
> must be the case that all desired orders either specify or inherit
> 'NEVER' - and we must abort.
OK. Thanks for helping me make it simpler:)
>>
>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>> include/linux/huge_mm.h | 23 +++++++++++++++++++----
>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 2f190c90192d..199ddc9f04a1 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
>> unsigned long orders)
>> {
>> /* Optimization to check if required orders are enabled early. */
>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>> + if (vma_is_anonymous(vma)) {
>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>> + unsigned long mask = always | madvise;
>> +
>> + /*
>> + * If the system-wide THP/mTHP sysfs settings are disabled,
>> + * then we should never allow hugepages.
>> + */
>> + if (!(mask & orders) && !(hugepage_global_enabled() && (inherit & orders)))
>> + return 0;
>> +
>> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
>> + goto skip;
>>
>> + mask = always;
>> if (vm_flags & VM_HUGEPAGE)
>> - mask |= READ_ONCE(huge_anon_orders_madvise);
>> + mask |= madvise;
>> if (hugepage_global_always() ||
>> ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
>> - mask |= READ_ONCE(huge_anon_orders_inherit);
>> + mask |= inherit;
>>
>> orders &= mask;
>> if (!orders)
>> return 0;
>> }
>>
>> +skip:
>> return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
>> }
>
> I feel this is compressing a lot of logic in a way that took me several
> readings to understand (hey I might not be the smartest cookie in the jar,
> but we need to account for all levels of kernel developer ;)
>
> I feel like we can make things a lot clearer here by separating out with a
> helper function (means we can drop some indentation too), and also take
> advantage of the fact that, if orders == 0, __thp_vma_allowable_orders()
> exits with 0 early so no need for us to do so ourselves:
Sure. Looks good to me. Thanks.
> /* Strictly mask requested anonymous orders according to sysfs settings. */
> static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> unsigned long tva_flags, unsigned long orders)
> {
> unsigned long always = READ_ONCE(huge_anon_orders_always);
> unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
> bool inherit_enabled = hugepage_global_enabled();
> bool has_madvise = vm_flags & VM_HUGEPAGE;
> unsigned long mask = always | madvise;
>
> mask = always | madvise;
> if (inherit_enabled)
> mask |= inherit;
>
> /* All set to/inherit NEVER - never means never globally, abort. */
> if (!(mask & orders))
> return 0;
>
> /* Otherwise, we only enforce sysfs settings if asked. */
> if (!(tva_flags & TVA_ENFORCE_SYSFS))
> return orders;
>
> mask = always;
> if (has_madvise)
> mask |= madvise;
> if (hugepage_global_always() || (has_madvise && inherit_enabled))
> mask |= inherit;
>
> return orders & mask;
> }
>
> ...
>
> static inline
> unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> unsigned long vm_flags,
> unsigned long tva_flags,
> unsigned long orders)
> {
> if (vma_is_anonymous(vma))
> orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
>
> return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
> }
On Mon, Jun 09, 2025 at 02:10:12PM +0800, Baolin Wang wrote:
>
>
> On 2025/6/7 19:55, Lorenzo Stoakes wrote:
> > Not related to your patch at all, but man this whole thing (thp allowed orders)
> > needs significant improvement, it seems always perversely complicated for a
> > relatively simple operation.
> >
> > Overall I LOVE what you're doing here, but I feel we can clarify things a
> > little while we're at it to make it clear exactly what we're doing.
> >
> > This is a very important change so forgive my fiddling about here but I'm
> > hoping we can take the opportunity to make things a little simpler!
> >
> > On Thu, Jun 05, 2025 at 04:00:58PM +0800, Baolin Wang wrote:
> > > The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which
> > > means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE
> > > will still attempt to collapse into a Anon THP. This violates the rule we have
> > > agreed upon: never means never.
> > >
> > > Another rule for madvise, referring to David's suggestion: “allowing for collapsing
> > > in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
> >
> > I'm generally not sure it's worth talking only about MADV_COLLAPSE here when
> > you're changing what THP is permitted across the board, I may have missed some
> > discussion and forgive me if so, but what is special about MADV_COLLAPSE's use
> > of thp_vma_allowable_orders() that makes it ignore 'never's moreso than other
> > users?
>
> We found that MADV_COLLAPSE ignores the THP configuration, meaning that even
> when THP is set to 'never', MADV_COLLAPSE can still collapse into THPs (and
> mTHPs in the future). This is because when MADV_COLLAPSE calls
> thp_vma_allowable_orders(), it does not set the TVA_ENFORCE_SYSFS flag,
> which means it ignores the system-wide Anon THP sysfs settings.
>
> So this patch set is aimed to fix the THP policy for MADV_COLLAPSE.
>
Yeah of course, and this is exactly why, but what I mean is, the patch
doesn't explicitly address MADV_COLLAPSE, it addresses a case that
MADV_COLLAPSE uses (which is as you say the motivating cause for the
change).
So I think the commit message should rather open something like:
If, when invoking thp_vma_allowable_orders(), the TVA_ENFORCE_SYSFS
flag is not specified, we ignore sysfs TLB settings.
Whilst it makes sense for the callers who do not specify this flag,
it creates a odd and surprising situation where a sysadmin
specifying 'never' for all THP sizes still observing THP pages
being allocated and used on the system.
The motivating case for this is MADV_COLLAPSE, <blah blah blah> :)
> > > To address this issue, should check whether the Anon THP configuration is disabled
> > > in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is set.
> > >
> > > In summary, the current strategy is:
> > >
> > > 1. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == false
> > > (global THP settings are not enabled), it means mTHP of that orders are prohibited
> > > from being used, then madvise_collapse() is forbidden for that orders.
> > >
> > > 2. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == true
> > > (global THP settings are enabled), and inherit & orders == 0, it means mTHP of that
> > > orders are still prohibited from being used, thus madvise_collapse() is not allowed
> > > for that orders.
> >
> > OK so it's already confusing that the global settings only impact 'inherit'
> > settings below, so they're not really global at all, but rather perhaps should
> > be called 'inherited'.
> >
> > Maybe I need to submit a patch to rename thp_inherited_enabled(), or perhaps
> > that'd just add to the confusion :P
> >
> > OK this is also not your fault just general commentary.
> >
> > Anyway, I feel points 1 and 2 can more succinctly be summed up as below,
> > also there's no need to refer to the code, it's actually clearer I think to
> > refer to the underlying logic:
> >
> > If no hugepage modes are enabled for the desired orders, nor can we
> > enable them by inheriting from a 'global' enabled setting - then it
> > must be the case that all desired orders either specify or inherit
> > 'NEVER' - and we must abort.
>
> OK. Thanks for helping me make it simpler:)
>
Thanks :)
> > >
> > > Reviewed-by: Zi Yan <ziy@nvidia.com>
> > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > > ---
> > > include/linux/huge_mm.h | 23 +++++++++++++++++++----
> > > 1 file changed, 19 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > index 2f190c90192d..199ddc9f04a1 100644
> > > --- a/include/linux/huge_mm.h
> > > +++ b/include/linux/huge_mm.h
> > > @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> > > unsigned long orders)
> > > {
> > > /* Optimization to check if required orders are enabled early. */
> > > - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> > > - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> > > + if (vma_is_anonymous(vma)) {
> > > + unsigned long always = READ_ONCE(huge_anon_orders_always);
> > > + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > > + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> > > + unsigned long mask = always | madvise;
> > > +
> > > + /*
> > > + * If the system-wide THP/mTHP sysfs settings are disabled,
> > > + * then we should never allow hugepages.
> > > + */
> > > + if (!(mask & orders) && !(hugepage_global_enabled() && (inherit & orders)))
> > > + return 0;
> > > +
> > > + if (!(tva_flags & TVA_ENFORCE_SYSFS))
> > > + goto skip;
> > >
> > > + mask = always;
> > > if (vm_flags & VM_HUGEPAGE)
> > > - mask |= READ_ONCE(huge_anon_orders_madvise);
> > > + mask |= madvise;
> > > if (hugepage_global_always() ||
> > > ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
> > > - mask |= READ_ONCE(huge_anon_orders_inherit);
> > > + mask |= inherit;
> > >
> > > orders &= mask;
> > > if (!orders)
> > > return 0;
> > > }
> > >
> > > +skip:
> > > return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
> > > }
> >
> > I feel this is compressing a lot of logic in a way that took me several
> > readings to understand (hey I might not be the smartest cookie in the jar,
> > but we need to account for all levels of kernel developer ;)
> >
> > I feel like we can make things a lot clearer here by separating out with a
> > helper function (means we can drop some indentation too), and also take
> > advantage of the fact that, if orders == 0, __thp_vma_allowable_orders()
> > exits with 0 early so no need for us to do so ourselves:
>
> Sure. Looks good to me. Thanks.
Great thanks!
>
> > /* Strictly mask requested anonymous orders according to sysfs settings. */
> > static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> > unsigned long tva_flags, unsigned long orders)
> > {
> > unsigned long always = READ_ONCE(huge_anon_orders_always);
> > unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
> > bool inherit_enabled = hugepage_global_enabled();
> > bool has_madvise = vm_flags & VM_HUGEPAGE;
> > unsigned long mask = always | madvise;
> >
> > mask = always | madvise;
> > if (inherit_enabled)
> > mask |= inherit;
> >
> > /* All set to/inherit NEVER - never means never globally, abort. */
> > if (!(mask & orders))
> > return 0;
> >
> > /* Otherwise, we only enforce sysfs settings if asked. */
> > if (!(tva_flags & TVA_ENFORCE_SYSFS))
> > return orders;
> >
> > mask = always;
> > if (has_madvise)
> > mask |= madvise;
> > if (hugepage_global_always() || (has_madvise && inherit_enabled))
> > mask |= inherit;
> >
> > return orders & mask;
> > }
> >
> > ...
> >
> > static inline
> > unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> > unsigned long vm_flags,
> > unsigned long tva_flags,
> > unsigned long orders)
> > {
> > if (vma_is_anonymous(vma))
> > orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
> >
> > return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
> > }
>
>
On 2025/6/9 23:17, Lorenzo Stoakes wrote: > On Mon, Jun 09, 2025 at 02:10:12PM +0800, Baolin Wang wrote: >> >> >> On 2025/6/7 19:55, Lorenzo Stoakes wrote: >>> Not related to your patch at all, but man this whole thing (thp allowed orders) >>> needs significant improvement, it seems always perversely complicated for a >>> relatively simple operation. >>> >>> Overall I LOVE what you're doing here, but I feel we can clarify things a >>> little while we're at it to make it clear exactly what we're doing. >>> >>> This is a very important change so forgive my fiddling about here but I'm >>> hoping we can take the opportunity to make things a little simpler! >>> >>> On Thu, Jun 05, 2025 at 04:00:58PM +0800, Baolin Wang wrote: >>>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which >>>> means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE >>>> will still attempt to collapse into a Anon THP. This violates the rule we have >>>> agreed upon: never means never. >>>> >>>> Another rule for madvise, referring to David's suggestion: “allowing for collapsing >>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine". >>> >>> I'm generally not sure it's worth talking only about MADV_COLLAPSE here when >>> you're changing what THP is permitted across the board, I may have missed some >>> discussion and forgive me if so, but what is special about MADV_COLLAPSE's use >>> of thp_vma_allowable_orders() that makes it ignore 'never's moreso than other >>> users? >> >> We found that MADV_COLLAPSE ignores the THP configuration, meaning that even >> when THP is set to 'never', MADV_COLLAPSE can still collapse into THPs (and >> mTHPs in the future). This is because when MADV_COLLAPSE calls >> thp_vma_allowable_orders(), it does not set the TVA_ENFORCE_SYSFS flag, >> which means it ignores the system-wide Anon THP sysfs settings. >> >> So this patch set is aimed to fix the THP policy for MADV_COLLAPSE. >> > > Yeah of course, and this is exactly why, but what I mean is, the patch > doesn't explicitly address MADV_COLLAPSE, it addresses a case that > MADV_COLLAPSE uses (which is as you say the motivating cause for the > change). > > So I think the commit message should rather open something like: > > If, when invoking thp_vma_allowable_orders(), the TVA_ENFORCE_SYSFS > flag is not specified, we ignore sysfs TLB settings. > > Whilst it makes sense for the callers who do not specify this flag, > it creates a odd and surprising situation where a sysadmin > specifying 'never' for all THP sizes still observing THP pages > being allocated and used on the system. > > The motivating case for this is MADV_COLLAPSE, <blah blah blah> :) OK. Will update the commit message. Thanks.
A couple follow up points that occurred to me:
On Sat, Jun 07, 2025 at 12:55:19PM +0100, Lorenzo Stoakes wrote:
> Not related to your patch at all, but man this whole thing (thp allowed orders)
> needs significant improvement, it seems always perversely complicated for a
> relatively simple operation.
>
> Overall I LOVE what you're doing here, but I feel we can clarify things a
> little while we're at it to make it clear exactly what we're doing.
>
> This is a very important change so forgive my fiddling about here but I'm
> hoping we can take the opportunity to make things a little simpler!
>
> On Thu, Jun 05, 2025 at 04:00:58PM +0800, Baolin Wang wrote:
> > The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which
> > means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE
> > will still attempt to collapse into a Anon THP. This violates the rule we have
> > agreed upon: never means never.
> >
> > Another rule for madvise, referring to David's suggestion: “allowing for collapsing
> > in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>
> I'm generally not sure it's worth talking only about MADV_COLLAPSE here when
> you're changing what THP is permitted across the board, I may have missed some
> discussion and forgive me if so, but what is special about MADV_COLLAPSE's use
> of thp_vma_allowable_orders() that makes it ignore 'never's moreso than other
> users?
I'd mention that MADV_COLLAPSE is special because of not specifying
TVA_ENFORCE_SYSFS but you are making this change across the board for all
callers who do not specify this.
I'd also CLEARLY mention that you handle David's request re: madvise by
restricting yourself to checking only for NEVER and retaining the existing logic
of not enforcing sysfs settings when TVA_ENFORCE_SYSFS, which includes not
checking the VMA for VM_HUGEPAGE if the madvise mode is enabled.
(i.e. addressing David's request).
[snip]
> I feel this is compressing a lot of logic in a way that took me several
> readings to understand (hey I might not be the smartest cookie in the jar,
> but we need to account for all levels of kernel developer ;)
>
> I feel like we can make things a lot clearer here by separating out with a
> helper function (means we can drop some indentation too), and also take
> advantage of the fact that, if orders == 0, __thp_vma_allowable_orders()
> exits with 0 early so no need for us to do so ourselves:
>
> /* Strictly mask requested anonymous orders according to sysfs settings. */
> static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> unsigned long tva_flags, unsigned long orders)
> {
> unsigned long always = READ_ONCE(huge_anon_orders_always);
> unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
> bool inherit_enabled = hugepage_global_enabled();
> bool has_madvise = vm_flags & VM_HUGEPAGE;
> unsigned long mask = always | madvise;
>
> mask = always | madvise;
> if (inherit_enabled)
> mask |= inherit;
>
> /* All set to/inherit NEVER - never means never globally, abort. */
> if (!(mask & orders))
> return 0;
>
> /* Otherwise, we only enforce sysfs settings if asked. */
Perhaps worth adding a comment here noting that, if the user sets a sysfs mode
of madvise and if TVA_ENFORCE_SYSFS is not set, we don't bother checking whether
the VMA has VM_HUGEPAGE set.
> if (!(tva_flags & TVA_ENFORCE_SYSFS))
> return orders;
>
> mask = always;
> if (has_madvise)
> mask |= madvise;
> if (hugepage_global_always() || (has_madvise && inherit_enabled))
> mask |= inherit;
>
> return orders & mask;
> }
>
> ...
>
> static inline
> unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> unsigned long vm_flags,
> unsigned long tva_flags,
> unsigned long orders)
> {
> if (vma_is_anonymous(vma))
> orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
>
> return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
> }
>
> >
> > --
> > 2.43.5
> >
On 2025/6/7 20:21, Lorenzo Stoakes wrote:
> A couple follow up points that occurred to me:
>
> On Sat, Jun 07, 2025 at 12:55:19PM +0100, Lorenzo Stoakes wrote:
>> Not related to your patch at all, but man this whole thing (thp allowed orders)
>> needs significant improvement, it seems always perversely complicated for a
>> relatively simple operation.
>>
>> Overall I LOVE what you're doing here, but I feel we can clarify things a
>> little while we're at it to make it clear exactly what we're doing.
>>
>> This is a very important change so forgive my fiddling about here but I'm
>> hoping we can take the opportunity to make things a little simpler!
>>
>> On Thu, Jun 05, 2025 at 04:00:58PM +0800, Baolin Wang wrote:
>>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which
>>> means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE
>>> will still attempt to collapse into a Anon THP. This violates the rule we have
>>> agreed upon: never means never.
>>>
>>> Another rule for madvise, referring to David's suggestion: “allowing for collapsing
>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>
>> I'm generally not sure it's worth talking only about MADV_COLLAPSE here when
>> you're changing what THP is permitted across the board, I may have missed some
>> discussion and forgive me if so, but what is special about MADV_COLLAPSE's use
>> of thp_vma_allowable_orders() that makes it ignore 'never's moreso than other
>> users?
>
> I'd mention that MADV_COLLAPSE is special because of not specifying
> TVA_ENFORCE_SYSFS but you are making this change across the board for all
> callers who do not specify this.
Currently, besides MADV_COLLAPSE not setting TVA_ENFORCE_SYSFS, there is
only one other instance where TVA_ENFORCE_SYSFS is not set, which is in
the collapse_pte_mapped_thp() function, but I believe this is reasonable
from the comments:
/*
* If we are here, we've succeeded in replacing all the native
pages
* in the page cache with a single hugepage. If a mm were to
fault-in
* this memory (mapped by a suitably aligned VMA), we'd get the
hugepage
* and map it by a PMD, regardless of sysfs THP settings. As
such, let's
* analogously elide sysfs THP settings here.
*/
if (!thp_vma_allowable_order(vma, vma->vm_flags, 0, PMD_ORDER))
return SCAN_VMA_CHECK;
>
> I'd also CLEARLY mention that you handle David's request re: madvise by
> restricting yourself to checking only for NEVER and retaining the existing logic
> of not enforcing sysfs settings when TVA_ENFORCE_SYSFS, which includes not
> checking the VMA for VM_HUGEPAGE if the madvise mode is enabled.
Sure.
>
> (i.e. addressing David's request).
>
> [snip]
>
>> I feel this is compressing a lot of logic in a way that took me several
>> readings to understand (hey I might not be the smartest cookie in the jar,
>> but we need to account for all levels of kernel developer ;)
>>
>> I feel like we can make things a lot clearer here by separating out with a
>> helper function (means we can drop some indentation too), and also take
>> advantage of the fact that, if orders == 0, __thp_vma_allowable_orders()
>> exits with 0 early so no need for us to do so ourselves:
>>
>> /* Strictly mask requested anonymous orders according to sysfs settings. */
>> static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
>> unsigned long tva_flags, unsigned long orders)
>> {
>> unsigned long always = READ_ONCE(huge_anon_orders_always);
>> unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>> unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
>> bool inherit_enabled = hugepage_global_enabled();
>> bool has_madvise = vm_flags & VM_HUGEPAGE;
>> unsigned long mask = always | madvise;
>>
>> mask = always | madvise;
>> if (inherit_enabled)
>> mask |= inherit;
>>
>> /* All set to/inherit NEVER - never means never globally, abort. */
>> if (!(mask & orders))
>> return 0;
>>
>> /* Otherwise, we only enforce sysfs settings if asked. */
>
> Perhaps worth adding a comment here noting that, if the user sets a sysfs mode
> of madvise and if TVA_ENFORCE_SYSFS is not set, we don't bother checking whether
> the VMA has VM_HUGEPAGE set.
Sure, will do. Thanks for reviewing.
On Mon, Jun 09, 2025 at 02:18:07PM +0800, Baolin Wang wrote:
>
>
> On 2025/6/7 20:21, Lorenzo Stoakes wrote:
> > A couple follow up points that occurred to me:
> >
> > On Sat, Jun 07, 2025 at 12:55:19PM +0100, Lorenzo Stoakes wrote:
> > > Not related to your patch at all, but man this whole thing (thp allowed orders)
> > > needs significant improvement, it seems always perversely complicated for a
> > > relatively simple operation.
> > >
> > > Overall I LOVE what you're doing here, but I feel we can clarify things a
> > > little while we're at it to make it clear exactly what we're doing.
> > >
> > > This is a very important change so forgive my fiddling about here but I'm
> > > hoping we can take the opportunity to make things a little simpler!
> > >
> > > On Thu, Jun 05, 2025 at 04:00:58PM +0800, Baolin Wang wrote:
> > > > The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which
> > > > means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE
> > > > will still attempt to collapse into a Anon THP. This violates the rule we have
> > > > agreed upon: never means never.
> > > >
> > > > Another rule for madvise, referring to David's suggestion: “allowing for collapsing
> > > > in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
> > >
> > > I'm generally not sure it's worth talking only about MADV_COLLAPSE here when
> > > you're changing what THP is permitted across the board, I may have missed some
> > > discussion and forgive me if so, but what is special about MADV_COLLAPSE's use
> > > of thp_vma_allowable_orders() that makes it ignore 'never's moreso than other
> > > users?
> >
> > I'd mention that MADV_COLLAPSE is special because of not specifying
> > TVA_ENFORCE_SYSFS but you are making this change across the board for all
> > callers who do not specify this.
>
> Currently, besides MADV_COLLAPSE not setting TVA_ENFORCE_SYSFS, there is
> only one other instance where TVA_ENFORCE_SYSFS is not set, which is in the
> collapse_pte_mapped_thp() function, but I believe this is reasonable from
> the comments:
>
> /*
> * If we are here, we've succeeded in replacing all the native pages
> * in the page cache with a single hugepage. If a mm were to
> fault-in
> * this memory (mapped by a suitably aligned VMA), we'd get the
> hugepage
> * and map it by a PMD, regardless of sysfs THP settings. As such,
> let's
> * analogously elide sysfs THP settings here.
> */
> if (!thp_vma_allowable_order(vma, vma->vm_flags, 0, PMD_ORDER))
> return SCAN_VMA_CHECK;
Thanks for referencing that - it's a nicely worded comment that does explain it
indeed :)
Definitely worth mentioning that in the commit msg I think.
>
> >
> > I'd also CLEARLY mention that you handle David's request re: madvise by
> > restricting yourself to checking only for NEVER and retaining the existing logic
> > of not enforcing sysfs settings when TVA_ENFORCE_SYSFS, which includes not
> > checking the VMA for VM_HUGEPAGE if the madvise mode is enabled.
>
> Sure.
Thanks!
>
> >
> > (i.e. addressing David's request).
> >
> > [snip]
> >
> > > I feel this is compressing a lot of logic in a way that took me several
> > > readings to understand (hey I might not be the smartest cookie in the jar,
> > > but we need to account for all levels of kernel developer ;)
> > >
> > > I feel like we can make things a lot clearer here by separating out with a
> > > helper function (means we can drop some indentation too), and also take
> > > advantage of the fact that, if orders == 0, __thp_vma_allowable_orders()
> > > exits with 0 early so no need for us to do so ourselves:
> > >
> > > /* Strictly mask requested anonymous orders according to sysfs settings. */
> > > static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> > > unsigned long tva_flags, unsigned long orders)
> > > {
> > > unsigned long always = READ_ONCE(huge_anon_orders_always);
> > > unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > > unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
> > > bool inherit_enabled = hugepage_global_enabled();
> > > bool has_madvise = vm_flags & VM_HUGEPAGE;
> > > unsigned long mask = always | madvise;
> > >
> > > mask = always | madvise;
> > > if (inherit_enabled)
> > > mask |= inherit;
> > >
> > > /* All set to/inherit NEVER - never means never globally, abort. */
> > > if (!(mask & orders))
> > > return 0;
> > >
> > > /* Otherwise, we only enforce sysfs settings if asked. */
> >
> > Perhaps worth adding a comment here noting that, if the user sets a sysfs mode
> > of madvise and if TVA_ENFORCE_SYSFS is not set, we don't bother checking whether
> > the VMA has VM_HUGEPAGE set.
>
> Sure, will do. Thanks for reviewing.
Thanks!
On 05/06/25 1:30 pm, Baolin Wang wrote:
> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which
> means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE
> will still attempt to collapse into a Anon THP. This violates the rule we have
> agreed upon: never means never.
>
> Another rule for madvise, referring to David's suggestion: “allowing for collapsing
> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>
> To address this issue, should check whether the Anon THP configuration is disabled
> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is set.
Did you mean to say "even when the TVA_ENFORCE_SYSFS flag is *not* set"? Because if
is set, then we have to check the anon THP sysfs config.
>
> In summary, the current strategy is:
>
> 1. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == false
> (global THP settings are not enabled), it means mTHP of that orders are prohibited
> from being used, then madvise_collapse() is forbidden for that orders.
>
> 2. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == true
> (global THP settings are enabled), and inherit & orders == 0, it means mTHP of that
> orders are still prohibited from being used, thus madvise_collapse() is not allowed
> for that orders.
>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
> include/linux/huge_mm.h | 23 +++++++++++++++++++----
> 1 file changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 2f190c90192d..199ddc9f04a1 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> unsigned long orders)
> {
> /* Optimization to check if required orders are enabled early. */
> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> + if (vma_is_anonymous(vma)) {
> + unsigned long always = READ_ONCE(huge_anon_orders_always);
> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> + unsigned long mask = always | madvise;
> +
> + /*
> + * If the system-wide THP/mTHP sysfs settings are disabled,
> + * then we should never allow hugepages.
> + */
> + if (!(mask & orders) && !(hugepage_global_enabled() && (inherit & orders)))
> + return 0;
> +
> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
> + goto skip;
>
> + mask = always;
> if (vm_flags & VM_HUGEPAGE)
> - mask |= READ_ONCE(huge_anon_orders_madvise);
> + mask |= madvise;
> if (hugepage_global_always() ||
> ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
> - mask |= READ_ONCE(huge_anon_orders_inherit);
> + mask |= inherit;
>
> orders &= mask;
> if (!orders)
> return 0;
> }
>
> +skip:
> return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
> }
>
On 06/06/25 10:19 pm, Dev Jain wrote:
>
> On 05/06/25 1:30 pm, Baolin Wang wrote:
>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs
>> settings, which
>> means that even though we have disabled the Anon THP configuration,
>> MADV_COLLAPSE
>> will still attempt to collapse into a Anon THP. This violates the
>> rule we have
>> agreed upon: never means never.
>>
>> Another rule for madvise, referring to David's suggestion: “allowing
>> for collapsing
>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>
>> To address this issue, should check whether the Anon THP
>> configuration is disabled
>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag
>> is set.
>
> Did you mean to say "even when the TVA_ENFORCE_SYSFS flag is *not*
> set"? Because if
> is set, then we have to check the anon THP sysfs config.
Otherwise the patch itself looks good to me, so:
Reviewed-by: Dev Jain <dev.jain@arm.com>
>
>>
>> In summary, the current strategy is:
>>
>> 1. If always & orders == 0, and madvise & orders == 0, and
>> hugepage_global_enabled() == false
>> (global THP settings are not enabled), it means mTHP of that orders
>> are prohibited
>> from being used, then madvise_collapse() is forbidden for that orders.
>>
>> 2. If always & orders == 0, and madvise & orders == 0, and
>> hugepage_global_enabled() == true
>> (global THP settings are enabled), and inherit & orders == 0, it
>> means mTHP of that
>> orders are still prohibited from being used, thus madvise_collapse()
>> is not allowed
>> for that orders.
>>
>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>> include/linux/huge_mm.h | 23 +++++++++++++++++++----
>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 2f190c90192d..199ddc9f04a1 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct
>> vm_area_struct *vma,
>> unsigned long orders)
>> {
>> /* Optimization to check if required orders are enabled early. */
>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>> + if (vma_is_anonymous(vma)) {
>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>> + unsigned long mask = always | madvise;
>> +
>> + /*
>> + * If the system-wide THP/mTHP sysfs settings are disabled,
>> + * then we should never allow hugepages.
>> + */
>> + if (!(mask & orders) && !(hugepage_global_enabled() &&
>> (inherit & orders)))
>> + return 0;
>> +
>> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
>> + goto skip;
>> + mask = always;
>> if (vm_flags & VM_HUGEPAGE)
>> - mask |= READ_ONCE(huge_anon_orders_madvise);
>> + mask |= madvise;
>> if (hugepage_global_always() ||
>> ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
>> - mask |= READ_ONCE(huge_anon_orders_inherit);
>> + mask |= inherit;
>> orders &= mask;
>> if (!orders)
>> return 0;
>> }
>> +skip:
>> return __thp_vma_allowable_orders(vma, vm_flags, tva_flags,
>> orders);
>> }
>
On 2025/6/7 02:47, Dev Jain wrote: > > On 06/06/25 10:19 pm, Dev Jain wrote: >> >> On 05/06/25 1:30 pm, Baolin Wang wrote: >>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs >>> settings, which >>> means that even though we have disabled the Anon THP configuration, >>> MADV_COLLAPSE >>> will still attempt to collapse into a Anon THP. This violates the >>> rule we have >>> agreed upon: never means never. >>> >>> Another rule for madvise, referring to David's suggestion: “allowing >>> for collapsing >>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine". >>> >>> To address this issue, should check whether the Anon THP >>> configuration is disabled >>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag >>> is set. >> >> Did you mean to say "even when the TVA_ENFORCE_SYSFS flag is *not* >> set"? Because if >> is set, then we have to check the anon THP sysfs config. Sorry for the confusion. What I mean is that we should also check whether the Anon THP is disabled when the TVA_ENFORCE_SYSFS flag is set. Will update the commit message. > Otherwise the patch itself looks good to me, so: > > Reviewed-by: Dev Jain <dev.jain@arm.com> Thanks for reviewing.
© 2016 - 2025 Red Hat, Inc.