Refactor enabled_store() to use a new change_enabled() helper.
Introduce a separate enum global_enabled_mode and
global_enabled_mode_strings[], mirroring the anon_enabled_mode
pattern from the previous commit.
A separate enum is necessary because the global THP setting does
not support "inherit", only "always", "madvise", and "never".
Reusing anon_enabled_mode would leave a NULL gap in the string
array, causing sysfs_match_string() to stop early and fail to
match entries after the gap.
The helper uses the same loop pattern as set_anon_enabled_mode(),
iterating over an array of flag bit positions and using
__test_and_set_bit()/__test_and_clear_bit() to track whether the state
actually changed.
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
mm/huge_memory.c | 63 ++++++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 48 insertions(+), 15 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f6af90e6cf05d..8949393de7a51 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -330,30 +330,63 @@ static const char * const anon_enabled_mode_strings[] = {
[ANON_ENABLED_NEVER] = "never",
};
+enum global_enabled_mode {
+ GLOBAL_ENABLED_ALWAYS = 0,
+ GLOBAL_ENABLED_MADVISE = 1,
+ GLOBAL_ENABLED_NEVER = 2,
+};
+
+static const char * const global_enabled_mode_strings[] = {
+ [GLOBAL_ENABLED_ALWAYS] = "always",
+ [GLOBAL_ENABLED_MADVISE] = "madvise",
+ [GLOBAL_ENABLED_NEVER] = "never",
+};
+
+static bool set_global_enabled_mode(enum global_enabled_mode mode)
+{
+ static const unsigned long thp_flags[] = {
+ TRANSPARENT_HUGEPAGE_FLAG,
+ TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG,
+ };
+ enum global_enabled_mode m;
+ bool changed = false;
+
+ for (m = 0; m < ARRAY_SIZE(thp_flags); m++) {
+ if (m == mode)
+ changed |= !__test_and_set_bit(thp_flags[m],
+ &transparent_hugepage_flags);
+ else
+ changed |= __test_and_clear_bit(thp_flags[m],
+ &transparent_hugepage_flags);
+ }
+
+ return changed;
+}
+
static ssize_t enabled_store(struct kobject *kobj,
struct kobj_attribute *attr,
const char *buf, size_t count)
{
- ssize_t ret = count;
+ int mode;
- if (sysfs_streq(buf, "always")) {
- clear_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags);
- set_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags);
- } else if (sysfs_streq(buf, "madvise")) {
- clear_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags);
- set_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags);
- } else if (sysfs_streq(buf, "never")) {
- clear_bit(TRANSPARENT_HUGEPAGE_FLAG, &transparent_hugepage_flags);
- clear_bit(TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, &transparent_hugepage_flags);
- } else
- ret = -EINVAL;
+ mode = sysfs_match_string(global_enabled_mode_strings, buf);
+ if (mode < 0)
+ return -EINVAL;
- if (ret > 0) {
+ if (set_global_enabled_mode(mode)) {
int err = start_stop_khugepaged();
+
if (err)
- ret = err;
+ return err;
+ } else {
+ /*
+ * Recalculate watermarks even when the mode didn't
+ * change, as the previous code always called
+ * start_stop_khugepaged() which does this internally.
+ */
+ set_recommended_min_free_kbytes();
}
- return ret;
+ return count;
}
static struct kobj_attribute enabled_attr = __ATTR_RW(enabled);
--
2.52.0
Hello,
On 3/11/2026 3:17 AM, Breno Leitao wrote:
> +static bool set_global_enabled_mode(enum global_enabled_mode mode)
> +{
> + static const unsigned long thp_flags[] = {
> + TRANSPARENT_HUGEPAGE_FLAG,
> + TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG,
> + };
> + enum global_enabled_mode m;
> + bool changed = false;
> +
> + for (m = 0; m < ARRAY_SIZE(thp_flags); m++) {
> + if (m == mode)
> + changed |= !__test_and_set_bit(thp_flags[m],
> + &transparent_hugepage_flags);
> + else
> + changed |= __test_and_clear_bit(thp_flags[m],
> + &transparent_hugepage_flags);
I am not a mm expert and typically do not follow the mm list. Is there
an issue with the usage of non-atomic variants here? The commit message
says this uses the same pattern as set_anon_enabled_mode().
However, set_anon_enabled_mode() has a spinlock=>huge_anon_orders_lock
protecting the access. But, transparent_hugepage_flags seems to be
unprotected in that regard.
IIUC, David's suggestion to use the lockless version was also based on
the use of a lock in that context.
https://lore.kernel.org/all/4f2abf42-983f-4cc2-92f5-c81827e7b7e2@kernel.org/
Note: We ran into this issue while experimenting with an AI review agent
internally. The patch was selected at random from the mailing list. I
analyzed the issue independently and came to the same conclusion. I
apologize if this is a false alarm.
> + }
> +
> + return changed;
> +}
> +
Hello Sohil,
On Fri, Mar 13, 2026 at 03:31:25PM -0700, Sohil Mehta wrote:
> Hello,
>
> On 3/11/2026 3:17 AM, Breno Leitao wrote:
>
> > +static bool set_global_enabled_mode(enum global_enabled_mode mode)
> > +{
> > + static const unsigned long thp_flags[] = {
> > + TRANSPARENT_HUGEPAGE_FLAG,
> > + TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG,
> > + };
> > + enum global_enabled_mode m;
> > + bool changed = false;
> > +
> > + for (m = 0; m < ARRAY_SIZE(thp_flags); m++) {
> > + if (m == mode)
> > + changed |= !__test_and_set_bit(thp_flags[m],
> > + &transparent_hugepage_flags);
> > + else
> > + changed |= __test_and_clear_bit(thp_flags[m],
> > + &transparent_hugepage_flags);
>
> I am not a mm expert and typically do not follow the mm list. Is there
> an issue with the usage of non-atomic variants here? The commit message
> says this uses the same pattern as set_anon_enabled_mode().
>
> However, set_anon_enabled_mode() has a spinlock=>huge_anon_orders_lock
> protecting the access. But, transparent_hugepage_flags seems to be
> unprotected in that regard.
I don't think that the atomic vs non-atomic will not help much, given
this is a compoud operation. Independently if this is atomic or not, it
is racy with anyone changing these fields (transparent_hugepage_flags).
In other words, Atomic ops make each individual bit flip safe, but
set_global_enabled_mode() and defrag_store() need to flip multiple bits
as a group. With atomic ops, two concurrent writers can still interleave
and leave the flags in an invalid state.
That said, Although I don't think this patch is making it worse, I think
the is a racy issue here that we can make better.
My suggestion is to move the rest of the helpers (defrag_store()) to use
sysfs_match_string(), and then create a thp_flags_lock spinlock to
protect operations against transparent_hugepage_flags. Any concern about
this approach?
Thanks for reviewing it,
--breno
On 3/16/2026 3:12 AM, Breno Leitao wrote:
>> I am not a mm expert and typically do not follow the mm list. Is there
>> an issue with the usage of non-atomic variants here? The commit message
>> says this uses the same pattern as set_anon_enabled_mode().
>>
>> However, set_anon_enabled_mode() has a spinlock=>huge_anon_orders_lock
>> protecting the access. But, transparent_hugepage_flags seems to be
>> unprotected in that regard.
>
> I don't think that the atomic vs non-atomic will not help much, given
> this is a compoud operation. Independently if this is atomic or not, it
> is racy with anyone changing these fields (transparent_hugepage_flags).
> In other words, Atomic ops make each individual bit flip safe, but
> set_global_enabled_mode() and defrag_store() need to flip multiple bits
> as a group. With atomic ops, two concurrent writers can still interleave
> and leave the flags in an invalid state.
You are right it is a compound operation. So, there is an existing issue
with two concurrent writers which can leave the flags in an invalid state.
But, I was wondering if there is a slightly different issue now due to
the non-atomic part. Some updates could be lost depending on the timing
of the operation.
For example,
CPU1 does: CPU2 does:
set_global_enabled_mode("always") defrag_store("always")
___test_and_set_bit():
// Trying to set bit 1
old = *p
// reads flags, sees defrag bit=0
set_bit(DEFRAG_DIRECT_FLAG)
// atomic: sets bit 3
*p = old | TRANSPARENT_HUGEPAGE_FLAG;
// writes back old value with bit 1 set
// DEFRAG_DIRECT_FLAG is lost (reverted to 0)
IIUC, this issue didn't exist before. I think it might be safer to use
the atomic test_and_set_bit() to be compatible with the older code.
Though, I'll leave it up to you as I don't have expertise here.
Overall, as you mentioned below, protecting transparent_hugepage_flags
with a spinlock seems like a better, long-term solution to me as well.
>
> That said, Although I don't think this patch is making it worse, I think
> the is a racy issue here that we can make better.
>
> My suggestion is to move the rest of the helpers (defrag_store()) to use
> sysfs_match_string(), and then create a thp_flags_lock spinlock to
> protect operations against transparent_hugepage_flags. Any concern about
> this approach?
>
On Mon, Mar 16, 2026 at 04:26:30PM -0700, Sohil Mehta wrote:
> On 3/16/2026 3:12 AM, Breno Leitao wrote:
>
> >> I am not a mm expert and typically do not follow the mm list. Is there
> >> an issue with the usage of non-atomic variants here? The commit message
> >> says this uses the same pattern as set_anon_enabled_mode().
> >>
> >> However, set_anon_enabled_mode() has a spinlock=>huge_anon_orders_lock
> >> protecting the access. But, transparent_hugepage_flags seems to be
> >> unprotected in that regard.
> >
> > I don't think that the atomic vs non-atomic will not help much, given
> > this is a compoud operation. Independently if this is atomic or not, it
> > is racy with anyone changing these fields (transparent_hugepage_flags).
> > In other words, Atomic ops make each individual bit flip safe, but
> > set_global_enabled_mode() and defrag_store() need to flip multiple bits
> > as a group. With atomic ops, two concurrent writers can still interleave
> > and leave the flags in an invalid state.
>
> You are right it is a compound operation. So, there is an existing issue
> with two concurrent writers which can leave the flags in an invalid state.
>
> But, I was wondering if there is a slightly different issue now due to
> the non-atomic part. Some updates could be lost depending on the timing
> of the operation.
>
> For example,
>
> CPU1 does: CPU2 does:
> set_global_enabled_mode("always") defrag_store("always")
>
> ___test_and_set_bit():
> // Trying to set bit 1
>
> old = *p
> // reads flags, sees defrag bit=0
>
> set_bit(DEFRAG_DIRECT_FLAG)
> // atomic: sets bit 3
>
>
> *p = old | TRANSPARENT_HUGEPAGE_FLAG;
> // writes back old value with bit 1 set
> // DEFRAG_DIRECT_FLAG is lost (reverted to 0)
>
> IIUC, this issue didn't exist before. I think it might be safer to use
> the atomic test_and_set_bit() to be compatible with the older code.
> Though, I'll leave it up to you as I don't have expertise here.
No, it's up to the maintainers :)
Given the above I think we should switch back to the atomic accessors.
We can address the broader issues with this horrible code in a separate
series.
>
> Overall, as you mentioned below, protecting transparent_hugepage_flags
> with a spinlock seems like a better, long-term solution to me as well.
Yeah, let's look at doing a follow up that cleans this up in general and
address that then.
>
> >
> > That said, Although I don't think this patch is making it worse, I think
> > the is a racy issue here that we can make better.
> >
> > My suggestion is to move the rest of the helpers (defrag_store()) to use
> > sysfs_match_string(), and then create a thp_flags_lock spinlock to
> > protect operations against transparent_hugepage_flags. Any concern about
> > this approach?
> >
>
>
Cheers, Lorenzo
On Tue, Mar 17, 2026 at 09:37:39AM +0000, Lorenzo Stoakes (Oracle) wrote:
> On Mon, Mar 16, 2026 at 04:26:30PM -0700, Sohil Mehta wrote:
> > On 3/16/2026 3:12 AM, Breno Leitao wrote:
> >
> > >> I am not a mm expert and typically do not follow the mm list. Is there
> > >> an issue with the usage of non-atomic variants here? The commit message
> > >> says this uses the same pattern as set_anon_enabled_mode().
> > >>
> > >> However, set_anon_enabled_mode() has a spinlock=>huge_anon_orders_lock
> > >> protecting the access. But, transparent_hugepage_flags seems to be
> > >> unprotected in that regard.
> > >
> > > I don't think that the atomic vs non-atomic will not help much, given
> > > this is a compoud operation. Independently if this is atomic or not, it
> > > is racy with anyone changing these fields (transparent_hugepage_flags).
> > > In other words, Atomic ops make each individual bit flip safe, but
> > > set_global_enabled_mode() and defrag_store() need to flip multiple bits
> > > as a group. With atomic ops, two concurrent writers can still interleave
> > > and leave the flags in an invalid state.
> >
> > You are right it is a compound operation. So, there is an existing issue
> > with two concurrent writers which can leave the flags in an invalid state.
> >
> > But, I was wondering if there is a slightly different issue now due to
> > the non-atomic part. Some updates could be lost depending on the timing
> > of the operation.
> >
> > For example,
> >
> > CPU1 does: CPU2 does:
> > set_global_enabled_mode("always") defrag_store("always")
> >
> > ___test_and_set_bit():
> > // Trying to set bit 1
> >
> > old = *p
> > // reads flags, sees defrag bit=0
> >
> > set_bit(DEFRAG_DIRECT_FLAG)
> > // atomic: sets bit 3
> >
> >
> > *p = old | TRANSPARENT_HUGEPAGE_FLAG;
>
> > // writes back old value with bit 1 set
> > // DEFRAG_DIRECT_FLAG is lost (reverted to 0)
> >
> > IIUC, this issue didn't exist before. I think it might be safer to use
> > the atomic test_and_set_bit() to be compatible with the older code.
> > Though, I'll leave it up to you as I don't have expertise here.
>
> No, it's up to the maintainers :)
>
> Given the above I think we should switch back to the atomic accessors.
>
> We can address the broader issues with this horrible code in a separate
> series.
Ack. let me respin then.
> > Overall, as you mentioned below, protecting transparent_hugepage_flags
> > with a spinlock seems like a better, long-term solution to me as well.
>
> Yeah, let's look at doing a follow up that cleans this up in general and
> address that then.
Sure. I am planning to improve defrag_store() as the next work, and then
come up with this additional spinlock for transparent_hugepage_flags.
On Tue, Mar 17, 2026 at 04:23:37AM -0700, Breno Leitao wrote: > On Tue, Mar 17, 2026 at 09:37:39AM +0000, Lorenzo Stoakes (Oracle) wrote: > > > > Given the above I think we should switch back to the atomic accessors. > > > > We can address the broader issues with this horrible code in a separate > > series. > > Ack. let me respin then. Thanks! > > > > Overall, as you mentioned below, protecting transparent_hugepage_flags > > > with a spinlock seems like a better, long-term solution to me as well. > > > > Yeah, let's look at doing a follow up that cleans this up in general and > > address that then. > > Sure. I am planning to improve defrag_store() as the next work, and then > come up with this additional spinlock for transparent_hugepage_flags. > To be clear by the way by 'horrible code' I meant the existing logic with the globals etc. not your change which is positive and welcome :) Cheers, Lorenzo
On Tue, Mar 17, 2026 at 11:25:23AM +0000, Lorenzo Stoakes (Oracle) wrote: > On Tue, Mar 17, 2026 at 04:23:37AM -0700, Breno Leitao wrote: > > On Tue, Mar 17, 2026 at 09:37:39AM +0000, Lorenzo Stoakes (Oracle) wrote: > > > > > > Given the above I think we should switch back to the atomic accessors. > > > > > > We can address the broader issues with this horrible code in a separate > > > series. > > > > Ack. let me respin then. > > Thanks! > > > > > > > Overall, as you mentioned below, protecting transparent_hugepage_flags > > > > with a spinlock seems like a better, long-term solution to me as well. > > > > > > Yeah, let's look at doing a follow up that cleans this up in general and > > > address that then. > > > > Sure. I am planning to improve defrag_store() as the next work, and then > > come up with this additional spinlock for transparent_hugepage_flags. > > > > To be clear by the way by 'horrible code' I meant the existing logic with the > globals etc. not your change which is positive and welcome :) lol. Not once did it cross my mind that you might be referring to my changes. I have never written horrible code in my life. :-P
On 2026/3/11 18:17, Breno Leitao wrote: > Refactor enabled_store() to use a new change_enabled() helper. > Introduce a separate enum global_enabled_mode and > global_enabled_mode_strings[], mirroring the anon_enabled_mode > pattern from the previous commit. > > A separate enum is necessary because the global THP setting does > not support "inherit", only "always", "madvise", and "never". > Reusing anon_enabled_mode would leave a NULL gap in the string > array, causing sysfs_match_string() to stop early and fail to > match entries after the gap. > > The helper uses the same loop pattern as set_anon_enabled_mode(), > iterating over an array of flag bit positions and using > __test_and_set_bit()/__test_and_clear_bit() to track whether the state > actually changed. > > Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> > Reviewed-by: Zi Yan <ziy@nvidia.com> > Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> > Reviewed-by: Wei Yang <richard.weiyang@gmail.com> > Signed-off-by: Breno Leitao <leitao@debian.org> > --- Thanks! Tested-by: Lance Yang <lance.yang@linux.dev>
© 2016 - 2026 Red Hat, Inc.