mm/huge_memory.c | 48 ++++++++++++++++++++++++++++-------------------- 1 file changed, 28 insertions(+), 20 deletions(-)
Writing to /sys/kernel/mm/transparent_hugepage/enabled causes
start_stop_khugepaged() called independent of any change.
start_stop_khugepaged() SPAMs the printk ring buffer overflow with the
exact same message, even when nothing changes.
For instance, if you have a custom vm.min_free_kbytes, just touching
/sys/kernel/mm/transparent_hugepage/enabled causes a printk message.
Example:
# sysctl -w vm.min_free_kbytes=112382
# for i in $(seq 100); do echo never > /sys/kernel/mm/transparent_hugepage/enabled ; done
and you have 100 WARN messages like the following, which is pretty dull:
khugepaged: min_free_kbytes is not updated to 112381 because user defined value 112382 is preferred
A similar message shows up when setting thp to "always":
# for i in $(seq 100); do
# echo 1024 > /proc/sys/vm/min_free_kbytes
# echo always > /sys/kernel/mm/transparent_hugepage/enabled
# done
And then, we have 100K messages like:
khugepaged: raising min_free_kbytes from 1024 to 67584 to help transparent hugepage allocations
This is more common when you have a configuration management system that
writes the THP configuration without an extra read, assuming that
nothing will happen if there is no change in the configuration, but it
prints these annoying messages.
For instance, at Meta's fleet, ~10K servers were producing 3.5M of
these messages per day.
Fix this by doing two things:
* Make the sysfs _store helpers a no-op if there is no state change
* Ratelimit these messages to avoid SPAM on config change
---
Breno Leitao (2):
mm: thp: avoid calling start_stop_khugepaged() in anon_enabled_store()
mm: thp: avoid calling start_stop_khugepaged() in enabled_store()
mm/huge_memory.c | 48 ++++++++++++++++++++++++++++--------------------
1 file changed, 28 insertions(+), 20 deletions(-)
---
base-commit: 9dd5012f78d699f7a6051583dc53adeb401e28f0
change-id: 20260303-thp_logs-059d6b80f6d6
Best regards,
--
Breno Leitao <leitao@debian.org>
On Wed, Mar 04, 2026 at 02:22:32AM -0800, Breno Leitao wrote: > Writing to /sys/kernel/mm/transparent_hugepage/enabled causes > start_stop_khugepaged() called independent of any change. > start_stop_khugepaged() SPAMs the printk ring buffer overflow with the > exact same message, even when nothing changes. > > For instance, if you have a custom vm.min_free_kbytes, just touching > /sys/kernel/mm/transparent_hugepage/enabled causes a printk message. > Example: > > # sysctl -w vm.min_free_kbytes=112382 > # for i in $(seq 100); do echo never > /sys/kernel/mm/transparent_hugepage/enabled ; done > > and you have 100 WARN messages like the following, which is pretty dull: > > khugepaged: min_free_kbytes is not updated to 112381 because user defined value 112382 is preferred > > A similar message shows up when setting thp to "always": > > # for i in $(seq 100); do > # echo 1024 > /proc/sys/vm/min_free_kbytes > # echo always > /sys/kernel/mm/transparent_hugepage/enabled > # done > > And then, we have 100K messages like: You mean 100? :) It's 100 locally here. > > khugepaged: raising min_free_kbytes from 1024 to 67584 to help transparent hugepage allocations > > This is more common when you have a configuration management system that > writes the THP configuration without an extra read, assuming that > nothing will happen if there is no change in the configuration, but it > prints these annoying messages. Right. > > For instance, at Meta's fleet, ~10K servers were producing 3.5M of > these messages per day. Yeah... that's not so fun :) > > Fix this by doing two things: > * Make the sysfs _store helpers a no-op if there is no state change > * Ratelimit these messages to avoid SPAM on config change Thanks for taking a look at this! > > --- > Breno Leitao (2): > mm: thp: avoid calling start_stop_khugepaged() in anon_enabled_store() > mm: thp: avoid calling start_stop_khugepaged() in enabled_store() > > mm/huge_memory.c | 48 ++++++++++++++++++++++++++++-------------------- > 1 file changed, 28 insertions(+), 20 deletions(-) > --- > base-commit: 9dd5012f78d699f7a6051583dc53adeb401e28f0 > change-id: 20260303-thp_logs-059d6b80f6d6 > > Best regards, > -- > Breno Leitao <leitao@debian.org> > Cheers, Lorenzo
On Wed, Mar 04, 2026 at 02:22:32AM -0800, Breno Leitao wrote:
> Breno Leitao (2):
> mm: thp: avoid calling start_stop_khugepaged() in anon_enabled_store()
> mm: thp: avoid calling start_stop_khugepaged() in enabled_store()
I think the same can be achieved cleaner from within start_stop_khugepaged().
Completely untested patch is below.
One noticeable difference that with the patch we don't kick
khugepaged_wait if khugepaged_thread is there. But I don't think it
should make a difference.
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 1dd3cfca610d..80f818d3a094 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2683,18 +2683,18 @@ static void set_recommended_min_free_kbytes(void)
int start_stop_khugepaged(void)
{
- int err = 0;
+ guard(mutex)(&khugepaged_mutex);
+
+ /* Check if anything has to be done */
+ if (hugepage_pmd_enabled() == !!khugepaged_thread)
+ return 0;
- mutex_lock(&khugepaged_mutex);
if (hugepage_pmd_enabled()) {
- if (!khugepaged_thread)
- khugepaged_thread = kthread_run(khugepaged, NULL,
- "khugepaged");
+ khugepaged_thread = kthread_run(khugepaged, NULL, "khugepaged");
if (IS_ERR(khugepaged_thread)) {
pr_err("khugepaged: kthread_run(khugepaged) failed\n");
- err = PTR_ERR(khugepaged_thread);
khugepaged_thread = NULL;
- goto fail;
+ return PTR_ERR(khugepaged_thread);
}
if (!list_empty(&khugepaged_scan.mm_head))
@@ -2703,10 +2703,9 @@ int start_stop_khugepaged(void)
kthread_stop(khugepaged_thread);
khugepaged_thread = NULL;
}
+
set_recommended_min_free_kbytes();
-fail:
- mutex_unlock(&khugepaged_mutex);
- return err;
+ return 0;
}
void khugepaged_min_free_kbytes_update(void)
--
Kiryl Shutsemau / Kirill A. Shutemov
On Wed, Mar 04, 2026 at 11:18:37AM +0000, Kiryl Shutsemau wrote:
> On Wed, Mar 04, 2026 at 02:22:32AM -0800, Breno Leitao wrote:
> > Breno Leitao (2):
> > mm: thp: avoid calling start_stop_khugepaged() in anon_enabled_store()
> > mm: thp: avoid calling start_stop_khugepaged() in enabled_store()
>
> I think the same can be achieved cleaner from within start_stop_khugepaged().
I'm kind of with you in doing it here but...
>
> Completely untested patch is below.
>
> One noticeable difference that with the patch we don't kick
> khugepaged_wait if khugepaged_thread is there. But I don't think it
> should make a difference.
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 1dd3cfca610d..80f818d3a094 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2683,18 +2683,18 @@ static void set_recommended_min_free_kbytes(void)
>
> int start_stop_khugepaged(void)
> {
> - int err = 0;
> + guard(mutex)(&khugepaged_mutex);
This is nice :) we need more guard usage in mm.
> +
> + /* Check if anything has to be done */
> + if (hugepage_pmd_enabled() == !!khugepaged_thread)
This is horrible, better to break out the latter expression as a small static
function like:
static bool is_khugepaged_thread_running(void)
{
if (khugepaged_thread)
return true;
return false;
}
> + return 0;
>
> - mutex_lock(&khugepaged_mutex);
> if (hugepage_pmd_enabled()) {
Can we race on this function call, meaning check above might vary from one here?
Better to have
bool enabled = hugepage_pmd_enabled();
etc. etc.
> - if (!khugepaged_thread)
> - khugepaged_thread = kthread_run(khugepaged, NULL,
> - "khugepaged");
> + khugepaged_thread = kthread_run(khugepaged, NULL, "khugepaged");
> if (IS_ERR(khugepaged_thread)) {
> pr_err("khugepaged: kthread_run(khugepaged) failed\n");
> - err = PTR_ERR(khugepaged_thread);
> khugepaged_thread = NULL;
> - goto fail;
> + return PTR_ERR(khugepaged_thread);
> }
>
> if (!list_empty(&khugepaged_scan.mm_head))
> @@ -2703,10 +2703,9 @@ int start_stop_khugepaged(void)
> kthread_stop(khugepaged_thread);
> khugepaged_thread = NULL;
> }
> +
> set_recommended_min_free_kbytes();
We are setting recommended min free kbytes each time somebody touches these
flags, maybe somebody somewhere relies on that and this change would break them?
So maybe we just want to do this each time anyway, but repress/rate limit
output?
> -fail:
> - mutex_unlock(&khugepaged_mutex);
> - return err;
> + return 0;
> }
>
> void khugepaged_min_free_kbytes_update(void)
> --
> Kiryl Shutsemau / Kirill A. Shutemov
Thanks, Lorenzo
>> +
>> + /* Check if anything has to be done */
>> + if (hugepage_pmd_enabled() == !!khugepaged_thread)
>
> This is horrible, better to break out the latter expression as a small static
> function like:
>
> static bool is_khugepaged_thread_running(void)
> {
> if (khugepaged_thread)
> return true;
> return false;
return khugepaged_thread;
should do the trick, given taht the return type of the function is a bool :)
> }
--
Cheers,
David
On Thu, Mar 05, 2026 at 01:41:33PM +0100, David Hildenbrand (Arm) wrote:
>
> >> +
> >> + /* Check if anything has to be done */
> >> + if (hugepage_pmd_enabled() == !!khugepaged_thread)
> >
> > This is horrible, better to break out the latter expression as a small static
> > function like:
> >
> > static bool is_khugepaged_thread_running(void)
> > {
> > if (khugepaged_thread)
> > return true;
> > return false;
>
> return khugepaged_thread;
>
> should do the trick, given taht the return type of the function is a bool :)
Yeah I know :)
I mean it's subjective, but this form makes it clear that what you're returning
is _not_ a boolean.
I was a bit 'meh' when I first saw this pattern, but it's grown on me, sort of a
'make it so simple you can't get confused' kinda thing.
>
> > }
>
> --
> Cheers,
>
> David
Cheers, Lorenzo
On 3/5/26 13:45, Lorenzo Stoakes (Oracle) wrote:
> On Thu, Mar 05, 2026 at 01:41:33PM +0100, David Hildenbrand (Arm) wrote:
>>
>>>
>>> This is horrible, better to break out the latter expression as a small static
>>> function like:
>>>
>>> static bool is_khugepaged_thread_running(void)
>>> {
>>> if (khugepaged_thread)
>>> return true;
>>> return false;
>>
>> return khugepaged_thread;
>>
>> should do the trick, given taht the return type of the function is a bool :)
>
> Yeah I know :)
>
> I mean it's subjective, but this form makes it clear that what you're returning
> is _not_ a boolean.
>
> I was a bit 'meh' when I first saw this pattern, but it's grown on me, sort of a
> 'make it so simple you can't get confused' kinda thing.
I consider
if (x)
return true;
return false;
rather ... questionable :)
--
Cheers,
David
On Thu, Mar 05, 2026 at 12:45:56PM +0000, Lorenzo Stoakes (Oracle) wrote:
> On Thu, Mar 05, 2026 at 01:41:33PM +0100, David Hildenbrand (Arm) wrote:
> >
> > >> +
> > >> + /* Check if anything has to be done */
> > >> + if (hugepage_pmd_enabled() == !!khugepaged_thread)
> > >
> > > This is horrible, better to break out the latter expression as a small static
> > > function like:
> > >
> > > static bool is_khugepaged_thread_running(void)
> > > {
> > > if (khugepaged_thread)
> > > return true;
> > > return false;
> >
> > return khugepaged_thread;
> >
> > should do the trick, given taht the return type of the function is a bool :)
>
> Yeah I know :)
>
> I mean it's subjective, but this form makes it clear that what you're returning
> is _not_ a boolean.
Sorry obviously meant to say what you're TESTING not what you're RETURNING.
>
> I was a bit 'meh' when I first saw this pattern, but it's grown on me, sort of a
> 'make it so simple you can't get confused' kinda thing.
>
> >
> > > }
> >
> > --
> > Cheers,
> >
> > David
>
> Cheers, Lorenzo
Hello Kiryl, On Wed, Mar 04, 2026 at 11:18:37AM +0000, Kiryl Shutsemau wrote: > On Wed, Mar 04, 2026 at 02:22:32AM -0800, Breno Leitao wrote: > > Breno Leitao (2): > > mm: thp: avoid calling start_stop_khugepaged() in anon_enabled_store() > > mm: thp: avoid calling start_stop_khugepaged() in enabled_store() > > I think the same can be achieved cleaner from within start_stop_khugepaged(). Thanks for the review. I considered this approach as well. From my limited experience, the preference is to return early in the _store() callbacks when the operation is a no-op, rather than pushing that logic deeper into nested code. Handling it at the _store() level results in more changes overall, but it is arguably more straightforward to reason about (!?). That said, I am no MM subsystem expert, and I am happy to go either way. --breno
© 2016 - 2026 Red Hat, Inc.