[PATCH 1/2] mm: thp: avoid calling start_stop_khugepaged() in anon_enabled_store()

Breno Leitao posted 2 patches 1 month, 1 week ago
There is a newer version of this series
[PATCH 1/2] mm: thp: avoid calling start_stop_khugepaged() in anon_enabled_store()
Posted by Breno Leitao 1 month, 1 week ago
Writing "never" (or any other value) multiple times to
/sys/kernel/mm/transparent_hugepage/hugepages-*/enabled calls
start_stop_khugepaged() each time, even when nothing actually changed.
This causes set_recommended_min_free_kbytes() to run unconditionally,
which is unnecessary and floods the printk buffer with "raising
min_free_kbytes" messages. Example:

  # for i in $(seq 100); do
  #       echo never > /sys/kernel/mm/transparent_hugepage/enabled
  # done

  # dmesg | grep "min_free_kbytes is not updated" | wc -l
  100

Use test_and_set_bit()/test_and_clear_bit() instead of the plain
variants to detect whether any bit actually flipped, and skip the
start_stop_khugepaged() call entirely when the configuration is
unchanged.

With this patch, redoing the same operation becomes a no-op.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 mm/huge_memory.c | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8e2746ea74adf..9abfb115e9329 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -520,36 +520,37 @@ static ssize_t anon_enabled_store(struct kobject *kobj,
 				  const char *buf, size_t count)
 {
 	int order = to_thpsize(kobj)->order;
+	bool changed = false;
 	ssize_t ret = count;
 
 	if (sysfs_streq(buf, "always")) {
 		spin_lock(&huge_anon_orders_lock);
-		clear_bit(order, &huge_anon_orders_inherit);
-		clear_bit(order, &huge_anon_orders_madvise);
-		set_bit(order, &huge_anon_orders_always);
+		changed = test_and_clear_bit(order, &huge_anon_orders_inherit);
+		changed |= test_and_clear_bit(order, &huge_anon_orders_madvise);
+		changed |= !test_and_set_bit(order, &huge_anon_orders_always);
 		spin_unlock(&huge_anon_orders_lock);
 	} else if (sysfs_streq(buf, "inherit")) {
 		spin_lock(&huge_anon_orders_lock);
-		clear_bit(order, &huge_anon_orders_always);
-		clear_bit(order, &huge_anon_orders_madvise);
-		set_bit(order, &huge_anon_orders_inherit);
+		changed = test_and_clear_bit(order, &huge_anon_orders_always);
+		changed |= test_and_clear_bit(order, &huge_anon_orders_madvise);
+		changed |= !test_and_set_bit(order, &huge_anon_orders_inherit);
 		spin_unlock(&huge_anon_orders_lock);
 	} else if (sysfs_streq(buf, "madvise")) {
 		spin_lock(&huge_anon_orders_lock);
-		clear_bit(order, &huge_anon_orders_always);
-		clear_bit(order, &huge_anon_orders_inherit);
-		set_bit(order, &huge_anon_orders_madvise);
+		changed = test_and_clear_bit(order, &huge_anon_orders_always);
+		changed |= test_and_clear_bit(order, &huge_anon_orders_inherit);
+		changed |= !test_and_set_bit(order, &huge_anon_orders_madvise);
 		spin_unlock(&huge_anon_orders_lock);
 	} else if (sysfs_streq(buf, "never")) {
 		spin_lock(&huge_anon_orders_lock);
-		clear_bit(order, &huge_anon_orders_always);
-		clear_bit(order, &huge_anon_orders_inherit);
-		clear_bit(order, &huge_anon_orders_madvise);
+		changed = test_and_clear_bit(order, &huge_anon_orders_always);
+		changed |= test_and_clear_bit(order, &huge_anon_orders_inherit);
+		changed |= test_and_clear_bit(order, &huge_anon_orders_madvise);
 		spin_unlock(&huge_anon_orders_lock);
 	} else
 		ret = -EINVAL;
 
-	if (ret > 0) {
+	if (ret > 0 && changed) {
 		int err;
 
 		err = start_stop_khugepaged();

-- 
2.47.3
Re: [PATCH 1/2] mm: thp: avoid calling start_stop_khugepaged() in anon_enabled_store()
Posted by Lorenzo Stoakes (Oracle) 1 month, 1 week ago
On Wed, Mar 04, 2026 at 02:22:33AM -0800, Breno Leitao wrote:
> Writing "never" (or any other value) multiple times to
> /sys/kernel/mm/transparent_hugepage/hugepages-*/enabled calls
> start_stop_khugepaged() each time, even when nothing actually changed.
> This causes set_recommended_min_free_kbytes() to run unconditionally,
> which is unnecessary and floods the printk buffer with "raising
> min_free_kbytes" messages. Example:
>
>   # for i in $(seq 100); do
>   #       echo never > /sys/kernel/mm/transparent_hugepage/enabled
>   # done
>
>   # dmesg | grep "min_free_kbytes is not updated" | wc -l
>   100
>
> Use test_and_set_bit()/test_and_clear_bit() instead of the plain
> variants to detect whether any bit actually flipped, and skip the
> start_stop_khugepaged() call entirely when the configuration is
> unchanged.
>
> With this patch, redoing the same operation becomes a no-op.
>
> Signed-off-by: Breno Leitao <leitao@debian.org>

General concept is sensible, but let's improve this code please.

> ---
>  mm/huge_memory.c | 27 ++++++++++++++-------------
>  1 file changed, 14 insertions(+), 13 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 8e2746ea74adf..9abfb115e9329 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -520,36 +520,37 @@ static ssize_t anon_enabled_store(struct kobject *kobj,
>  				  const char *buf, size_t count)
>  {
>  	int order = to_thpsize(kobj)->order;
> +	bool changed = false;
>  	ssize_t ret = count;
>
>  	if (sysfs_streq(buf, "always")) {
>  		spin_lock(&huge_anon_orders_lock);
> -		clear_bit(order, &huge_anon_orders_inherit);
> -		clear_bit(order, &huge_anon_orders_madvise);
> -		set_bit(order, &huge_anon_orders_always);
> +		changed = test_and_clear_bit(order, &huge_anon_orders_inherit);
> +		changed |= test_and_clear_bit(order, &huge_anon_orders_madvise);
> +		changed |= !test_and_set_bit(order, &huge_anon_orders_always);
>  		spin_unlock(&huge_anon_orders_lock);
>  	} else if (sysfs_streq(buf, "inherit")) {
>  		spin_lock(&huge_anon_orders_lock);
> -		clear_bit(order, &huge_anon_orders_always);
> -		clear_bit(order, &huge_anon_orders_madvise);
> -		set_bit(order, &huge_anon_orders_inherit);
> +		changed = test_and_clear_bit(order, &huge_anon_orders_always);
> +		changed |= test_and_clear_bit(order, &huge_anon_orders_madvise);
> +		changed |= !test_and_set_bit(order, &huge_anon_orders_inherit);
>  		spin_unlock(&huge_anon_orders_lock);
>  	} else if (sysfs_streq(buf, "madvise")) {
>  		spin_lock(&huge_anon_orders_lock);
> -		clear_bit(order, &huge_anon_orders_always);
> -		clear_bit(order, &huge_anon_orders_inherit);
> -		set_bit(order, &huge_anon_orders_madvise);
> +		changed = test_and_clear_bit(order, &huge_anon_orders_always);
> +		changed |= test_and_clear_bit(order, &huge_anon_orders_inherit);
> +		changed |= !test_and_set_bit(order, &huge_anon_orders_madvise);
>  		spin_unlock(&huge_anon_orders_lock);
>  	} else if (sysfs_streq(buf, "never")) {
>  		spin_lock(&huge_anon_orders_lock);
> -		clear_bit(order, &huge_anon_orders_always);
> -		clear_bit(order, &huge_anon_orders_inherit);
> -		clear_bit(order, &huge_anon_orders_madvise);
> +		changed = test_and_clear_bit(order, &huge_anon_orders_always);
> +		changed |= test_and_clear_bit(order, &huge_anon_orders_inherit);
> +		changed |= test_and_clear_bit(order, &huge_anon_orders_madvise);

This is badly implemented already (sigh) so a little tricky as to how to
abstract.

Yes the existing logic duplicated, doesn't mean we have to keep doing so :)

To put money where my mouth is attached a (totally untested, in line with
Kiryl's :P) patch to give a sense of how one might achieve this.

As to this vs. Kiryl's... I mean it might be nice to fix this crap up here to be
honest.

Maybe David can have deciding vote ;)

But see below for a caveat...

>  		spin_unlock(&huge_anon_orders_lock);
>  	} else
>  		ret = -EINVAL;
>
> -	if (ret > 0) {
> +	if (ret > 0 && changed) {
>  		int err;
>
>  		err = start_stop_khugepaged();

There's a caveat here as mentioned in reply to Kiryl - I'm concerned users
might rely on the set recommended min kbytes even when things don't change.

Not sure how likely that is, but it's a user-visible change in how this behaves.

Cheers, Lorenzo

>
> --
> 2.47.3
>

----8<----
From cb2c4c8bf183ef0d10068cfd12c12d19cb17a241 Mon Sep 17 00:00:00 2001
From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
Date: Wed, 4 Mar 2026 16:37:20 +0000
Subject: [PATCH] idea

Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
 mm/huge_memory.c | 74 ++++++++++++++++++++++++++++++------------------
 1 file changed, 46 insertions(+), 28 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 0df1f4a17430..97dabbeb9112 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -515,46 +515,64 @@ static ssize_t anon_enabled_show(struct kobject *kobj,
 	return sysfs_emit(buf, "%s\n", output);
 }

+enum huge_mode {
+	HUGE_ALWAYS,
+	HUGE_INHERIT,
+	HUGE_MADVISE,
+	HUGE_NUM_MODES,
+	HUGE_NEVER,
+};
+
+static bool change_anon_orders(int order, enum huge_mode mode)
+{
+	static unsigned long *orders[] = {
+		&huge_anon_orders_always,
+		&huge_anon_orders_inherit,
+		&huge_anon_orders_madvise,
+	};
+	bool changed = false;
+	int i;
+
+	spin_lock(&huge_anon_orders_lock);
+	for (i = 0; i < HUGE_NUM_MODES; i++) {
+		if (i == mode)
+			changed |= !test_and_set_bit(order, orders[mode]);
+		else
+			changed |= test_and_clear_bit(order, orders[mode]);
+	}
+	spin_unlock(&huge_anon_orders_lock);
+
+	return changed;
+}
+
 static ssize_t anon_enabled_store(struct kobject *kobj,
 				 struct kobj_attribute *attr,
 				 const char *buf, size_t count)
 {
 	int order = to_thpsize(kobj)->order;
 	ssize_t ret = count;
+	bool changed;
+
+	if (sysfs_streq(buf, "always"))
+		changed = change_anon_orders(order, HUGE_ALWAYS);
+	else if (sysfs_streq(buf, "inherit"))
+		changed = change_anon_orders(order, HUGE_INHERIT);
+	else if (sysfs_streq(buf, "madvise"))
+		changed = change_anon_orders(order, HUGE_MADVISE);
+	else if (sysfs_streq(buf, "never"))
+		changed = change_anon_orders(order, HUGE_NEVER);
+	else
+		return -EINVAL;

-	if (sysfs_streq(buf, "always")) {
-		spin_lock(&huge_anon_orders_lock);
-		clear_bit(order, &huge_anon_orders_inherit);
-		clear_bit(order, &huge_anon_orders_madvise);
-		set_bit(order, &huge_anon_orders_always);
-		spin_unlock(&huge_anon_orders_lock);
-	} else if (sysfs_streq(buf, "inherit")) {
-		spin_lock(&huge_anon_orders_lock);
-		clear_bit(order, &huge_anon_orders_always);
-		clear_bit(order, &huge_anon_orders_madvise);
-		set_bit(order, &huge_anon_orders_inherit);
-		spin_unlock(&huge_anon_orders_lock);
-	} else if (sysfs_streq(buf, "madvise")) {
-		spin_lock(&huge_anon_orders_lock);
-		clear_bit(order, &huge_anon_orders_always);
-		clear_bit(order, &huge_anon_orders_inherit);
-		set_bit(order, &huge_anon_orders_madvise);
-		spin_unlock(&huge_anon_orders_lock);
-	} else if (sysfs_streq(buf, "never")) {
-		spin_lock(&huge_anon_orders_lock);
-		clear_bit(order, &huge_anon_orders_always);
-		clear_bit(order, &huge_anon_orders_inherit);
-		clear_bit(order, &huge_anon_orders_madvise);
-		spin_unlock(&huge_anon_orders_lock);
-	} else
-		ret = -EINVAL;
-
-	if (ret > 0) {
+	if (changed) {
 		int err;

 		err = start_stop_khugepaged();
 		if (err)
 			ret = err;
+	} else {
+		/* Users expect this even if unchanged. TODO: Put in header... */
+		//set_recommended_min_free_kbytes();
 	}
 	return ret;
 }
--
2.53.0
Re: [PATCH 1/2] mm: thp: avoid calling start_stop_khugepaged() in anon_enabled_store()
Posted by David Hildenbrand (Arm) 1 month, 1 week ago
> This is badly implemented already (sigh) so a little tricky as to how to
> abstract.
> 
> Yes the existing logic duplicated, doesn't mean we have to keep doing so :)
> 
> To put money where my mouth is attached a (totally untested, in line with
> Kiryl's :P) patch to give a sense of how one might achieve this.
> 
> As to this vs. Kiryl's... I mean it might be nice to fix this crap up here to be
> honest.
> 
> Maybe David can have deciding vote ;)
I guess there are plenty of cleanups to be had here. And yes, we can
have multiple ones! :)

-- 
Cheers,

David
Re: [PATCH 1/2] mm: thp: avoid calling start_stop_khugepaged() in anon_enabled_store()
Posted by Breno Leitao 1 month, 1 week ago
On Wed, Mar 04, 2026 at 04:40:22PM +0000, Lorenzo Stoakes (Oracle) wrote:
> On Wed, Mar 04, 2026 at 02:22:33AM -0800, Breno Leitao wrote:
> > Writing "never" (or any other value) multiple times to
> > /sys/kernel/mm/transparent_hugepage/hugepages-*/enabled calls
> > start_stop_khugepaged() each time, even when nothing actually changed.
> > This causes set_recommended_min_free_kbytes() to run unconditionally,
> > which is unnecessary and floods the printk buffer with "raising
> > min_free_kbytes" messages. Example:
> >
> >   # for i in $(seq 100); do
> >   #       echo never > /sys/kernel/mm/transparent_hugepage/enabled
> >   # done
> >
> >   # dmesg | grep "min_free_kbytes is not updated" | wc -l
> >   100
> >
> > Use test_and_set_bit()/test_and_clear_bit() instead of the plain
> > variants to detect whether any bit actually flipped, and skip the
> > start_stop_khugepaged() call entirely when the configuration is
> > unchanged.
> >
> > With this patch, redoing the same operation becomes a no-op.
> >
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> 
> General concept is sensible, but let's improve this code please.

Ack! Thanks for the suggestions.

> >  		spin_unlock(&huge_anon_orders_lock);
> >  	} else
> >  		ret = -EINVAL;
> >
> > -	if (ret > 0) {
> > +	if (ret > 0 && changed) {
> >  		int err;
> >
> >  		err = start_stop_khugepaged();
> 
> There's a caveat here as mentioned in reply to Kiryl - I'm concerned users
> might rely on the set recommended min kbytes even when things don't change.
> 
> Not sure how likely that is, but it's a user-visible change in how this behaves.

Is there any motivation that users are retouching
/sys/kernel/mm/transparent_hugepage just to trigger
set_recommended_min_free_kbytes() ? That seems weird, but, I will keep it in
the change.

> From cb2c4c8bf183ef0d10068cfd12c12d19cb17a241 Mon Sep 17 00:00:00 2001
> From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
> Date: Wed, 4 Mar 2026 16:37:20 +0000
> Subject: [PATCH] idea
> 
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

Thanks for the idea. Let me hack on top of it, and propose a v2.

> ---
>  mm/huge_memory.c | 74 ++++++++++++++++++++++++++++++------------------
>  1 file changed, 46 insertions(+), 28 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 0df1f4a17430..97dabbeb9112 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -515,46 +515,64 @@ static ssize_t anon_enabled_show(struct kobject *kobj,
>  	return sysfs_emit(buf, "%s\n", output);
>  }
> 
> +enum huge_mode {
> +	HUGE_ALWAYS,
> +	HUGE_INHERIT,
> +	HUGE_MADVISE,
> +	HUGE_NUM_MODES,
> +	HUGE_NEVER,
> +};
> +
> +static bool change_anon_orders(int order, enum huge_mode mode)
> +{
> +	static unsigned long *orders[] = {
> +		&huge_anon_orders_always,
> +		&huge_anon_orders_inherit,
> +		&huge_anon_orders_madvise,
> +	};
> +	bool changed = false;
> +	int i;
> +
> +	spin_lock(&huge_anon_orders_lock);
> +	for (i = 0; i < HUGE_NUM_MODES; i++) {

> +		if (i == mode)
> +			changed |= !test_and_set_bit(order, orders[mode]);
> +		else
> +			changed |= test_and_clear_bit(order, orders[mode]);

I suppose we want s/mode/i in the test_and_{clear,set}_bit() here:

		if (i == mode)
			// set for mode
			changed |= !test_and_set_bit(order, orders[i]);
		else
			// clear for !mode
			changed |= test_and_clear_bit(order, orders[i]);

For two reasons:
	* you want to unset "i" when i != mode.
	* you would have an OOB when accessing orders[HUGE_NEVER == 4]


>  static ssize_t anon_enabled_store(struct kobject *kobj,
>  				 struct kobj_attribute *attr,
>  				 const char *buf, size_t count)
>  {
>  	int order = to_thpsize(kobj)->order;
>  	ssize_t ret = count;
> +	bool changed;
> +
> +	if (sysfs_streq(buf, "always"))
> +		changed = change_anon_orders(order, HUGE_ALWAYS);
> +	else if (sysfs_streq(buf, "inherit"))
> +		changed = change_anon_orders(order, HUGE_INHERIT);
> +	else if (sysfs_streq(buf, "madvise"))
> +		changed = change_anon_orders(order, HUGE_MADVISE);
> +	else if (sysfs_streq(buf, "never"))
> +		changed = change_anon_orders(order, HUGE_NEVER);
> +	else
> +		return -EINVAL;

I think we can simplify anon_enabled_store() even more, by leveraging sysfs_match_string().
Something like:

	static const char *const anon_mode_strings[] = {
		[HUGE_ALWAYS]   = "always",
		[HUGE_INHERIT]  = "inherit",
		[HUGE_MADVISE]  = "madvise",
		[HUGE_NEVER]    = "never",
		NULL,
	};

and then

	static ssize_t anon_enabled_store(struct kobject *kobj,
					struct kobj_attribute *attr,
					const char *buf, size_t count)
	{
		int order = to_thpsize(kobj)->order;
		int mode;

		mode = sysfs_match_string(enabled_mode_strings, buf);
		if (mode < 0)
			return mode;

		if (change_anon_orders(order, mode)) {
			int err = start_stop_khugepaged();

			if (err)
				return err;
		} else {
			/* Users expect this even if unchanged. TODO: Put in header... */
			//set_recommended_min_free_kbytes();
		}
		return count;
	}



Anyway, I like this approach, thanks!. Let me hack a v2 based on it.

--breno
Re: [PATCH 1/2] mm: thp: avoid calling start_stop_khugepaged() in anon_enabled_store()
Posted by Lorenzo Stoakes (Oracle) 1 month, 1 week ago
On Thu, Mar 05, 2026 at 03:48:07AM -0800, Breno Leitao wrote:
> On Wed, Mar 04, 2026 at 04:40:22PM +0000, Lorenzo Stoakes (Oracle) wrote:
> > On Wed, Mar 04, 2026 at 02:22:33AM -0800, Breno Leitao wrote:
> > > Writing "never" (or any other value) multiple times to
> > > /sys/kernel/mm/transparent_hugepage/hugepages-*/enabled calls
> > > start_stop_khugepaged() each time, even when nothing actually changed.
> > > This causes set_recommended_min_free_kbytes() to run unconditionally,
> > > which is unnecessary and floods the printk buffer with "raising
> > > min_free_kbytes" messages. Example:
> > >
> > >   # for i in $(seq 100); do
> > >   #       echo never > /sys/kernel/mm/transparent_hugepage/enabled
> > >   # done
> > >
> > >   # dmesg | grep "min_free_kbytes is not updated" | wc -l
> > >   100
> > >
> > > Use test_and_set_bit()/test_and_clear_bit() instead of the plain
> > > variants to detect whether any bit actually flipped, and skip the
> > > start_stop_khugepaged() call entirely when the configuration is
> > > unchanged.
> > >
> > > With this patch, redoing the same operation becomes a no-op.
> > >
> > > Signed-off-by: Breno Leitao <leitao@debian.org>
> >
> > General concept is sensible, but let's improve this code please.
>
> Ack! Thanks for the suggestions.

No problem, thanks for the patch! :)

>
> > >  		spin_unlock(&huge_anon_orders_lock);
> > >  	} else
> > >  		ret = -EINVAL;
> > >
> > > -	if (ret > 0) {
> > > +	if (ret > 0 && changed) {
> > >  		int err;
> > >
> > >  		err = start_stop_khugepaged();
> >
> > There's a caveat here as mentioned in reply to Kiryl - I'm concerned users
> > might rely on the set recommended min kbytes even when things don't change.
> >
> > Not sure how likely that is, but it's a user-visible change in how this behaves.
>
> Is there any motivation that users are retouching
> /sys/kernel/mm/transparent_hugepage just to trigger
> set_recommended_min_free_kbytes() ? That seems weird, but, I will keep it in
> the change.

I mean I can't really think of any, but I don't want to risk breaking (weird)
userspace.

>
> > From cb2c4c8bf183ef0d10068cfd12c12d19cb17a241 Mon Sep 17 00:00:00 2001
> > From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
> > Date: Wed, 4 Mar 2026 16:37:20 +0000
> > Subject: [PATCH] idea
> >
> > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
>
> Thanks for the idea. Let me hack on top of it, and propose a v2.

Thanks!

>
> > ---
> >  mm/huge_memory.c | 74 ++++++++++++++++++++++++++++++------------------
> >  1 file changed, 46 insertions(+), 28 deletions(-)
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 0df1f4a17430..97dabbeb9112 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -515,46 +515,64 @@ static ssize_t anon_enabled_show(struct kobject *kobj,
> >  	return sysfs_emit(buf, "%s\n", output);
> >  }
> >
> > +enum huge_mode {
> > +	HUGE_ALWAYS,
> > +	HUGE_INHERIT,
> > +	HUGE_MADVISE,
> > +	HUGE_NUM_MODES,
> > +	HUGE_NEVER,
> > +};
> > +
> > +static bool change_anon_orders(int order, enum huge_mode mode)
> > +{
> > +	static unsigned long *orders[] = {
> > +		&huge_anon_orders_always,
> > +		&huge_anon_orders_inherit,
> > +		&huge_anon_orders_madvise,
> > +	};
> > +	bool changed = false;
> > +	int i;
> > +
> > +	spin_lock(&huge_anon_orders_lock);
> > +	for (i = 0; i < HUGE_NUM_MODES; i++) {
>
> > +		if (i == mode)
> > +			changed |= !test_and_set_bit(order, orders[mode]);
> > +		else
> > +			changed |= test_and_clear_bit(order, orders[mode]);
>
> I suppose we want s/mode/i in the test_and_{clear,set}_bit() here:
>
> 		if (i == mode)
> 			// set for mode
> 			changed |= !test_and_set_bit(order, orders[i]);
> 		else
> 			// clear for !mode
> 			changed |= test_and_clear_bit(order, orders[i]);
>
> For two reasons:
> 	* you want to unset "i" when i != mode.
> 	* you would have an OOB when accessing orders[HUGE_NEVER == 4]
>
>
> >  static ssize_t anon_enabled_store(struct kobject *kobj,
> >  				 struct kobj_attribute *attr,
> >  				 const char *buf, size_t count)
> >  {
> >  	int order = to_thpsize(kobj)->order;
> >  	ssize_t ret = count;
> > +	bool changed;
> > +
> > +	if (sysfs_streq(buf, "always"))
> > +		changed = change_anon_orders(order, HUGE_ALWAYS);
> > +	else if (sysfs_streq(buf, "inherit"))
> > +		changed = change_anon_orders(order, HUGE_INHERIT);
> > +	else if (sysfs_streq(buf, "madvise"))
> > +		changed = change_anon_orders(order, HUGE_MADVISE);
> > +	else if (sysfs_streq(buf, "never"))
> > +		changed = change_anon_orders(order, HUGE_NEVER);
> > +	else
> > +		return -EINVAL;
>
> I think we can simplify anon_enabled_store() even more, by leveraging sysfs_match_string().
> Something like:
>
> 	static const char *const anon_mode_strings[] = {
> 		[HUGE_ALWAYS]   = "always",
> 		[HUGE_INHERIT]  = "inherit",
> 		[HUGE_MADVISE]  = "madvise",
> 		[HUGE_NEVER]    = "never",
> 		NULL,
> 	};
>
> and then
>
> 	static ssize_t anon_enabled_store(struct kobject *kobj,
> 					struct kobj_attribute *attr,
> 					const char *buf, size_t count)
> 	{
> 		int order = to_thpsize(kobj)->order;
> 		int mode;
>
> 		mode = sysfs_match_string(enabled_mode_strings, buf);
> 		if (mode < 0)
> 			return mode;

Nice!

>
> 		if (change_anon_orders(order, mode)) {
> 			int err = start_stop_khugepaged();
>
> 			if (err)
> 				return err;
> 		} else {
> 			/* Users expect this even if unchanged. TODO: Put in header... */
> 			//set_recommended_min_free_kbytes();
> 		}
> 		return count;
> 	}
>
>
>
> Anyway, I like this approach, thanks!. Let me hack a v2 based on it.

Great, thanks!

Note that my code seemed to introduce a splat, so it's buggy, make sure to check
it carefully (that 'untested' proviso was apposite, it turns out! :)

>
> --breno

Cheers, Lorenzo