mm/vmstat: reject zero vm.stat_interval to prevent busy-loop

[PATCH] mm/vmstat: reject zero vm.stat_interval to prevent busy-loop

Posted by Maximilian Pezzullo via B4 Relay 1 month, 1 week ago

From: Maximilian Pezzullo <maximilianpezzullo@gmail.com>

Setting vm.stat_interval to 0 causes excessive kworker CPU usage
because vmstat_shepherd() and vmstat_update() reschedule themselves
with round_jiffies_relative(0), which resolves to an immediate
reschedule and creates a busy-loop.

Add a custom sysctl handler that rejects 0 and restores the previous
value, similar to how dirtytime_interval_handler() handles
vm.dirtytime_expire_seconds.

Reported-by: Terry M <terrym3201@bugzilla.kernel.org>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220226
Signed-off-by: Maximilian Pezzullo <maximilianpezzullo@gmail.com>
---
 mm/vmstat.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 86b14b0f77b5..6eeb4341b215 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -2114,6 +2114,18 @@ void vmstat_flush_workqueue(void)
 	flush_workqueue(mm_percpu_wq);
 }
 
+static int vmstat_interval_handler(const struct ctl_table *table, int write,
+				    void *buffer, size_t *lenp, loff_t *ppos)
+{
+	int ret = proc_dointvec_jiffies(table, write, buffer, lenp, ppos);
+
+	if (ret == 0 && write && sysctl_stat_interval == 0) {
+		sysctl_stat_interval = HZ;
+		return -EINVAL;
+	}
+	return ret;
+}
+
 static void vmstat_shepherd(struct work_struct *w)
 {
 	int cpu;
@@ -2236,7 +2248,7 @@ static const struct ctl_table vmstat_table[] = {
 		.data		= &sysctl_stat_interval,
 		.maxlen		= sizeof(sysctl_stat_interval),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec_jiffies,
+		.proc_handler	= vmstat_interval_handler,
 	},
 	{
 		.procname	= "stat_refresh",

---
base-commit: af4e9ef3d78420feb8fe58cd9a1ab80c501b3c08
change-id: 20260304-mm-vmstat-reject-zero-vm-stat_interval-to-prevent-busy-loop-ba5446a527aa

Best regards,
-- 
Maximilian Pezzullo <maximilianpezzullo@gmail.com>

Re: [PATCH] mm/vmstat: reject zero vm.stat_interval to prevent busy-loop

Posted by kernel test robot 1 month, 1 week ago

Hi Maximilian,

kernel test robot noticed the following build warnings:

[auto build test WARNING on af4e9ef3d78420feb8fe58cd9a1ab80c501b3c08]

url:    https://github.com/intel-lab-lkp/linux/commits/Maximilian-Pezzullo-via-B4-Relay/mm-vmstat-reject-zero-vm-stat_interval-to-prevent-busy-loop/20260304-153052
base:   af4e9ef3d78420feb8fe58cd9a1ab80c501b3c08
patch link:    https://lore.kernel.org/r/20260304-mm-vmstat-reject-zero-vm-stat_interval-to-prevent-busy-loop-v1-1-c03c9555ff15%40gmail.com
patch subject: [PATCH] mm/vmstat: reject zero vm.stat_interval to prevent busy-loop
config: parisc-randconfig-002-20260305 (https://download.01.org/0day-ci/archive/20260305/202603050819.oMnaHoIC-lkp@intel.com/config)
compiler: hppa-linux-gcc (GCC) 8.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260305/202603050819.oMnaHoIC-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603050819.oMnaHoIC-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> mm/vmstat.c:2117:12: warning: 'vmstat_interval_handler' defined but not used [-Wunused-function]
    static int vmstat_interval_handler(const struct ctl_table *table, int write,
               ^~~~~~~~~~~~~~~~~~~~~~~


vim +/vmstat_interval_handler +2117 mm/vmstat.c

  2116	
> 2117	static int vmstat_interval_handler(const struct ctl_table *table, int write,
  2118					    void *buffer, size_t *lenp, loff_t *ppos)
  2119	{
  2120		int ret = proc_dointvec_jiffies(table, write, buffer, lenp, ppos);
  2121	
  2122		if (ret == 0 && write && sysctl_stat_interval == 0) {
  2123			sysctl_stat_interval = HZ;
  2124			return -EINVAL;
  2125		}
  2126		return ret;
  2127	}
  2128	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Re: [PATCH] mm/vmstat: reject zero vm.stat_interval to prevent busy-loop

Posted by Michal Hocko 1 month, 1 week ago

On Wed 04-03-26 08:27:38, Maximilian Pezzullo via B4 Relay wrote:
> From: Maximilian Pezzullo <maximilianpezzullo@gmail.com>
> 
> Setting vm.stat_interval to 0 causes excessive kworker CPU usage
> because vmstat_shepherd() and vmstat_update() reschedule themselves
> with round_jiffies_relative(0), which resolves to an immediate
> reschedule and creates a busy-loop.
> 
> Add a custom sysctl handler that rejects 0 and restores the previous
> value, similar to how dirtytime_interval_handler() handles
> vm.dirtytime_expire_seconds.
> 
> Reported-by: Terry M <terrym3201@bugzilla.kernel.org>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220226
> Signed-off-by: Maximilian Pezzullo <maximilianpezzullo@gmail.com>
> ---
>  mm/vmstat.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 86b14b0f77b5..6eeb4341b215 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -2114,6 +2114,18 @@ void vmstat_flush_workqueue(void)
>  	flush_workqueue(mm_percpu_wq);
>  }
>  
> +static int vmstat_interval_handler(const struct ctl_table *table, int write,
> +				    void *buffer, size_t *lenp, loff_t *ppos)
> +{
> +	int ret = proc_dointvec_jiffies(table, write, buffer, lenp, ppos);
> +
> +	if (ret == 0 && write && sysctl_stat_interval == 0) {
> +		sysctl_stat_interval = HZ;
> +		return -EINVAL;

So you update the value and report the failure. Nope, this is not
correct way to handle that. Either tou check the value and fail before
any side effects or you correct the value, report that to the log and
return success.

I would preffer to not do that at all. Setting any arbitrary small value
will have some side effects. This is admin only interface and we expect
those do know what they are doing.

NAK to the patch
-- 
Michal Hocko
SUSE Labs

Re: [PATCH] mm/vmstat: reject zero vm.stat_interval to prevent busy-loop

Posted by Vlastimil Babka (SUSE) 1 month, 1 week ago

On 3/4/26 10:27 AM, Michal Hocko wrote:
> On Wed 04-03-26 08:27:38, Maximilian Pezzullo via B4 Relay wrote:
>> From: Maximilian Pezzullo <maximilianpezzullo@gmail.com>
>>
>> Setting vm.stat_interval to 0 causes excessive kworker CPU usage
>> because vmstat_shepherd() and vmstat_update() reschedule themselves
>> with round_jiffies_relative(0), which resolves to an immediate
>> reschedule and creates a busy-loop.
>>
>> Add a custom sysctl handler that rejects 0 and restores the previous
>> value, similar to how dirtytime_interval_handler() handles
>> vm.dirtytime_expire_seconds.
>>
>> Reported-by: Terry M <terrym3201@bugzilla.kernel.org>
>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220226
>> Signed-off-by: Maximilian Pezzullo <maximilianpezzullo@gmail.com>
>> ---
>>  mm/vmstat.c | 14 +++++++++++++-
>>  1 file changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/vmstat.c b/mm/vmstat.c
>> index 86b14b0f77b5..6eeb4341b215 100644
>> --- a/mm/vmstat.c
>> +++ b/mm/vmstat.c
>> @@ -2114,6 +2114,18 @@ void vmstat_flush_workqueue(void)
>>  	flush_workqueue(mm_percpu_wq);
>>  }
>>  
>> +static int vmstat_interval_handler(const struct ctl_table *table, int write,
>> +				    void *buffer, size_t *lenp, loff_t *ppos)
>> +{
>> +	int ret = proc_dointvec_jiffies(table, write, buffer, lenp, ppos);
>> +
>> +	if (ret == 0 && write && sysctl_stat_interval == 0) {
>> +		sysctl_stat_interval = HZ;
>> +		return -EINVAL;
> 
> So you update the value and report the failure. Nope, this is not
> correct way to handle that. Either tou check the value and fail before
> any side effects or you correct the value, report that to the log and
> return success.
> 
> I would preffer to not do that at all. Setting any arbitrary small value
> will have some side effects. This is admin only interface and we expect
> those do know what they are doing.

I think it would make some sense to reject a value that leads to
unrecoverable situation due to something in the kernel looping endlessly
with no preemption, causing e.g. softlockups, and making it impossible
to set a new sane value again. I don't know if that's the case here for
value 0. If it only leads to 20-30% cpu utilization (per the bugzilla)
and the admin can recover by setting a sane value again, we can indeed
leave it that way.

> NAK to the patch