[PATCH 1/2] writeback: fix 100% CPU usage when dirtytime_expire_interval is 0

Laveesh Bansal posted 2 patches 1 month, 1 week ago
There is a newer version of this series
[PATCH 1/2] writeback: fix 100% CPU usage when dirtytime_expire_interval is 0
Posted by Laveesh Bansal 1 month, 1 week ago
When vm.dirtytime_expire_seconds is set to 0, wakeup_dirtytime_writeback()
schedules delayed work with a delay of 0, causing immediate execution.
The function then reschedules itself with 0 delay again, creating an
infinite busy loop that causes 100% kworker CPU usage.

Fix by:
- Only scheduling delayed work in wakeup_dirtytime_writeback() when
  dirtytime_expire_interval is non-zero
- Cancelling the delayed work in dirtytime_interval_handler() when
  the interval is set to 0
- Adding a guard in start_dirtytime_writeback() for defensive coding

Tested by booting kernel in QEMU with virtme-ng:
- Before fix: kworker CPU spikes to ~73%
- After fix: CPU remains at normal levels
- Setting interval back to non-zero correctly resumes writeback

Fixes: a2f4870697a5 ("fs: make sure the timestamps for lazytime inodes eventually get written")
Cc: stable@vger.kernel.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220227
Signed-off-by: Laveesh Bansal <laveeshb@laveeshbansal.com>
---
 fs/fs-writeback.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 6800886c4d10..cd21c74cd0e5 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2492,7 +2492,8 @@ static void wakeup_dirtytime_writeback(struct work_struct *w)
 				wb_wakeup(wb);
 	}
 	rcu_read_unlock();
-	schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ);
+	if (dirtytime_expire_interval)
+		schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ);
 }
 
 static int dirtytime_interval_handler(const struct ctl_table *table, int write,
@@ -2501,8 +2502,12 @@ static int dirtytime_interval_handler(const struct ctl_table *table, int write,
 	int ret;
 
 	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
-	if (ret == 0 && write)
-		mod_delayed_work(system_percpu_wq, &dirtytime_work, 0);
+	if (ret == 0 && write) {
+		if (dirtytime_expire_interval)
+			mod_delayed_work(system_percpu_wq, &dirtytime_work, 0);
+		else
+			cancel_delayed_work_sync(&dirtytime_work);
+	}
 	return ret;
 }
 
@@ -2519,7 +2524,8 @@ static const struct ctl_table vm_fs_writeback_table[] = {
 
 static int __init start_dirtytime_writeback(void)
 {
-	schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ);
+	if (dirtytime_expire_interval)
+		schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ);
 	register_sysctl_init("vm", vm_fs_writeback_table);
 	return 0;
 }
-- 
2.43.0
Re: [PATCH 1/2] writeback: fix 100% CPU usage when dirtytime_expire_interval is 0
Posted by Jan Kara 1 month ago
On Fri 02-01-26 20:16:56, Laveesh Bansal wrote:
> When vm.dirtytime_expire_seconds is set to 0, wakeup_dirtytime_writeback()
> schedules delayed work with a delay of 0, causing immediate execution.
> The function then reschedules itself with 0 delay again, creating an
> infinite busy loop that causes 100% kworker CPU usage.
> 
> Fix by:
> - Only scheduling delayed work in wakeup_dirtytime_writeback() when
>   dirtytime_expire_interval is non-zero
> - Cancelling the delayed work in dirtytime_interval_handler() when
>   the interval is set to 0
> - Adding a guard in start_dirtytime_writeback() for defensive coding
> 
> Tested by booting kernel in QEMU with virtme-ng:
> - Before fix: kworker CPU spikes to ~73%
> - After fix: CPU remains at normal levels
> - Setting interval back to non-zero correctly resumes writeback
> 
> Fixes: a2f4870697a5 ("fs: make sure the timestamps for lazytime inodes eventually get written")
> Cc: stable@vger.kernel.org
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220227
> Signed-off-by: Laveesh Bansal <laveeshb@laveeshbansal.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/fs-writeback.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 6800886c4d10..cd21c74cd0e5 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -2492,7 +2492,8 @@ static void wakeup_dirtytime_writeback(struct work_struct *w)
>  				wb_wakeup(wb);
>  	}
>  	rcu_read_unlock();
> -	schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ);
> +	if (dirtytime_expire_interval)
> +		schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ);
>  }
>  
>  static int dirtytime_interval_handler(const struct ctl_table *table, int write,
> @@ -2501,8 +2502,12 @@ static int dirtytime_interval_handler(const struct ctl_table *table, int write,
>  	int ret;
>  
>  	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> -	if (ret == 0 && write)
> -		mod_delayed_work(system_percpu_wq, &dirtytime_work, 0);
> +	if (ret == 0 && write) {
> +		if (dirtytime_expire_interval)
> +			mod_delayed_work(system_percpu_wq, &dirtytime_work, 0);
> +		else
> +			cancel_delayed_work_sync(&dirtytime_work);
> +	}
>  	return ret;
>  }
>  
> @@ -2519,7 +2524,8 @@ static const struct ctl_table vm_fs_writeback_table[] = {
>  
>  static int __init start_dirtytime_writeback(void)
>  {
> -	schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ);
> +	if (dirtytime_expire_interval)
> +		schedule_delayed_work(&dirtytime_work, dirtytime_expire_interval * HZ);
>  	register_sysctl_init("vm", vm_fs_writeback_table);
>  	return 0;
>  }
> -- 
> 2.43.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR