[PATCH] jbd2: prevent softlockup in jbd2_log_do_checkpoint()

libaokun@huaweicloud.com posted 1 patch 1 month, 3 weeks ago
fs/jbd2/checkpoint.c | 1 +
1 file changed, 1 insertion(+)
[PATCH] jbd2: prevent softlockup in jbd2_log_do_checkpoint()
Posted by libaokun@huaweicloud.com 1 month, 3 weeks ago
From: Baokun Li <libaokun1@huawei.com>

Both jbd2_log_do_checkpoint() and jbd2_journal_shrink_checkpoint_list()
periodically release j_list_lock after processing a batch of buffers to
avoid long hold times on the j_list_lock. However, since both functions
contend for j_list_lock, the combined time spent waiting and processing
can be significant.

jbd2_journal_shrink_checkpoint_list() explicitly calls cond_resched() when
need_resched() is true to avoid softlockups during prolonged operations.
But jbd2_log_do_checkpoint() only exits its loop when need_resched() is
true, relying on potentially sleeping functions like __flush_batch() or
wait_on_buffer() to trigger rescheduling. If those functions do not sleep,
the kernel may hit a softlockup.

watchdog: BUG: soft lockup - CPU#3 stuck for 156s! [kworker/u129:2:373]
CPU: 3 PID: 373 Comm: kworker/u129:2 Kdump: loaded Not tainted 6.6.0+ #10
Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.27 06/13/2017
Workqueue: writeback wb_workfn (flush-7:2)
pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : native_queued_spin_lock_slowpath+0x358/0x418
lr : jbd2_log_do_checkpoint+0x31c/0x438 [jbd2]
Call trace:
 native_queued_spin_lock_slowpath+0x358/0x418
 jbd2_log_do_checkpoint+0x31c/0x438 [jbd2]
 __jbd2_log_wait_for_space+0xfc/0x2f8 [jbd2]
 add_transaction_credits+0x3bc/0x418 [jbd2]
 start_this_handle+0xf8/0x560 [jbd2]
 jbd2__journal_start+0x118/0x228 [jbd2]
 __ext4_journal_start_sb+0x110/0x188 [ext4]
 ext4_do_writepages+0x3dc/0x740 [ext4]
 ext4_writepages+0xa4/0x190 [ext4]
 do_writepages+0x94/0x228
 __writeback_single_inode+0x48/0x318
 writeback_sb_inodes+0x204/0x590
 __writeback_inodes_wb+0x54/0xf8
 wb_writeback+0x2cc/0x3d8
 wb_do_writeback+0x2e0/0x2f8
 wb_workfn+0x80/0x2a8
 process_one_work+0x178/0x3e8
 worker_thread+0x234/0x3b8
 kthread+0xf0/0x108
 ret_from_fork+0x10/0x20

So explicitly call cond_resched() in jbd2_log_do_checkpoint() to avoid
softlockup.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
 fs/jbd2/checkpoint.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index b3971e91e8eb..38861ca04899 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -285,6 +285,7 @@ int jbd2_log_do_checkpoint(journal_t *journal)
 		retry:
 			if (batch_count)
 				__flush_batch(journal, &batch_count);
+			cond_resched();
 			spin_lock(&journal->j_list_lock);
 			goto restart;
 	}
-- 
2.39.2
Re: [PATCH] jbd2: prevent softlockup in jbd2_log_do_checkpoint()
Posted by Theodore Ts'o 1 month, 3 weeks ago
On Tue, 12 Aug 2025 14:37:52 +0800, libaokun@huaweicloud.com wrote:
> Both jbd2_log_do_checkpoint() and jbd2_journal_shrink_checkpoint_list()
> periodically release j_list_lock after processing a batch of buffers to
> avoid long hold times on the j_list_lock. However, since both functions
> contend for j_list_lock, the combined time spent waiting and processing
> can be significant.
> 
> jbd2_journal_shrink_checkpoint_list() explicitly calls cond_resched() when
> need_resched() is true to avoid softlockups during prolonged operations.
> But jbd2_log_do_checkpoint() only exits its loop when need_resched() is
> true, relying on potentially sleeping functions like __flush_batch() or
> wait_on_buffer() to trigger rescheduling. If those functions do not sleep,
> the kernel may hit a softlockup.
> 
> [...]

Applied, thanks!

[1/1] jbd2: prevent softlockup in jbd2_log_do_checkpoint()
      commit: 9d98cf4632258720f18265a058e62fde120c0151

Best regards,
-- 
Theodore Ts'o <tytso@mit.edu>