ext4: fast commit: avoid fs_reclaim inversion in perform_commit

[PATCH] ext4: fast commit: avoid fs_reclaim inversion in perform_commit

Posted by Li Chen 1 month, 2 weeks ago

lockdep reports a possible deadlock due to lock order inversion:

     CPU0                    CPU1
     ----                    ----
lock(fs_reclaim);
                             lock(&sbi->s_fc_lock);
                             lock(fs_reclaim);
lock(&sbi->s_fc_lock);

ext4_fc_perform_commit() holds s_fc_lock while writing the fast commit
log. Allocations here can enter reclaim and take fs_reclaim, inverting
with ext4_fc_del() which runs under fs_reclaim during inode eviction.
Wrap Step 6 in memalloc_nofs_save()/restore() so reclaim is skipped
while s_fc_lock is held.

Fixes: 6593714d67ba ("ext4: hold s_fc_lock while during fast commit")
Signed-off-by: Li Chen <me@linux.beauty>
---
 fs/ext4/fast_commit.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 3bcdd4619de1..b0c458082997 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -1045,6 +1045,7 @@ static int ext4_fc_perform_commit(journal_t *journal)
 	struct ext4_fc_head head;
 	struct inode *inode;
 	struct blk_plug plug;
+	unsigned int nofs;
 	int ret = 0;
 	u32 crc = 0;
 
@@ -1118,6 +1119,7 @@ static int ext4_fc_perform_commit(journal_t *journal)
 		blkdev_issue_flush(journal->j_fs_dev);
 
 	blk_start_plug(&plug);
+	nofs = memalloc_nofs_save();
 	/* Step 6: Write fast commit blocks to disk. */
 	if (sbi->s_fc_bytes == 0) {
 		/*
@@ -1158,6 +1160,7 @@ static int ext4_fc_perform_commit(journal_t *journal)
 
 out:
 	mutex_unlock(&sbi->s_fc_lock);
+	memalloc_nofs_restore(nofs);
 	blk_finish_plug(&plug);
 	return ret;
 }
-- 
2.52.0

Re: [PATCH] ext4: fast commit: avoid fs_reclaim inversion in perform_commit

Posted by Jan Kara 1 month ago

On Tue 23-12-25 21:13:42, Li Chen wrote:
> lockdep reports a possible deadlock due to lock order inversion:
> 
>      CPU0                    CPU1
>      ----                    ----
> lock(fs_reclaim);
>                              lock(&sbi->s_fc_lock);
>                              lock(fs_reclaim);
> lock(&sbi->s_fc_lock);
> 
> ext4_fc_perform_commit() holds s_fc_lock while writing the fast commit
> log. Allocations here can enter reclaim and take fs_reclaim, inverting
> with ext4_fc_del() which runs under fs_reclaim during inode eviction.
> Wrap Step 6 in memalloc_nofs_save()/restore() so reclaim is skipped
> while s_fc_lock is held.
> 
> Fixes: 6593714d67ba ("ext4: hold s_fc_lock while during fast commit")
> Signed-off-by: Li Chen <me@linux.beauty>

Thanks for the analysis and the patch! Your solution is in principle
correct but it's a bit fragile because there can be other instances (or we
can grow them in the future) where sbi->s_fc_lock is held when doing
allocation. The situation is that sbi->s_fc_lock can be acquired from inode
eviction path (ext4_clear_inode()) and thus this lock is inherently reclaim
unsafe. What we do in such cases is that we create helper functions for
acquiring / releasing the lock while also setting proper context and using
these helpers - like in commit 00d873c17e29 ("ext4: avoid deadlock in fs
reclaim with page writeback"). Can you perhaps modify your patch to follow
that behavior as well?

								Honza

> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> index 3bcdd4619de1..b0c458082997 100644
> --- a/fs/ext4/fast_commit.c
> +++ b/fs/ext4/fast_commit.c
> @@ -1045,6 +1045,7 @@ static int ext4_fc_perform_commit(journal_t *journal)
>  	struct ext4_fc_head head;
>  	struct inode *inode;
>  	struct blk_plug plug;
> +	unsigned int nofs;
>  	int ret = 0;
>  	u32 crc = 0;
>  
> @@ -1118,6 +1119,7 @@ static int ext4_fc_perform_commit(journal_t *journal)
>  		blkdev_issue_flush(journal->j_fs_dev);
>  
>  	blk_start_plug(&plug);
> +	nofs = memalloc_nofs_save();
>  	/* Step 6: Write fast commit blocks to disk. */
>  	if (sbi->s_fc_bytes == 0) {
>  		/*
> @@ -1158,6 +1160,7 @@ static int ext4_fc_perform_commit(journal_t *journal)
>  
>  out:
>  	mutex_unlock(&sbi->s_fc_lock);
> +	memalloc_nofs_restore(nofs);
>  	blk_finish_plug(&plug);
>  	return ret;
>  }
> -- 
> 2.52.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

Re: [PATCH] ext4: fast commit: avoid fs_reclaim inversion in perform_commit

Posted by Li Chen 1 month ago

Hi Jan,

 ---- On Tue, 06 Jan 2026 00:17:31 +0800  Jan Kara <jack@suse.cz> wrote --- 
 > On Tue 23-12-25 21:13:42, Li Chen wrote:
 > > lockdep reports a possible deadlock due to lock order inversion:
 > > 
 > >      CPU0                    CPU1
 > >      ----                    ----
 > > lock(fs_reclaim);
 > >                              lock(&sbi->s_fc_lock);
 > >                              lock(fs_reclaim);
 > > lock(&sbi->s_fc_lock);
 > > 
 > > ext4_fc_perform_commit() holds s_fc_lock while writing the fast commit
 > > log. Allocations here can enter reclaim and take fs_reclaim, inverting
 > > with ext4_fc_del() which runs under fs_reclaim during inode eviction.
 > > Wrap Step 6 in memalloc_nofs_save()/restore() so reclaim is skipped
 > > while s_fc_lock is held.
 > > 
 > > Fixes: 6593714d67ba ("ext4: hold s_fc_lock while during fast commit")
 > > Signed-off-by: Li Chen <me@linux.beauty>
 > 
 > Thanks for the analysis and the patch! Your solution is in principle
 > correct but it's a bit fragile because there can be other instances (or we
 > can grow them in the future) where sbi->s_fc_lock is held when doing
 > allocation. The situation is that sbi->s_fc_lock can be acquired from inode
 > eviction path (ext4_clear_inode()) and thus this lock is inherently reclaim
 > unsafe. What we do in such cases is that we create helper functions for
 > acquiring / releasing the lock while also setting proper context and using
 > these helpers - like in commit 00d873c17e29 ("ext4: avoid deadlock in fs
 > reclaim with page writeback"). Can you perhaps modify your patch to follow
 > that behavior as well?

Thanks a lot for your suggestion, I have added helpers here: https://lore.kernel.org/linux-ext4/20260106120621.440126-1-me@linux.beauty/T/#u
Please take a look, thanks.
(But I didn't add v2 reroll count there, because I mistakenly remembered that this was an RFC, sorry for this)

Regards,
Li