[v2] jbd2: fix deadlock in jbd2_journal_cancel_revoke()

[PATCH v2] jbd2: fix deadlock in jbd2_journal_cancel_revoke()

Posted by Zhang Yi 2 months ago

From: Zhang Yi <yi.zhang@huawei.com>

Commit f76d4c28a46a ("fs/jbd2: use sleeping version of
__find_get_block()") changed jbd2_journal_cancel_revoke() to use
__find_get_block_nonatomic() which holds the folio lock instead of
i_private_lock. This breaks the lock ordering (folio -> buffer) and
causes an ABBA deadlock when the filesystem blocksize < pagesize:

     T1                                T2
ext4_mkdir()
 ext4_init_new_dir()
  ext4_append()
   ext4_getblk()
    lock_buffer()    <- A
                                   sync_blockdev()
                                    blkdev_writepages()
                                     writeback_iter()
                                      writeback_get_folio()
                                       folio_lock()   <- B
     ext4_journal_get_create_access()
      jbd2_journal_cancel_revoke()
       __find_get_block_nonatomic()
        folio_lock()  <- B
                                     block_write_full_folio()
                                      lock_buffer()   <- A

This can occasionally cause generic/013 to hang.

Fix by only calling __find_get_block_nonatomic() when the passed
buffer_head doesn't belong to the bdev, which is the only case that we
need to look up its bdev alias. Otherwise, the lookup is redundant since
the found buffer_head is equal to the one we passed in.

Fixes: f76d4c28a46a ("fs/jbd2: use sleeping version of __find_get_block()")
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
v1->v2:
 - Switch to using sb_is_blkdev_sb() to check whether the bh belongs to
   the bdev.

 fs/jbd2/revoke.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/jbd2/revoke.c b/fs/jbd2/revoke.c
index 9016ddb82447..e4c2fbd381f1 100644
--- a/fs/jbd2/revoke.c
+++ b/fs/jbd2/revoke.c
@@ -428,6 +428,7 @@ void jbd2_journal_cancel_revoke(handle_t *handle, struct journal_head *jh)
 	journal_t *journal = handle->h_transaction->t_journal;
 	int need_cancel;
 	struct buffer_head *bh = jh2bh(jh);
+	struct address_space *bh_mapping = bh->b_folio->mapping;
 
 	jbd2_debug(4, "journal_head %p, cancelling revoke\n", jh);
 
@@ -464,13 +465,14 @@ void jbd2_journal_cancel_revoke(handle_t *handle, struct journal_head *jh)
 	 * buffer_head?  If so, we'd better make sure we clear the
 	 * revoked status on any hashed alias too, otherwise the revoke
 	 * state machine will get very upset later on. */
-	if (need_cancel) {
+	if (need_cancel && !sb_is_blkdev_sb(bh_mapping->host->i_sb)) {
 		struct buffer_head *bh2;
+
 		bh2 = __find_get_block_nonatomic(bh->b_bdev, bh->b_blocknr,
 						 bh->b_size);
 		if (bh2) {
-			if (bh2 != bh)
-				clear_buffer_revoked(bh2);
+			WARN_ON_ONCE(bh2 == bh);
+			clear_buffer_revoked(bh2);
 			__brelse(bh2);
 		}
 	}
-- 
2.52.0

Re: [PATCH v2] jbd2: fix deadlock in jbd2_journal_cancel_revoke()

Posted by Theodore Ts'o 2 months ago

On Thu, 09 Apr 2026 19:42:03 +0800, Zhang Yi wrote:
> Commit f76d4c28a46a ("fs/jbd2: use sleeping version of
> __find_get_block()") changed jbd2_journal_cancel_revoke() to use
> __find_get_block_nonatomic() which holds the folio lock instead of
> i_private_lock. This breaks the lock ordering (folio -> buffer) and
> causes an ABBA deadlock when the filesystem blocksize < pagesize:
> 
>      T1                                T2
> ext4_mkdir()
>  ext4_init_new_dir()
>   ext4_append()
>    ext4_getblk()
>     lock_buffer()    <- A
>                                    sync_blockdev()
>                                     blkdev_writepages()
>                                      writeback_iter()
>                                       writeback_get_folio()
>                                        folio_lock()   <- B
>      ext4_journal_get_create_access()
>       jbd2_journal_cancel_revoke()
>        __find_get_block_nonatomic()
>         folio_lock()  <- B
>                                      block_write_full_folio()
>                                       lock_buffer()   <- A
> 
> [...]

Applied, thanks!

[1/1] jbd2: fix deadlock in jbd2_journal_cancel_revoke()
      commit: 981fcc5674e67158d24d23e841523eccba19d0e7

Best regards,
-- 
Theodore Ts'o <tytso@mit.edu>

Re: [PATCH v2] jbd2: fix deadlock in jbd2_journal_cancel_revoke()

Posted by Jan Kara 2 months ago

On Thu 09-04-26 19:42:03, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> Commit f76d4c28a46a ("fs/jbd2: use sleeping version of
> __find_get_block()") changed jbd2_journal_cancel_revoke() to use
> __find_get_block_nonatomic() which holds the folio lock instead of
> i_private_lock. This breaks the lock ordering (folio -> buffer) and
> causes an ABBA deadlock when the filesystem blocksize < pagesize:
> 
>      T1                                T2
> ext4_mkdir()
>  ext4_init_new_dir()
>   ext4_append()
>    ext4_getblk()
>     lock_buffer()    <- A
>                                    sync_blockdev()
>                                     blkdev_writepages()
>                                      writeback_iter()
>                                       writeback_get_folio()
>                                        folio_lock()   <- B
>      ext4_journal_get_create_access()
>       jbd2_journal_cancel_revoke()
>        __find_get_block_nonatomic()
>         folio_lock()  <- B
>                                      block_write_full_folio()
>                                       lock_buffer()   <- A
> 
> This can occasionally cause generic/013 to hang.
> 
> Fix by only calling __find_get_block_nonatomic() when the passed
> buffer_head doesn't belong to the bdev, which is the only case that we
> need to look up its bdev alias. Otherwise, the lookup is redundant since
> the found buffer_head is equal to the one we passed in.
> 
> Fixes: f76d4c28a46a ("fs/jbd2: use sleeping version of __find_get_block()")
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Indeed, good catch! Sadly, lockep didn't tell us about this as it tracks
neither buffer locks nor folio locks... The patch looks good. Feel free to
add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza
> ---
> v1->v2:
>  - Switch to using sb_is_blkdev_sb() to check whether the bh belongs to
>    the bdev.
> 
>  fs/jbd2/revoke.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/jbd2/revoke.c b/fs/jbd2/revoke.c
> index 9016ddb82447..e4c2fbd381f1 100644
> --- a/fs/jbd2/revoke.c
> +++ b/fs/jbd2/revoke.c
> @@ -428,6 +428,7 @@ void jbd2_journal_cancel_revoke(handle_t *handle, struct journal_head *jh)
>  	journal_t *journal = handle->h_transaction->t_journal;
>  	int need_cancel;
>  	struct buffer_head *bh = jh2bh(jh);
> +	struct address_space *bh_mapping = bh->b_folio->mapping;
>  
>  	jbd2_debug(4, "journal_head %p, cancelling revoke\n", jh);
>  
> @@ -464,13 +465,14 @@ void jbd2_journal_cancel_revoke(handle_t *handle, struct journal_head *jh)
>  	 * buffer_head?  If so, we'd better make sure we clear the
>  	 * revoked status on any hashed alias too, otherwise the revoke
>  	 * state machine will get very upset later on. */
> -	if (need_cancel) {
> +	if (need_cancel && !sb_is_blkdev_sb(bh_mapping->host->i_sb)) {
>  		struct buffer_head *bh2;
> +
>  		bh2 = __find_get_block_nonatomic(bh->b_bdev, bh->b_blocknr,
>  						 bh->b_size);
>  		if (bh2) {
> -			if (bh2 != bh)
> -				clear_buffer_revoked(bh2);
> +			WARN_ON_ONCE(bh2 == bh);
> +			clear_buffer_revoked(bh2);
>  			__brelse(bh2);
>  		}
>  	}
> -- 
> 2.52.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR