ext4: possible circular locking dependency at ext4_xattr_inode_create

Posted by Sergey Senozhatsky 1 year, 2 months ago

Hi,

I've a following syzkaller report (no reproducer); the report is
against 5.15, but the same call-chain seems possible in current
upstream as well.  So I suspect that maybe ext4_xattr_inode_create()
should take nested inode_lock (I_MUTEX_XATTR) instead.  Does the
patch below make any sense?

======================================================
WARNING: possible circular locking dependency detected
5.15.168-syzkaller-23766-g3f37c55c6291 #0 Not tainted
------------------------------------------------------
syz-executor297/1452 is trying to acquire lock:
ffff888120b5e750 (&ea_inode->i_rwsem#8/1){+.+.}-{3:3}, at: inode_lock
ffff888120b5e750 (&ea_inode->i_rwsem#8/1){+.+.}-{3:3}, at: ext4_xattr_inode_create
ffff888120b5e750 (&ea_inode->i_rwsem#8/1){+.+.}-{3:3}, at: ext4_xattr_inode_lookup_create
ffff888120b5e750 (&ea_inode->i_rwsem#8/1){+.+.}-{3:3}, at: ext4_xattr_set_entry+0x2aeb/0x3200

but task is already holding lock:
ffff888120b58c68 (&ei->i_data_sem/3){++++}-{3:3}, at: ext4_setattr+0x12b5/0x1950

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&ei->i_data_sem/3){++++}-{3:3}:
       down_write+0x38/0x60
       ext4_update_i_disksize
       ext4_xattr_inode_write
       ext4_xattr_inode_lookup_create
       ext4_xattr_set_entry+0x2839/0x3200
       ext4_xattr_ibody_set+0x113/0x320
       ext4_xattr_set_handle+0xa31/0x1440
       ext4_xattr_set+0x266/0x3d0
       __vfs_setxattr+0x15e/0x1c0
       __vfs_setxattr_noperm+0x128/0x5e0
       vfs_setxattr+0x1c6/0x410
       setxattr+0x1d6/0x270
       path_setxattr+0x1cc/0x2b0
       __do_sys_lsetxattr
       __se_sys_lsetxattr
       __x64_sys_lsetxattr+0xb4/0xd0
       do_syscall_x64
       do_syscall_64+0x69/0xc0
       entry_SYSCALL_64_after_hwframe+0x66/0xd0

-> #0 (&ea_inode->i_rwsem#8/1){+.+.}-{3:3}:
       check_prev_add
       check_prevs_add
       validate_chain
       __lock_acquire+0x2c95/0x7850
       lock_acquire+0x1d2/0x4e0
       down_write+0x38/0x60
       inode_lock
       ext4_xattr_inode_create
       ext4_xattr_inode_lookup_create
       ext4_xattr_set_entry+0x2aeb/0x3200
       ext4_xattr_block_set+0xdc1/0x2de0
       ext4_xattr_move_to_block
       ext4_xattr_make_inode_space
       ext4_expand_extra_isize_ea+0xe58/0x19c0
       __ext4_expand_extra_isize+0x2fd/0x400
       ext4_try_to_expand_extra_isize
       __ext4_mark_inode_dirty+0x58b/0x840
       ext4_setattr+0x1341/0x1950
       notify_change+0xafb/0xd80
       do_truncate+0x218/0x2f0
       handle_truncate
       do_open
       path_openat+0x27d3/0x2e10
       do_filp_open+0x23a/0x360
       do_sys_openat2+0x188/0x720
       do_sys_open+0x1d1/0x220
       do_syscall_x64
       do_syscall_64+0x69/0xc0
       entry_SYSCALL_64_after_hwframe+0x66/0xd0

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&ei->i_data_sem/3);
                               lock(&ea_inode->i_rwsem#8/1);
                               lock(&ei->i_data_sem/3);
  lock(&ea_inode->i_rwsem#8/1);

 *** DEADLOCK ***

5 locks held by syz-executor297/1452:
 #0: ffff88811231c460 (sb_writers#5){.+.+}-{0:0}, at: mnt_want_write+0x3b/0x80
 #1: ffff888120b58de0 (&sb->s_type->i_mutex_key#8){++++}-{3:3}, at: inode_lock
 #1: ffff888120b58de0 (&sb->s_type->i_mutex_key#8){++++}-{3:3}, at: do_truncate+0x204/0x2f0
 #2: ffff888120b58f80 (mapping.invalidate_lock){++++}-{3:3}, at: filemap_invalidate_lock
 #2: ffff888120b58f80 (mapping.invalidate_lock){++++}-{3:3}, at: ext4_setattr+0xd49/0x1950
 #3: ffff888120b58c68 (&ei->i_data_sem/3){++++}-{3:3}, at: ext4_setattr+0x12b5/0x1950
 #4: ffff888120b58ab8 (&ei->xattr_sem){++++}-{3:3}, at: ext4_write_trylock_xattr
 #4: ffff888120b58ab8 (&ei->xattr_sem){++++}-{3:3}, at: ext4_try_to_expand_extra_isize
 #4: ffff888120b58ab8 (&ei->xattr_sem){++++}-{3:3}, at: __ext4_mark_inode_dirty+0x4f7/0x840

stack backtrace:
Call Trace:
 <TASK>
 __dump_stack
 dump_stack_lvl+0x1e3/0x2d0
 check_noncircular+0x2f3/0x3a0
 check_prev_add
 check_prevs_add
 validate_chain
 __lock_acquire+0x2c95/0x7850
 lock_acquire+0x1d2/0x4e0
 down_write+0x38/0x60
 inode_lock
 ext4_xattr_inode_create
 ext4_xattr_inode_lookup_create
 ext4_xattr_set_entry+0x2aeb/0x3200
 ext4_xattr_block_set+0xdc1/0x2de0
 ext4_xattr_move_to_block
 ext4_xattr_make_inode_space
 ext4_expand_extra_isize_ea+0xe58/0x19c0
 __ext4_expand_extra_isize+0x2fd/0x400
 ext4_try_to_expand_extra_isize
 __ext4_mark_inode_dirty+0x58b/0x840
 ext4_setattr+0x1341/0x1950
 notify_change+0xafb/0xd80
 do_truncate+0x218/0x2f0
 handle_truncate
 do_open
 path_openat+0x27d3/0x2e10
 do_filp_open+0x23a/0x360
 do_sys_openat2+0x188/0x720
 do_sys_open+0x1d1/0x220
 do_syscall_x64
 do_syscall_64+0x69/0xc0
 entry_SYSCALL_64_after_hwframe+0x66/0xd0

---

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 7647e9f6e190..db3c68fbbadf 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -1511,7 +1511,7 @@ static struct inode *ext4_xattr_inode_create(handle_t *handle,
 		 */
 		dquot_free_inode(ea_inode);
 		dquot_drop(ea_inode);
-		inode_lock(ea_inode);
+		inode_lock_nested(inode, I_MUTEX_XATTR);
 		ea_inode->i_flags |= S_NOQUOTA;
 		inode_unlock(ea_inode);
 	}

Re: ext4: possible circular locking dependency at ext4_xattr_inode_create

Posted by Theodore Ts'o 1 year, 2 months ago

On Tue, Nov 12, 2024 at 04:34:21PM +0900, Sergey Senozhatsky wrote:
> 
> I've a following syzkaller report (no reproducer); the report is
> against 5.15, but the same call-chain seems possible in current
> upstream as well.  So I suspect that maybe ext4_xattr_inode_create()
> should take nested inode_lock (I_MUTEX_XATTR) instead.  Does the
> patch below make any sense?

These syzkaller reports result from mounting a corrupted (fuzzed) file
system typically when an inode is used in multiple contexts (e.g., as
a directory and an EA inode, etc.) at the same time.

I'd have to take a closer look to see if it makes sense, but in
general, very often whenever we try to fix one of these it ends up
triggering some other syzkaller failure.  And, these sorts of things
don't actually result in actual security problems (at worst, a hang /
denial of service attack), and the right thing to do is to just run
fsck on the !@#?!? file system before mounting the thing.

The best way to protect systems against threat model of users picking
up a random USB stick dropped in a parking lot that contains a
maliciously fuzzed file system is to either (a) run fsck before
allowing the file system to be mounted, (b) enable the enterprise
policy that prohibits USB thumb drives from being automounted, or (c)
mount USB stick in some kind of VM (e.g., CrosVM) and then use a
reverse virtiofs / 9pfs / fuse to make the file system be available in
the host system.

The last would be best solution, but it would require development
work.  So I mention it in the hopes that at some point I can convince
some company to pick it up, since it would significantly improve
security for all desktops, laptops, and mobile systems that want to
support mounting removeable storage.

In any case, trying to fix these sorts of syzkaller warnings is
essentially playing whack-a-mole, and so while I don't have objections
to these sorts of fixes, if it causes any kind of regression or worse,
*two* new syzkaller failures, it just makes life harder for overworked
ext4 developers.  :-)

Cheers,

						- Ted

Re: ext4: possible circular locking dependency at ext4_xattr_inode_create

Posted by Sergey Senozhatsky 1 year, 2 months ago

Hi Ted,

On (24/11/12 10:29), Theodore Ts'o wrote:
> > I've a following syzkaller report (no reproducer); the report is
> > against 5.15, but the same call-chain seems possible in current
> > upstream as well.  So I suspect that maybe ext4_xattr_inode_create()
> > should take nested inode_lock (I_MUTEX_XATTR) instead.  Does the
> > patch below make any sense?
> 
> These syzkaller reports result from mounting a corrupted (fuzzed) file
> system typically when an inode is used in multiple contexts (e.g., as
> a directory and an EA inode, etc.) at the same time.

I certainly see your point, and I don't argue.

> I'd have to take a closer look to see if it makes sense, but in
> general, very often whenever we try to fix one of these it ends up
> triggering some other syzkaller failure.

I see, the one-liner that I posted sort of looks like an addition to
d1bc560e9a9c7 which landed in ext4 recently.

> And, these sorts of things don't actually result in actual security
> problems (at worst, a hang / denial of service attack), and the right
> thing to do is to just run fsck on the !@#?!? file system before
> mounting the thing.

So in our particular case reboot is a bad scenario.  Looking at reports
from the fleet I see a bunch of hung-task reboots with ext4 frames,
e.g. ext4_update_i_disksize()->down_write()->schedule() /* forever */,
but I can't claim that this is the deadlock that syzkaller has reported,
it very well might not be.