[PATCH v5 00/14] Create and use APIs to centralise locking for directory ops.

NeilBrown posted 14 patches 1 month, 1 week ago
There is a newer version of this series
[PATCH v5 00/14] Create and use APIs to centralise locking for directory ops.
Posted by NeilBrown 1 month, 1 week ago
This version contains a couple of fixes thanks to Al's review and Dan's
robot, and a couple of s-o-b and a-b tags.
It is based on vfs-all (e90fbb585e64).

Previous description:

 this series is the next part of my effort to change directory-op
 locking to allow multiple concurrent ops in a directory.  Ultimately we
 will (in my plan) lock the target dentry(s) rather than the whole
 parent directory.

 To help with changing the locking protocol, this series centralises
 locking and lookup in some helpers.  The various helpers are introduced
 and then used in the same patch - roughly one patch per helper though
 with various exceptions.

 I haven't introduced these helpers into the various filesystems that
 Al's tree-in-dcache series is changing.  That series introduces and
 uses similar helpers tuned to the specific needs of that set of
 filesystems.  Ultimately all the helpers will use the same backends
 which can then be adjusted when it is time to change the locking
 protocol.

 One change that deserves highlighting is in patch 13 where vfs_mkdir()
 is changed to unlock the parent on failure, as well as the current
 behaviour of dput()ing the dentry on failure.  Once this change is in
 place, the final step of both create and an remove sequences only
 requires the target dentry, not the parent.  So e.g.  end_creating() is
 only given the dentry (which may be IS_ERR() after vfs_mkdir()).  This
 helps establish the pattern that it is the dentry that is being locked
 and unlocked (the lock is currently held on dentry->d_parent->d_inode,
 but that can change).

 Please review the changes I've made to your respective code areas and
 let us know of any problems.

Thanks,
NeilBrown

 [PATCH v5 01/14] debugfs: rename end_creating() to
 [PATCH v5 02/14] VFS: introduce start_dirop() and end_dirop()
 [PATCH v5 03/14] VFS: tidy up do_unlinkat()
 [PATCH v5 04/14] VFS/nfsd/cachefiles/ovl: add start_creating() and
 [PATCH v5 05/14] VFS/nfsd/cachefiles/ovl: introduce start_removing()
 [PATCH v5 06/14] VFS: introduce start_creating_noperm() and
 [PATCH v5 07/14] VFS: introduce start_removing_dentry()
 [PATCH v5 08/14] VFS: add start_creating_killable() and
 [PATCH v5 09/14] VFS/nfsd/ovl: introduce start_renaming() and
 [PATCH v5 10/14] VFS/ovl/smb: introduce start_renaming_dentry()
 [PATCH v5 11/14] Add start_renaming_two_dentries()
 [PATCH v5 12/14] ecryptfs: use new start_creating/start_removing APIs
 [PATCH v5 13/14] VFS: change vfs_mkdir() to unlock on failure.
 [PATCH v5 14/14] VFS: introduce end_creating_keep()
[syzbot ci] Re: Create and use APIs to centralise locking for directory ops.
Posted by syzbot ci 1 month, 1 week ago
syzbot ci has tested the following series

[v5] Create and use APIs to centralise locking for directory ops.
https://lore.kernel.org/all/20251106005333.956321-1-neilb@ownmail.net
* [PATCH v5 01/14] debugfs: rename end_creating() to debugfs_end_creating()
* [PATCH v5 02/14] VFS: introduce start_dirop() and end_dirop()
* [PATCH v5 03/14] VFS: tidy up do_unlinkat()
* [PATCH v5 04/14] VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating()
* [PATCH v5 05/14] VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing()
* [PATCH v5 06/14] VFS: introduce start_creating_noperm() and start_removing_noperm()
* [PATCH v5 07/14] VFS: introduce start_removing_dentry()
* [PATCH v5 08/14] VFS: add start_creating_killable() and start_removing_killable()
* [PATCH v5 09/14] VFS/nfsd/ovl: introduce start_renaming() and end_renaming()
* [PATCH v5 10/14] VFS/ovl/smb: introduce start_renaming_dentry()
* [PATCH v5 11/14] Add start_renaming_two_dentries()
* [PATCH v5 12/14] ecryptfs: use new start_creating/start_removing APIs
* [PATCH v5 13/14] VFS: change vfs_mkdir() to unlock on failure.
* [PATCH v5 14/14] VFS: introduce end_creating_keep()

and found the following issues:
* WARNING: lock held when returning to user space in start_creating
* possible deadlock in mnt_want_write

Full report is available here:
https://ci.syzbot.org/series/4f406e4d-6aba-457a-b9c1-21f4407176a0

***

WARNING: lock held when returning to user space in start_creating

tree:      torvalds
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base:      6146a0f1dfae5d37442a9ddcba012add260bceb0
arch:      amd64
compiler:  Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config:    https://ci.syzbot.org/builds/49013fb4-56ed-423c-8e15-252d65d5c1b4/config
C repro:   https://ci.syzbot.org/findings/403597e5-81d3-4a9e-8d43-cf15c00b3265/c_repro
syz repro: https://ci.syzbot.org/findings/403597e5-81d3-4a9e-8d43-cf15c00b3265/syz_repro

UDF-fs: INFO Mounting volume 'LinuxUDF', timestamp 2022/11/22 14:59 (1000)
overlayfs: upper fs needs to support d_type.
overlayfs: upper fs does not support tmpfile.
================================================
WARNING: lock held when returning to user space!
syzkaller #0 Not tainted
------------------------------------------------
syz.0.17/5964 is leaving the kernel with locks still held!
1 lock held by syz.0.17/5964:
 #0: ffff888119a282a0 (&type->i_mutex_dir_key#8/1){+.+.}-{4:4}, at: inode_lock_nested include/linux/fs.h:1025 [inline]
 #0: ffff888119a282a0 (&type->i_mutex_dir_key#8/1){+.+.}-{4:4}, at: __start_dirop fs/namei.c:2794 [inline]
 #0: ffff888119a282a0 (&type->i_mutex_dir_key#8/1){+.+.}-{4:4}, at: start_dirop fs/namei.c:2805 [inline]
 #0: ffff888119a282a0 (&type->i_mutex_dir_key#8/1){+.+.}-{4:4}, at: start_creating+0xbe/0x100 fs/namei.c:3261


***

possible deadlock in mnt_want_write

tree:      torvalds
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base:      6146a0f1dfae5d37442a9ddcba012add260bceb0
arch:      amd64
compiler:  Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config:    https://ci.syzbot.org/builds/49013fb4-56ed-423c-8e15-252d65d5c1b4/config
syz repro: https://ci.syzbot.org/findings/7d1f626d-9979-4c5b-b36b-5616a983b0ac/syz_repro

======================================================
WARNING: possible circular locking dependency detected
syzkaller #0 Not tainted
------------------------------------------------------
syz.0.17/6011 is trying to acquire lock:
ffff88810943c420
 (sb_writers#12){.+.+}-{0:0}, at: mnt_want_write+0x41/0x90 fs/namespace.c:508

but task is already holding lock:
ffff888169f40940 (&type->i_mutex_dir_key#5/1){+.+.}-{4:4}, at: inode_lock_nested include/linux/fs.h:1025 [inline]
ffff888169f40940 (&type->i_mutex_dir_key#5/1){+.+.}-{4:4}, at: __start_dirop fs/namei.c:2794 [inline]
ffff888169f40940 (&type->i_mutex_dir_key#5/1){+.+.}-{4:4}, at: start_dirop fs/namei.c:2805 [inline]
ffff888169f40940 (&type->i_mutex_dir_key#5/1){+.+.}-{4:4}, at: start_creating+0xbe/0x100 fs/namei.c:3261

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&type->i_mutex_dir_key#5/1){+.+.}-{4:4}:
       reacquire_held_locks+0x127/0x1d0 kernel/locking/lockdep.c:5385
       __lock_release kernel/locking/lockdep.c:5574 [inline]
       lock_release+0x1b4/0x3e0 kernel/locking/lockdep.c:5889
       up_write+0x2d/0x420 kernel/locking/rwsem.c:1642
       inode_unlock include/linux/fs.h:990 [inline]
       end_dirop fs/namei.c:2818 [inline]
       end_creating include/linux/namei.h:125 [inline]
       vfs_mkdir+0x111/0x570 fs/namei.c:5037
       do_mkdirat+0x247/0x5e0 fs/namei.c:5058
       __do_sys_mkdir fs/namei.c:5080 [inline]
       __se_sys_mkdir fs/namei.c:5078 [inline]
       __x64_sys_mkdir+0x6c/0x80 fs/namei.c:5078
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #0 (sb_writers#12){.+.+}-{0:0}:
       check_prev_add kernel/locking/lockdep.c:3165 [inline]
       check_prevs_add kernel/locking/lockdep.c:3284 [inline]
       validate_chain+0xb9b/0x2140 kernel/locking/lockdep.c:3908
       __lock_acquire+0xab9/0xd20 kernel/locking/lockdep.c:5237
       lock_acquire+0x120/0x360 kernel/locking/lockdep.c:5868
       percpu_down_read_internal include/linux/percpu-rwsem.h:53 [inline]
       percpu_down_read_freezable include/linux/percpu-rwsem.h:83 [inline]
       __sb_start_write include/linux/fs.h:1916 [inline]
       sb_start_write+0x4d/0x1c0 include/linux/fs.h:2052
       mnt_want_write+0x41/0x90 fs/namespace.c:508
       filename_create+0x14f/0x360 fs/namei.c:4785
       do_mkdirat+0x32c/0x5e0 fs/namei.c:5050
       __do_sys_mkdir fs/namei.c:5080 [inline]
       __se_sys_mkdir fs/namei.c:5078 [inline]
       __x64_sys_mkdir+0x6c/0x80 fs/namei.c:5078
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&type->i_mutex_dir_key#5/1);
                               lock(sb_writers#12);
                               lock(&type->i_mutex_dir_key#5/1);
  rlock(sb_writers#12);

 *** DEADLOCK ***

1 lock held by syz.0.17/6011:
 #0: ffff888169f40940 (&type->i_mutex_dir_key#5/1){+.+.}-{4:4}, at: inode_lock_nested include/linux/fs.h:1025 [inline]
 #0: ffff888169f40940 (&type->i_mutex_dir_key#5/1){+.+.}-{4:4}, at: __start_dirop fs/namei.c:2794 [inline]
 #0: ffff888169f40940 (&type->i_mutex_dir_key#5/1){+.+.}-{4:4}, at: start_dirop fs/namei.c:2805 [inline]
 #0: ffff888169f40940 (&type->i_mutex_dir_key#5/1){+.+.}-{4:4}, at: start_creating+0xbe/0x100 fs/namei.c:3261

stack backtrace:
CPU: 1 UID: 0 PID: 6011 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
 print_circular_bug+0x2ee/0x310 kernel/locking/lockdep.c:2043
 check_noncircular+0x134/0x160 kernel/locking/lockdep.c:2175
 check_prev_add kernel/locking/lockdep.c:3165 [inline]
 check_prevs_add kernel/locking/lockdep.c:3284 [inline]
 validate_chain+0xb9b/0x2140 kernel/locking/lockdep.c:3908
 __lock_acquire+0xab9/0xd20 kernel/locking/lockdep.c:5237
 lock_acquire+0x120/0x360 kernel/locking/lockdep.c:5868
 percpu_down_read_internal include/linux/percpu-rwsem.h:53 [inline]
 percpu_down_read_freezable include/linux/percpu-rwsem.h:83 [inline]
 __sb_start_write include/linux/fs.h:1916 [inline]
 sb_start_write+0x4d/0x1c0 include/linux/fs.h:2052
 mnt_want_write+0x41/0x90 fs/namespace.c:508
 filename_create+0x14f/0x360 fs/namei.c:4785
 do_mkdirat+0x32c/0x5e0 fs/namei.c:5050
 __do_sys_mkdir fs/namei.c:5080 [inline]
 __se_sys_mkdir fs/namei.c:5078 [inline]
 __x64_sys_mkdir+0x6c/0x80 fs/namei.c:5078
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fdc9a98efc9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fdc9b79b038 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
RAX: ffffffffffffffda RBX: 00007fdc9abe5fa0 RCX: 00007fdc9a98efc9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00002000000008c0
RBP: 00007fdc9aa11f91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fdc9abe6038 R14: 00007fdc9abe5fa0 R15: 00007ffe4d481c38
 </TASK>


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.
Re: [syzbot ci] Re: Create and use APIs to centralise locking for directory ops.
Posted by NeilBrown 1 month ago
On Thu, 06 Nov 2025, syzbot ci wrote:
> syzbot ci has tested the following series
> 
> [v5] Create and use APIs to centralise locking for directory ops.
> https://lore.kernel.org/all/20251106005333.956321-1-neilb@ownmail.net
> * [PATCH v5 01/14] debugfs: rename end_creating() to debugfs_end_creating()
> * [PATCH v5 02/14] VFS: introduce start_dirop() and end_dirop()
> * [PATCH v5 03/14] VFS: tidy up do_unlinkat()
> * [PATCH v5 04/14] VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating()
> * [PATCH v5 05/14] VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing()
> * [PATCH v5 06/14] VFS: introduce start_creating_noperm() and start_removing_noperm()
> * [PATCH v5 07/14] VFS: introduce start_removing_dentry()
> * [PATCH v5 08/14] VFS: add start_creating_killable() and start_removing_killable()
> * [PATCH v5 09/14] VFS/nfsd/ovl: introduce start_renaming() and end_renaming()
> * [PATCH v5 10/14] VFS/ovl/smb: introduce start_renaming_dentry()
> * [PATCH v5 11/14] Add start_renaming_two_dentries()
> * [PATCH v5 12/14] ecryptfs: use new start_creating/start_removing APIs
> * [PATCH v5 13/14] VFS: change vfs_mkdir() to unlock on failure.
> * [PATCH v5 14/14] VFS: introduce end_creating_keep()
> 
> and found the following issues:
> * WARNING: lock held when returning to user space in start_creating
> * possible deadlock in mnt_want_write
> 
> Full report is available here:
> https://ci.syzbot.org/series/4f406e4d-6aba-457a-b9c1-21f4407176a0
> 
> ***
> 
> WARNING: lock held when returning to user space in start_creating

I think this was due to a bug in 
   VFS: change vfs_mkdir() to unlock on failure.
in ovl_create_real()

That patch removed a end_creating() call that was after
ovl_create_real() returned failure, but didn't add end_creating() in
ovl_create_real() on failure.  So it could exit with the lock still
held.

This patch should fix it, particularly the second hunk.

Thanks,
NeilBrown

diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index a4a0dc261310..739f974dc258 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -187,7 +187,7 @@ struct dentry *ovl_create_real(struct ovl_fs *ofs, struct dentry *parent,
 			if (!err && ofs->casefold != ovl_dentry_casefolded(newdentry)) {
 				pr_warn_ratelimited("wrong inherited casefold (%pd2)\n",
 						    newdentry);
-				dput(newdentry);
+				end_creating(newdentry);
 				err = -EINVAL;
 			}
 			break;
@@ -237,8 +237,7 @@ struct dentry *ovl_create_real(struct ovl_fs *ofs, struct dentry *parent,
 	}
 out:
 	if (err) {
-		if (!IS_ERR(newdentry))
-			dput(newdentry);
+		end_creating(newdentry);
 		return ERR_PTR(err);
 	}
 	return newdentry;