From nobody Sat Oct 4 00:28:02 2025 Received: from neil.brown.name (neil.brown.name [103.29.64.221]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 09B9D639; Fri, 22 Aug 2025 00:11:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.29.64.221 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821482; cv=none; b=O9mIyW6Qet+xfnRMAsriwttxfk6psrt5cVJc8Umr6XZa8FjJFIyiMJ6AEq+TuravP6FZ7tIsPMY8FjXEKQzcJBJN/9E3hLyO7QBFuIgTmG0i6MZSeS8z7MSr1psnbF1j5nS2xuyIeKm7ZUUGxW+VhYauIDuQ9wH4LOjx0ch08b0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821482; c=relaxed/simple; bh=Wv73anJ/o0Drj+xc9I6pO/lz+Q65M5okLbn7MxX1YzE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=StfYszPxgvJkNJdkWUzVE2Oy90Yd0xnkAujAjsKK85UJa3EwscRuJaHUBp0uTh3J0jQ5KA6Aq0Cxhjex9DpEehChZIbH8cWUkZpgbedvS4SqEvo4R6mGuJPd1UdoZMgi+tQ25sLmf5GyYKv8NdaYOZxVjb1wRDgUbqMEyyaXkO4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name; spf=pass smtp.mailfrom=neil.brown.name; arc=none smtp.client-ip=103.29.64.221 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=neil.brown.name Received: from 196.186.233.220.static.exetel.com.au ([220.233.186.196] helo=home.neil.brown.name) by neil.brown.name with esmtp (Exim 4.95) (envelope-from ) id 1upFN8-006naZ-2A; Fri, 22 Aug 2025 00:11:11 +0000 From: NeilBrown To: Alexander Viro , Christian Brauner Cc: Jan Kara , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 01/16] VFS: discard err2 in filename_create() Date: Fri, 22 Aug 2025 10:00:19 +1000 Message-ID: <20250822000818.1086550-2-neil@brown.name> X-Mailer: git-send-email 2.50.0.107.gf914562f5916.dirty In-Reply-To: <20250822000818.1086550-1-neil@brown.name> References: <20250822000818.1086550-1-neil@brown.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Since 204a575e91f3 "VFS: add common error checks to lookup_one_qstr_excl()" filename_create() does not need to stash the error value from mnt_want_writ= e() into a separate variable - the logic that used to clobber 'error' after the call of mnt_want_write() has migrated into lookup_one_qstr_excl(). So there is no need for two different err variables. This patch discards "err2" and uses "error' throughout. Signed-off-by: NeilBrown --- fs/namei.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index cd43ff89fbaa..62c1e2268942 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -4114,7 +4114,6 @@ static struct dentry *filename_create(int dfd, struct= filename *name, unsigned int reval_flag =3D lookup_flags & LOOKUP_REVAL; unsigned int create_flags =3D LOOKUP_CREATE | LOOKUP_EXCL; int type; - int err2; int error; =20 error =3D filename_parentat(dfd, name, reval_flag, path, &last, &type); @@ -4129,7 +4128,7 @@ static struct dentry *filename_create(int dfd, struct= filename *name, goto out; =20 /* don't fail immediately if it's r/o, at least try to report other error= s */ - err2 =3D mnt_want_write(path->mnt); + error =3D mnt_want_write(path->mnt); /* * Do the final lookup. Suppress 'create' if there is a trailing * '/', and a directory wasn't requested. @@ -4142,17 +4141,16 @@ static struct dentry *filename_create(int dfd, stru= ct filename *name, if (IS_ERR(dentry)) goto unlock; =20 - if (unlikely(err2)) { - error =3D err2; + if (unlikely(error)) goto fail; - } + return dentry; fail: dput(dentry); dentry =3D ERR_PTR(error); unlock: inode_unlock(path->dentry->d_inode); - if (!err2) + if (!error) mnt_drop_write(path->mnt); out: path_put(path); --=20 2.50.0.107.gf914562f5916.dirty From nobody Sat Oct 4 00:28:02 2025 Received: from neil.brown.name (neil.brown.name [103.29.64.221]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 09BCFA48; Fri, 22 Aug 2025 00:11:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.29.64.221 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821483; cv=none; b=fjS+7p1W4Lna0U4wG1ewXP7gxSgxWi8ZcQrtDSuHTo1edEH6/5uPy5rvfmPw01KUbvoz+qsrb9DLBPIGjAh2nIYx7Oist7nfNsigHyHJZqsBuazXBaEY0uy93nldohhrO10sjD2Xq8o/rHmb9CsZCjYAErW/XVK8+lvkjiMfEAs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821483; c=relaxed/simple; bh=w9NqnSfAAzN7pyPgJqvIpY2bJ5Qzw/yvyMsNLpzqYdE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JandktBdpVkdnzazD+JqF4C9I807QEo/YQ/xpRvuLXbGEjYPzkbVr7XCqFRmeMssoLBXKZxdMPLHh7DAUISmQQMAsLiraFeokpHBw4M+wi088GsoyFJe4uW6r2rfwsoT5xbGC1f/C+agcQJ34lrNQQqT+ItquTtcUuxwIXcL4M4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name; spf=pass smtp.mailfrom=neil.brown.name; arc=none smtp.client-ip=103.29.64.221 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=neil.brown.name Received: from 196.186.233.220.static.exetel.com.au ([220.233.186.196] helo=home.neil.brown.name) by neil.brown.name with esmtp (Exim 4.95) (envelope-from ) id 1upFN8-006nab-HQ; Fri, 22 Aug 2025 00:11:12 +0000 From: NeilBrown To: Alexander Viro , Christian Brauner Cc: Jan Kara , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 02/16] VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata Date: Fri, 22 Aug 2025 10:00:20 +1000 Message-ID: <20250822000818.1086550-3-neil@brown.name> X-Mailer: git-send-email 2.50.0.107.gf914562f5916.dirty In-Reply-To: <20250822000818.1086550-1-neil@brown.name> References: <20250822000818.1086550-1-neil@brown.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" A rename can only rename with a single mount. Callers of vfs_rename() must and do ensure this is the case. So there is no point in having two mnt_idmaps in renamedata as they are always the same. Only of of them is passed to ->rename in any case. This patch replaces both with a single "mnt_idmap" and changes all callers. Reviewed-by: Jeff Layton Signed-off-by: NeilBrown --- fs/cachefiles/namei.c | 3 +-- fs/ecryptfs/inode.c | 3 +-- fs/namei.c | 17 ++++++++--------- fs/nfsd/vfs.c | 3 +-- fs/overlayfs/overlayfs.h | 3 +-- fs/smb/server/vfs.c | 3 +-- include/linux/fs.h | 6 ++---- 7 files changed, 15 insertions(+), 23 deletions(-) diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c index 91dfd0231877..d1edb2ac3837 100644 --- a/fs/cachefiles/namei.c +++ b/fs/cachefiles/namei.c @@ -387,10 +387,9 @@ int cachefiles_bury_object(struct cachefiles_cache *ca= che, cachefiles_io_error(cache, "Rename security error %d", ret); } else { struct renamedata rd =3D { - .old_mnt_idmap =3D &nop_mnt_idmap, + .mnt_idmap =3D &nop_mnt_idmap, .old_parent =3D dir, .old_dentry =3D rep, - .new_mnt_idmap =3D &nop_mnt_idmap, .new_parent =3D cache->graveyard, .new_dentry =3D grave, }; diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c index 72fbe1316ab8..abd954c6a14e 100644 --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -634,10 +634,9 @@ ecryptfs_rename(struct mnt_idmap *idmap, struct inode = *old_dir, goto out_lock; } =20 - rd.old_mnt_idmap =3D &nop_mnt_idmap; + rd.mnt_idmap =3D &nop_mnt_idmap; rd.old_parent =3D lower_old_dir_dentry; rd.old_dentry =3D lower_old_dentry; - rd.new_mnt_idmap =3D &nop_mnt_idmap; rd.new_parent =3D lower_new_dir_dentry; rd.new_dentry =3D lower_new_dentry; rc =3D vfs_rename(&rd); diff --git a/fs/namei.c b/fs/namei.c index 62c1e2268942..1d5fdcbe1828 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -5022,20 +5022,20 @@ int vfs_rename(struct renamedata *rd) if (source =3D=3D target) return 0; =20 - error =3D may_delete(rd->old_mnt_idmap, old_dir, old_dentry, is_dir); + error =3D may_delete(rd->mnt_idmap, old_dir, old_dentry, is_dir); if (error) return error; =20 if (!target) { - error =3D may_create(rd->new_mnt_idmap, new_dir, new_dentry); + error =3D may_create(rd->mnt_idmap, new_dir, new_dentry); } else { new_is_dir =3D d_is_dir(new_dentry); =20 if (!(flags & RENAME_EXCHANGE)) - error =3D may_delete(rd->new_mnt_idmap, new_dir, + error =3D may_delete(rd->mnt_idmap, new_dir, new_dentry, is_dir); else - error =3D may_delete(rd->new_mnt_idmap, new_dir, + error =3D may_delete(rd->mnt_idmap, new_dir, new_dentry, new_is_dir); } if (error) @@ -5050,13 +5050,13 @@ int vfs_rename(struct renamedata *rd) */ if (new_dir !=3D old_dir) { if (is_dir) { - error =3D inode_permission(rd->old_mnt_idmap, source, + error =3D inode_permission(rd->mnt_idmap, source, MAY_WRITE); if (error) return error; } if ((flags & RENAME_EXCHANGE) && new_is_dir) { - error =3D inode_permission(rd->new_mnt_idmap, target, + error =3D inode_permission(rd->mnt_idmap, target, MAY_WRITE); if (error) return error; @@ -5124,7 +5124,7 @@ int vfs_rename(struct renamedata *rd) if (error) goto out; } - error =3D old_dir->i_op->rename(rd->new_mnt_idmap, old_dir, old_dentry, + error =3D old_dir->i_op->rename(rd->mnt_idmap, old_dir, old_dentry, new_dir, new_dentry, flags); if (error) goto out; @@ -5267,10 +5267,9 @@ int do_renameat2(int olddfd, struct filename *from, = int newdfd, =20 rd.old_parent =3D old_path.dentry; rd.old_dentry =3D old_dentry; - rd.old_mnt_idmap =3D mnt_idmap(old_path.mnt); + rd.mnt_idmap =3D mnt_idmap(old_path.mnt); rd.new_parent =3D new_path.dentry; rd.new_dentry =3D new_dentry; - rd.new_mnt_idmap =3D mnt_idmap(new_path.mnt); rd.delegated_inode =3D &delegated_inode; rd.flags =3D flags; error =3D vfs_rename(&rd); diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 98ab55ba3ced..5f3e99f956ca 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1943,10 +1943,9 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *f= fhp, char *fname, int flen, goto out_dput_old; } else { struct renamedata rd =3D { - .old_mnt_idmap =3D &nop_mnt_idmap, + .mnt_idmap =3D &nop_mnt_idmap, .old_parent =3D fdentry, .old_dentry =3D odentry, - .new_mnt_idmap =3D &nop_mnt_idmap, .new_parent =3D tdentry, .new_dentry =3D ndentry, }; diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h index bb0d7ded8e76..4f84abaa0d68 100644 --- a/fs/overlayfs/overlayfs.h +++ b/fs/overlayfs/overlayfs.h @@ -361,10 +361,9 @@ static inline int ovl_do_rename(struct ovl_fs *ofs, st= ruct dentry *olddir, { int err; struct renamedata rd =3D { - .old_mnt_idmap =3D ovl_upper_mnt_idmap(ofs), + .mnt_idmap =3D ovl_upper_mnt_idmap(ofs), .old_parent =3D olddir, .old_dentry =3D olddentry, - .new_mnt_idmap =3D ovl_upper_mnt_idmap(ofs), .new_parent =3D newdir, .new_dentry =3D newdentry, .flags =3D flags, diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c index 04539037108c..07739055ac9f 100644 --- a/fs/smb/server/vfs.c +++ b/fs/smb/server/vfs.c @@ -770,10 +770,9 @@ int ksmbd_vfs_rename(struct ksmbd_work *work, const st= ruct path *old_path, goto out4; } =20 - rd.old_mnt_idmap =3D mnt_idmap(old_path->mnt), + rd.mnt_idmap =3D mnt_idmap(old_path->mnt), rd.old_parent =3D old_parent, rd.old_dentry =3D old_child, - rd.new_mnt_idmap =3D mnt_idmap(new_path.mnt), rd.new_parent =3D new_path.dentry, rd.new_dentry =3D new_dentry, rd.flags =3D flags, diff --git a/include/linux/fs.h b/include/linux/fs.h index d7ab4f96d705..73b39e5bb9e4 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2008,20 +2008,18 @@ int vfs_unlink(struct mnt_idmap *, struct inode *, = struct dentry *, =20 /** * struct renamedata - contains all information required for renaming - * @old_mnt_idmap: idmap of the old mount the inode was found from + * @mnt_idmap: idmap of the mount in which the rename is happening. * @old_parent: parent of source * @old_dentry: source - * @new_mnt_idmap: idmap of the new mount the inode was found from * @new_parent: parent of destination * @new_dentry: destination * @delegated_inode: returns an inode needing a delegation break * @flags: rename flags */ struct renamedata { - struct mnt_idmap *old_mnt_idmap; + struct mnt_idmap *mnt_idmap; struct dentry *old_parent; struct dentry *old_dentry; - struct mnt_idmap *new_mnt_idmap; struct dentry *new_parent; struct dentry *new_dentry; struct inode **delegated_inode; --=20 2.50.0.107.gf914562f5916.dirty From nobody Sat Oct 4 00:28:02 2025 Received: from neil.brown.name (neil.brown.name [103.29.64.221]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9A5AF155C97; Fri, 22 Aug 2025 00:11:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.29.64.221 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821488; cv=none; b=Fxf9ilFmM6mnWY414bDxKzcfXodyU5HuuuS3ULquYF3ju03t3QWxmfs1HTN+aBgtqm2GaIL3QSrGVnkSgd4LIJbmzoXBXhMyHumR1zCSHCNFlHBDlFj/azIWM7hlgfQI5lVgPGNZ6wgwMaCUQbR4ZjoA2MSGxHS7iKITa1wwvYE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821488; c=relaxed/simple; bh=UDkmKL/2p8aQSv8lXFxCvcefVp3lmKhPsUMxUO+pPAU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jCFz0g4cdGf1xFEl5kusO4ZpWd6dCSvYoiJSI7Z7VHcPx83YXLOQQ7oh4+2hR7zdC0LkSB/nBh5azQn9W70zmNBhwuqsfKX2aDhda6XInkIJzWfYfehEYn2nxftO2/mpmSVpzCDi4I7G0VsBB6riMDMJp6mKcoUcJ3kEhPzh7/c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name; spf=pass smtp.mailfrom=neil.brown.name; arc=none smtp.client-ip=103.29.64.221 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=neil.brown.name Received: from 196.186.233.220.static.exetel.com.au ([220.233.186.196] helo=home.neil.brown.name) by neil.brown.name with esmtp (Exim 4.95) (envelope-from ) id 1upFN9-006nad-1f; Fri, 22 Aug 2025 00:11:12 +0000 From: NeilBrown To: Alexander Viro , Christian Brauner Cc: Jan Kara , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Peter Zijlstra Subject: [PATCH v2 03/16] Introduce wake_up_key() Date: Fri, 22 Aug 2025 10:00:21 +1000 Message-ID: <20250822000818.1086550-4-neil@brown.name> X-Mailer: git-send-email 2.50.0.107.gf914562f5916.dirty In-Reply-To: <20250822000818.1086550-1-neil@brown.name> References: <20250822000818.1086550-1-neil@brown.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" There is a common pattern of passing a key to __wake_up(). The key is used to select which waiter to wake. Callers must currently use __wake_up() directly, which requires that a TASK state and exclusive-waiter-count also be passed. The desired state is almost always TASK_NORMAL and these cases do not have exclusive waiters so the count is not relevant. This patch introduces wake_up_key(wq, key) which simplifies the call, and changes relevant callers. An exclusive waiter count of '1' is used for consistency with wake_up(). In all current cases this number is irrelevant. Most callers (all but one) of __wake_up() are converted either to wake_up_key(), or to the existing wake_up_poll(). Suggested-by: Al Viro Signed-off-by: NeilBrown --- fs/gfs2/glock.c | 2 +- fs/nfs/callback_proc.c | 2 +- fs/userfaultfd.c | 4 ++-- include/linux/wait.h | 3 ++- io_uring/io_uring.h | 2 +- kernel/locking/percpu-rwsem.c | 2 +- kernel/sched/wait.c | 2 +- kernel/sched/wait_bit.c | 2 +- kernel/signal.c | 3 +-- mm/memcontrol-v1.c | 2 +- 10 files changed, 12 insertions(+), 12 deletions(-) diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index b6fd1cb17de7..cbf8f264f908 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -121,7 +121,7 @@ static void wake_up_glock(struct gfs2_glock *gl) wait_queue_head_t *wq =3D glock_waitqueue(&gl->gl_name); =20 if (waitqueue_active(wq)) - __wake_up(wq, TASK_NORMAL, 1, &gl->gl_name); + wake_up_key(wq, &gl->gl_name); } =20 static void gfs2_glock_dealloc(struct rcu_head *rcu) diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c index 8397c43358bd..b1d71c43c87d 100644 --- a/fs/nfs/callback_proc.c +++ b/fs/nfs/callback_proc.c @@ -689,7 +689,7 @@ __be32 nfs4_callback_notify_lock(void *argp, void *resp, =20 /* Don't wake anybody if the string looked bogus */ if (args->cbnl_valid) - __wake_up(&cps->clp->cl_lock_waitq, TASK_NORMAL, 0, args); + wake_up_key(&cps->clp->cl_lock_waitq, args); =20 return htonl(NFS4_OK); } diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 54c6cc7fe9c6..0d58c53bd583 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -874,7 +874,7 @@ static int userfaultfd_release(struct inode *inode, str= uct file *file) */ spin_lock_irq(&ctx->fault_pending_wqh.lock); __wake_up_locked_key(&ctx->fault_pending_wqh, TASK_NORMAL, &range); - __wake_up(&ctx->fault_wqh, TASK_NORMAL, 1, &range); + wake_up_key(&ctx->fault_wqh, &range); spin_unlock_irq(&ctx->fault_pending_wqh.lock); =20 /* Flush pending events that may still wait on event_wqh */ @@ -1175,7 +1175,7 @@ static void __wake_userfault(struct userfaultfd_ctx *= ctx, __wake_up_locked_key(&ctx->fault_pending_wqh, TASK_NORMAL, range); if (waitqueue_active(&ctx->fault_wqh)) - __wake_up(&ctx->fault_wqh, TASK_NORMAL, 1, range); + wake_up_key(&ctx->fault_wqh, range); spin_unlock_irq(&ctx->fault_pending_wqh.lock); } =20 diff --git a/include/linux/wait.h b/include/linux/wait.h index 09855d819418..86d751893c9f 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -224,6 +224,7 @@ void __wake_up_pollfree(struct wait_queue_head *wq_head= ); #define wake_up_locked(x) __wake_up_locked((x), TASK_NORMAL, 1) #define wake_up_all_locked(x) __wake_up_locked((x), TASK_NORMAL, 0) #define wake_up_sync(x) __wake_up_sync(x, TASK_NORMAL) +#define wake_up_key(x,k) __wake_up(x, TASK_NORMAL, 1, k); =20 #define wake_up_interruptible(x) __wake_up(x, TASK_INTERRUPTIBLE, 1, NULL) #define wake_up_interruptible_nr(x, nr) __wake_up(x, TASK_INTERRUPTIBLE, n= r, NULL) @@ -236,7 +237,7 @@ void __wake_up_pollfree(struct wait_queue_head *wq_head= ); #define poll_to_key(m) ((void *)(__force uintptr_t)(__poll_t)(m)) #define key_to_poll(m) ((__force __poll_t)(uintptr_t)(void *)(m)) #define wake_up_poll(x, m) \ - __wake_up(x, TASK_NORMAL, 1, poll_to_key(m)) + wake_up_key(x, poll_to_key(m)) #define wake_up_poll_on_current_cpu(x, m) \ __wake_up_on_current_cpu(x, TASK_NORMAL, poll_to_key(m)) #define wake_up_locked_poll(x, m) \ diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index abc6de227f74..11d8b656ad64 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -304,7 +304,7 @@ static inline void __io_wq_wake(struct wait_queue_head = *wq) * epoll and should terminate multishot poll at that point. */ if (wq_has_sleeper(wq)) - __wake_up(wq, TASK_NORMAL, 0, poll_to_key(EPOLL_URING_WAKE | EPOLLIN)); + wake_up_poll(wq, EPOLL_URING_WAKE | EPOLLIN); } =20 static inline void io_poll_wq_wake(struct io_ring_ctx *ctx) diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c index ef234469baac..1d3e5f03e0f1 100644 --- a/kernel/locking/percpu-rwsem.c +++ b/kernel/locking/percpu-rwsem.c @@ -278,7 +278,7 @@ void percpu_up_write(struct percpu_rw_semaphore *sem) /* * Prod any pending reader/writer to make progress. */ - __wake_up(&sem->waiters, TASK_NORMAL, 1, sem); + wake_up_key(&sem->waiters, sem); =20 /* * Once this completes (at least one RCU-sched grace period hence) the diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c index 20f27e2cf7ae..201d9827580f 100644 --- a/kernel/sched/wait.c +++ b/kernel/sched/wait.c @@ -227,7 +227,7 @@ EXPORT_SYMBOL_GPL(__wake_up_sync); /* For internal use = only */ =20 void __wake_up_pollfree(struct wait_queue_head *wq_head) { - __wake_up(wq_head, TASK_NORMAL, 0, poll_to_key(EPOLLHUP | POLLFREE)); + wake_up_poll(wq_head, EPOLLHUP | POLLFREE); /* POLLFREE must have cleared the queue. */ WARN_ON_ONCE(waitqueue_active(wq_head)); } diff --git a/kernel/sched/wait_bit.c b/kernel/sched/wait_bit.c index 1088d3b7012c..87f9d1428e62 100644 --- a/kernel/sched/wait_bit.c +++ b/kernel/sched/wait_bit.c @@ -126,7 +126,7 @@ void __wake_up_bit(struct wait_queue_head *wq_head, uns= igned long *word, int bit struct wait_bit_key key =3D __WAIT_BIT_KEY_INITIALIZER(word, bit); =20 if (waitqueue_active(wq_head)) - __wake_up(wq_head, TASK_NORMAL, 1, &key); + wake_up_key(wq_head, &key); } EXPORT_SYMBOL(__wake_up_bit); =20 diff --git a/kernel/signal.c b/kernel/signal.c index e2c928de7d2c..e4adffab3a8d 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2154,8 +2154,7 @@ void do_notify_pidfd(struct task_struct *task) =20 WARN_ON(task->exit_state =3D=3D 0); =20 - __wake_up(&pid->wait_pidfd, TASK_NORMAL, 0, - poll_to_key(EPOLLIN | EPOLLRDNORM)); + wake_up_poll(&pid->wait_pidfd, EPOLLIN | EPOLLRDNORM); } =20 /* diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 4b94731305b9..8cb251aa2dcc 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -1328,7 +1328,7 @@ void memcg1_oom_recover(struct mem_cgroup *memcg) * triggering notification. */ if (memcg && memcg->under_oom) - __wake_up(&memcg_oom_waitq, TASK_NORMAL, 0, memcg); + wake_up_key(&memcg_oom_waitq, memcg); } =20 /** --=20 2.50.0.107.gf914562f5916.dirty From nobody Sat Oct 4 00:28:02 2025 Received: from neil.brown.name (neil.brown.name [103.29.64.221]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E2841373; Fri, 22 Aug 2025 00:11:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.29.64.221 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821483; cv=none; b=CDRv+zI3zQBeFs2XTcgFPomCkjFn6LzZOHIf9djHWwSI7Ekvn1o0K+SLrKID1aVTPOm0zvPE7TfZg+aGAqfndUpgCnwom82QDqseNuzyEI6J8NbAjcC+PbWjxP03rCNzHOcp0YMxlCFBpM9dxLxAbQOWrJDrW+TIuRMIwU9rk9k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821483; c=relaxed/simple; bh=6seksQbHnpEHkKSoawkYuBxKLOfnGApYqhYzmA3oVgc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=AxP99mLpDyGS533U3xRu+kWG1DgLocxioW/y/bEFU1Cl3ayZiN58zjltXrln+46ZJaP6UdIyHpDCS+xN0czqKMM3h0Ra7EE48wW1iQt4DzFPIVNGUPUUT74L9Z70QqGnfjqf1HZQKuqRcwyKUl8Ti4HZD3Wi/jiPx/618GhpY/w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name; spf=pass smtp.mailfrom=neil.brown.name; arc=none smtp.client-ip=103.29.64.221 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=neil.brown.name Received: from 196.186.233.220.static.exetel.com.au ([220.233.186.196] helo=home.neil.brown.name) by neil.brown.name with esmtp (Exim 4.95) (envelope-from ) id 1upFN9-006naj-JP; Fri, 22 Aug 2025 00:11:13 +0000 From: NeilBrown To: Alexander Viro , Christian Brauner Cc: Jan Kara , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 04/16] VFS: use global wait-queue table for d_alloc_parallel() Date: Fri, 22 Aug 2025 10:00:22 +1000 Message-ID: <20250822000818.1086550-5-neil@brown.name> X-Mailer: git-send-email 2.50.0.107.gf914562f5916.dirty In-Reply-To: <20250822000818.1086550-1-neil@brown.name> References: <20250822000818.1086550-1-neil@brown.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" d_alloc_parallel() is currently given a wait_queue_head to be used if other threads need to wait for the lookup resulting dentry to complete. This must have a life time which extends until the lookup is completed. Future proposed patches will use d_alloc_parallel() for names being created/unlinked etc. Some filesystems combine lookup with create making a longer code path that the wq needs to live for. If it is still to be allocated on-stack this can be cumbersome. This patch replaces the on-stack wqs with a global array of wqs which are used as needed. A wq is NOT allocated when a dentry is first created but only when a second thread attempts to use the same name and so is forced to wait. At this moment a wq is chosen using a hash of the dentry pointer and that wq is assigned to ->d_wait. The ->d_lock is then dropped and the task waits. When the dentry is finally moved out of "d_in_lookup" a wake up is only sent if ->d_wait is not NULL. This avoids an (uncontended) spin lock/unlock which saves a couple of atomic operations in a common case. The wake up passes the dentry that the wake up is for as the "key" and the waiter will only wakes processes waiting on the same key. This means that when these global waitqueues are shared (which is inevitable though unlikely to be frequent), a task will not be woken prematurely. Signed-off-by: NeilBrown --- Documentation/filesystems/porting.rst | 6 ++ fs/afs/dir_silly.c | 4 +- fs/dcache.c | 82 ++++++++++++++++++++++----- fs/fuse/readdir.c | 3 +- fs/namei.c | 6 +- fs/nfs/dir.c | 7 +-- fs/nfs/unlink.c | 3 +- fs/proc/base.c | 3 +- fs/proc/proc_sysctl.c | 3 +- fs/smb/client/readdir.c | 3 +- include/linux/dcache.h | 3 +- include/linux/nfs_xdr.h | 1 - 12 files changed, 84 insertions(+), 40 deletions(-) diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesyst= ems/porting.rst index 85f590254f07..96107c15e928 100644 --- a/Documentation/filesystems/porting.rst +++ b/Documentation/filesystems/porting.rst @@ -1285,3 +1285,9 @@ rather than a VMA, as the VMA at this stage is not ye= t valid. The vm_area_desc provides the minimum required information for a filesystem to initialise state upon memory mapping of a file-backed region, and output parameters for the file system to set this state. +--- + +** mandatory** + +d_alloc_parallel() signature has changed - it no longer receives a +waitqueue_head. It uses one from an internal table when needed. diff --git a/fs/afs/dir_silly.c b/fs/afs/dir_silly.c index 0b80eb93fa40..ce76b3b30850 100644 --- a/fs/afs/dir_silly.c +++ b/fs/afs/dir_silly.c @@ -237,13 +237,11 @@ int afs_silly_iput(struct dentry *dentry, struct inod= e *inode) struct dentry *alias; int ret; =20 - DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq); - _enter("%p{%pd},%llx", dentry, dentry, vnode->fid.vnode); =20 down_read(&dvnode->rmdir_lock); =20 - alias =3D d_alloc_parallel(dentry->d_parent, &dentry->d_name, &wq); + alias =3D d_alloc_parallel(dentry->d_parent, &dentry->d_name); if (IS_ERR(alias)) { up_read(&dvnode->rmdir_lock); return 0; diff --git a/fs/dcache.c b/fs/dcache.c index 60046ae23d51..df9306c63581 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -2137,8 +2137,7 @@ struct dentry *d_add_ci(struct dentry *dentry, struct= inode *inode, return found; } if (d_in_lookup(dentry)) { - found =3D d_alloc_parallel(dentry->d_parent, name, - dentry->d_wait); + found =3D d_alloc_parallel(dentry->d_parent, name); if (IS_ERR(found) || !d_in_lookup(found)) { iput(inode); return found; @@ -2148,7 +2147,7 @@ struct dentry *d_add_ci(struct dentry *dentry, struct= inode *inode, if (!found) { iput(inode); return ERR_PTR(-ENOMEM); - }=20 + } } res =3D d_splice_alias(inode, found); if (res) { @@ -2505,6 +2504,49 @@ void d_rehash(struct dentry * entry) } EXPORT_SYMBOL(d_rehash); =20 +#define PAR_LOOKUP_WQ_BITS 8 +#define PAR_LOOKUP_WQS (1 << PAR_LOOKUP_WQ_BITS) +static wait_queue_head_t par_wait_table[PAR_LOOKUP_WQS] __cacheline_aligne= d; +static inline wait_queue_head_t *par_waitq(struct dentry *dentry) +{ + return &par_wait_table[hash_ptr(dentry, PAR_LOOKUP_WQ_BITS)]; +} + +static int __init par_wait_init(void) +{ + int i; + + for (i =3D 0; i < PAR_LOOKUP_WQS; i++) + init_waitqueue_head(&par_wait_table[i]); + return 0; +} + +struct par_wait_key { + struct dentry *de; + struct wait_queue_entry wqe; +}; + +static int d_wait_wake_fn(struct wait_queue_entry *wq_entry, + unsigned mode, int sync, void *key) +{ + struct par_wait_key *pwk =3D container_of(wq_entry, + struct par_wait_key, wqe); + if (pwk->de =3D=3D key) + return default_wake_function(wq_entry, mode, sync, key); + return 0; +} + +static inline void d_wake_waiters(struct wait_queue_head *d_wait, + struct dentry *dentry) +{ + /* ->d_wait is only set if some thread is actually waiting. + * If we find it is NULL - the common case - then there was no + * contention and there are no waiters to be woken. + */ + if (d_wait) + wake_up_key(d_wait, dentry); +} + static inline unsigned start_dir_add(struct inode *dir) { preempt_disable_nested(); @@ -2517,31 +2559,39 @@ static inline unsigned start_dir_add(struct inode *= dir) } =20 static inline void end_dir_add(struct inode *dir, unsigned int n, - wait_queue_head_t *d_wait) + wait_queue_head_t *d_wait, struct dentry *de) { smp_store_release(&dir->i_dir_seq, n + 2); preempt_enable_nested(); - if (wq_has_sleeper(d_wait)) - wake_up_all(d_wait); + d_wake_waiters(d_wait, de); } =20 static void d_wait_lookup(struct dentry *dentry) { if (d_in_lookup(dentry)) { - DECLARE_WAITQUEUE(wait, current); - add_wait_queue(dentry->d_wait, &wait); + struct par_wait_key wk =3D { + .de =3D dentry, + .wqe =3D { + .private =3D current, + .func =3D d_wait_wake_fn, + }, + }; + struct wait_queue_head *wq; + if (!dentry->d_wait) + dentry->d_wait =3D par_waitq(dentry); + wq =3D dentry->d_wait; + add_wait_queue(wq, &wk.wqe); do { set_current_state(TASK_UNINTERRUPTIBLE); spin_unlock(&dentry->d_lock); schedule(); spin_lock(&dentry->d_lock); } while (d_in_lookup(dentry)); + remove_wait_queue(wq, &wk.wqe); } } =20 -struct dentry *d_alloc_parallel(struct dentry *parent, - const struct qstr *name, - wait_queue_head_t *wq) +struct dentry *d_alloc_parallel(struct dentry *parent, const struct qstr *= name) { unsigned int hash =3D name->hash; struct hlist_bl_head *b =3D in_lookup_hash(parent, hash); @@ -2554,6 +2604,7 @@ struct dentry *d_alloc_parallel(struct dentry *parent, return ERR_PTR(-ENOMEM); =20 new->d_flags |=3D DCACHE_PAR_LOOKUP; + new->d_wait =3D NULL; spin_lock(&parent->d_lock); new->d_parent =3D dget_dlock(parent); hlist_add_head(&new->d_sib, &parent->d_children); @@ -2642,7 +2693,6 @@ struct dentry *d_alloc_parallel(struct dentry *parent, return dentry; } rcu_read_unlock(); - new->d_wait =3D wq; hlist_bl_add_head(&new->d_u.d_in_lookup_hash, b); hlist_bl_unlock(b); return new; @@ -2680,7 +2730,7 @@ static wait_queue_head_t *__d_lookup_unhash(struct de= ntry *dentry) void __d_lookup_unhash_wake(struct dentry *dentry) { spin_lock(&dentry->d_lock); - wake_up_all(__d_lookup_unhash(dentry)); + d_wake_waiters(__d_lookup_unhash(dentry), dentry); spin_unlock(&dentry->d_lock); } EXPORT_SYMBOL(__d_lookup_unhash_wake); @@ -2711,7 +2761,7 @@ static inline void __d_add(struct dentry *dentry, str= uct inode *inode, } __d_rehash(dentry); if (dir) - end_dir_add(dir, n, d_wait); + end_dir_add(dir, n, d_wait, dentry); spin_unlock(&dentry->d_lock); if (inode) spin_unlock(&inode->i_lock); @@ -2877,7 +2927,7 @@ static void __d_move(struct dentry *dentry, struct de= ntry *target, write_seqcount_end(&dentry->d_seq); =20 if (dir) - end_dir_add(dir, n, d_wait); + end_dir_add(dir, n, d_wait, target); =20 if (dentry->d_parent !=3D old_parent) spin_unlock(&dentry->d_parent->d_lock); @@ -3241,6 +3291,8 @@ static void __init dcache_init(void) =20 runtime_const_init(shift, d_hash_shift); runtime_const_init(ptr, dentry_hashtable); + + par_wait_init(); } =20 /* SLAB cache for __getname() consumers */ diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c index c2aae2eef086..f588252891af 100644 --- a/fs/fuse/readdir.c +++ b/fs/fuse/readdir.c @@ -160,7 +160,6 @@ static int fuse_direntplus_link(struct file *file, struct inode *dir =3D d_inode(parent); struct fuse_conn *fc; struct inode *inode; - DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq); int epoch; =20 if (!o->nodeid) { @@ -197,7 +196,7 @@ static int fuse_direntplus_link(struct file *file, dentry =3D d_lookup(parent, &name); if (!dentry) { retry: - dentry =3D d_alloc_parallel(parent, &name, &wq); + dentry =3D d_alloc_parallel(parent, &name); if (IS_ERR(dentry)) return PTR_ERR(dentry); } diff --git a/fs/namei.c b/fs/namei.c index 1d5fdcbe1828..7a2d72ee1af1 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1784,13 +1784,12 @@ static struct dentry *__lookup_slow(const struct qs= tr *name, { struct dentry *dentry, *old; struct inode *inode =3D dir->d_inode; - DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq); =20 /* Don't go there if it's already dead */ if (unlikely(IS_DEADDIR(inode))) return ERR_PTR(-ENOENT); again: - dentry =3D d_alloc_parallel(dir, name, &wq); + dentry =3D d_alloc_parallel(dir, name); if (IS_ERR(dentry)) return dentry; if (unlikely(!d_in_lookup(dentry))) { @@ -3618,7 +3617,6 @@ static struct dentry *lookup_open(struct nameidata *n= d, struct file *file, struct dentry *dentry; int error, create_error =3D 0; umode_t mode =3D op->mode; - DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq); =20 if (unlikely(IS_DEADDIR(dir_inode))) return ERR_PTR(-ENOENT); @@ -3627,7 +3625,7 @@ static struct dentry *lookup_open(struct nameidata *n= d, struct file *file, dentry =3D d_lookup(dir, &nd->last); for (;;) { if (!dentry) { - dentry =3D d_alloc_parallel(dir, &nd->last, &wq); + dentry =3D d_alloc_parallel(dir, &nd->last); if (IS_ERR(dentry)) return dentry; } diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index d81217923936..d00b7b2781fa 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -727,7 +727,6 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs= _entry *entry, unsigned long dir_verifier) { struct qstr filename =3D QSTR_INIT(entry->name, entry->len); - DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq); struct dentry *dentry; struct dentry *alias; struct inode *inode; @@ -756,7 +755,7 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs= _entry *entry, dentry =3D d_lookup(parent, &filename); again: if (!dentry) { - dentry =3D d_alloc_parallel(parent, &filename, &wq); + dentry =3D d_alloc_parallel(parent, &filename); if (IS_ERR(dentry)) return; } @@ -2060,7 +2059,6 @@ int nfs_atomic_open(struct inode *dir, struct dentry = *dentry, struct file *file, unsigned open_flags, umode_t mode) { - DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq); struct nfs_open_context *ctx; struct dentry *res; struct iattr attr =3D { .ia_valid =3D ATTR_OPEN }; @@ -2115,8 +2113,7 @@ int nfs_atomic_open(struct inode *dir, struct dentry = *dentry, if (!(open_flags & O_CREAT) && !d_in_lookup(dentry)) { d_drop(dentry); switched =3D true; - dentry =3D d_alloc_parallel(dentry->d_parent, - &dentry->d_name, &wq); + dentry =3D d_alloc_parallel(dentry->d_parent, &dentry->d_name); if (IS_ERR(dentry)) return PTR_ERR(dentry); if (unlikely(!d_in_lookup(dentry))) diff --git a/fs/nfs/unlink.c b/fs/nfs/unlink.c index b55467911648..894af85830fa 100644 --- a/fs/nfs/unlink.c +++ b/fs/nfs/unlink.c @@ -124,7 +124,7 @@ static int nfs_call_unlink(struct dentry *dentry, struc= t inode *inode, struct nf struct dentry *alias; =20 down_read_non_owner(&NFS_I(dir)->rmdir_sem); - alias =3D d_alloc_parallel(dentry->d_parent, &data->args.name, &data->wq); + alias =3D d_alloc_parallel(dentry->d_parent, &data->args.name); if (IS_ERR(alias)) { up_read_non_owner(&NFS_I(dir)->rmdir_sem); return 0; @@ -185,7 +185,6 @@ nfs_async_unlink(struct dentry *dentry, const struct qs= tr *name) =20 data->cred =3D get_current_cred(); data->res.dir_attr =3D &data->dir_attr; - init_waitqueue_head(&data->wq); =20 status =3D -EBUSY; spin_lock(&dentry->d_lock); diff --git a/fs/proc/base.c b/fs/proc/base.c index 62d35631ba8c..0b296c94000e 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2129,8 +2129,7 @@ bool proc_fill_cache(struct file *file, struct dir_co= ntext *ctx, =20 child =3D try_lookup_noperm(&qname, dir); if (!child) { - DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq); - child =3D d_alloc_parallel(dir, &qname, &wq); + child =3D d_alloc_parallel(dir, &qname); if (IS_ERR(child)) goto end_instantiate; if (d_in_lookup(child)) { diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index 49ab74e0bfde..04a382178c65 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -692,8 +692,7 @@ static bool proc_sys_fill_cache(struct file *file, =20 child =3D d_lookup(dir, &qname); if (!child) { - DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq); - child =3D d_alloc_parallel(dir, &qname, &wq); + child =3D d_alloc_parallel(dir, &qname); if (IS_ERR(child)) return false; if (d_in_lookup(child)) { diff --git a/fs/smb/client/readdir.c b/fs/smb/client/readdir.c index 4e5460206397..5a92a1ad317d 100644 --- a/fs/smb/client/readdir.c +++ b/fs/smb/client/readdir.c @@ -74,7 +74,6 @@ cifs_prime_dcache(struct dentry *parent, struct qstr *nam= e, struct cifs_sb_info *cifs_sb =3D CIFS_SB(sb); bool posix =3D cifs_sb_master_tcon(cifs_sb)->posix_extensions; bool reparse_need_reval =3D false; - DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq); int rc; =20 cifs_dbg(FYI, "%s: for %s\n", __func__, name->name); @@ -106,7 +105,7 @@ cifs_prime_dcache(struct dentry *parent, struct qstr *n= ame, (fattr->cf_flags & CIFS_FATTR_NEED_REVAL)) return; =20 - dentry =3D d_alloc_parallel(parent, name, &wq); + dentry =3D d_alloc_parallel(parent, name); } if (IS_ERR(dentry)) return; diff --git a/include/linux/dcache.h b/include/linux/dcache.h index cc3e1c1a3454..c1239af19d68 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -241,8 +241,7 @@ extern void d_delete(struct dentry *); /* allocate/de-allocate */ extern struct dentry * d_alloc(struct dentry *, const struct qstr *); extern struct dentry * d_alloc_anon(struct super_block *); -extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr= *, - wait_queue_head_t *); +extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr= *); extern struct dentry * d_splice_alias(struct inode *, struct dentry *); /* weird procfs mess; *NOT* exported */ extern struct dentry * d_splice_alias_ops(struct inode *, struct dentry *, diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h index ac4bff6e9913..197c9b30dfdf 100644 --- a/include/linux/nfs_xdr.h +++ b/include/linux/nfs_xdr.h @@ -1735,7 +1735,6 @@ struct nfs_unlinkdata { struct nfs_removeargs args; struct nfs_removeres res; struct dentry *dentry; - wait_queue_head_t wq; const struct cred *cred; struct nfs_fattr dir_attr; long timeout; --=20 2.50.0.107.gf914562f5916.dirty From nobody Sat Oct 4 00:28:02 2025 Received: from neil.brown.name (neil.brown.name [103.29.64.221]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C77F217D2; Fri, 22 Aug 2025 00:11:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.29.64.221 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821484; cv=none; b=HhU4EZ9QxaBJP42LhLL/79vtyGlnhOwOr7DjWv5nyjjQCtAI/fCfoEk2EFE06T4hKgY3aEumTSGNtTQvqxB1JNGXkKkjJKiigwU8LeZd5lOEt9MIgxIxyXpemAyzRo2+XDwjp6J0PIT27V4hLT0h4MLtR5h6MzocTLEiTVEWq3Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821484; c=relaxed/simple; bh=y082QnXc1Odi3N9LpWJbT1+h9NTlVBg0Oum3Z+maln4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SCkSVQo/64MQv07eKJzfn14PaW3L/3yIG3wOJzqTS862I7uBbis5bhru9PtWRxwnaD4IdUe8aozIscryYAIpOdXj4G/Db/7w0PC5UyIoskXA5x9lbgyUdTPkBM0BD//KCAjuQ5KTOU+oisOj0qW+lhEBTMVCjySQ5M8YxKANeMo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name; spf=pass smtp.mailfrom=neil.brown.name; arc=none smtp.client-ip=103.29.64.221 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=neil.brown.name Received: from 196.186.233.220.static.exetel.com.au ([220.233.186.196] helo=home.neil.brown.name) by neil.brown.name with esmtp (Exim 4.95) (envelope-from ) id 1upFNA-006nan-2D; Fri, 22 Aug 2025 00:11:13 +0000 From: NeilBrown To: Alexander Viro , Christian Brauner Cc: Jan Kara , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 05/16] VFS: use d_alloc_parallel() in lookup_one_qstr_excl(). Date: Fri, 22 Aug 2025 10:00:23 +1000 Message-ID: <20250822000818.1086550-6-neil@brown.name> X-Mailer: git-send-email 2.50.0.107.gf914562f5916.dirty In-Reply-To: <20250822000818.1086550-1-neil@brown.name> References: <20250822000818.1086550-1-neil@brown.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" lookup_one_qstr_excl() is used for lookups prior to directory modifications, whether create, remove or rename. To prepare for allowing modification to happen in parallel, change lookup_one_qstr_excl() to use d_alloc_parallel(). As a result, ->lookup is now only ever called with a d_in_lookup() dentry. Consequently we can remove the d_in_lookup() check from d_add_ci() which is only used in ->lookup. If LOOKUP_EXCL or LOOKUP_RENAME_TARGET is passed, the caller must ensure d_lookup_done() is called at an appropriate time, and must not assume that it can test for positive or negative dentries without confirming that the dentry is no longer d_in_lookup() - unless it is filesystem code acting on itself and *knows* that ->lookup() always completes the lookup (currently true for all filesystems other than NFS). Signed-off-by: NeilBrown --- Documentation/filesystems/porting.rst | 12 +++++++++++ fs/dcache.c | 16 ++++---------- fs/namei.c | 30 ++++++++++++++++++--------- 3 files changed, 36 insertions(+), 22 deletions(-) diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesyst= ems/porting.rst index 96107c15e928..1d3c1e9b6cf3 100644 --- a/Documentation/filesystems/porting.rst +++ b/Documentation/filesystems/porting.rst @@ -1291,3 +1291,15 @@ parameters for the file system to set this state. =20 d_alloc_parallel() signature has changed - it no longer receives a waitqueue_head. It uses one from an internal table when needed. + +--- + +** mandatory** + +kern_path_create() and user_path_create() can return a d_in_lookup() +dentry as can lookup_one_qstr_excl() if passed "O_CREATE|O_EXCL" or +"O_RENAME_TARGET". This can currently only happen if the target +filesystem is NFS. + +inode_operations.lookup() is now only ever called with a d_in_lookup() +dentry (i.e. DCACHE_PAR_LOOKUP will be set). diff --git a/fs/dcache.c b/fs/dcache.c index df9306c63581..034726ab058e 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -2136,18 +2136,10 @@ struct dentry *d_add_ci(struct dentry *dentry, stru= ct inode *inode, iput(inode); return found; } - if (d_in_lookup(dentry)) { - found =3D d_alloc_parallel(dentry->d_parent, name); - if (IS_ERR(found) || !d_in_lookup(found)) { - iput(inode); - return found; - } - } else { - found =3D d_alloc(dentry->d_parent, name); - if (!found) { - iput(inode); - return ERR_PTR(-ENOMEM); - } + found =3D d_alloc_parallel(dentry->d_parent, name); + if (IS_ERR(found) || !d_in_lookup(found)) { + iput(inode); + return found; } res =3D d_splice_alias(inode, found); if (res) { diff --git a/fs/namei.c b/fs/namei.c index 7a2d72ee1af1..b785bf7a9344 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1666,13 +1666,14 @@ static struct dentry *lookup_dcache(const struct qs= tr *name, } =20 /* - * Parent directory has inode locked exclusive. This is one - * and only case when ->lookup() gets called on non in-lookup - * dentries - as the matter of fact, this only gets called - * when directory is guaranteed to have no in-lookup children - * at all. - * Will return -ENOENT if name isn't found and LOOKUP_CREATE wasn't passed. - * Will return -EEXIST if name is found and LOOKUP_EXCL was passed. + * Parent directory has inode locked. + * d_lookup_done() must be called before the dentry is dput() + * if LOOKUP_EXCL or LOOKUP_RENAME_TARGET is set. + * If the dentry is not d_in_lookup(): + * Will return -ENOENT if name isn't found and LOOKUP_CREATE wasn't pass= ed. + * Will return -EEXIST if name is found and LOOKUP_EXCL was passed. + * If it is d_in_lookup() then these conditions can only be checked by the + * file system when carrying out the intent (create or rename). */ struct dentry *lookup_one_qstr_excl(const struct qstr *name, struct dentry *base, unsigned int flags) @@ -1690,18 +1691,27 @@ struct dentry *lookup_one_qstr_excl(const struct qs= tr *name, if (unlikely(IS_DEADDIR(dir))) return ERR_PTR(-ENOENT); =20 - dentry =3D d_alloc(base, name); - if (unlikely(!dentry)) - return ERR_PTR(-ENOMEM); + dentry =3D d_alloc_parallel(base, name); + if (unlikely(IS_ERR(dentry))) + return dentry; + if (unlikely(!d_in_lookup(dentry))) + /* Raced with another thread which did the lookup */ + goto found; =20 old =3D dir->i_op->lookup(dir, dentry, flags); if (unlikely(old)) { + d_lookup_done(dentry); dput(dentry); dentry =3D old; } found: if (IS_ERR(dentry)) return dentry; + if (d_in_lookup(dentry)) + /* We cannot check for errors - the caller will have to + * wait for any create-etc attempt to get relevant errors. + */ + return dentry; if (d_is_negative(dentry) && !(flags & LOOKUP_CREATE)) { dput(dentry); return ERR_PTR(-ENOENT); --=20 2.50.0.107.gf914562f5916.dirty From nobody Sat Oct 4 00:28:02 2025 Received: from neil.brown.name (neil.brown.name [103.29.64.221]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C8F2420EB; Fri, 22 Aug 2025 00:11:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.29.64.221 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821484; cv=none; b=mYiq6J0S2MeCDKatC51HNAgZrthkx7bFq10gR8sL0kau5lC2Hy6PfoN5RVDmmaZRi0l9jqgehI1kk19gDqkwEoWBsV0uDkDpD9hDrFjDsz1xZa6pnr2pco37555gQ2Idq585lIG0/xavsyixSfUOnbMvvX0zUeZ4bHDN5tg77wc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821484; c=relaxed/simple; bh=xqXtKlwajL92pljf4lb4kVN1sDC+aVFMEzKA1befkEc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GCwHMPOlMxEbbibQ8a8Ga1k90GSJCplhRlfPn0yod4IDXiBEpBVoTZE1I7qP9kWFNz7OwSYV8A+HqDp+o1hbEXbsT+lVEWWHeFdQZLj01DNc7TXPVJltnI3BRID6WGHFnvGuFpzZYxQzpZTthHv8e9gXq9fm1tDR6JdYAwxvmew= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name; spf=pass smtp.mailfrom=neil.brown.name; arc=none smtp.client-ip=103.29.64.221 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=neil.brown.name Received: from 196.186.233.220.static.exetel.com.au ([220.233.186.196] helo=home.neil.brown.name) by neil.brown.name with esmtp (Exim 4.95) (envelope-from ) id 1upFNA-006nar-IW; Fri, 22 Aug 2025 00:11:14 +0000 From: NeilBrown To: Alexander Viro , Christian Brauner Cc: Jan Kara , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 06/16] VFS: introduce start_dirop() Date: Fri, 22 Aug 2025 10:00:24 +1000 Message-ID: <20250822000818.1086550-7-neil@brown.name> X-Mailer: git-send-email 2.50.0.107.gf914562f5916.dirty In-Reply-To: <20250822000818.1086550-1-neil@brown.name> References: <20250822000818.1086550-1-neil@brown.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The fact that directory operations (create,remove,rename) are protected by a lock on the parent is known widely throughout the kernel. In order to change this - to locking the target dentry instead - it is best to centralise this knowledge so it can be changed in one place. This patch introduces start_dirop() which is local to fs/namei.c. It performs the required locking for create and remove. Rename will be handled separately. It is intended that this will be exported to the rest of the kernel using more focused helpers. Signed-off-by: NeilBrown --- fs/namei.c | 60 +++++++++++++++++++++++++++++++++++------------------- 1 file changed, 39 insertions(+), 21 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index b785bf7a9344..4f1eddaff63f 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2752,6 +2752,32 @@ static int filename_parentat(int dfd, struct filenam= e *name, return __filename_parentat(dfd, name, flags, parent, last, type, NULL); } =20 +/** + * start_dirop - begin a create or remove dirop, performing locking and lo= okup + * @parent - the dentry of the parent in which the operation will occur + * @name - a qstr holding the name within that parent + * @lookup_flags - intent and other lookup flags. + * + * The lookup is performed and necessarly locks are taken so that, on succ= ess, + * the returned dentry can be operated on safely. + * The qstr must already have the hash value calculated. + * + * Returns: a locked dentry, or an error. + * + */ +static struct dentry *start_dirop(struct dentry *parent, struct qstr *name, + unsigned int lookup_flags) +{ + struct dentry *dentry; + struct inode *dir =3D d_inode(parent); + + inode_lock_nested(dir, I_MUTEX_PARENT); + dentry =3D lookup_one_qstr_excl(name, parent, lookup_flags); + if (IS_ERR(dentry)) + inode_unlock(dir); + return dentry; +} + /* does lookup, returns the object with parent locked */ static struct dentry *__kern_path_locked(int dfd, struct filename *name, s= truct path *path) { @@ -2765,12 +2791,10 @@ static struct dentry *__kern_path_locked(int dfd, s= truct filename *name, struct return ERR_PTR(error); if (unlikely(type !=3D LAST_NORM)) return ERR_PTR(-EINVAL); - inode_lock_nested(parent_path.dentry->d_inode, I_MUTEX_PARENT); - d =3D lookup_one_qstr_excl(&last, parent_path.dentry, 0); - if (IS_ERR(d)) { - inode_unlock(parent_path.dentry->d_inode); + d =3D start_dirop(parent_path.dentry, &last, 0); + if (IS_ERR(d)) return d; - } + path->dentry =3D no_free_ptr(parent_path.dentry); path->mnt =3D no_free_ptr(parent_path.mnt); return d; @@ -2789,12 +2813,10 @@ struct dentry *kern_path_locked_negative(const char= *name, struct path *path) return ERR_PTR(error); if (unlikely(type !=3D LAST_NORM)) return ERR_PTR(-EINVAL); - inode_lock_nested(parent_path.dentry->d_inode, I_MUTEX_PARENT); - d =3D lookup_one_qstr_excl(&last, parent_path.dentry, LOOKUP_CREATE); - if (IS_ERR(d)) { - inode_unlock(parent_path.dentry->d_inode); + d =3D start_dirop(parent_path.dentry, &last, LOOKUP_CREATE); + if (IS_ERR(d)) return d; - } + path->dentry =3D no_free_ptr(parent_path.dentry); path->mnt =3D no_free_ptr(parent_path.mnt); return d; @@ -4143,11 +4165,9 @@ static struct dentry *filename_create(int dfd, struc= t filename *name, */ if (last.name[last.len] && !want_dir) create_flags &=3D ~LOOKUP_CREATE; - inode_lock_nested(path->dentry->d_inode, I_MUTEX_PARENT); - dentry =3D lookup_one_qstr_excl(&last, path->dentry, - reval_flag | create_flags); + dentry =3D start_dirop(path->dentry, &last, reval_flag | create_flags); if (IS_ERR(dentry)) - goto unlock; + goto out_drop_write; =20 if (unlikely(error)) goto fail; @@ -4156,8 +4176,8 @@ static struct dentry *filename_create(int dfd, struct= filename *name, fail: dput(dentry); dentry =3D ERR_PTR(error); -unlock: inode_unlock(path->dentry->d_inode); +out_drop_write: if (!error) mnt_drop_write(path->mnt); out: @@ -4511,8 +4531,7 @@ int do_rmdir(int dfd, struct filename *name) if (error) goto exit2; =20 - inode_lock_nested(path.dentry->d_inode, I_MUTEX_PARENT); - dentry =3D lookup_one_qstr_excl(&last, path.dentry, lookup_flags); + dentry =3D start_dirop(path.dentry, &last, lookup_flags); error =3D PTR_ERR(dentry); if (IS_ERR(dentry)) goto exit3; @@ -4522,8 +4541,8 @@ int do_rmdir(int dfd, struct filename *name) error =3D vfs_rmdir(mnt_idmap(path.mnt), path.dentry->d_inode, dentry); exit4: dput(dentry); -exit3: inode_unlock(path.dentry->d_inode); +exit3: mnt_drop_write(path.mnt); exit2: path_put(&path); @@ -4640,8 +4659,7 @@ int do_unlinkat(int dfd, struct filename *name) if (error) goto exit2; retry_deleg: - inode_lock_nested(path.dentry->d_inode, I_MUTEX_PARENT); - dentry =3D lookup_one_qstr_excl(&last, path.dentry, lookup_flags); + dentry =3D start_dirop(path.dentry, &last, lookup_flags); error =3D PTR_ERR(dentry); if (!IS_ERR(dentry)) { =20 @@ -4657,8 +4675,8 @@ int do_unlinkat(int dfd, struct filename *name) dentry, &delegated_inode); exit3: dput(dentry); + inode_unlock(path.dentry->d_inode); } - inode_unlock(path.dentry->d_inode); if (inode) iput(inode); /* truncate the inode here */ inode =3D NULL; --=20 2.50.0.107.gf914562f5916.dirty From nobody Sat Oct 4 00:28:02 2025 Received: from neil.brown.name (neil.brown.name [103.29.64.221]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2CD8D28F4; Fri, 22 Aug 2025 00:11:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.29.64.221 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821485; cv=none; b=KbLpQnzGkMlCuyvaeKLs955Nv5/E1htCbe0mqnO+6aITf91Z81qwzJgNpObrepcXJbpNBYZNEmwZhnY5haNK4uXhGQSBkkuws3DTeCyGiV9V9s1b5cBy6ke8NgaVeU/5vCfS+KyEK7ahMyqPE8Z9ltDd2XnonA9nP3xAy0Ej1Ac= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821485; c=relaxed/simple; bh=Xm0u+m9SfyNz7nxDbcIx4KNkfWnzX+pGkAKYdilO22I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Wc3NJxa2J7jFhfqWqGINO0EkfI4fAjSEw4I0ES4aBMacOWuY9i8AQSr4QHTs9FXA2sUobtz3MiLRH+HqqoNcFu5j1tLJO8uBBeHOsIS6hjq5fOvAA4dqqv/jYTyWJioZTMffMh6QyEbe7XpQSCvuZj3qCYkvuUPG0hcgy0qBElU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name; spf=pass smtp.mailfrom=neil.brown.name; arc=none smtp.client-ip=103.29.64.221 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=neil.brown.name Received: from 196.186.233.220.static.exetel.com.au ([220.233.186.196] helo=home.neil.brown.name) by neil.brown.name with esmtp (Exim 4.95) (envelope-from ) id 1upFNB-006nav-13; Fri, 22 Aug 2025 00:11:14 +0000 From: NeilBrown To: Alexander Viro , Christian Brauner Cc: Jan Kara , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 07/16] VFS: introduce end_dirop() and end_dirop_mkdir() Date: Fri, 22 Aug 2025 10:00:25 +1000 Message-ID: <20250822000818.1086550-8-neil@brown.name> X-Mailer: git-send-email 2.50.0.107.gf914562f5916.dirty In-Reply-To: <20250822000818.1086550-1-neil@brown.name> References: <20250822000818.1086550-1-neil@brown.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" end_dirop() is the partner of start_dirop(). It drops the lock and releases the reference on the dentry. It *is* exported and can be used by all callers. As vfs_mkdir() drops the dentry on error we cannot use end_dirop() as that won't unlock when the dentry IS_ERR(). For those cases we have end_dirop_mkdir(). end_dirop() can always be called on the result of start_dirop(), but not after vfs_mkdir(). end_dirop_mkdir() can only be called on the result of start_dirop() if that was not an error, and can calso be called on the result of vfs_mkdir(). We we change vfs_mkdir() to drop the lock when it drops the dentry, end_dirop_mkdir() can be discarded. Signed-off-by: NeilBrown --- fs/namei.c | 50 +++++++++++++++++++++++++++++++++++-------- include/linux/namei.h | 3 +++ 2 files changed, 44 insertions(+), 9 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index 4f1eddaff63f..8121550f20aa 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2778,6 +2778,43 @@ static struct dentry *start_dirop(struct dentry *par= ent, struct qstr *name, return dentry; } =20 +/** + * end_dirop - signal completion of a dirop + * @de - the dentry which was returned by start_dirop or similar. + * + * If the de is an error, nothing happens. Otherwise any lock taken to + * protect the dentry is dropped and the dentry itself is release (dput()). + */ +void end_dirop(struct dentry *de) +{ + if (!IS_ERR(de)) { + inode_unlock(de->d_parent->d_inode); + dput(de); + } +} +EXPORT_SYMBOL(end_dirop); + +/** + * end_dirop_mkdir - signal completion of a dirop which could have been vf= s_mkdir + * @de - the dentry which was returned by start_dirop or similar. + * @parent - the parent in which the mkdir happened. + * + * Because vfs_mkdir() dput()s the dentry on failure, end_dirop() cannot be + * used with it. Instead this function must be used, and it must not be c= aller + * if the original lookup failed. + * + * If de is an error the parent is unlocked, else this behaves the same as + * end_dirop(). + */ +void end_dirop_mkdir(struct dentry *de, struct dentry *parent) +{ + if (IS_ERR(de)) + inode_unlock(parent->d_inode); + else + end_dirop(de); +} +EXPORT_SYMBOL(end_dirop_mkdir); + /* does lookup, returns the object with parent locked */ static struct dentry *__kern_path_locked(int dfd, struct filename *name, s= truct path *path) { @@ -4174,9 +4211,8 @@ static struct dentry *filename_create(int dfd, struct= filename *name, =20 return dentry; fail: - dput(dentry); + end_dirop(dentry); dentry =3D ERR_PTR(error); - inode_unlock(path->dentry->d_inode); out_drop_write: if (!error) mnt_drop_write(path->mnt); @@ -4198,9 +4234,7 @@ EXPORT_SYMBOL(kern_path_create); =20 void done_path_create(struct path *path, struct dentry *dentry) { - if (!IS_ERR(dentry)) - dput(dentry); - inode_unlock(path->dentry->d_inode); + end_dirop_mkdir(dentry, path->dentry); mnt_drop_write(path->mnt); path_put(path); } @@ -4540,8 +4574,7 @@ int do_rmdir(int dfd, struct filename *name) goto exit4; error =3D vfs_rmdir(mnt_idmap(path.mnt), path.dentry->d_inode, dentry); exit4: - dput(dentry); - inode_unlock(path.dentry->d_inode); + end_dirop(dentry); exit3: mnt_drop_write(path.mnt); exit2: @@ -4674,8 +4707,7 @@ int do_unlinkat(int dfd, struct filename *name) error =3D vfs_unlink(mnt_idmap(path.mnt), path.dentry->d_inode, dentry, &delegated_inode); exit3: - dput(dentry); - inode_unlock(path.dentry->d_inode); + end_dirop(dentry); } if (inode) iput(inode); /* truncate the inode here */ diff --git a/include/linux/namei.h b/include/linux/namei.h index 5d085428e471..bd0cba118540 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -81,6 +81,9 @@ struct dentry *lookup_one_positive_unlocked(struct mnt_id= map *idmap, struct qstr *name, struct dentry *base); =20 +void end_dirop(struct dentry *de); +void end_dirop_mkdir(struct dentry *de, struct dentry *parent); + extern int follow_down_one(struct path *); extern int follow_down(struct path *path, unsigned int flags); extern int follow_up(struct path *); --=20 2.50.0.107.gf914562f5916.dirty From nobody Sat Oct 4 00:28:02 2025 Received: from neil.brown.name (neil.brown.name [103.29.64.221]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2CDC728F5; Fri, 22 Aug 2025 00:11:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.29.64.221 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821485; cv=none; b=FbUl9E7q2gNkr/xcOgsmjH/d0dnb+QSmnW2Cvka6Hwc2J/vYtX5X00ONlLAr5GNyy2nOyZQsbzKkGrwguMHR+HSbJTo7fMiMM3bvt+J55WmH8b9Jimw/uaDZi3gUUpW34t76rh7ZHbBFZMCDV5sBxE7t+cUUHzSyTOJ7h95ZPiA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821485; c=relaxed/simple; bh=wjv05qWFt0qPZ3AOnvrdkXePFJHOTSlN7tSG1qIL/+c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oUJcC9d2Sc8A6xvjH7WlVd0Kxy/K3t6hyva6phN9eZ1vABoFAAQ3v73oWYrcoqIeg4lW5XnGLen6KbGXPILbq/uVsgY9YxA7uxtGA5ErqMrWjV4hyj8OT77QOnRubLCCBq2jO5ROzN0FmBlax8s4gPfeZ8fNOotUUEtpVGDKpGI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name; spf=pass smtp.mailfrom=neil.brown.name; arc=none smtp.client-ip=103.29.64.221 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=neil.brown.name Received: from 196.186.233.220.static.exetel.com.au ([220.233.186.196] helo=home.neil.brown.name) by neil.brown.name with esmtp (Exim 4.95) (envelope-from ) id 1upFNB-006nb1-Gx; Fri, 22 Aug 2025 00:11:15 +0000 From: NeilBrown To: Alexander Viro , Christian Brauner Cc: Jan Kara , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 08/16] VFS: implement simple_start_creating() with start_dirop() Date: Fri, 22 Aug 2025 10:00:26 +1000 Message-ID: <20250822000818.1086550-9-neil@brown.name> X-Mailer: git-send-email 2.50.0.107.gf914562f5916.dirty In-Reply-To: <20250822000818.1086550-1-neil@brown.name> References: <20250822000818.1086550-1-neil@brown.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" simple_start_creating() now uses start_dirop(), which further centralised locking rules. start_dirop() and lookup_noperm_common() are not available via fs/internal.h. Signed-off-by: NeilBrown --- fs/internal.h | 3 +++ fs/libfs.c | 36 +++++++++++++++++------------------- fs/namei.c | 6 +++--- 3 files changed, 23 insertions(+), 22 deletions(-) diff --git a/fs/internal.h b/fs/internal.h index 38e8aab27bbd..baeaaf3747e3 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -67,6 +67,9 @@ int vfs_tmpfile(struct mnt_idmap *idmap, const struct path *parentpath, struct file *file, umode_t mode); struct dentry *d_hash_and_lookup(struct dentry *, struct qstr *); +struct dentry *start_dirop(struct dentry *parent, struct qstr *name, + unsigned int lookup_flags); +int lookup_noperm_common(struct qstr *qname, struct dentry *base); =20 /* * namespace.c diff --git a/fs/libfs.c b/fs/libfs.c index ce8c496a6940..63c1a4186206 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -2289,27 +2289,25 @@ void stashed_dentry_prune(struct dentry *dentry) cmpxchg(stashed, dentry, NULL); } =20 -/* parent must be held exclusive */ +/** + * simple_start_creating - prepare to create a given name + * @parent - directory in which to prepare to create the name + * @name - the name to be created + * + * Locks are taken and a lookup in performed prior to creating + * an object in a directory. No permission checking is performed. + * + * Returns: a negative dentry on which vfs_create() or similar may + * be attempted, or an error. + */ struct dentry *simple_start_creating(struct dentry *parent, const char *na= me) { - struct dentry *dentry; - struct inode *dir =3D d_inode(parent); + struct qstr qname =3D QSTR(name); + int err; =20 - inode_lock(dir); - if (unlikely(IS_DEADDIR(dir))) { - inode_unlock(dir); - return ERR_PTR(-ENOENT); - } - dentry =3D lookup_noperm(&QSTR(name), parent); - if (IS_ERR(dentry)) { - inode_unlock(dir); - return dentry; - } - if (dentry->d_inode) { - dput(dentry); - inode_unlock(dir); - return ERR_PTR(-EEXIST); - } - return dentry; + err =3D lookup_noperm_common(&qname, parent); + if (err) + return ERR_PTR(err); + return start_dirop(parent, &qname, LOOKUP_CREATE | LOOKUP_EXCL); } EXPORT_SYMBOL(simple_start_creating); diff --git a/fs/namei.c b/fs/namei.c index 8121550f20aa..c1e39c985f1f 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2765,8 +2765,8 @@ static int filename_parentat(int dfd, struct filename= *name, * Returns: a locked dentry, or an error. * */ -static struct dentry *start_dirop(struct dentry *parent, struct qstr *name, - unsigned int lookup_flags) +struct dentry *start_dirop(struct dentry *parent, struct qstr *name, + unsigned int lookup_flags) { struct dentry *dentry; struct inode *dir =3D d_inode(parent); @@ -2931,7 +2931,7 @@ int vfs_path_lookup(struct dentry *dentry, struct vfs= mount *mnt, } EXPORT_SYMBOL(vfs_path_lookup); =20 -static int lookup_noperm_common(struct qstr *qname, struct dentry *base) +int lookup_noperm_common(struct qstr *qname, struct dentry *base) { const char *name =3D qname->name; u32 len =3D qname->len; --=20 2.50.0.107.gf914562f5916.dirty From nobody Sat Oct 4 00:28:02 2025 Received: from neil.brown.name (neil.brown.name [103.29.64.221]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E911F22EE5; Fri, 22 Aug 2025 00:11:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.29.64.221 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821486; cv=none; b=PQBwp9koJMVd5rbrHr9qLm4pyKzh3/aPvxuUPMb2C2yhjUo1dlKbPMBDPJsBObGwYRos+S1rAwNUlXbkTfJIKoy6RyyKeSn+JMu5io8W3OuK4rkOzqnQiebvrJLRZ+74sVVLsOGAmSfkHo4pughIsOe4GiciJ0nq8uarJQ0aYeE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821486; c=relaxed/simple; bh=UHhqmYK4kA6URFwMqIaA9tYTTvYOGMHKirwwvzE7Fgo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rbjZCgl9Fo8Ho7epayGxxvAjgY7giTtREnbsWpKdab/BQAg3tT8x81NL53J9f5WE2YYIYLt6SpxtPD4hh5hoV6TAp/0wvnfdf2fTbV7BNU7sRyDrFowtzVAcioOo2XmYYrOJ7G5xpcF++uiodODM4g8Bl25qtJDGmZCe81nGh3Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name; spf=pass smtp.mailfrom=neil.brown.name; arc=none smtp.client-ip=103.29.64.221 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=neil.brown.name Received: from 196.186.233.220.static.exetel.com.au ([220.233.186.196] helo=home.neil.brown.name) by neil.brown.name with esmtp (Exim 4.95) (envelope-from ) id 1upFNB-006nb5-Vl; Fri, 22 Aug 2025 00:11:15 +0000 From: NeilBrown To: Alexander Viro , Christian Brauner Cc: Jan Kara , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 09/16] VFS: introduce simple_end_creating() and simple_failed_creating() Date: Fri, 22 Aug 2025 10:00:27 +1000 Message-ID: <20250822000818.1086550-10-neil@brown.name> X-Mailer: git-send-email 2.50.0.107.gf914562f5916.dirty In-Reply-To: <20250822000818.1086550-1-neil@brown.name> References: <20250822000818.1086550-1-neil@brown.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" These are partners of simple_start_creating(). On failure we don't keep a reference. On success we do. Use these where simple_start_creating() is used, in debugfs, tracefs, and rpcpipefs. Also rename start_creating, end_creating, failed_creating in debugfs to free up these generic names for more generic use. I put the declarations in namei.h to have access to end_dirop() but they really below with the other simple_ function declaration. Possibly these should be moved out of fs.h into a separate libfs.h which could include namei.h Signed-off-by: NeilBrown --- fs/debugfs/inode.c | 43 +++++++++++++++++++++---------------------- fs/tracefs/inode.c | 6 ++---- include/linux/namei.h | 16 ++++++++++++++++ net/sunrpc/rpc_pipe.c | 11 ++++------- 4 files changed, 43 insertions(+), 33 deletions(-) diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c index a0357b0cf362..9525618ccad1 100644 --- a/fs/debugfs/inode.c +++ b/fs/debugfs/inode.c @@ -353,7 +353,8 @@ struct dentry *debugfs_lookup(const char *name, struct = dentry *parent) } EXPORT_SYMBOL_GPL(debugfs_lookup); =20 -static struct dentry *start_creating(const char *name, struct dentry *pare= nt) +static struct dentry *debugfs_start_creating(const char *name, + struct dentry *parent) { struct dentry *dentry; int error; @@ -393,18 +394,16 @@ static struct dentry *start_creating(const char *name= , struct dentry *parent) return dentry; } =20 -static struct dentry *failed_creating(struct dentry *dentry) +static struct dentry *debugfs_failed_creating(struct dentry *dentry) { - inode_unlock(d_inode(dentry->d_parent)); - dput(dentry); + simple_failed_creating(dentry); simple_release_fs(&debugfs_mount, &debugfs_mount_count); return ERR_PTR(-ENOMEM); } =20 -static struct dentry *end_creating(struct dentry *dentry) +static struct dentry *debugfs_end_creating(struct dentry *dentry) { - inode_unlock(d_inode(dentry->d_parent)); - return dentry; + return simple_end_creating(dentry); } =20 static struct dentry *__debugfs_create_file(const char *name, umode_t mode, @@ -419,13 +418,13 @@ static struct dentry *__debugfs_create_file(const cha= r *name, umode_t mode, if (!(mode & S_IFMT)) mode |=3D S_IFREG; BUG_ON(!S_ISREG(mode)); - dentry =3D start_creating(name, parent); + dentry =3D debugfs_start_creating(name, parent); =20 if (IS_ERR(dentry)) return dentry; =20 if (!(debugfs_allow & DEBUGFS_ALLOW_API)) { - failed_creating(dentry); + debugfs_failed_creating(dentry); return ERR_PTR(-EPERM); } =20 @@ -433,7 +432,7 @@ static struct dentry *__debugfs_create_file(const char = *name, umode_t mode, if (unlikely(!inode)) { pr_err("out of free dentries, can not create file '%s'\n", name); - return failed_creating(dentry); + return debugfs_failed_creating(dentry); } =20 inode->i_mode =3D mode; @@ -448,7 +447,7 @@ static struct dentry *__debugfs_create_file(const char = *name, umode_t mode, =20 d_instantiate(dentry, inode); fsnotify_create(d_inode(dentry->d_parent), dentry); - return end_creating(dentry); + return debugfs_end_creating(dentry); } =20 struct dentry *debugfs_create_file_full(const char *name, umode_t mode, @@ -568,14 +567,14 @@ EXPORT_SYMBOL_GPL(debugfs_create_file_size); */ struct dentry *debugfs_create_dir(const char *name, struct dentry *parent) { - struct dentry *dentry =3D start_creating(name, parent); + struct dentry *dentry =3D debugfs_start_creating(name, parent); struct inode *inode; =20 if (IS_ERR(dentry)) return dentry; =20 if (!(debugfs_allow & DEBUGFS_ALLOW_API)) { - failed_creating(dentry); + debugfs_failed_creating(dentry); return ERR_PTR(-EPERM); } =20 @@ -583,7 +582,7 @@ struct dentry *debugfs_create_dir(const char *name, str= uct dentry *parent) if (unlikely(!inode)) { pr_err("out of free dentries, can not create directory '%s'\n", name); - return failed_creating(dentry); + return debugfs_failed_creating(dentry); } =20 inode->i_mode =3D S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO; @@ -595,7 +594,7 @@ struct dentry *debugfs_create_dir(const char *name, str= uct dentry *parent) d_instantiate(dentry, inode); inc_nlink(d_inode(dentry->d_parent)); fsnotify_mkdir(d_inode(dentry->d_parent), dentry); - return end_creating(dentry); + return debugfs_end_creating(dentry); } EXPORT_SYMBOL_GPL(debugfs_create_dir); =20 @@ -615,14 +614,14 @@ struct dentry *debugfs_create_automount(const char *n= ame, debugfs_automount_t f, void *data) { - struct dentry *dentry =3D start_creating(name, parent); + struct dentry *dentry =3D debugfs_start_creating(name, parent); struct inode *inode; =20 if (IS_ERR(dentry)) return dentry; =20 if (!(debugfs_allow & DEBUGFS_ALLOW_API)) { - failed_creating(dentry); + debugfs_failed_creating(dentry); return ERR_PTR(-EPERM); } =20 @@ -630,7 +629,7 @@ struct dentry *debugfs_create_automount(const char *nam= e, if (unlikely(!inode)) { pr_err("out of free dentries, can not create automount '%s'\n", name); - return failed_creating(dentry); + return debugfs_failed_creating(dentry); } =20 make_empty_dir_inode(inode); @@ -642,7 +641,7 @@ struct dentry *debugfs_create_automount(const char *nam= e, d_instantiate(dentry, inode); inc_nlink(d_inode(dentry->d_parent)); fsnotify_mkdir(d_inode(dentry->d_parent), dentry); - return end_creating(dentry); + return debugfs_end_creating(dentry); } EXPORT_SYMBOL(debugfs_create_automount); =20 @@ -678,7 +677,7 @@ struct dentry *debugfs_create_symlink(const char *name,= struct dentry *parent, if (!link) return ERR_PTR(-ENOMEM); =20 - dentry =3D start_creating(name, parent); + dentry =3D debugfs_start_creating(name, parent); if (IS_ERR(dentry)) { kfree(link); return dentry; @@ -689,13 +688,13 @@ struct dentry *debugfs_create_symlink(const char *nam= e, struct dentry *parent, pr_err("out of free dentries, can not create symlink '%s'\n", name); kfree(link); - return failed_creating(dentry); + return debugfs_failed_creating(dentry); } inode->i_mode =3D S_IFLNK | S_IRWXUGO; inode->i_op =3D &debugfs_symlink_inode_operations; inode->i_link =3D link; d_instantiate(dentry, inode); - return end_creating(dentry); + return debugfs_end_creating(dentry); } EXPORT_SYMBOL_GPL(debugfs_create_symlink); =20 diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 0c023941a316..320d7f25024b 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -571,16 +571,14 @@ struct dentry *tracefs_start_creating(const char *nam= e, struct dentry *parent) =20 struct dentry *tracefs_failed_creating(struct dentry *dentry) { - inode_unlock(d_inode(dentry->d_parent)); - dput(dentry); + simple_failed_creating(dentry); simple_release_fs(&tracefs_mount, &tracefs_mount_count); return NULL; } =20 struct dentry *tracefs_end_creating(struct dentry *dentry) { - inode_unlock(d_inode(dentry->d_parent)); - return dentry; + return simple_end_creating(dentry); } =20 /* Find the inode that this will use for default */ diff --git a/include/linux/namei.h b/include/linux/namei.h index bd0cba118540..b1171aa7fb96 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -84,6 +84,22 @@ struct dentry *lookup_one_positive_unlocked(struct mnt_i= dmap *idmap, void end_dirop(struct dentry *de); void end_dirop_mkdir(struct dentry *de, struct dentry *parent); =20 +/* filesystems which use the dcache as backing store don't + * keep a reference after creating an object. + */ +static inline struct dentry *simple_end_creating(struct dentry *dentry) +{ + dget(dentry); + end_dirop(dentry); + return dentry; +} + +/* On failure, the don't keep a reference */ +static inline void simple_failed_creating(struct dentry *dentry) +{ + end_dirop(dentry); +} + extern int follow_down_one(struct path *); extern int follow_down(struct path *path, unsigned int flags); extern int follow_up(struct path *); diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c index 0bd1df2ebb47..38c26909235d 100644 --- a/net/sunrpc/rpc_pipe.c +++ b/net/sunrpc/rpc_pipe.c @@ -536,8 +536,7 @@ static int rpc_new_file(struct dentry *parent, =20 inode =3D rpc_get_inode(dir->i_sb, S_IFREG | mode); if (unlikely(!inode)) { - dput(dentry); - inode_unlock(dir); + simple_failed_creating(dentry); return -ENOMEM; } inode->i_ino =3D iunique(dir->i_sb, 100); @@ -546,7 +545,7 @@ static int rpc_new_file(struct dentry *parent, rpc_inode_setowner(inode, private); d_instantiate(dentry, inode); fsnotify_create(dir, dentry); - inode_unlock(dir); + simple_end_creating(dentry); return 0; } =20 @@ -572,9 +571,8 @@ static struct dentry *rpc_new_dir(struct dentry *parent, inc_nlink(dir); d_instantiate(dentry, inode); fsnotify_mkdir(dir, dentry); - inode_unlock(dir); =20 - return dentry; + return simple_end_creating(dentry); } =20 static int rpc_populate(struct dentry *parent, @@ -669,9 +667,8 @@ int rpc_mkpipe_dentry(struct dentry *parent, const char= *name, rpci->pipe =3D pipe; rpc_inode_setowner(inode, private); d_instantiate(dentry, inode); - pipe->dentry =3D dentry; fsnotify_create(dir, dentry); - inode_unlock(dir); + pipe->dentry =3D simple_end_creating(dentry); return 0; =20 failed: --=20 2.50.0.107.gf914562f5916.dirty From nobody Sat Oct 4 00:28:02 2025 Received: from neil.brown.name (neil.brown.name [103.29.64.221]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E86A5225D6; Fri, 22 Aug 2025 00:11:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.29.64.221 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821486; cv=none; b=Dw7n4fKJKVSX3HXITY4L5fJ5LAl8Cls6WzmvspIdlX5jluJoOYXhmHD/jC2HorqiBP0DhxefPpWiTbiPLr8HNkEYXrsPCsp49qpSL/ox+A35D+S5TQyKxjKDCImeRnFfliEt+RTt7qxdClpBD/pfz3ig8cLDv+x6Z0EC8d2K/QY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821486; c=relaxed/simple; bh=WxoFrYEdzCo87Szu1Wss2RIWhUxh1bJ27GAEEzx+OYE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YmVOETOfjqGem9y2azsJyzbsqGOtQ3BajfHggZr5LjnqHsfiwCNFM80KhPFienXPgiGuzHp6BIuvgjq8sqmTp+1xdHAqT71Xetw82DE7ntmBudYA4dwSycug13IKo5ssNwfTzow9ek4descSiTRVe+IVJooIIN8cV1cOnTphqyQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name; spf=pass smtp.mailfrom=neil.brown.name; arc=none smtp.client-ip=103.29.64.221 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=neil.brown.name Received: from 196.186.233.220.static.exetel.com.au ([220.233.186.196] helo=home.neil.brown.name) by neil.brown.name with esmtp (Exim 4.95) (envelope-from ) id 1upFNC-006nb9-HE; Fri, 22 Aug 2025 00:11:16 +0000 From: NeilBrown To: Alexander Viro , Christian Brauner Cc: Jan Kara , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 10/16] Use simple_start_creating() in various places. Date: Fri, 22 Aug 2025 10:00:28 +1000 Message-ID: <20250822000818.1086550-11-neil@brown.name> X-Mailer: git-send-email 2.50.0.107.gf914562f5916.dirty In-Reply-To: <20250822000818.1086550-1-neil@brown.name> References: <20250822000818.1086550-1-neil@brown.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" s390/hypfs, android/binder, binfmt_misc, devpts, nfsd, bpf, apparmour, selinux, security all create FS objects following the pattern of simple_start_creating(). This patch changes them all to use simple_start_creating() and to clean up with simple_failed_creating() or simple_end_creating(). Signed-off-by: NeilBrown --- arch/s390/hypfs/inode.c | 20 +++++------- drivers/android/binderfs.c | 53 +++++++------------------------- fs/binfmt_misc.c | 30 +++++++----------- fs/devpts/inode.c | 29 +++++++----------- fs/nfsd/nfsctl.c | 56 ++++++++++++++-------------------- kernel/bpf/inode.c | 14 ++++----- security/apparmor/apparmorfs.c | 37 ++++++---------------- security/inode.c | 19 +++--------- security/selinux/selinuxfs.c | 8 ++--- 9 files changed, 88 insertions(+), 178 deletions(-) diff --git a/arch/s390/hypfs/inode.c b/arch/s390/hypfs/inode.c index 96409573c75d..00639f458068 100644 --- a/arch/s390/hypfs/inode.c +++ b/arch/s390/hypfs/inode.c @@ -341,17 +341,14 @@ static struct dentry *hypfs_create_file(struct dentry= *parent, const char *name, struct dentry *dentry; struct inode *inode; =20 - inode_lock(d_inode(parent)); - dentry =3D lookup_noperm(&QSTR(name), parent); - if (IS_ERR(dentry)) { - dentry =3D ERR_PTR(-ENOMEM); - goto fail; - } + dentry =3D simple_start_creating(parent, name); + if (IS_ERR(dentry)) + return ERR_PTR(-ENOMEM); + inode =3D hypfs_make_inode(parent->d_sb, mode); if (!inode) { - dput(dentry); - dentry =3D ERR_PTR(-ENOMEM); - goto fail; + simple_failed_creating(dentry); + return ERR_PTR(-ENOMEM); } if (S_ISREG(mode)) { inode->i_fop =3D &hypfs_file_ops; @@ -367,10 +364,7 @@ static struct dentry *hypfs_create_file(struct dentry = *parent, const char *name, BUG(); inode->i_private =3D data; d_instantiate(dentry, inode); - dget(dentry); -fail: - inode_unlock(d_inode(parent)); - return dentry; + return simple_end_creating(dentry); } =20 struct dentry *hypfs_mkdir(struct dentry *parent, const char *name) diff --git a/drivers/android/binderfs.c b/drivers/android/binderfs.c index 0d9d95a7fb60..466e8c0007c3 100644 --- a/drivers/android/binderfs.c +++ b/drivers/android/binderfs.c @@ -181,28 +181,17 @@ static int binderfs_binder_device_create(struct inode= *ref_inode, } =20 root =3D sb->s_root; - inode_lock(d_inode(root)); =20 - /* look it up */ - dentry =3D lookup_noperm(&QSTR(name), root); + dentry =3D simple_start_creating(root, name); if (IS_ERR(dentry)) { - inode_unlock(d_inode(root)); ret =3D PTR_ERR(dentry); goto err; } =20 - if (d_really_is_positive(dentry)) { - /* already exists */ - dput(dentry); - inode_unlock(d_inode(root)); - ret =3D -EEXIST; - goto err; - } - inode->i_private =3D device; d_instantiate(dentry, inode); fsnotify_create(root->d_inode, dentry); - inode_unlock(d_inode(root)); + simple_end_creating(dentry); =20 binder_add_device(device); =20 @@ -482,19 +471,7 @@ static struct inode *binderfs_make_inode(struct super_= block *sb, int mode) static struct dentry *binderfs_create_dentry(struct dentry *parent, const char *name) { - struct dentry *dentry; - - dentry =3D lookup_noperm(&QSTR(name), parent); - if (IS_ERR(dentry)) - return dentry; - - /* Return error if the file/dir already exists. */ - if (d_really_is_positive(dentry)) { - dput(dentry); - return ERR_PTR(-EEXIST); - } - - return dentry; + return simple_start_creating(parent, name); } =20 struct dentry *binderfs_create_file(struct dentry *parent, const char *nam= e, @@ -506,18 +483,16 @@ struct dentry *binderfs_create_file(struct dentry *pa= rent, const char *name, struct super_block *sb; =20 parent_inode =3D d_inode(parent); - inode_lock(parent_inode); =20 dentry =3D binderfs_create_dentry(parent, name); if (IS_ERR(dentry)) - goto out; + return dentry; =20 sb =3D parent_inode->i_sb; new_inode =3D binderfs_make_inode(sb, S_IFREG | 0444); if (!new_inode) { - dput(dentry); - dentry =3D ERR_PTR(-ENOMEM); - goto out; + simple_failed_creating(dentry); + return ERR_PTR(-ENOMEM); } =20 new_inode->i_fop =3D fops; @@ -525,9 +500,7 @@ struct dentry *binderfs_create_file(struct dentry *pare= nt, const char *name, d_instantiate(dentry, new_inode); fsnotify_create(parent_inode, dentry); =20 -out: - inode_unlock(parent_inode); - return dentry; + return simple_end_creating(dentry); } =20 static struct dentry *binderfs_create_dir(struct dentry *parent, @@ -538,18 +511,16 @@ static struct dentry *binderfs_create_dir(struct dent= ry *parent, struct super_block *sb; =20 parent_inode =3D d_inode(parent); - inode_lock(parent_inode); =20 dentry =3D binderfs_create_dentry(parent, name); if (IS_ERR(dentry)) - goto out; + return dentry; =20 sb =3D parent_inode->i_sb; new_inode =3D binderfs_make_inode(sb, S_IFDIR | 0755); if (!new_inode) { - dput(dentry); - dentry =3D ERR_PTR(-ENOMEM); - goto out; + simple_failed_creating(dentry); + return ERR_PTR(-ENOMEM); } =20 new_inode->i_fop =3D &simple_dir_operations; @@ -560,9 +531,7 @@ static struct dentry *binderfs_create_dir(struct dentry= *parent, inc_nlink(parent_inode); fsnotify_mkdir(parent_inode, dentry); =20 -out: - inode_unlock(parent_inode); - return dentry; + return simple_end_creating(dentry); } =20 static int binder_features_show(struct seq_file *m, void *unused) diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index a839f960cd4a..dbe56f243174 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -796,28 +796,23 @@ static ssize_t bm_register_write(struct file *file, c= onst char __user *buffer, revert_creds(old_cred); if (IS_ERR(f)) { pr_notice("register: failed to install interpreter file %s\n", - e->interpreter); + e->interpreter); kfree(e); return PTR_ERR(f); } e->interp_file =3D f; } =20 - inode_lock(d_inode(root)); - dentry =3D lookup_noperm(&QSTR(e->name), root); + dentry =3D simple_start_creating(root, e->name); err =3D PTR_ERR(dentry); if (IS_ERR(dentry)) goto out; =20 - err =3D -EEXIST; - if (d_really_is_positive(dentry)) - goto out2; - inode =3D bm_get_inode(sb, S_IFREG | 0644); =20 err =3D -ENOMEM; if (!inode) - goto out2; + goto out; =20 refcount_set(&e->users, 1); e->dentry =3D dget(dentry); @@ -830,19 +825,16 @@ static ssize_t bm_register_write(struct file *file, c= onst char __user *buffer, list_add(&e->list, &misc->entries); write_unlock(&misc->entries_lock); =20 - err =3D 0; -out2: - dput(dentry); + simple_end_creating(dentry); + return count; + out: - inode_unlock(d_inode(root)); + simple_failed_creating(dentry); =20 - if (err) { - if (f) - filp_close(f, NULL); - kfree(e); - return err; - } - return count; + if (f) + filp_close(f, NULL); + kfree(e); + return err; } =20 static const struct file_operations bm_register_operations =3D { diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c index fdf22264a8e9..85c78e133dad 100644 --- a/fs/devpts/inode.c +++ b/fs/devpts/inode.c @@ -259,7 +259,6 @@ static int devpts_parse_param(struct fs_context *fc, st= ruct fs_parameter *param) static int mknod_ptmx(struct super_block *sb, struct fs_context *fc) { int mode; - int rc =3D -ENOMEM; struct dentry *dentry; struct inode *inode; struct dentry *root =3D sb->s_root; @@ -268,18 +267,15 @@ static int mknod_ptmx(struct super_block *sb, struct = fs_context *fc) kuid_t ptmx_uid =3D current_fsuid(); kgid_t ptmx_gid =3D current_fsgid(); =20 - inode_lock(d_inode(root)); - - /* If we have already created ptmx node, return */ - if (fsi->ptmx_dentry) { - rc =3D 0; - goto out; - } + dentry =3D simple_start_creating(root, "ptmx"); + if (IS_ERR(dentry)) { + if (dentry =3D=3D ERR_PTR(-EEXIST)) { + /* If we have already created ptmx node, return */ + return 0; + } =20 - dentry =3D d_alloc_name(root, "ptmx"); - if (!dentry) { pr_err("Unable to alloc dentry for ptmx node\n"); - goto out; + return -ENOMEM; } =20 /* @@ -288,8 +284,8 @@ static int mknod_ptmx(struct super_block *sb, struct fs= _context *fc) inode =3D new_inode(sb); if (!inode) { pr_err("Unable to alloc inode for ptmx node\n"); - dput(dentry); - goto out; + simple_failed_creating(dentry); + return -ENOMEM; } =20 inode->i_ino =3D 2; @@ -302,11 +298,8 @@ static int mknod_ptmx(struct super_block *sb, struct f= s_context *fc) =20 d_add(dentry, inode); =20 - fsi->ptmx_dentry =3D dentry; - rc =3D 0; -out: - inode_unlock(d_inode(root)); - return rc; + fsi->ptmx_dentry =3D simple_end_creating(dentry); + return 0; } =20 static void update_ptmx_mode(struct pts_fs_info *fsi) diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index bc6b776fc657..e0aac224172c 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1147,24 +1147,21 @@ static int __nfsd_mkdir(struct inode *dir, struct d= entry *dentry, umode_t mode, =20 static struct dentry *nfsd_mkdir(struct dentry *parent, struct nfsdfs_clie= nt *ncl, char *name) { - struct inode *dir =3D parent->d_inode; struct dentry *dentry; - int ret =3D -ENOMEM; + int ret; + + dentry =3D simple_start_creating(parent, name); + if (IS_ERR(dentry)) + return dentry; =20 - inode_lock(dir); - dentry =3D d_alloc_name(parent, name); - if (!dentry) - goto out_err; ret =3D __nfsd_mkdir(d_inode(parent), dentry, S_IFDIR | 0600, ncl); if (ret) goto out_err; -out: - inode_unlock(dir); - return dentry; + return simple_end_creating(dentry); + out_err: - dput(dentry); - dentry =3D ERR_PTR(ret); - goto out; + simple_failed_creating(dentry); + return ERR_PTR(ret); } =20 #if IS_ENABLED(CONFIG_SUNRPC_GSS) @@ -1193,19 +1190,18 @@ static int __nfsd_symlink(struct inode *dir, struct= dentry *dentry, static void _nfsd_symlink(struct dentry *parent, const char *name, const char *content) { - struct inode *dir =3D parent->d_inode; struct dentry *dentry; int ret; =20 - inode_lock(dir); - dentry =3D d_alloc_name(parent, name); - if (!dentry) - goto out; + dentry =3D simple_start_creating(parent, name); + if (IS_ERR(dentry)) + return; ret =3D __nfsd_symlink(d_inode(parent), dentry, S_IFLNK | 0777, content); - if (ret) - dput(dentry); -out: - inode_unlock(dir); + if (ret) { + simple_failed_creating(dentry); + return; + } + simple_end_creating(dentry); } #else static inline void _nfsd_symlink(struct dentry *parent, const char *name, @@ -1250,30 +1246,24 @@ static int nfsdfs_create_files(struct dentry *root, struct dentry *dentry; int i; =20 - inode_lock(dir); for (i =3D 0; files->name && files->name[0]; i++, files++) { - dentry =3D d_alloc_name(root, files->name); - if (!dentry) - goto out; + dentry =3D simple_start_creating(root, files->name); + if (IS_ERR(dentry)) + return PTR_ERR(dentry);; inode =3D nfsd_get_inode(d_inode(root)->i_sb, S_IFREG | files->mode); if (!inode) { - dput(dentry); - goto out; + simple_failed_creating(dentry); + return -ENOMEM; } kref_get(&ncl->cl_ref); inode->i_fop =3D files->ops; inode->i_private =3D ncl; d_add(dentry, inode); fsnotify_create(dir, dentry); - if (fdentries) - fdentries[i] =3D dentry; + fdentries[i] =3D simple_end_creating(dentry); } - inode_unlock(dir); return 0; -out: - inode_unlock(dir); - return -ENOMEM; } =20 /* on success, returns positive number unique to that client. */ diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c index 5c2e96b19392..ee50d1a38023 100644 --- a/kernel/bpf/inode.c +++ b/kernel/bpf/inode.c @@ -420,16 +420,14 @@ static int bpf_iter_link_pin_kernel(struct dentry *pa= rent, struct dentry *dentry; int ret; =20 - inode_lock(parent->d_inode); - dentry =3D lookup_noperm(&QSTR(name), parent); - if (IS_ERR(dentry)) { - inode_unlock(parent->d_inode); - return PTR_ERR(dentry); - } + dentry =3D simple_start_creating(parent, name); + if (IS_ERR(dentry)) + return PTR_ERR(dentry); + ret =3D bpf_mkobj_ops(dentry, mode, link, &bpf_link_iops, &bpf_iter_fops); - dput(dentry); - inode_unlock(parent->d_inode); + /* bpf_mkobj_ops took the ref if needed, so we dput() here */ + dput(simple_end_creating(dentry)); return ret; } =20 diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c index 391a586d0557..13260352198f 100644 --- a/security/apparmor/apparmorfs.c +++ b/security/apparmor/apparmorfs.c @@ -280,32 +280,20 @@ static struct dentry *aafs_create(const char *name, u= mode_t mode, if (error) return ERR_PTR(error); =20 - dir =3D d_inode(parent); - - inode_lock(dir); - dentry =3D lookup_noperm(&QSTR(name), parent); + dentry =3D simple_start_creating(parent, name); if (IS_ERR(dentry)) { error =3D PTR_ERR(dentry); - goto fail_lock; - } - - if (d_really_is_positive(dentry)) { - error =3D -EEXIST; - goto fail_dentry; + goto fail; } =20 error =3D __aafs_setup_d_inode(dir, dentry, mode, data, link, fops, iops); if (error) - goto fail_dentry; - inode_unlock(dir); - - return dentry; + goto fail; =20 -fail_dentry: - dput(dentry); + return simple_end_creating(dentry); =20 -fail_lock: - inode_unlock(dir); +fail: + simple_failed_creating(dentry); simple_release_fs(&aafs_mnt, &aafs_count); =20 return ERR_PTR(error); @@ -2567,8 +2555,7 @@ static int aa_mk_null_file(struct dentry *parent) if (error) return error; =20 - inode_lock(d_inode(parent)); - dentry =3D lookup_noperm(&QSTR(NULL_FILE_NAME), parent); + dentry =3D simple_start_creating(parent, NULL_FILE_NAME); if (IS_ERR(dentry)) { error =3D PTR_ERR(dentry); goto out; @@ -2576,7 +2563,8 @@ static int aa_mk_null_file(struct dentry *parent) inode =3D new_inode(parent->d_inode->i_sb); if (!inode) { error =3D -ENOMEM; - goto out1; + simple_failed_creating(dentry); + goto out; } =20 inode->i_ino =3D get_next_ino(); @@ -2588,18 +2576,13 @@ static int aa_mk_null_file(struct dentry *parent) aa_null.dentry =3D dget(dentry); aa_null.mnt =3D mntget(mount); =20 - error =3D 0; + simple_end_creating(dentry); =20 -out1: - dput(dentry); out: - inode_unlock(d_inode(parent)); simple_release_fs(&mount, &count); return error; } =20 - - static const char *policy_get_link(struct dentry *dentry, struct inode *inode, struct delayed_call *done) diff --git a/security/inode.c b/security/inode.c index 43382ef8896e..0d2fab99e71e 100644 --- a/security/inode.c +++ b/security/inode.c @@ -129,20 +129,14 @@ static struct dentry *securityfs_create_dentry(const = char *name, umode_t mode, =20 dir =3D d_inode(parent); =20 - inode_lock(dir); - dentry =3D lookup_noperm(&QSTR(name), parent); + dentry =3D simple_start_creating(parent, name); if (IS_ERR(dentry)) goto out; =20 - if (d_really_is_positive(dentry)) { - error =3D -EEXIST; - goto out1; - } - inode =3D new_inode(dir->i_sb); if (!inode) { error =3D -ENOMEM; - goto out1; + goto out; } =20 inode->i_ino =3D get_next_ino(); @@ -161,14 +155,11 @@ static struct dentry *securityfs_create_dentry(const = char *name, umode_t mode, inode->i_fop =3D fops; } d_instantiate(dentry, inode); - inode_unlock(dir); - return dentry; + return simple_end_creating(dentry); =20 -out1: - dput(dentry); - dentry =3D ERR_PTR(error); out: - inode_unlock(dir); + simple_failed_creating(dentry); + dentry =3D ERR_PTR(error); if (pinned) simple_release_fs(&mount, &mount_count); return dentry; diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c index 9aa1d03ab612..5aa1ad5be587 100644 --- a/security/selinux/selinuxfs.c +++ b/security/selinux/selinuxfs.c @@ -1949,15 +1949,16 @@ static const struct inode_operations swapover_dir_i= node_operations =3D { static struct dentry *sel_make_swapover_dir(struct super_block *sb, unsigned long *ino) { - struct dentry *dentry =3D d_alloc_name(sb->s_root, ".swapover"); + struct dentry *dentry; struct inode *inode; =20 + dentry =3D simple_start_creating(sb->s_root, ".swapover"); if (!dentry) return ERR_PTR(-ENOMEM); =20 inode =3D sel_make_inode(sb, S_IFDIR); if (!inode) { - dput(dentry); + simple_failed_creating(dentry); return ERR_PTR(-ENOMEM); } =20 @@ -1968,8 +1969,7 @@ static struct dentry *sel_make_swapover_dir(struct su= per_block *sb, inode_lock(sb->s_root->d_inode); d_add(dentry, inode); inc_nlink(sb->s_root->d_inode); - inode_unlock(sb->s_root->d_inode); - return dentry; + return simple_end_creating(dentry); } =20 #define NULL_FILE_NAME "null" --=20 2.50.0.107.gf914562f5916.dirty From nobody Sat Oct 4 00:28:02 2025 Received: from neil.brown.name (neil.brown.name [103.29.64.221]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A854B54723; Fri, 22 Aug 2025 00:11:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.29.64.221 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821487; cv=none; b=FLHfhk7H4XXreBWrdsr+brxteuLnI9kUEaeD0DRXbTNTSsgrR0c/hZV5Bcd+HZRRbpbNLwCTSNtylXIdoxk2CB7Nh2gkPgDXYVVr5ksym1ZzuqI8jLm3o1gqf/ULU4AHMRxleZAeO9ZdqLuAP1WPDwdc9PjpgK55svrx+ZDM4VY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821487; c=relaxed/simple; bh=X8yh33lItgx3dVAiLdahd/NXyVASV1tHoEAt3L1iMVo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lNW+W+tcccBZFyQYLMUUO/j1VV8IEMT14M8TJuVK3DL2ZyIeKEynNGj3PwciZDfxDCiYigAe8A3RjrHYbssrycs7K7XFaDyBWLIfnGlRODo0EvG26+v+dbxYhtRx4rLQqdYTZ48kFGbj4zoIrx7dCWuSc80favRHfQ5r6wzNSOY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name; spf=pass smtp.mailfrom=neil.brown.name; arc=none smtp.client-ip=103.29.64.221 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=neil.brown.name Received: from 196.186.233.220.static.exetel.com.au ([220.233.186.196] helo=home.neil.brown.name) by neil.brown.name with esmtp (Exim 4.95) (envelope-from ) id 1upFND-006nbD-0t; Fri, 22 Aug 2025 00:11:16 +0000 From: NeilBrown To: Alexander Viro , Christian Brauner Cc: Jan Kara , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 11/16] VFS/nfsd/cachefiles: add start_creating() and end_creating() Date: Fri, 22 Aug 2025 10:00:29 +1000 Message-ID: <20250822000818.1086550-12-neil@brown.name> X-Mailer: git-send-email 2.50.0.107.gf914562f5916.dirty In-Reply-To: <20250822000818.1086550-1-neil@brown.name> References: <20250822000818.1086550-1-neil@brown.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" start_creating() is similar to simple_start_creating() but is not so simple. It takes a qstr for the name, includes permission checking, and does NOT report an error if the name already exists, returning a positive dentry instead. This is currently used by nfsd and cachefiles. Overlayfs might have a use for it too. end_creating() is called after the dentry has been used. Unlike simple_end_creating(), end_creating() drop the reference to the dentry as it is generally no longer needed. This is exactly end_dirop_mkdir(), but using that everywhere looks a bit odd... These calls help encapsulate locking rules so that directory locking can be changed. Signed-off-by: NeilBrown --- fs/cachefiles/namei.c | 33 +++++++++++++++------------------ fs/namei.c | 25 +++++++++++++++++++++++++ fs/nfsd/nfs3proc.c | 14 +++++--------- fs/nfsd/nfs4proc.c | 14 +++++--------- fs/nfsd/nfs4recover.c | 16 ++++++---------- fs/nfsd/nfsproc.c | 11 +++++------ fs/nfsd/vfs.c | 42 +++++++++++++++--------------------------- include/linux/namei.h | 18 ++++++++++++++++++ 8 files changed, 94 insertions(+), 79 deletions(-) diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c index d1edb2ac3837..9af324473967 100644 --- a/fs/cachefiles/namei.c +++ b/fs/cachefiles/namei.c @@ -93,12 +93,11 @@ struct dentry *cachefiles_get_directory(struct cachefil= es_cache *cache, _enter(",,%s", dirname); =20 /* search the current directory for the element name */ - inode_lock_nested(d_inode(dir), I_MUTEX_PARENT); =20 retry: ret =3D cachefiles_inject_read_error(); if (ret =3D=3D 0) - subdir =3D lookup_one(&nop_mnt_idmap, &QSTR(dirname), dir); + subdir =3D start_creating(&nop_mnt_idmap, dir, &QSTR(dirname)); else subdir =3D ERR_PTR(ret); trace_cachefiles_lookup(NULL, dir, subdir); @@ -141,7 +140,7 @@ struct dentry *cachefiles_get_directory(struct cachefil= es_cache *cache, trace_cachefiles_mkdir(dir, subdir); =20 if (unlikely(d_unhashed(subdir) || d_is_negative(subdir))) { - dput(subdir); + end_creating(subdir, dir); goto retry; } ASSERT(d_backing_inode(subdir)); @@ -154,7 +153,8 @@ struct dentry *cachefiles_get_directory(struct cachefil= es_cache *cache, =20 /* Tell rmdir() it's not allowed to delete the subdir */ inode_lock(d_inode(subdir)); - inode_unlock(d_inode(dir)); + dget(subdir); + end_creating(subdir, dir); =20 if (!__cachefiles_mark_inode_in_use(NULL, d_inode(subdir))) { pr_notice("cachefiles: Inode already in use: %pd (B=3D%lx)\n", @@ -196,14 +196,11 @@ struct dentry *cachefiles_get_directory(struct cachef= iles_cache *cache, return ERR_PTR(-EBUSY); =20 mkdir_error: - inode_unlock(d_inode(dir)); - if (!IS_ERR(subdir)) - dput(subdir); + end_creating(subdir, dir); pr_err("mkdir %s failed with error %d\n", dirname, ret); return ERR_PTR(ret); =20 lookup_error: - inode_unlock(d_inode(dir)); ret =3D PTR_ERR(subdir); pr_err("Lookup %s failed with error %d\n", dirname, ret); return ERR_PTR(ret); @@ -679,36 +676,37 @@ bool cachefiles_commit_tmpfile(struct cachefiles_cach= e *cache, =20 _enter(",%pD", object->file); =20 - inode_lock_nested(d_inode(fan), I_MUTEX_PARENT); ret =3D cachefiles_inject_read_error(); if (ret =3D=3D 0) - dentry =3D lookup_one(&nop_mnt_idmap, &QSTR(object->d_name), fan); + dentry =3D start_creating(&nop_mnt_idmap, fan, &QSTR(object->d_name)); else dentry =3D ERR_PTR(ret); if (IS_ERR(dentry)) { trace_cachefiles_vfs_error(object, d_inode(fan), PTR_ERR(dentry), cachefiles_trace_lookup_error); _debug("lookup fail %ld", PTR_ERR(dentry)); - goto out_unlock; + goto out; } =20 - if (!d_is_negative(dentry)) { + while (!d_is_negative(dentry)) { ret =3D cachefiles_unlink(volume->cache, object, fan, dentry, FSCACHE_OBJECT_IS_STALE); if (ret < 0) goto out_dput; =20 - dput(dentry); + end_creating(dentry, fan); + ret =3D cachefiles_inject_read_error(); if (ret =3D=3D 0) - dentry =3D lookup_one(&nop_mnt_idmap, &QSTR(object->d_name), fan); + dentry =3D start_creating(&nop_mnt_idmap, fan, + &QSTR(object->d_name)); else dentry =3D ERR_PTR(ret); if (IS_ERR(dentry)) { trace_cachefiles_vfs_error(object, d_inode(fan), PTR_ERR(dentry), cachefiles_trace_lookup_error); _debug("lookup fail %ld", PTR_ERR(dentry)); - goto out_unlock; + goto out; } } =20 @@ -730,9 +728,8 @@ bool cachefiles_commit_tmpfile(struct cachefiles_cache = *cache, } =20 out_dput: - dput(dentry); -out_unlock: - inode_unlock(d_inode(fan)); + end_creating(dentry, fan); +out: _leave(" =3D %u", success); return success; } diff --git a/fs/namei.c b/fs/namei.c index c1e39c985f1f..407f3516b335 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -3161,6 +3161,31 @@ struct dentry *lookup_noperm_positive_unlocked(struc= t qstr *name, } EXPORT_SYMBOL(lookup_noperm_positive_unlocked); =20 +/** + * start_creating - prepare to create a given name with permission checking + * @idmap - idmap of the mount + * @parent - directory in which to prepare to create the name + * @name - the name to be created + * + * Locks are taken and a lookup in performed prior to creating + * an object in a directory. Permission checking (MAY_EXEC) is performed + * against @idmap. + * + * If the name already exists, a positive dentry is returned. + * + * Returns: a negative or positive dentry, or an error. + */ +struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *pare= nt, + struct qstr *name) +{ + int err =3D lookup_one_common(idmap, name, parent); + + if (err) + return ERR_PTR(err); + return start_dirop(parent, name, LOOKUP_CREATE); +} +EXPORT_SYMBOL(start_creating); + #ifdef CONFIG_UNIX98_PTYS int path_pts(struct path *path) { diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c index b6d03e1ef5f7..e2aac0def2cb 100644 --- a/fs/nfsd/nfs3proc.c +++ b/fs/nfsd/nfs3proc.c @@ -281,14 +281,11 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_= fh *fhp, if (host_err) return nfserrno(host_err); =20 - inode_lock_nested(inode, I_MUTEX_PARENT); - - child =3D lookup_one(&nop_mnt_idmap, - &QSTR_LEN(argp->name, argp->len), - parent); + child =3D start_creating(&nop_mnt_idmap, parent, + &QSTR_LEN(argp->name, argp->len)); if (IS_ERR(child)) { status =3D nfserrno(PTR_ERR(child)); - goto out; + goto out_write; } =20 if (d_really_is_negative(child)) { @@ -367,9 +364,8 @@ nfsd3_create_file(struct svc_rqst *rqstp, struct svc_fh= *fhp, status =3D nfsd_create_setattr(rqstp, fhp, resfhp, &attrs); =20 out: - inode_unlock(inode); - if (child && !IS_ERR(child)) - dput(child); + end_creating(child, parent); +out_write: fh_drop_write(fhp); return status; } diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 71b428efcbb5..35d48221072f 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -264,14 +264,11 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_= fh *fhp, if (is_create_with_attrs(open)) nfsd4_acl_to_attr(NF4REG, open->op_acl, &attrs); =20 - inode_lock_nested(inode, I_MUTEX_PARENT); - - child =3D lookup_one(&nop_mnt_idmap, - &QSTR_LEN(open->op_fname, open->op_fnamelen), - parent); + child =3D start_creating(&nop_mnt_idmap, parent, + &QSTR_LEN(open->op_fname, open->op_fnamelen)); if (IS_ERR(child)) { status =3D nfserrno(PTR_ERR(child)); - goto out; + goto out_write; } =20 if (d_really_is_negative(child)) { @@ -379,10 +376,9 @@ nfsd4_create_file(struct svc_rqst *rqstp, struct svc_f= h *fhp, if (attrs.na_aclerr) open->op_bmval[0] &=3D ~FATTR4_WORD0_ACL; out: - inode_unlock(inode); + end_creating(child, parent); nfsd_attrs_free(&attrs); - if (child && !IS_ERR(child)) - dput(child); +out_write: fh_drop_write(fhp); return status; } diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c index 2231192ec33f..93b2a3e764db 100644 --- a/fs/nfsd/nfs4recover.c +++ b/fs/nfsd/nfs4recover.c @@ -216,13 +216,11 @@ nfsd4_create_clid_dir(struct nfs4_client *clp) goto out_creds; =20 dir =3D nn->rec_file->f_path.dentry; - /* lock the parent */ - inode_lock(d_inode(dir)); =20 - dentry =3D lookup_one(&nop_mnt_idmap, &QSTR(dname), dir); + dentry =3D start_creating(&nop_mnt_idmap, dir, &QSTR(dname)); if (IS_ERR(dentry)) { status =3D PTR_ERR(dentry); - goto out_unlock; + goto out; } if (d_really_is_positive(dentry)) /* @@ -233,15 +231,13 @@ nfsd4_create_clid_dir(struct nfs4_client *clp) * In the 4.0 case, we should never get here; but we may * as well be forgiving and just succeed silently. */ - goto out_put; + goto out_end; dentry =3D vfs_mkdir(&nop_mnt_idmap, d_inode(dir), dentry, S_IRWXU); if (IS_ERR(dentry)) status =3D PTR_ERR(dentry); -out_put: - if (!status) - dput(dentry); -out_unlock: - inode_unlock(d_inode(dir)); +out_end: + end_creating(dentry, dir); +out: if (status =3D=3D 0) { if (nn->in_grace) __nfsd4_create_reclaim_record_grace(clp, dname, diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c index 8f71f5748c75..ee1b16e921fd 100644 --- a/fs/nfsd/nfsproc.c +++ b/fs/nfsd/nfsproc.c @@ -306,18 +306,16 @@ nfsd_proc_create(struct svc_rqst *rqstp) goto done; } =20 - inode_lock_nested(dirfhp->fh_dentry->d_inode, I_MUTEX_PARENT); - dchild =3D lookup_one(&nop_mnt_idmap, &QSTR_LEN(argp->name, argp->len), - dirfhp->fh_dentry); + dchild =3D start_creating(&nop_mnt_idmap, dirfhp->fh_dentry, + &QSTR_LEN(argp->name, argp->len)); if (IS_ERR(dchild)) { resp->status =3D nfserrno(PTR_ERR(dchild)); - goto out_unlock; + goto out_write; } fh_init(newfhp, NFS_FHSIZE); resp->status =3D fh_compose(newfhp, dirfhp->fh_export, dchild, dirfhp); if (!resp->status && d_really_is_negative(dchild)) resp->status =3D nfserr_noent; - dput(dchild); if (resp->status) { if (resp->status !=3D nfserr_noent) goto out_unlock; @@ -423,7 +421,8 @@ nfsd_proc_create(struct svc_rqst *rqstp) } =20 out_unlock: - inode_unlock(dirfhp->fh_dentry->d_inode); + end_creating(dchild, dirfhp->fh_dentry); +out_write: fh_drop_write(dirfhp); done: fh_put(dirfhp); diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 5f3e99f956ca..5c809cbc05fe 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1597,19 +1597,16 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *= fhp, if (host_err) return nfserrno(host_err); =20 - inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT); - dchild =3D lookup_one(&nop_mnt_idmap, &QSTR_LEN(fname, flen), dentry); + dchild =3D start_creating(&nop_mnt_idmap, dentry, &QSTR_LEN(fname, flen)); host_err =3D PTR_ERR(dchild); - if (IS_ERR(dchild)) { - err =3D nfserrno(host_err); - goto out_unlock; - } + if (IS_ERR(dchild)) + return nfserrno(host_err); + err =3D fh_compose(resfhp, fhp->fh_export, dchild, fhp); /* * We unconditionally drop our ref to dchild as fh_compose will have * already grabbed its own ref for it. */ - dput(dchild); if (err) goto out_unlock; err =3D fh_fill_pre_attrs(fhp); @@ -1618,7 +1615,7 @@ nfsd_create(struct svc_rqst *rqstp, struct svc_fh *fh= p, err =3D nfsd_create_locked(rqstp, fhp, attrs, type, rdev, resfhp); fh_fill_post_attrs(fhp); out_unlock: - inode_unlock(dentry->d_inode); + end_creating(dchild, dentry); return err; } =20 @@ -1704,11 +1701,9 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh *= fhp, } =20 dentry =3D fhp->fh_dentry; - inode_lock_nested(dentry->d_inode, I_MUTEX_PARENT); - dnew =3D lookup_one(&nop_mnt_idmap, &QSTR_LEN(fname, flen), dentry); + dnew =3D start_creating(&nop_mnt_idmap, dentry, &QSTR_LEN(fname, flen)); if (IS_ERR(dnew)) { err =3D nfserrno(PTR_ERR(dnew)); - inode_unlock(dentry->d_inode); goto out_drop_write; } err =3D fh_fill_pre_attrs(fhp); @@ -1721,11 +1716,11 @@ nfsd_symlink(struct svc_rqst *rqstp, struct svc_fh = *fhp, nfsd_create_setattr(rqstp, fhp, resfhp, attrs); fh_fill_post_attrs(fhp); out_unlock: - inode_unlock(dentry->d_inode); + end_creating(dnew, dentry); if (!err) err =3D nfserrno(commit_metadata(fhp)); - dput(dnew); - if (err=3D=3D0) err =3D cerr; + if (err=3D=3D0) + err =3D cerr; out_drop_write: fh_drop_write(fhp); out: @@ -1780,32 +1775,31 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ff= hp, =20 ddir =3D ffhp->fh_dentry; dirp =3D d_inode(ddir); - inode_lock_nested(dirp, I_MUTEX_PARENT); + dnew =3D start_creating(&nop_mnt_idmap, ddir, &QSTR_LEN(name, len)); =20 - dnew =3D lookup_one(&nop_mnt_idmap, &QSTR_LEN(name, len), ddir); if (IS_ERR(dnew)) { host_err =3D PTR_ERR(dnew); - goto out_unlock; + goto out_drop_write; } =20 dold =3D tfhp->fh_dentry; =20 err =3D nfserr_noent; if (d_really_is_negative(dold)) - goto out_dput; + goto out_unlock; err =3D fh_fill_pre_attrs(ffhp); if (err !=3D nfs_ok) - goto out_dput; + goto out_unlock; host_err =3D vfs_link(dold, &nop_mnt_idmap, dirp, dnew, NULL); fh_fill_post_attrs(ffhp); - inode_unlock(dirp); +out_unlock: + end_creating(dnew, ddir); if (!host_err) { host_err =3D commit_metadata(ffhp); if (!host_err) host_err =3D commit_metadata(tfhp); } =20 - dput(dnew); out_drop_write: fh_drop_write(tfhp); if (host_err =3D=3D -EBUSY) { @@ -1820,12 +1814,6 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffh= p, } out: return err !=3D nfs_ok ? err : nfserrno(host_err); - -out_dput: - dput(dnew); -out_unlock: - inode_unlock(dirp); - goto out_drop_write; } =20 static void diff --git a/include/linux/namei.h b/include/linux/namei.h index b1171aa7fb96..7371f586e318 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -81,9 +81,27 @@ struct dentry *lookup_one_positive_unlocked(struct mnt_i= dmap *idmap, struct qstr *name, struct dentry *base); =20 +struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *pare= nt, + struct qstr *name); + void end_dirop(struct dentry *de); void end_dirop_mkdir(struct dentry *de, struct dentry *parent); =20 +/* end_creating - finish action started with start_creating + * @child - dentry returned by start_creating() + * @parent - dentry given to start_creating() + * + * Unlock and release the child. + * + * Unlike end_dirop() this can only be called if start_creating() succeede= d. + * It handles @child being and error as vfs_mkdir() might have converted t= he + * dentry to an error - in that case the parent still needs to be unlocked. + */ +static inline void end_creating(struct dentry *child, struct dentry *paren= t) +{ + end_dirop_mkdir(child, parent); +} + /* filesystems which use the dcache as backing store don't * keep a reference after creating an object. */ --=20 2.50.0.107.gf914562f5916.dirty From nobody Sat Oct 4 00:28:02 2025 Received: from neil.brown.name (neil.brown.name [103.29.64.221]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A7EA1537E9; Fri, 22 Aug 2025 00:11:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.29.64.221 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821486; cv=none; b=HZcy1yP7Dyw81U42y2nmSShcfuKkQ92cMgbKCvcWDPke/RHVfaPDZLeYfJhAWRuqQahzJwuEwMH+L6fWzjn4JK+i211S4vx/lZsoi/Q/RQw2qeG1u7nMV9VCHBp5J/U4dChouaEMg5yrmlJHe1lFimxIw0E7Tr7rrrsmpSsuaOs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821486; c=relaxed/simple; bh=YO4vYvbZq/SXdjaSJEeIzUMiMaZxEX3ks9vFUBdy8aA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QRU+ehvDoQtAcbjM+THaCnTzMDNA4Gi1x3LmxrtBynqdaTeq224phI43jlDU6Lh+Opr/o1ZIkf16oNwe5wuUoVXrgiyTG0uY9m0xPnnZIpKBKZmv1z+ojSIS5b4cPcb3QocJhHUe4KmZN9ipW2dPzN9uMhy+iwdG+5TVsaDcXeg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name; spf=pass smtp.mailfrom=neil.brown.name; arc=none smtp.client-ip=103.29.64.221 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=neil.brown.name Received: from 196.186.233.220.static.exetel.com.au ([220.233.186.196] helo=home.neil.brown.name) by neil.brown.name with esmtp (Exim 4.95) (envelope-from ) id 1upFND-006nbH-Fh; Fri, 22 Aug 2025 00:11:17 +0000 From: NeilBrown To: Alexander Viro , Christian Brauner Cc: Jan Kara , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 12/16] nfsd: move name lookup out of nfsd4_list_rec_dir() Date: Fri, 22 Aug 2025 10:00:30 +1000 Message-ID: <20250822000818.1086550-13-neil@brown.name> X-Mailer: git-send-email 2.50.0.107.gf914562f5916.dirty In-Reply-To: <20250822000818.1086550-1-neil@brown.name> References: <20250822000818.1086550-1-neil@brown.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" nfsd4_list_rec_dir() is called with two different callbacks. One of the callbacks uses vfs_rmdir() to move the directory. The other doesn't use the dentry at all, just the name. As only one callback needs the dentry, this patch moves the lookup into that function. This prepares of changes to how directory operations are locked. Signed-off-by: NeilBrown --- fs/nfsd/nfs4recover.c | 50 +++++++++++++++++++++---------------------- 1 file changed, 24 insertions(+), 26 deletions(-) diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c index 93b2a3e764db..f65cf7ecea6d 100644 --- a/fs/nfsd/nfs4recover.c +++ b/fs/nfsd/nfs4recover.c @@ -254,7 +254,7 @@ nfsd4_create_clid_dir(struct nfs4_client *clp) nfs4_reset_creds(original_cred); } =20 -typedef int (recdir_func)(struct dentry *, struct dentry *, struct nfsd_ne= t *); +typedef int (recdir_func)(struct dentry *, char *, struct nfsd_net *); =20 struct name_list { char name[HEXDIR_LEN]; @@ -308,24 +308,14 @@ nfsd4_list_rec_dir(recdir_func *f, struct nfsd_net *n= n) } =20 status =3D iterate_dir(nn->rec_file, &ctx.ctx); - inode_lock_nested(d_inode(dir), I_MUTEX_PARENT); =20 list_for_each_entry_safe(entry, tmp, &ctx.names, list) { - if (!status) { - struct dentry *dentry; - dentry =3D lookup_one(&nop_mnt_idmap, - &QSTR(entry->name), dir); - if (IS_ERR(dentry)) { - status =3D PTR_ERR(dentry); - break; - } - status =3D f(dir, dentry, nn); - dput(dentry); - } + if (!status) + status =3D f(dir, entry->name, nn); + list_del(&entry->list); kfree(entry); } - inode_unlock(d_inode(dir)); nfs4_reset_creds(original_cred); =20 list_for_each_entry_safe(entry, tmp, &ctx.names, list) { @@ -423,18 +413,19 @@ nfsd4_remove_clid_dir(struct nfs4_client *clp) } =20 static int -purge_old(struct dentry *parent, struct dentry *child, struct nfsd_net *nn) +purge_old(struct dentry *parent, char *cname, struct nfsd_net *nn) { int status; + struct dentry *child; struct xdr_netobj name; =20 - if (child->d_name.len !=3D HEXDIR_LEN - 1) { + if (strlen(cname) !=3D HEXDIR_LEN - 1) { printk("%s: illegal name %pd in recovery directory\n", __func__, child); /* Keep trying; maybe the others are OK: */ return 0; } - name.data =3D kmemdup_nul(child->d_name.name, child->d_name.len, GFP_KERN= EL); + name.data =3D kstrdup(cname, GFP_KERNEL); if (!name.data) { dprintk("%s: failed to allocate memory for name.data!\n", __func__); @@ -444,10 +435,17 @@ purge_old(struct dentry *parent, struct dentry *child= , struct nfsd_net *nn) if (nfs4_has_reclaimed_state(name, nn)) goto out_free; =20 - status =3D vfs_rmdir(&nop_mnt_idmap, d_inode(parent), child); - if (status) - printk("failed to remove client recovery directory %pd\n", - child); + inode_lock_nested(d_inode(parent), I_MUTEX_PARENT); + child =3D lookup_one(&nop_mnt_idmap, &QSTR(cname), parent); + if (!IS_ERR(child)) { + status =3D vfs_rmdir(&nop_mnt_idmap, d_inode(parent), child); + if (status) + printk("failed to remove client recovery directory %pd\n", + child); + dput(child); + } + inode_unlock(d_inode(parent)); + out_free: kfree(name.data); out: @@ -478,18 +476,18 @@ nfsd4_recdir_purge_old(struct nfsd_net *nn) } =20 static int -load_recdir(struct dentry *parent, struct dentry *child, struct nfsd_net *= nn) +load_recdir(struct dentry *parent, char *cname, struct nfsd_net *nn) { struct xdr_netobj name; struct xdr_netobj princhash =3D { .len =3D 0, .data =3D NULL }; =20 - if (child->d_name.len !=3D HEXDIR_LEN - 1) { - printk("%s: illegal name %pd in recovery directory\n", - __func__, child); + if (strlen(cname) !=3D HEXDIR_LEN - 1) { + printk("%s: illegal name %s in recovery directory\n", + __func__, cname); /* Keep trying; maybe the others are OK: */ return 0; } - name.data =3D kmemdup_nul(child->d_name.name, child->d_name.len, GFP_KERN= EL); + name.data =3D kstrdup(cname, GFP_KERNEL); if (!name.data) { dprintk("%s: failed to allocate memory for name.data!\n", __func__); --=20 2.50.0.107.gf914562f5916.dirty From nobody Sat Oct 4 00:28:02 2025 Received: from neil.brown.name (neil.brown.name [103.29.64.221]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9A4B615539A; Fri, 22 Aug 2025 00:11:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.29.64.221 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821488; cv=none; b=bk6eX8GyiGSyYwLWgsheTW1OU22QmAByhPzGiniZbnEsiOEp/IE9jAemdqOuC1ohE51YlXFLU/CQTbzqdCo/6PCDiWYfCV8yoHpUZqOeQHV1IHA7bs1CVuWw9SVp50vnfqqFIJPRyHVfWkCHhtb7beyDtT4Yn7oECjfXKH+0haw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821488; c=relaxed/simple; bh=cLxOonLPhU4Ns4lppuaRfso35+0UkceNyrDxB0iS+uQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=neSzbm0GV8pc1Z8r2RxOuehrWJpMAq6g2Qe6b5dnW3gZl1lF+srzjj1Vpm4AC4ELamhFp/jGgyuzmN6I6IlH96luo6Mp6VKHKTH3COxiUhCy1/GihrYl/RIPiqFtGcswSTZVq/PkHcSdA9aq6XkrrbU1ufGxcgA6hOOZX1OqB14= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name; spf=pass smtp.mailfrom=neil.brown.name; arc=none smtp.client-ip=103.29.64.221 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=neil.brown.name Received: from 196.186.233.220.static.exetel.com.au ([220.233.186.196] helo=home.neil.brown.name) by neil.brown.name with esmtp (Exim 4.95) (envelope-from ) id 1upFND-006nbL-WB; Fri, 22 Aug 2025 00:11:17 +0000 From: NeilBrown To: Alexander Viro , Christian Brauner Cc: Jan Kara , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 13/16] VFS/nfsd/cachefiles: introduce start_removing() Date: Fri, 22 Aug 2025 10:00:31 +1000 Message-ID: <20250822000818.1086550-14-neil@brown.name> X-Mailer: git-send-email 2.50.0.107.gf914562f5916.dirty In-Reply-To: <20250822000818.1086550-1-neil@brown.name> References: <20250822000818.1086550-1-neil@brown.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" start_removing() is similar to start_creating() to will only return a positive dentry and the expectation is that it will be removed. This is used by nfsd and cachefiles. They are changed to also use end_dirop() to terminate the action begun by start_removing(). Signed-off-by: NeilBrown --- fs/cachefiles/namei.c | 25 ++++++++++--------------- fs/namei.c | 27 +++++++++++++++++++++++++++ fs/nfsd/nfs4recover.c | 24 +++++++----------------- fs/nfsd/vfs.c | 26 ++++++++++---------------- include/linux/namei.h | 2 ++ 5 files changed, 56 insertions(+), 48 deletions(-) diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c index 9af324473967..ddced50afb66 100644 --- a/fs/cachefiles/namei.c +++ b/fs/cachefiles/namei.c @@ -260,6 +260,7 @@ static int cachefiles_unlink(struct cachefiles_cache *c= ache, * - File backed objects are unlinked * - Directory backed objects are stuffed into the graveyard for userspace= to * delete + * On entry dir must be locked. It will be unlocked on exit. */ int cachefiles_bury_object(struct cachefiles_cache *cache, struct cachefiles_object *object, @@ -275,7 +276,8 @@ int cachefiles_bury_object(struct cachefiles_cache *cac= he, _enter(",'%pd','%pd'", dir, rep); =20 if (rep->d_parent !=3D dir) { - inode_unlock(d_inode(dir)); + dget(rep); + end_dirop(rep); _leave(" =3D -ESTALE"); return -ESTALE; } @@ -286,16 +288,16 @@ int cachefiles_bury_object(struct cachefiles_cache *c= ache, * by a file struct. */ ret =3D cachefiles_unlink(cache, object, dir, rep, why); - dput(rep); + end_dirop(rep); =20 - inode_unlock(d_inode(dir)); _leave(" =3D %d", ret); return ret; } =20 /* directories have to be moved to the graveyard */ _debug("move stale object to graveyard"); - inode_unlock(d_inode(dir)); + dget(rep); + end_dirop(rep); =20 try_again: /* first step is to make up a grave dentry in the graveyard */ @@ -745,26 +747,20 @@ static struct dentry *cachefiles_lookup_for_cull(stru= ct cachefiles_cache *cache, struct dentry *victim; int ret =3D -ENOENT; =20 - inode_lock_nested(d_inode(dir), I_MUTEX_PARENT); + victim =3D start_removing(&nop_mnt_idmap, dir, &QSTR(filename)); =20 - victim =3D lookup_one(&nop_mnt_idmap, &QSTR(filename), dir); if (IS_ERR(victim)) goto lookup_error; - if (d_is_negative(victim)) - goto lookup_put; if (d_inode(victim)->i_flags & S_KERNEL_FILE) goto lookup_busy; return victim; =20 lookup_busy: ret =3D -EBUSY; -lookup_put: - inode_unlock(d_inode(dir)); - dput(victim); + end_dirop(victim); return ERR_PTR(ret); =20 lookup_error: - inode_unlock(d_inode(dir)); ret =3D PTR_ERR(victim); if (ret =3D=3D -ENOENT) return ERR_PTR(-ESTALE); /* Probably got retired by the netfs */ @@ -812,18 +808,17 @@ int cachefiles_cull(struct cachefiles_cache *cache, s= truct dentry *dir, =20 ret =3D cachefiles_bury_object(cache, NULL, dir, victim, FSCACHE_OBJECT_WAS_CULLED); + dput(victim); if (ret < 0) goto error; =20 fscache_count_culled(); - dput(victim); _leave(" =3D 0"); return 0; =20 error_unlock: - inode_unlock(d_inode(dir)); + end_dirop(victim); error: - dput(victim); if (ret =3D=3D -ENOENT) return -ESTALE; /* Probably got retired by the netfs */ =20 diff --git a/fs/namei.c b/fs/namei.c index 407f3516b335..27a99c276137 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -3186,6 +3186,33 @@ struct dentry *start_creating(struct mnt_idmap *idma= p, struct dentry *parent, } EXPORT_SYMBOL(start_creating); =20 +/** + * start_removing - prepare to remove a given name with permission checking + * @idmap - idmap of the mount + * @parent - directory in which to find the name + * @name - the name to be removed + * + * Locks are taken and a lookup in performed prior to removing + * an object from a directory. Permission checking (MAY_EXEC) is performed + * against @idmap. + * + * If the name doesn't exist, an error is returned. + * + * end_dirop() should be called when removal is complete, or aborted. + * + * Returns: a positive dentry, or an error. + */ +struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *pare= nt, + struct qstr *name) +{ + int err =3D lookup_one_common(idmap, name, parent); + + if (err) + return ERR_PTR(err); + return start_dirop(parent, name, 0); +} +EXPORT_SYMBOL(start_removing); + #ifdef CONFIG_UNIX98_PTYS int path_pts(struct path *path) { diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c index f65cf7ecea6d..8d4bb22db3b7 100644 --- a/fs/nfsd/nfs4recover.c +++ b/fs/nfsd/nfs4recover.c @@ -335,20 +335,12 @@ nfsd4_unlink_clid_dir(char *name, struct nfsd_net *nn) dprintk("NFSD: nfsd4_unlink_clid_dir. name %s\n", name); =20 dir =3D nn->rec_file->f_path.dentry; - inode_lock_nested(d_inode(dir), I_MUTEX_PARENT); - dentry =3D lookup_one(&nop_mnt_idmap, &QSTR(name), dir); - if (IS_ERR(dentry)) { - status =3D PTR_ERR(dentry); - goto out_unlock; - } - status =3D -ENOENT; - if (d_really_is_negative(dentry)) - goto out; + dentry =3D start_removing(&nop_mnt_idmap, dir, &QSTR(name)); + if (IS_ERR(dentry)) + return PTR_ERR(dentry); + status =3D vfs_rmdir(&nop_mnt_idmap, d_inode(dir), dentry); -out: - dput(dentry); -out_unlock: - inode_unlock(d_inode(dir)); + end_dirop(dentry); return status; } =20 @@ -435,16 +427,14 @@ purge_old(struct dentry *parent, char *cname, struct = nfsd_net *nn) if (nfs4_has_reclaimed_state(name, nn)) goto out_free; =20 - inode_lock_nested(d_inode(parent), I_MUTEX_PARENT); - child =3D lookup_one(&nop_mnt_idmap, &QSTR(cname), parent); + child =3D start_removing(&nop_mnt_idmap, parent, &QSTR(cname)); if (!IS_ERR(child)) { status =3D vfs_rmdir(&nop_mnt_idmap, d_inode(parent), child); if (status) printk("failed to remove client recovery directory %pd\n", child); - dput(child); + end_dirop(child); } - inode_unlock(d_inode(parent)); =20 out_free: kfree(name.data); diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 5c809cbc05fe..5bdd068dbdd7 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -2013,7 +2013,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fh= p, int type, { struct dentry *dentry, *rdentry; struct inode *dirp; - struct inode *rinode; + struct inode *rinode =3D NULL; __be32 err; int host_err; =20 @@ -2032,24 +2032,21 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *= fhp, int type, =20 dentry =3D fhp->fh_dentry; dirp =3D d_inode(dentry); - inode_lock_nested(dirp, I_MUTEX_PARENT); =20 - rdentry =3D lookup_one(&nop_mnt_idmap, &QSTR_LEN(fname, flen), dentry); + rdentry =3D start_removing(&nop_mnt_idmap, dentry, &QSTR_LEN(fname, flen)= ); + host_err =3D PTR_ERR(rdentry); if (IS_ERR(rdentry)) - goto out_unlock; + goto out_drop_write; =20 - if (d_really_is_negative(rdentry)) { - dput(rdentry); - host_err =3D -ENOENT; - goto out_unlock; - } - rinode =3D d_inode(rdentry); err =3D fh_fill_pre_attrs(fhp); if (err !=3D nfs_ok) goto out_unlock; =20 + rinode =3D d_inode(rdentry); + /* Prevent truncation until after locks dropped */ ihold(rinode); + if (!type) type =3D d_inode(rdentry)->i_mode & S_IFMT; =20 @@ -2071,10 +2068,10 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *= fhp, int type, } fh_fill_post_attrs(fhp); =20 - inode_unlock(dirp); - if (!host_err) +out_unlock: + end_dirop(rdentry); + if (!err && !host_err) host_err =3D commit_metadata(fhp); - dput(rdentry); iput(rinode); /* truncate the inode here */ =20 out_drop_write: @@ -2092,9 +2089,6 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fh= p, int type, } out: return err !=3D nfs_ok ? err : nfserrno(host_err); -out_unlock: - inode_unlock(dirp); - goto out_drop_write; } =20 /* diff --git a/include/linux/namei.h b/include/linux/namei.h index 7371f586e318..5feb92b84d84 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -83,6 +83,8 @@ struct dentry *lookup_one_positive_unlocked(struct mnt_id= map *idmap, =20 struct dentry *start_creating(struct mnt_idmap *idmap, struct dentry *pare= nt, struct qstr *name); +struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *pare= nt, + struct qstr *name); =20 void end_dirop(struct dentry *de); void end_dirop_mkdir(struct dentry *de, struct dentry *parent); --=20 2.50.0.107.gf914562f5916.dirty From nobody Sat Oct 4 00:28:02 2025 Received: from neil.brown.name (neil.brown.name [103.29.64.221]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9C130156230; Fri, 22 Aug 2025 00:11:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.29.64.221 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821488; cv=none; b=Zmke0E1F3SdYdNrfioxtikplH3xBiCAM+Rtw9FfYRFxJc5cKR9toqLrFk9goKOZd1loapACMjQPam1i/9BAB5dx9MnpjtB8ZSk87+brJmEnQiuzE54hyakzpCOHlR+c3uyfeFXMrwE0Hv7/37JB90nwBS1MdPJJPiQbzid17cOI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821488; c=relaxed/simple; bh=l6xkNp1lPiYCT5DKAQ0xm8yzAbNf3BW5XyqA/xo4kDI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=T5bJHjNNm+4rsmQY5wIi9JfrQCSCVl4sG2clHGL6dx5dp1iy369V7c9m/4kxZpq7JdgpCK+WZYf/An86cXshpi7Tj7E3w2TogvGYND2er0u0bFmbErtvUaoTEm5kEkMbbArE2nF03aCE3/mzZJVV/JEMGrECsOX2f4POXG7V62E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name; spf=pass smtp.mailfrom=neil.brown.name; arc=none smtp.client-ip=103.29.64.221 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=neil.brown.name Received: from 196.186.233.220.static.exetel.com.au ([220.233.186.196] helo=home.neil.brown.name) by neil.brown.name with esmtp (Exim 4.95) (envelope-from ) id 1upFNE-006nbW-Eq; Fri, 22 Aug 2025 00:11:18 +0000 From: NeilBrown To: Alexander Viro , Christian Brauner Cc: Jan Kara , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 14/16] VFS: introduce start_creating_noperm() and start_removing_noperm() Date: Fri, 22 Aug 2025 10:00:32 +1000 Message-ID: <20250822000818.1086550-15-neil@brown.name> X-Mailer: git-send-email 2.50.0.107.gf914562f5916.dirty In-Reply-To: <20250822000818.1086550-1-neil@brown.name> References: <20250822000818.1086550-1-neil@brown.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" xfs, fuse, ipc/mqueue need variants of start_creating or start_removing which do not check permissions. This patch adds _noperm versions of these functions. Signed-off-by: NeilBrown --- fs/fuse/dir.c | 19 +++++++--------- fs/namei.c | 48 ++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/orphanage.c | 11 ++++----- include/linux/namei.h | 2 ++ ipc/mqueue.c | 7 ++---- 5 files changed, 64 insertions(+), 23 deletions(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 2d817d7cab26..043896278380 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1403,27 +1403,25 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, = u64 parent_nodeid, if (!parent) return -ENOENT; =20 - inode_lock_nested(parent, I_MUTEX_PARENT); if (!S_ISDIR(parent->i_mode)) - goto unlock; + goto put_parent; =20 err =3D -ENOENT; dir =3D d_find_alias(parent); if (!dir) - goto unlock; + goto put_parent; =20 - name->hash =3D full_name_hash(dir, name->name, name->len); - entry =3D d_lookup(dir, name); + entry =3D start_removing_noperm(dir, name); dput(dir); - if (!entry) - goto unlock; + if (IS_ERR(entry)) + goto put_parent; =20 fuse_dir_changed(parent); if (!(flags & FUSE_EXPIRE_ONLY)) d_invalidate(entry); fuse_invalidate_entry_cache(entry); =20 - if (child_nodeid !=3D 0 && d_really_is_positive(entry)) { + if (child_nodeid !=3D 0) { inode_lock(d_inode(entry)); if (get_node_id(d_inode(entry)) !=3D child_nodeid) { err =3D -ENOENT; @@ -1451,10 +1449,9 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, u= 64 parent_nodeid, } else { err =3D 0; } - dput(entry); =20 - unlock: - inode_unlock(parent); + end_dirop(entry); + put_parent: iput(parent); return err; } diff --git a/fs/namei.c b/fs/namei.c index 27a99c276137..34895487045e 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -3213,6 +3213,54 @@ struct dentry *start_removing(struct mnt_idmap *idma= p, struct dentry *parent, } EXPORT_SYMBOL(start_removing); =20 +/** + * start_creating_noperm - prepare to create a given name without permissi= on checking + * @parent - directory in which to prepare to create the name + * @name - the name to be created + * + * Locks are taken and a lookup in performed prior to creating + * an object in a directory. + * + * If the name already exists, a positive dentry is returned. + * + * Returns: a negative or positive dentry, or an error. + */ +struct dentry *start_creating_noperm(struct dentry *parent, + struct qstr *name) +{ + int err =3D lookup_noperm_common(name, parent); + + if (err) + return ERR_PTR(err); + return start_dirop(parent, name, LOOKUP_CREATE); +} +EXPORT_SYMBOL(start_creating_noperm); + +/** + * start_removing_noperm - prepare to remove a given name without permissi= on checking + * @parent - directory in which to find the name + * @name - the name to be removed + * + * Locks are taken and a lookup in performed prior to removing + * an object from a directory. + * + * If the name doesn't exist, an error is returned. + * + * end_dirop() should be called when removal is complete, or aborted. + * + * Returns: a positive dentry, or an error. + */ +struct dentry *start_removing_noperm(struct dentry *parent, + struct qstr *name) +{ + int err =3D lookup_noperm_common(name, parent); + + if (err) + return ERR_PTR(err); + return start_dirop(parent, name, 0); +} +EXPORT_SYMBOL(start_removing_noperm); + #ifdef CONFIG_UNIX98_PTYS int path_pts(struct path *path) { diff --git a/fs/xfs/scrub/orphanage.c b/fs/xfs/scrub/orphanage.c index 9c12cb844231..aa2f747039e8 100644 --- a/fs/xfs/scrub/orphanage.c +++ b/fs/xfs/scrub/orphanage.c @@ -152,11 +152,10 @@ xrep_orphanage_create( } =20 /* Try to find the orphanage directory. */ - inode_lock_nested(root_inode, I_MUTEX_PARENT); - orphanage_dentry =3D lookup_noperm(&QSTR(ORPHANAGE), root_dentry); + orphanage_dentry =3D start_creating_noperm(root_dentry, &QSTR(ORPHANAGE)); if (IS_ERR(orphanage_dentry)) { error =3D PTR_ERR(orphanage_dentry); - goto out_unlock_root; + goto out_dput_root; } =20 /* @@ -170,7 +169,7 @@ xrep_orphanage_create( orphanage_dentry, 0750); error =3D PTR_ERR(orphanage_dentry); if (IS_ERR(orphanage_dentry)) - goto out_unlock_root; + goto out_dput_orphanage; } =20 /* Not a directory? Bail out. */ @@ -200,9 +199,7 @@ xrep_orphanage_create( sc->orphanage_ilock_flags =3D 0; =20 out_dput_orphanage: - dput(orphanage_dentry); -out_unlock_root: - inode_unlock(VFS_I(sc->mp->m_rootip)); + end_dirop_mkdir(orphanage_dentry, root_dentry); out_dput_root: dput(root_dentry); out: diff --git a/include/linux/namei.h b/include/linux/namei.h index 5feb92b84d84..d765a23c87e4 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -85,6 +85,8 @@ struct dentry *start_creating(struct mnt_idmap *idmap, st= ruct dentry *parent, struct qstr *name); struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *pare= nt, struct qstr *name); +struct dentry *start_creating_noperm(struct dentry *parent, struct qstr *n= ame); +struct dentry *start_removing_noperm(struct dentry *parent, struct qstr *n= ame); =20 void end_dirop(struct dentry *de); void end_dirop_mkdir(struct dentry *de, struct dentry *parent); diff --git a/ipc/mqueue.c b/ipc/mqueue.c index 093551fe66a7..407ef49f2dbc 100644 --- a/ipc/mqueue.c +++ b/ipc/mqueue.c @@ -913,13 +913,11 @@ static int do_mq_open(const char __user *u_name, int = oflag, umode_t mode, goto out_putname; =20 ro =3D mnt_want_write(mnt); /* we'll drop it in any case */ - inode_lock(d_inode(root)); - path.dentry =3D lookup_noperm(&QSTR(name->name), root); + path.dentry =3D start_creating_noperm(root, &QSTR(name->name)); if (IS_ERR(path.dentry)) { error =3D PTR_ERR(path.dentry); goto out_putfd; } - path.mnt =3D mntget(mnt); error =3D prepare_open(path.dentry, oflag, ro, mode, name, attr); if (!error) { struct file *file =3D dentry_open(&path, oflag, current_cred()); @@ -928,13 +926,12 @@ static int do_mq_open(const char __user *u_name, int = oflag, umode_t mode, else error =3D PTR_ERR(file); } - path_put(&path); out_putfd: if (error) { put_unused_fd(fd); fd =3D error; } - inode_unlock(d_inode(root)); + end_dirop(path.dentry); if (!ro) mnt_drop_write(mnt); out_putname: --=20 2.50.0.107.gf914562f5916.dirty From nobody Sat Oct 4 00:28:02 2025 Received: from neil.brown.name (neil.brown.name [103.29.64.221]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1EAB1EC2; Fri, 22 Aug 2025 00:11:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.29.64.221 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821483; cv=none; b=gPO+HHGSdyuwBkVIP2YJHxPLBLc3Wve9iTlpOvbV078uGWs6cSYhx+HjC0n58cgOdEjpj4xlZcVGJnwdxvQ7Y+4DNpy6nIGUuAOGzqlhThOjceVGgOK4r+fE4sSjQg4bNLMrNeKvxrVf8pHr/xFN3vraioKmEbBxUhnE8IhBJHo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821483; c=relaxed/simple; bh=vLoGSo14hf6U2j542yrjveQQMBm85/0J1om6WKkhYl8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fm264Xm32gdTaRcxlsIWbxLnMpeGWOTyHNq4owkJP9aRczWPuw3kg/Xw8cESVIHrJPvxFshRXVF17PHJsMZOkzyG8GrCNKOjd6i9xH5eIswOepULcBqm3doWq3Onysq5vzrx28lLKn8ZwltHCV80s6/Zi88dD0m/RmQLyy1Mpi0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name; spf=pass smtp.mailfrom=neil.brown.name; arc=none smtp.client-ip=103.29.64.221 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=neil.brown.name Received: from 196.186.233.220.static.exetel.com.au ([220.233.186.196] helo=home.neil.brown.name) by neil.brown.name with esmtp (Exim 4.95) (envelope-from ) id 1upFNE-006nbf-UF; Fri, 22 Aug 2025 00:11:18 +0000 From: NeilBrown To: Alexander Viro , Christian Brauner Cc: Jan Kara , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 15/16] VFS: introduce start_removing_dentry() Date: Fri, 22 Aug 2025 10:00:33 +1000 Message-ID: <20250822000818.1086550-16-neil@brown.name> X-Mailer: git-send-email 2.50.0.107.gf914562f5916.dirty In-Reply-To: <20250822000818.1086550-1-neil@brown.name> References: <20250822000818.1086550-1-neil@brown.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" start_removing_dentry() is similar to start_removing() but instead of providing a name for lookup, the target dentry is given. start_removing_dentry() checks that the dentry is still hashed and in the parent, and if so it locks and increases the refcount so that end_dirop() can be used to finish the operation. This is used in cachefiles, overlayfs, smb/server and apparmor. There will be other users including ecryptfs. Signed-off-by: NeilBrown --- fs/cachefiles/interface.c | 14 +++++++++----- fs/cachefiles/namei.c | 22 ++++++++++++---------- fs/cachefiles/volume.c | 10 +++++++--- fs/namei.c | 29 +++++++++++++++++++++++++++++ fs/overlayfs/dir.c | 10 ++++------ fs/overlayfs/readdir.c | 8 ++++---- fs/smb/server/vfs.c | 27 ++++----------------------- include/linux/namei.h | 2 ++ security/apparmor/apparmorfs.c | 8 ++++---- 9 files changed, 75 insertions(+), 55 deletions(-) diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c index 3e63cfe15874..763d7d55b1f9 100644 --- a/fs/cachefiles/interface.c +++ b/fs/cachefiles/interface.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include "internal.h" @@ -428,11 +429,14 @@ static bool cachefiles_invalidate_cookie(struct fscac= he_cookie *cookie) if (!old_tmpfile) { struct cachefiles_volume *volume =3D object->volume; struct dentry *fan =3D volume->fanout[(u8)cookie->key_hash]; - - inode_lock_nested(d_inode(fan), I_MUTEX_PARENT); - cachefiles_bury_object(volume->cache, object, fan, - old_file->f_path.dentry, - FSCACHE_OBJECT_INVALIDATED); + struct dentry *obj; + + obj =3D start_removing_dentry(fan, old_file->f_path.dentry); + if (!IS_ERR(obj)) + cachefiles_bury_object(volume->cache, object, + fan, obj, + FSCACHE_OBJECT_INVALIDATED); + end_dirop(obj); } fput(old_file); } diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c index ddced50afb66..cc6dccd606ea 100644 --- a/fs/cachefiles/namei.c +++ b/fs/cachefiles/namei.c @@ -424,13 +424,12 @@ int cachefiles_delete_object(struct cachefiles_object= *object, =20 _enter(",OBJ%x{%pD}", object->debug_id, object->file); =20 - /* Stop the dentry being negated if it's only pinned by a file struct. */ - dget(dentry); - - inode_lock_nested(d_backing_inode(fan), I_MUTEX_PARENT); - ret =3D cachefiles_unlink(volume->cache, object, fan, dentry, why); - inode_unlock(d_backing_inode(fan)); - dput(dentry); + dentry =3D start_removing_dentry(fan, dentry); + if (IS_ERR(dentry)) + ret =3D PTR_ERR(dentry); + else + ret =3D cachefiles_unlink(volume->cache, object, fan, dentry, why); + end_dirop(dentry); return ret; } =20 @@ -643,9 +642,12 @@ bool cachefiles_look_up_object(struct cachefiles_objec= t *object) =20 if (!d_is_reg(dentry)) { pr_err("%pd is not a file\n", dentry); - inode_lock_nested(d_inode(fan), I_MUTEX_PARENT); - ret =3D cachefiles_bury_object(volume->cache, object, fan, dentry, - FSCACHE_OBJECT_IS_WEIRD); + struct dentry *de =3D start_removing_dentry(fan, dentry); + if (!IS_ERR(de)) + ret =3D cachefiles_bury_object(volume->cache, object, + fan, de, + FSCACHE_OBJECT_IS_WEIRD); + end_dirop(de); dput(dentry); if (ret < 0) return false; diff --git a/fs/cachefiles/volume.c b/fs/cachefiles/volume.c index 781aac4ef274..8c29f3db3fae 100644 --- a/fs/cachefiles/volume.c +++ b/fs/cachefiles/volume.c @@ -7,6 +7,7 @@ =20 #include #include +#include #include "internal.h" #include =20 @@ -58,9 +59,12 @@ void cachefiles_acquire_volume(struct fscache_volume *vc= ookie) if (ret < 0) { if (ret !=3D -ESTALE) goto error_dir; - inode_lock_nested(d_inode(cache->store), I_MUTEX_PARENT); - cachefiles_bury_object(cache, NULL, cache->store, vdentry, - FSCACHE_VOLUME_IS_WEIRD); + vdentry =3D start_removing_dentry(cache->store, vdentry); + if (!IS_ERR(vdentry)) + cachefiles_bury_object(cache, NULL, cache->store, + vdentry, + FSCACHE_VOLUME_IS_WEIRD); + end_dirop(vdentry); cachefiles_put_directory(volume->dentry); cond_resched(); goto retry; diff --git a/fs/namei.c b/fs/namei.c index 34895487045e..af56bc39c4d5 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -3261,6 +3261,35 @@ struct dentry *start_removing_noperm(struct dentry *= parent, } EXPORT_SYMBOL(start_removing_noperm); =20 +/** + * start_removing_dentry - prepare to remove a given dentry + * @parent - directory from which dentry should be removed + * @child - the dentry to be removed + * + * A lock is taken to protect the dentry again other dirops and + * the validity of the dentry is checked: correct parent and still hashed. + * + * If the dentry is valid a reference is taken and returned. If not + * an error is returned. + * + * end_dirop() should be called when removal is complete, or aborted. + * + * Returns: the valid dentry, or an error. + */ +struct dentry *start_removing_dentry(struct dentry *parent, + struct dentry *child) +{ + inode_lock_nested(parent->d_inode, I_MUTEX_PARENT); + if (unlikely(IS_DEADDIR(parent->d_inode) || + child->d_parent !=3D parent || + d_unhashed(child))) { + inode_unlock(parent->d_inode); + return ERR_PTR(-EINVAL); + } + return dget(child); +} +EXPORT_SYMBOL(start_removing_dentry); + #ifdef CONFIG_UNIX98_PTYS int path_pts(struct path *path) { diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c index 70b8687dc45e..b8f0d409e841 100644 --- a/fs/overlayfs/dir.c +++ b/fs/overlayfs/dir.c @@ -47,14 +47,12 @@ static int ovl_cleanup_locked(struct ovl_fs *ofs, struc= t inode *wdir, int ovl_cleanup(struct ovl_fs *ofs, struct dentry *workdir, struct dentry *wdentry) { - int err; - - err =3D ovl_parent_lock(workdir, wdentry); - if (err) - return err; + wdentry =3D start_removing_dentry(workdir, wdentry); + if (IS_ERR(wdentry)) + return PTR_ERR(wdentry); =20 ovl_cleanup_locked(ofs, workdir->d_inode, wdentry); - ovl_parent_unlock(workdir); + end_dirop(wdentry); =20 return 0; } diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c index b65cdfce31ce..20348be4b98f 100644 --- a/fs/overlayfs/readdir.c +++ b/fs/overlayfs/readdir.c @@ -1158,11 +1158,11 @@ int ovl_workdir_cleanup(struct ovl_fs *ofs, struct = dentry *parent, if (!d_is_dir(dentry) || level > 1) return ovl_cleanup(ofs, parent, dentry); =20 - err =3D ovl_parent_lock(parent, dentry); - if (err) - return err; + dentry =3D start_removing_dentry(parent, dentry); + if (IS_ERR(dentry)) + return PTR_ERR(dentry); err =3D ovl_do_rmdir(ofs, parent->d_inode, dentry); - ovl_parent_unlock(parent); + end_dirop(dentry); if (err) { struct path path =3D { .mnt =3D mnt, .dentry =3D dentry }; =20 diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c index 07739055ac9f..04030a204d5a 100644 --- a/fs/smb/server/vfs.c +++ b/fs/smb/server/vfs.c @@ -48,24 +48,6 @@ static void ksmbd_vfs_inherit_owner(struct ksmbd_work *w= ork, i_uid_write(inode, i_uid_read(parent_inode)); } =20 -/** - * ksmbd_vfs_lock_parent() - lock parent dentry if it is stable - * @parent: parent dentry - * @child: child dentry - * - * Returns: %0 on success, %-ENOENT if the parent dentry is not stable - */ -int ksmbd_vfs_lock_parent(struct dentry *parent, struct dentry *child) -{ - inode_lock_nested(d_inode(parent), I_MUTEX_PARENT); - if (child->d_parent !=3D parent) { - inode_unlock(d_inode(parent)); - return -ENOENT; - } - - return 0; -} - static int ksmbd_vfs_path_lookup(struct ksmbd_share_config *share_conf, char *pathname, unsigned int flags, struct path *path, bool do_lock) @@ -1083,18 +1065,17 @@ int ksmbd_vfs_unlink(struct file *filp) return err; =20 dir =3D dget_parent(dentry); - err =3D ksmbd_vfs_lock_parent(dir, dentry); - if (err) + dentry =3D start_removing_dentry(dir, dentry); + err =3D PTR_ERR(dentry); + if (IS_ERR(dentry)) goto out; - dget(dentry); =20 if (S_ISDIR(d_inode(dentry)->i_mode)) err =3D vfs_rmdir(idmap, d_inode(dir), dentry); else err =3D vfs_unlink(idmap, d_inode(dir), dentry, NULL); =20 - dput(dentry); - inode_unlock(d_inode(dir)); + end_dirop(dentry); if (err) ksmbd_debug(VFS, "failed to delete, err %d\n", err); out: diff --git a/include/linux/namei.h b/include/linux/namei.h index d765a23c87e4..b89be0ac5e87 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -87,6 +87,8 @@ struct dentry *start_removing(struct mnt_idmap *idmap, st= ruct dentry *parent, struct qstr *name); struct dentry *start_creating_noperm(struct dentry *parent, struct qstr *n= ame); struct dentry *start_removing_noperm(struct dentry *parent, struct qstr *n= ame); +struct dentry *start_removing_dentry(struct dentry *parent, + struct dentry *child); =20 void end_dirop(struct dentry *de); void end_dirop_mkdir(struct dentry *de, struct dentry *parent); diff --git a/security/apparmor/apparmorfs.c b/security/apparmor/apparmorfs.c index 13260352198f..f33fc19c99a7 100644 --- a/security/apparmor/apparmorfs.c +++ b/security/apparmor/apparmorfs.c @@ -343,17 +343,17 @@ static void aafs_remove(struct dentry *dentry) if (!dentry || IS_ERR(dentry)) return; =20 + /* ->d_parent is stable as rename is not supported */ dir =3D d_inode(dentry->d_parent); - inode_lock(dir); - if (simple_positive(dentry)) { + dentry =3D start_removing_dentry(dentry->d_parent, dentry); + if (!IS_ERR(dentry) && simple_positive(dentry)) { if (d_is_dir(dentry)) simple_rmdir(dir, dentry); else simple_unlink(dir, dentry); d_delete(dentry); - dput(dentry); } - inode_unlock(dir); + end_dirop(dentry); simple_release_fs(&aafs_mnt, &aafs_count); } =20 --=20 2.50.0.107.gf914562f5916.dirty From nobody Sat Oct 4 00:28:02 2025 Received: from neil.brown.name (neil.brown.name [103.29.64.221]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 94866139E; Fri, 22 Aug 2025 00:11:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.29.64.221 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821483; cv=none; b=B02kxAEKWH7iJvUSJPg5UREgzqapPAVQTJm2SYxn4g3PsVCd4AMf+OPPEHVIFVyGhXhDwJbc3tDm4adnZdfZR70obiT5TCDL3E+jvPG2NWK8ypoKc+zIKX2FBVhorWsf+vG4kiTUAQO+x3DzLGDajvZkG7ELn3DdjnR7zdsCUwQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755821483; c=relaxed/simple; bh=ULv7ZyF5fWdjhfgSI2qkGQb6DUzG8NgxC77a+4v4w64=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=X43bFH/IYfYuMytqR1FvjzThPgujLKS3rw7Qjc0f7nIyvAnNOPldqubvRoYDPqMzJ/F+aRhKalxZ537WOuYFHz60Cuf3dGbEL6TVDYuTC27aXvSAv5K7X03itJRHFPbVUj6h2EbXWVZWbkmrPAUJl3bjrX1Nz0q0H2swtbBggKc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name; spf=pass smtp.mailfrom=neil.brown.name; arc=none smtp.client-ip=103.29.64.221 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=brown.name Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=neil.brown.name Received: from 196.186.233.220.static.exetel.com.au ([220.233.186.196] helo=home.neil.brown.name) by neil.brown.name with esmtp (Exim 4.95) (envelope-from ) id 1upFNF-006nbk-Ef; Fri, 22 Aug 2025 00:11:19 +0000 From: NeilBrown To: Alexander Viro , Christian Brauner Cc: Jan Kara , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 16/16] VFS: add start_creating_killable() and start_removing_killable() Date: Fri, 22 Aug 2025 10:00:34 +1000 Message-ID: <20250822000818.1086550-17-neil@brown.name> X-Mailer: git-send-email 2.50.0.107.gf914562f5916.dirty In-Reply-To: <20250822000818.1086550-1-neil@brown.name> References: <20250822000818.1086550-1-neil@brown.name> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" These are similar to start_creating() and start_removing(), but allow a fatal signal to abort waiting for the lock. There are used in btrfs for subvol creating and removal, and will have a role in overlayfs too. Signed-off-by: NeilBrown --- fs/btrfs/ioctl.c | 43 +++++++---------------- fs/namei.c | 80 +++++++++++++++++++++++++++++++++++++++++-- include/linux/namei.h | 6 ++++ 3 files changed, 95 insertions(+), 34 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 7e13de2bdcbf..20febcf25aea 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -880,8 +880,6 @@ static inline int btrfs_may_create(struct mnt_idmap *id= map, { if (d_really_is_positive(child)) return -EEXIST; - if (IS_DEADDIR(dir)) - return -ENOENT; if (!fsuidgid_has_mapping(dir->i_sb, idmap)) return -EOVERFLOW; return inode_permission(idmap, dir, MAY_WRITE | MAY_EXEC); @@ -904,14 +902,9 @@ static noinline int btrfs_mksubvol(struct dentry *pare= nt, struct fscrypt_str name_str =3D FSTR_INIT((char *)qname->name, qname->len= ); int ret; =20 - ret =3D down_write_killable_nested(&dir->i_rwsem, I_MUTEX_PARENT); - if (ret =3D=3D -EINTR) - return ret; - - dentry =3D lookup_one(idmap, qname, parent); - ret =3D PTR_ERR(dentry); + dentry =3D start_creating_killable(idmap, parent, qname); if (IS_ERR(dentry)) - goto out_unlock; + return PTR_ERR(dentry); =20 ret =3D btrfs_may_create(idmap, dir, dentry); if (ret) @@ -940,9 +933,7 @@ static noinline int btrfs_mksubvol(struct dentry *paren= t, out_up_read: up_read(&fs_info->subvol_sem); out_dput: - dput(dentry); -out_unlock: - btrfs_inode_unlock(BTRFS_I(dir), 0); + end_creating(dentry, parent); return ret; } =20 @@ -2417,18 +2408,10 @@ static noinline int btrfs_ioctl_snap_destroy(struct= file *file, goto free_subvol_name; } =20 - ret =3D down_write_killable_nested(&dir->i_rwsem, I_MUTEX_PARENT); - if (ret =3D=3D -EINTR) - goto free_subvol_name; - dentry =3D lookup_one(idmap, &QSTR(subvol_name), parent); + dentry =3D start_removing_killable(idmap, parent, &QSTR(subvol_name)); if (IS_ERR(dentry)) { ret =3D PTR_ERR(dentry); - goto out_unlock_dir; - } - - if (d_really_is_negative(dentry)) { - ret =3D -ENOENT; - goto out_dput; + goto out_end_dirop; } =20 inode =3D d_inode(dentry); @@ -2449,7 +2432,7 @@ static noinline int btrfs_ioctl_snap_destroy(struct f= ile *file, */ ret =3D -EPERM; if (!btrfs_test_opt(fs_info, USER_SUBVOL_RM_ALLOWED)) - goto out_dput; + goto out_end_dirop; =20 /* * Do not allow deletion if the parent dir is the same @@ -2460,21 +2443,21 @@ static noinline int btrfs_ioctl_snap_destroy(struct= file *file, */ ret =3D -EINVAL; if (root =3D=3D dest) - goto out_dput; + goto out_end_dirop; =20 ret =3D inode_permission(idmap, inode, MAY_WRITE | MAY_EXEC); if (ret) - goto out_dput; + goto out_end_dirop; } =20 /* check if subvolume may be deleted by a user */ ret =3D btrfs_may_delete(idmap, dir, dentry, 1); if (ret) - goto out_dput; + goto out_end_dirop; =20 if (btrfs_ino(BTRFS_I(inode)) !=3D BTRFS_FIRST_FREE_OBJECTID) { ret =3D -EINVAL; - goto out_dput; + goto out_end_dirop; } =20 btrfs_inode_lock(BTRFS_I(inode), 0); @@ -2483,10 +2466,8 @@ static noinline int btrfs_ioctl_snap_destroy(struct = file *file, if (!ret) d_delete_notify(dir, dentry); =20 -out_dput: - dput(dentry); -out_unlock_dir: - btrfs_inode_unlock(BTRFS_I(dir), 0); +out_end_dirop: + end_dirop(dentry); free_subvol_name: kfree(subvol_name_ptr); free_parent: diff --git a/fs/namei.c b/fs/namei.c index af56bc39c4d5..5b40f025ecc5 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2765,19 +2765,33 @@ static int filename_parentat(int dfd, struct filena= me *name, * Returns: a locked dentry, or an error. * */ -struct dentry *start_dirop(struct dentry *parent, struct qstr *name, - unsigned int lookup_flags) +static struct dentry *__start_dirop(struct dentry *parent, struct qstr *na= me, + unsigned int lookup_flags, + unsigned int state) { struct dentry *dentry; struct inode *dir =3D d_inode(parent); =20 - inode_lock_nested(dir, I_MUTEX_PARENT); + if (state =3D=3D TASK_KILLABLE) { + int ret =3D down_write_killable_nested(&dir->i_rwsem, + I_MUTEX_PARENT); + if (ret) + return ERR_PTR(ret); + } else { + inode_lock_nested(dir, I_MUTEX_PARENT); + } dentry =3D lookup_one_qstr_excl(name, parent, lookup_flags); if (IS_ERR(dentry)) inode_unlock(dir); return dentry; } =20 +struct dentry *start_dirop(struct dentry *parent, struct qstr *name, + unsigned int lookup_flags) +{ + return __start_dirop(parent, name, lookup_flags, TASK_NORMAL); +} + /** * end_dirop - signal completion of a dirop * @de - the dentry which was returned by start_dirop or similar. @@ -3213,6 +3227,66 @@ struct dentry *start_removing(struct mnt_idmap *idma= p, struct dentry *parent, } EXPORT_SYMBOL(start_removing); =20 +/** + * start_creating_killable - prepare to create a given name with permissio= n checking + * @idmap - idmap of the mount + * @parent - directory in which to prepare to create the name + * @name - the name to be created + * + * Locks are taken and a lookup in performed prior to creating + * an object in a directory. Permission checking (MAY_EXEC) is performed + * against @idmap. + * + * If the name already exists, a positive dentry is returned. + * + * If a signal is received or was already pending, the function aborts + * with -EINTR; + * + * Returns: a negative or positive dentry, or an error. + */ +struct dentry *start_creating_killable(struct mnt_idmap *idmap, + struct dentry *parent, + struct qstr *name) +{ + int err =3D lookup_one_common(idmap, name, parent); + + if (err) + return ERR_PTR(err); + return __start_dirop(parent, name, LOOKUP_CREATE, TASK_KILLABLE); +} +EXPORT_SYMBOL(start_creating_killable); + +/** + * start_removing_killable - prepare to remove a given name with permissio= n checking + * @idmap - idmap of the mount + * @parent - directory in which to find the name + * @name - the name to be removed + * + * Locks are taken and a lookup in performed prior to removing + * an object from a directory. Permission checking (MAY_EXEC) is performed + * against @idmap. + * + * If the name doesn't exist, an error is returned. + * + * end_dirop() should be called when removal is complete, or aborted. + * + * If a signal is received or was already pending, the function aborts + * with -EINTR; + * + * Returns: a positive dentry, or an error. + */ +struct dentry *start_removing_killable(struct mnt_idmap *idmap, + struct dentry *parent, + struct qstr *name) +{ + int err =3D lookup_one_common(idmap, name, parent); + + if (err) + return ERR_PTR(err); + return __start_dirop(parent, name, 0, TASK_KILLABLE); +} +EXPORT_SYMBOL(start_removing_killable); + /** * start_creating_noperm - prepare to create a given name without permissi= on checking * @parent - directory in which to prepare to create the name diff --git a/include/linux/namei.h b/include/linux/namei.h index b89be0ac5e87..11b8d410b5eb 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -85,6 +85,12 @@ struct dentry *start_creating(struct mnt_idmap *idmap, s= truct dentry *parent, struct qstr *name); struct dentry *start_removing(struct mnt_idmap *idmap, struct dentry *pare= nt, struct qstr *name); +struct dentry *start_creating_killable(struct mnt_idmap *idmap, + struct dentry *parent, + struct qstr *name); +struct dentry *start_removing_killable(struct mnt_idmap *idmap, + struct dentry *parent, + struct qstr *name); struct dentry *start_creating_noperm(struct dentry *parent, struct qstr *n= ame); struct dentry *start_removing_noperm(struct dentry *parent, struct qstr *n= ame); struct dentry *start_removing_dentry(struct dentry *parent, --=20 2.50.0.107.gf914562f5916.dirty