From nobody Wed Apr 1 09:44:31 2026 Received: from ewsoutbound.kpnmail.nl (ewsoutbound.kpnmail.nl [195.121.94.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EA561345CA5 for ; Tue, 31 Mar 2026 17:21:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.121.94.183 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774977689; cv=none; b=Wdh2AS97MxdoNkodA9QNlydPvxCrzBp4Lk3yZsljGH3pN6UcwW5/DkrFiJmfgWQ9fWzrGHpPxhLL2rnzo+p7JYuDXBdQHd4F/p8Im5fXJrp1gkk+fbKOJ7qzHTYlRfvxZVRVHc6DqVFKLT6/eaE4EqyeyRs131qX+iVeG+s/vt8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774977689; c=relaxed/simple; bh=yWQgdgRhZbv2l7+Ro7qNBtP2xSE0vb5n5/JP2mfl/+I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=K6mh1ZuR8QiqKonOirXfJx6P14UXVxrOgimfr1jxwLKBRLrLw86JYc9nKs7rrr+MVv4QCB6o4RoHkgF1qtCC4EaNqLvHWCCDNatEWSQjdqQjqZb+eJpVWh1kXsFcuYUMEhj0zcPUicNJ8oBYHR80u42ZkDNzOEWFl3cNTz4mnV4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=xs4all.nl; spf=pass smtp.mailfrom=xs4all.nl; dkim=pass (2048-bit key) header.d=xs4all.nl header.i=@xs4all.nl header.b=oylyme5g; arc=none smtp.client-ip=195.121.94.183 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=xs4all.nl Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=xs4all.nl Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=xs4all.nl header.i=@xs4all.nl header.b="oylyme5g" X-KPN-MessageId: e41bc105-2d25-11f1-bea5-005056992ed3 Received: from smtp.kpnmail.nl (unknown [10.31.155.7]) by ewsoutbound.so.kpn.org (Halon) with ESMTPS id e41bc105-2d25-11f1-bea5-005056992ed3; Tue, 31 Mar 2026 19:20:18 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=xs4all.nl; s=xs4all01; h=mime-version:message-id:date:subject:to:from; bh=R8axV3zrGCR8LJlu0OosxCPVJsIb2s3wxLdTHN9HNfs=; b=oylyme5gUdOycP1i4/B3srC+vbVwh/GgRfJ5is0ImO70jvMbB9EmnAHUalR5tDlHBuVGaTRwBoGYF IfMjtyhvIiSDU/6305uGLevty4qBBnCyPup6MEN28pGKFSoWgYDkqWr9eZBceWN5O3UXsxyKYi+B3Q IM0yCpJk36Yp3AeS7ePQnMpJy9EQv7U4LTQgKqau4feqLgP/ILmi/l33fZ/TEgXPaUBajy42mPw5Fk YINjhjnosgm5yvM4pItsURKfeiwaIgZebvYtI2wqx6paXPkePXhzt3kHJm4FaTW4ZcUGPT1FDE5amS +yY66969218gVp+2eruq7fjvfV40NKg== X-KPN-MID: 33|E+oMIFqNqCiE4l2kkzmIREhn0jNvfbd5HtdsBoKJlJtYl6UMqHxfksVtxQu3h4a S1l/vKYh5kODQ0fVPu6NrAqTCqTdBWcGeVQFguoybViY= X-KPN-VerifiedSender: Yes X-CMASSUN: 33|oK00dgZQze3kacwmaEU7YrVnVXYLTTTiqkBii5XgtPLNjONTvUlor+wDZ5NSIjp HloufzgZAlY5Cchja9QZN0g== Received: from daedalus.home (unknown [178.227.25.158]) by smtp.xs4all.nl (Halon) with ESMTPSA id e3a32562-2d25-11f1-86d5-005056998788; Tue, 31 Mar 2026 19:20:18 +0200 (CEST) From: Jori Koolstra To: Andy Lutomirski , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Alexander Viro , Christian Brauner , Jeff Layton , Chuck Lever , Arnd Bergmann , Shuah Khan , Greg Kroah-Hartman , "H. Peter Anvin" , Jan Kara , Alexander Aring Cc: Peter Zijlstra , Oleg Nesterov , Andrey Albershteyn , Jiri Olsa , Mathieu Desnoyers , =?UTF-8?q?Thomas=20Wei=C3=9Fschuh?= , Namhyung Kim , Arnaldo Carvalho de Melo , Aleksa Sarai , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, cmirabil@redhat.com, Jori Koolstra , "Masami Hiramatsu (Google)" Subject: [RFC PATCH 1/2] vfs: syscalls: add mkdirat_fd() Date: Tue, 31 Mar 2026 19:19:58 +0200 Message-ID: <20260331172011.3512876-2-jkoolstra@xs4all.nl> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260331172011.3512876-1-jkoolstra@xs4all.nl> References: <20260331172011.3512876-1-jkoolstra@xs4all.nl> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently there is no way to race-freely create and open a directory. For regular files we have open(O_CREAT) for creating a new file inode, and returning a pinning fd to it. The lack of such functionality for directories means that when populating a directory tree there's always a race involved: the inodes first need to be created, and then opened to adjust their permissions/ownership/labels/timestamps/acls/xattrs/..., but in the time window between the creation and the opening they might be replaced by something else. Addressing this race without proper APIs is possible (by immediately fstat()ing what was opened, to verify that it has the right inode type), but difficult to get right. Hence, mkdirat_fd() that creates a directory and returns an O_DIRECTORY fd is useful. This feature idea (and description) is taken from the UAPI group: https://github.com/uapi-group/kernel-features?tab=3Dreadme-ov-file#race-fre= e-creation-and-opening-of-non-file-inodes Signed-off-by: Jori Koolstra --- arch/x86/entry/syscalls/syscall_64.tbl | 1 + fs/internal.h | 1 + fs/namei.c | 26 ++++++++++++++++++++++++-- include/linux/fcntl.h | 2 ++ include/linux/syscalls.h | 2 ++ include/uapi/asm-generic/fcntl.h | 3 +++ include/uapi/asm-generic/unistd.h | 5 ++++- scripts/syscall.tbl | 1 + 8 files changed, 38 insertions(+), 3 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscal= ls/syscall_64.tbl index 524155d655da..dda920c26941 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -396,6 +396,7 @@ 469 common file_setattr sys_file_setattr 470 common listns sys_listns 471 common rseq_slice_yield sys_rseq_slice_yield +472 common mkdirat_fd sys_mkdirat_fd =20 # # Due to a historical design error, certain syscalls are numbered differen= tly diff --git a/fs/internal.h b/fs/internal.h index cbc384a1aa09..2885a3e4ebdd 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -58,6 +58,7 @@ int filename_unlinkat(int dfd, struct filename *name); int may_linkat(struct mnt_idmap *idmap, const struct path *link); int filename_renameat2(int olddfd, struct filename *oldname, int newdfd, struct filename *newname, unsigned int flags); +int filename_mkdirat_fd(int dfd, struct filename *name, umode_t mode, unsi= gned int flags); int filename_mkdirat(int dfd, struct filename *name, umode_t mode); int filename_mknodat(int dfd, struct filename *name, umode_t mode, unsigne= d int dev); int filename_symlinkat(struct filename *from, int newdfd, struct filename = *to); diff --git a/fs/namei.c b/fs/namei.c index 1eb9db055292..93252937983e 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -5256,6 +5256,11 @@ struct dentry *vfs_mkdir(struct mnt_idmap *idmap, st= ruct inode *dir, EXPORT_SYMBOL(vfs_mkdir); =20 int filename_mkdirat(int dfd, struct filename *name, umode_t mode) +{ + return filename_mkdirat_fd(dfd, name, mode, 0); +} + +int filename_mkdirat_fd(int dfd, struct filename *name, umode_t mode, unsi= gned int flags) { struct dentry *dentry; struct path path; @@ -5263,7 +5268,7 @@ int filename_mkdirat(int dfd, struct filename *name, = umode_t mode) unsigned int lookup_flags =3D LOOKUP_DIRECTORY; struct delegated_inode delegated_inode =3D { }; =20 -retry: +start: dentry =3D filename_create(dfd, name, &path, lookup_flags); if (IS_ERR(dentry)) return PTR_ERR(dentry); @@ -5276,7 +5281,6 @@ int filename_mkdirat(int dfd, struct filename *name, = umode_t mode) if (IS_ERR(dentry)) error =3D PTR_ERR(dentry); } - end_creating_path(&path, dentry); if (is_delegated(&delegated_inode)) { error =3D break_deleg_wait(&delegated_inode); if (!error) @@ -5286,7 +5290,25 @@ int filename_mkdirat(int dfd, struct filename *name,= umode_t mode) lookup_flags |=3D LOOKUP_REVAL; goto retry; } + + if (!error && (flags & MKDIRAT_FD_NEED_FD)) { + struct path new_path =3D { .mnt =3D path.mnt, .dentry =3D dentry }; + error =3D FD_ADD(0, dentry_open(&new_path, O_DIRECTORY, current_cred())); + } + end_creating_path(&path, dentry); return error; +retry: + end_creating_path(&path, dentry); + goto start; +} + +SYSCALL_DEFINE4(mkdirat_fd, int, dfd, const char __user *, pathname, umode= _t, mode, + unsigned int, flags) +{ + CLASS(filename, name)(pathname); + if (flags & ~VALID_MKDIRAT_FD_FLAGS) + return -EINVAL; + return filename_mkdirat_fd(dfd, name, mode, flags | MKDIRAT_FD_NEED_FD); } =20 SYSCALL_DEFINE3(mkdirat, int, dfd, const char __user *, pathname, umode_t,= mode) diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h index a332e79b3207..d2f0fdb82847 100644 --- a/include/linux/fcntl.h +++ b/include/linux/fcntl.h @@ -25,6 +25,8 @@ #define force_o_largefile() (!IS_ENABLED(CONFIG_ARCH_32BIT_OFF_T)) #endif =20 +#define VALID_MKDIRAT_FD_FLAGS (MKDIRAT_FD_NEED_FD) + #if BITS_PER_LONG =3D=3D 32 #define IS_GETLK32(cmd) ((cmd) =3D=3D F_GETLK) #define IS_SETLK32(cmd) ((cmd) =3D=3D F_SETLK) diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 02bd6ddb6278..52e7f09d5525 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -999,6 +999,8 @@ asmlinkage long sys_lsm_get_self_attr(unsigned int attr= , struct lsm_ctx __user * asmlinkage long sys_lsm_set_self_attr(unsigned int attr, struct lsm_ctx __= user *ctx, u32 size, u32 flags); asmlinkage long sys_lsm_list_modules(u64 __user *ids, u32 __user *size, u3= 2 flags); +asmlinkage long sys_mkdirat_fd(int dfd, const char __user *pathname, umode= _t mode, + unsigned int flags) =20 /* * Architecture-specific system calls diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fc= ntl.h index 613475285643..621458bf1fbf 100644 --- a/include/uapi/asm-generic/fcntl.h +++ b/include/uapi/asm-generic/fcntl.h @@ -95,6 +95,9 @@ #define O_NDELAY O_NONBLOCK #endif =20 +/* Flags for mkdirat_fd */ +#define MKDIRAT_FD_NEED_FD 0x01 + #define F_DUPFD 0 /* dup */ #define F_GETFD 1 /* get close_on_exec */ #define F_SETFD 2 /* set/clear close_on_exec */ diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/u= nistd.h index a627acc8fb5f..5bae1029f5d9 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -863,8 +863,11 @@ __SYSCALL(__NR_listns, sys_listns) #define __NR_rseq_slice_yield 471 __SYSCALL(__NR_rseq_slice_yield, sys_rseq_slice_yield) =20 +#define __NR_mkdirat_fd 472 +__SYSCALL(__NR_mkdirat_fd, sys_mkdirat_fd) + #undef __NR_syscalls -#define __NR_syscalls 472 +#define __NR_syscalls 473 =20 /* * 32 bit systems traditionally used different diff --git a/scripts/syscall.tbl b/scripts/syscall.tbl index 7a42b32b6577..db3bd97d4a1a 100644 --- a/scripts/syscall.tbl +++ b/scripts/syscall.tbl @@ -412,3 +412,4 @@ 469 common file_setattr sys_file_setattr 470 common listns sys_listns 471 common rseq_slice_yield sys_rseq_slice_yield +472 common mkdirat_fd sys_mkdirat_fd --=20 2.53.0