From nobody Sun Feb 8 12:37:50 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D08A323372C; Fri, 25 Apr 2025 08:11:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745568711; cv=none; b=VejYFBFi8WgX4xEuskSPqync5EJVOT1FJXDu7AwiMcKiMEiTuQ34RMDRemOwkMni8hT1lXCVc0Kw3J51v4Q+WV+VVkR345dCCdP6PQTwKMETP473A9p2bnMneT/OrJ2oEf3OwdLT8vSPEzBRszTeXeqoubPexspJp4ocP116cRk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745568711; c=relaxed/simple; bh=nMJ3psWToc89ewO/cJ/7prDnz/wFMrPXq0ZW1NO0b4M=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Ngxvy2O36c9QXCivN+DSprHd5A/ahtxfcAQ8U0SmIGpjaTZjrpxS6gozmLPfsr/qKtGNUaJKHKhHCFIVWWLb9SCXt6laQeITb0EjyUHjpu+rJIE7XdxhPAXhTf8zSQuZ/WfO3iIwuMO702Yv7VrUSGrRjfKLGdclFCGxpLPK7d8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=hV45fax+; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="hV45fax+" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 516A8C4CEEA; Fri, 25 Apr 2025 08:11:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1745568711; bh=nMJ3psWToc89ewO/cJ/7prDnz/wFMrPXq0ZW1NO0b4M=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=hV45fax+HKLTqmZh3rULF2KkDHyVLvBLBXQQhRlc09iJfYhXdvgDLmJ0TJE0wXFw/ jzrG0wX3FG20I75VnwriSDpj13F6eWQIPzG/35gpzGdLwW8LeOPcGZz2U3Kh6cKO+j dIJAZdXCJ5MEjqWKEqey7u4/a0xffLVWzwOSR/TbDC4Ds+UnvdNhlsISZ3736njZE9 CL3ZJBoywv+4vviYO4iUPTZboqZErwjHqJ0z45ZUp1ANm+cYB71ZyHErHa+FZDQsgn Jk0c/j/KiE2IiLPW6KjVX0qC6Chc8S9cPUdVbQ+sENTCBhfv6IU/Xwwa+rKNLHc/7Z nV1mG5gjDawDQ== From: Christian Brauner Date: Fri, 25 Apr 2025 10:11:32 +0200 Subject: [PATCH v2 3/4] pidfs: get rid of __pidfd_prepare() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250425-work-pidfs-net-v2-3-450a19461e75@kernel.org> References: <20250425-work-pidfs-net-v2-0-450a19461e75@kernel.org> In-Reply-To: <20250425-work-pidfs-net-v2-0-450a19461e75@kernel.org> To: Oleg Nesterov , Kuniyuki Iwashima , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, netdev@vger.kernel.org, David Rheinsberg , Jan Kara , Alexander Mikhalitsyn , Luca Boccassi , Lennart Poettering , Daan De Meyer , Mike Yuan , Christian Brauner X-Mailer: b4 0.15-dev-c25d1 X-Developer-Signature: v=1; a=openpgp-sha256; l=8672; i=brauner@kernel.org; h=from:subject:message-id; bh=nMJ3psWToc89ewO/cJ/7prDnz/wFMrPXq0ZW1NO0b4M=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMWRwO29NzEtx/dH5YpLiwYu803esbJhSpOP4+MaKE87CB ud25BmndJSyMIhxMciKKbI4tJuEyy3nqdhslKkBM4eVCWQIAxenAEzk1U1GhnlT85+8dIttP521 yGK9jXaNzZZn8ff7meeFabox6jKs/cTwm+U5T3TTF0WzoO3HT02fPMNm4//CnsKpXnxWj+a6Rd6 IZgMA X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 Fold it into pidfd_prepare() and rename PIDFD_CLONE to PIDFD_STALE to indicate that the passed pid might not have task linkage and no explicit check for that should be performed. Reviewed-by: Oleg Nesterov Signed-off-by: Christian Brauner --- fs/pidfs.c | 22 +++++++----- include/linux/pid.h | 2 +- include/uapi/linux/pidfd.h | 2 +- kernel/fork.c | 83 ++++++++++++++++--------------------------= ---- 4 files changed, 44 insertions(+), 65 deletions(-) diff --git a/fs/pidfs.c b/fs/pidfs.c index 308792d4b11a..0afaffd5a18a 100644 --- a/fs/pidfs.c +++ b/fs/pidfs.c @@ -768,7 +768,7 @@ static inline bool pidfs_pid_valid(struct pid *pid, con= st struct path *path, { enum pid_type type; =20 - if (flags & PIDFD_CLONE) + if (flags & PIDFD_STALE) return true; =20 /* @@ -777,10 +777,14 @@ static inline bool pidfs_pid_valid(struct pid *pid, c= onst struct path *path, * pidfd has been allocated perform another check that the pid * is still alive. If it is exit information is available even * if the task gets reaped before the pidfd is returned to - * userspace. The only exception is PIDFD_CLONE where no task - * linkage has been established for @pid yet and the kernel is - * in the middle of process creation so there's nothing for - * pidfs to miss. + * userspace. The only exception are indicated by PIDFD_STALE: + * + * (1) The kernel is in the middle of task creation and thus no + * task linkage has been established yet. + * (2) The caller knows @pid has been registered in pidfs at a + * time when the task was still alive. + * + * In both cases exit information will have been reported. */ if (flags & PIDFD_THREAD) type =3D PIDTYPE_PID; @@ -874,11 +878,11 @@ struct file *pidfs_alloc_file(struct pid *pid, unsign= ed int flags) int ret; =20 /* - * Ensure that PIDFD_CLONE can be passed as a flag without + * Ensure that PIDFD_STALE can be passed as a flag without * overloading other uapi pidfd flags. */ - BUILD_BUG_ON(PIDFD_CLONE =3D=3D PIDFD_THREAD); - BUILD_BUG_ON(PIDFD_CLONE =3D=3D PIDFD_NONBLOCK); + BUILD_BUG_ON(PIDFD_STALE =3D=3D PIDFD_THREAD); + BUILD_BUG_ON(PIDFD_STALE =3D=3D PIDFD_NONBLOCK); =20 ret =3D path_from_stashed(&pid->stashed, pidfs_mnt, get_pid(pid), &path); if (ret < 0) @@ -887,7 +891,7 @@ struct file *pidfs_alloc_file(struct pid *pid, unsigned= int flags) if (!pidfs_pid_valid(pid, &path, flags)) return ERR_PTR(-ESRCH); =20 - flags &=3D ~PIDFD_CLONE; + flags &=3D ~PIDFD_STALE; pidfd_file =3D dentry_open(&path, flags, current_cred()); /* Raise PIDFD_THREAD explicitly as do_dentry_open() strips it. */ if (!IS_ERR(pidfd_file)) diff --git a/include/linux/pid.h b/include/linux/pid.h index 311ecebd7d56..453ae6d8a68d 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -77,7 +77,7 @@ struct file; struct pid *pidfd_pid(const struct file *file); struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags); struct task_struct *pidfd_get_task(int pidfd, unsigned int *flags); -int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret); +int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret_f= ile); void do_notify_pidfd(struct task_struct *task); =20 static inline struct pid *get_pid(struct pid *pid) diff --git a/include/uapi/linux/pidfd.h b/include/uapi/linux/pidfd.h index 2970ef44655a..8c1511edd0e9 100644 --- a/include/uapi/linux/pidfd.h +++ b/include/uapi/linux/pidfd.h @@ -12,7 +12,7 @@ #define PIDFD_THREAD O_EXCL #ifdef __KERNEL__ #include -#define PIDFD_CLONE CLONE_PIDFD +#define PIDFD_STALE CLONE_PIDFD #endif =20 /* Flags for pidfd_send_signal(). */ diff --git a/kernel/fork.c b/kernel/fork.c index f7403e1fb0d4..1d95f4dae327 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2035,55 +2035,11 @@ static inline void rcu_copy_process(struct task_str= uct *p) #endif /* #ifdef CONFIG_TASKS_TRACE_RCU */ } =20 -/** - * __pidfd_prepare - allocate a new pidfd_file and reserve a pidfd - * @pid: the struct pid for which to create a pidfd - * @flags: flags of the new @pidfd - * @ret: Where to return the file for the pidfd. - * - * Allocate a new file that stashes @pid and reserve a new pidfd number in= the - * caller's file descriptor table. The pidfd is reserved but not installed= yet. - * - * The helper doesn't perform checks on @pid which makes it useful for pid= fds - * created via CLONE_PIDFD where @pid has no task attached when the pidfd = and - * pidfd file are prepared. - * - * If this function returns successfully the caller is responsible to eith= er - * call fd_install() passing the returned pidfd and pidfd file as argument= s in - * order to install the pidfd into its file descriptor table or they must = use - * put_unused_fd() and fput() on the returned pidfd and pidfd file - * respectively. - * - * This function is useful when a pidfd must already be reserved but there - * might still be points of failure afterwards and the caller wants to ens= ure - * that no pidfd is leaked into its file descriptor table. - * - * Return: On success, a reserved pidfd is returned from the function and = a new - * pidfd file is returned in the last argument to the function. On - * error, a negative error code is returned from the function and = the - * last argument remains unchanged. - */ -static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct fil= e **ret) -{ - struct file *pidfd_file; - - CLASS(get_unused_fd, pidfd)(O_CLOEXEC); - if (pidfd < 0) - return pidfd; - - pidfd_file =3D pidfs_alloc_file(pid, flags | O_RDWR); - if (IS_ERR(pidfd_file)) - return PTR_ERR(pidfd_file); - - *ret =3D pidfd_file; - return take_fd(pidfd); -} - /** * pidfd_prepare - allocate a new pidfd_file and reserve a pidfd * @pid: the struct pid for which to create a pidfd * @flags: flags of the new @pidfd - * @ret: Where to return the pidfd. + * @ret_file: return the new pidfs file * * Allocate a new file that stashes @pid and reserve a new pidfd number in= the * caller's file descriptor table. The pidfd is reserved but not installed= yet. @@ -2106,16 +2062,26 @@ static int __pidfd_prepare(struct pid *pid, unsigne= d int flags, struct file **re * error, a negative error code is returned from the function and = the * last argument remains unchanged. */ -int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret) +int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret_f= ile) { + struct file *pidfs_file; + /* - * While holding the pidfd waitqueue lock removing the task - * linkage for the thread-group leader pid (PIDTYPE_TGID) isn't - * possible. Thus, if there's still task linkage for PIDTYPE_PID - * not having thread-group leader linkage for the pid means it - * wasn't a thread-group leader in the first place. + * PIDFD_STALE is only allowed to be passed if the caller knows + * that @pid is already registered in pidfs and thus + * PIDFD_INFO_EXIT information is guaranteed to be available. */ - scoped_guard(spinlock_irq, &pid->wait_pidfd.lock) { + if (!(flags & PIDFD_STALE)) { + /* + * While holding the pidfd waitqueue lock removing the + * task linkage for the thread-group leader pid + * (PIDTYPE_TGID) isn't possible. Thus, if there's still + * task linkage for PIDTYPE_PID not having thread-group + * leader linkage for the pid means it wasn't a + * thread-group leader in the first place. + */ + guard(spinlock_irq)(&pid->wait_pidfd.lock); + /* Task has already been reaped. */ if (!pid_has_task(pid, PIDTYPE_PID)) return -ESRCH; @@ -2128,7 +2094,16 @@ int pidfd_prepare(struct pid *pid, unsigned int flag= s, struct file **ret) return -ENOENT; } =20 - return __pidfd_prepare(pid, flags, ret); + CLASS(get_unused_fd, pidfd)(O_CLOEXEC); + if (pidfd < 0) + return pidfd; + + pidfs_file =3D pidfs_alloc_file(pid, flags | O_RDWR); + if (IS_ERR(pidfs_file)) + return PTR_ERR(pidfs_file); + + *ret_file =3D pidfs_file; + return take_fd(pidfd); } =20 static void __delayed_free_task(struct rcu_head *rhp) @@ -2477,7 +2452,7 @@ __latent_entropy struct task_struct *copy_process( * Note that no task has been attached to @pid yet indicate * that via CLONE_PIDFD. */ - retval =3D __pidfd_prepare(pid, flags | PIDFD_CLONE, &pidfile); + retval =3D pidfd_prepare(pid, flags | PIDFD_STALE, &pidfile); if (retval < 0) goto bad_fork_free_pid; pidfd =3D retval; --=20 2.47.2