kernel/pid.c | 2 +- .../selftests/pidfd/pidfd_getfd_test.c | 31 ++++++++++++++++++- 2 files changed, 31 insertions(+), 2 deletions(-)
From: Tycho Andersen <tandersen@netflix.com>
We can get EBADF from __pidfd_fget() if a task is currently exiting, which
might be confusing. Let's check PF_EXITING, and just report ESRCH if so.
I chose PF_EXITING, because it is set in exit_signals(), which is called
before exit_files(). Since ->exit_status is mostly set after exit_files()
in exit_notify(), using that still leaves a window open for the race.
Signed-off-by: Tycho Andersen <tandersen@netflix.com>
---
kernel/pid.c | 2 +-
.../selftests/pidfd/pidfd_getfd_test.c | 31 ++++++++++++++++++-
2 files changed, 31 insertions(+), 2 deletions(-)
diff --git a/kernel/pid.c b/kernel/pid.c
index de0bf2f8d18b..db8731f0ee45 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -688,7 +688,7 @@ static int pidfd_getfd(struct pid *pid, int fd)
int ret;
task = get_pid_task(pid, PIDTYPE_PID);
- if (!task)
+ if (!task || task->flags & PF_EXITING)
return -ESRCH;
file = __pidfd_fget(task, fd);
diff --git a/tools/testing/selftests/pidfd/pidfd_getfd_test.c b/tools/testing/selftests/pidfd/pidfd_getfd_test.c
index 0930e2411dfb..cd51d547b751 100644
--- a/tools/testing/selftests/pidfd/pidfd_getfd_test.c
+++ b/tools/testing/selftests/pidfd/pidfd_getfd_test.c
@@ -5,6 +5,7 @@
#include <fcntl.h>
#include <limits.h>
#include <linux/types.h>
+#include <poll.h>
#include <sched.h>
#include <signal.h>
#include <stdio.h>
@@ -129,6 +130,7 @@ FIXTURE(child)
* When it is closed, the child will exit.
*/
int sk;
+ bool ignore_child_result;
};
FIXTURE_SETUP(child)
@@ -165,10 +167,14 @@ FIXTURE_SETUP(child)
FIXTURE_TEARDOWN(child)
{
+ int ret;
+
EXPECT_EQ(0, close(self->pidfd));
EXPECT_EQ(0, close(self->sk));
- EXPECT_EQ(0, wait_for_pid(self->pid));
+ ret = wait_for_pid(self->pid);
+ if (!self->ignore_child_result)
+ EXPECT_EQ(0, ret);
}
TEST_F(child, disable_ptrace)
@@ -235,6 +241,29 @@ TEST(flags_set)
EXPECT_EQ(errno, EINVAL);
}
+TEST_F(child, no_strange_EBADF)
+{
+ struct pollfd fds;
+
+ self->ignore_child_result = true;
+
+ fds.fd = self->pidfd;
+ fds.events = POLLIN;
+
+ ASSERT_EQ(kill(self->pid, SIGKILL), 0);
+ ASSERT_EQ(poll(&fds, 1, 5000), 1);
+
+ /*
+ * It used to be that pidfd_getfd() could race with the exiting thread
+ * between exit_files() and release_task(), and get a non-null task
+ * with a NULL files struct, and you'd get EBADF, which was slightly
+ * confusing.
+ */
+ errno = 0;
+ EXPECT_EQ(sys_pidfd_getfd(self->pidfd, self->remote_fd, 0), -1);
+ EXPECT_EQ(errno, ESRCH);
+}
+
#if __NR_pidfd_getfd == -1
int main(void)
{
base-commit: 082d11c164aef02e51bcd9c7cbf1554a8e42d9b5
--
2.34.1
On 02/06, Tycho Andersen wrote: > > From: Tycho Andersen <tandersen@netflix.com> > > We can get EBADF from __pidfd_fget() if a task is currently exiting, which > might be confusing. agreed, because EBADF looks as if the "fd" argument was wrong, > Let's check PF_EXITING, and just report ESRCH if so. agreed, we can pretend that the task has already exited, But: > --- a/kernel/pid.c > +++ b/kernel/pid.c > @@ -688,7 +688,7 @@ static int pidfd_getfd(struct pid *pid, int fd) > int ret; > > task = get_pid_task(pid, PIDTYPE_PID); > - if (!task) > + if (!task || task->flags & PF_EXITING) > return -ESRCH; This looks racy. Suppose that pidfd_getfd() races with the exiting task. It is possible that this task sets PF_EXITING and does exit_files() after the "task->flags & PF_EXITING" check above and before pidfd_getfd() does __pidfd_fget(), in this case pidfd_getfd() still returns the same EBADF we want to avoid. Perhaps we can change pidfd_getfd() to do if (IS_ERR(file)) return (task->flags & PF_EXITING) ? -ESRCH : PTR_ERR(file); instead? This needs a comment to explain the PF_EXITING check. And perhaps another comment to explain that we can't miss PF_EXITING if the target task has already passed exit_files, both exit_files() and fget_task() take the same task_lock(task). What do you think? Oleg.
Sorry for noise, forgot to mention... On 02/06, Oleg Nesterov wrote: > > On 02/06, Tycho Andersen wrote: > > > > From: Tycho Andersen <tandersen@netflix.com> > > > > We can get EBADF from __pidfd_fget() if a task is currently exiting, which > > might be confusing. > > agreed, because EBADF looks as if the "fd" argument was wrong, > > > Let's check PF_EXITING, and just report ESRCH if so. > > agreed, we can pretend that the task has already exited, > > But: > > > --- a/kernel/pid.c > > +++ b/kernel/pid.c > > @@ -688,7 +688,7 @@ static int pidfd_getfd(struct pid *pid, int fd) > > int ret; > > > > task = get_pid_task(pid, PIDTYPE_PID); > > - if (!task) > > + if (!task || task->flags & PF_EXITING) > > return -ESRCH; > > This looks racy. Suppose that pidfd_getfd() races with the exiting task. > > It is possible that this task sets PF_EXITING and does exit_files() > after the "task->flags & PF_EXITING" check above and before pidfd_getfd() > does __pidfd_fget(), in this case pidfd_getfd() still returns the same > EBADF we want to avoid. > > Perhaps we can change pidfd_getfd() to do > > if (IS_ERR(file)) > return (task->flags & PF_EXITING) ? -ESRCH : PTR_ERR(file); Or we can check task->files != NULL rather than PF_EXITING. To me this looks even better, but looks more confusing without a comment. OTOH, imo this needs a comment anyway ;) > > instead? > > This needs a comment to explain the PF_EXITING check. And perhaps another > comment to explain that we can't miss PF_EXITING if the target task has > already passed exit_files, both exit_files() and fget_task() take the same > task_lock(task). > > What do you think? > > Oleg.
On Tue, Feb 06, 2024 at 07:06:07PM +0100, Oleg Nesterov wrote: > Or we can check task->files != NULL rather than PF_EXITING. > > To me this looks even better, but looks more confusing without a comment. > OTOH, imo this needs a comment anyway ;) I thought about this, but I didn't really understand the null check in exit_files(); if it can really be called more than once, are there other cases where task->files == NULL that we really should report EBADF? Tycho
On 02/06, Tycho Andersen wrote: > On Tue, Feb 06, 2024 at 07:06:07PM +0100, Oleg Nesterov wrote: > > Or we can check task->files != NULL rather than PF_EXITING. > > > > To me this looks even better, but looks more confusing without a comment. > > OTOH, imo this needs a comment anyway ;) > > I thought about this, but I didn't really understand the null check in > exit_files(); I guess task->files can be NULL at least if it was cloned with kernel_clone_args->no_files == T > if it can really be called more than once, I don't think this is possible. Well, unless the exiting task hits a BUG() after exit_files() and calls do_exit() recursively. > are there > other cases where task->files == NULL that we really should report > EBADF? I don't think so... If nothing else, sys_close() dereferences current->files without any checks, so I think task->files == NULL is simply impossible if this task is a userspace process/thread until it exits. But Tycho, I won't insist. If you prefer to check PF_EXITING, I am fine. Oleg.
On Tue, Feb 06, 2024 at 08:25:54PM +0100, Oleg Nesterov wrote: > On 02/06, Tycho Andersen wrote: > > > On Tue, Feb 06, 2024 at 07:06:07PM +0100, Oleg Nesterov wrote: > > > Or we can check task->files != NULL rather than PF_EXITING. > > > > > > To me this looks even better, but looks more confusing without a comment. > > > OTOH, imo this needs a comment anyway ;) > > > > I thought about this, but I didn't really understand the null check in > > exit_files(); > > I guess task->files can be NULL at least if it was cloned with > kernel_clone_args->no_files == T Won't this give false positives for vhost workers which do set ->no_files but are user workers? IOW, return -ESRCH even though -EBADF would be correct in this scenario?
On 02/07, Christian Brauner wrote: > > On Tue, Feb 06, 2024 at 08:25:54PM +0100, Oleg Nesterov wrote: > > On 02/06, Tycho Andersen wrote: > > > > > On Tue, Feb 06, 2024 at 07:06:07PM +0100, Oleg Nesterov wrote: > > > > Or we can check task->files != NULL rather than PF_EXITING. > > > > > > > > To me this looks even better, but looks more confusing without a comment. > > > > OTOH, imo this needs a comment anyway ;) > > > > > > I thought about this, but I didn't really understand the null check in > > > exit_files(); > > > > I guess task->files can be NULL at least if it was cloned with > > kernel_clone_args->no_files == T > > Won't this give false positives for vhost workers which do set > ->no_files but are user workers? IOW, return -ESRCH even though -EBADF > would be correct in this scenario? OK, agreed. Lets check PF_EXITING or exit_state. Oleg.
On Tue, Feb 06, 2024 at 08:25:54PM +0100, Oleg Nesterov wrote: > But Tycho, I won't insist. If you prefer to check PF_EXITING, I am fine. Looks like we raced, I sent a v2 with PF_EXITING, mostly because I didn't want to run into weird things I didn't understand. I'm happy to fix it up to check ->files if that's what you prefer Christian? Tycho
On Tue, Feb 06, 2024 at 06:37:22PM +0100, Oleg Nesterov wrote: > > --- a/kernel/pid.c > > +++ b/kernel/pid.c > > @@ -688,7 +688,7 @@ static int pidfd_getfd(struct pid *pid, int fd) > > int ret; > > > > task = get_pid_task(pid, PIDTYPE_PID); > > - if (!task) > > + if (!task || task->flags & PF_EXITING) > > return -ESRCH; > > This looks racy. Suppose that pidfd_getfd() races with the exiting task. > > It is possible that this task sets PF_EXITING and does exit_files() > after the "task->flags & PF_EXITING" check above and before pidfd_getfd() > does __pidfd_fget(), in this case pidfd_getfd() still returns the same > EBADF we want to avoid. > > Perhaps we can change pidfd_getfd() to do > > if (IS_ERR(file)) > return (task->flags & PF_EXITING) ? -ESRCH : PTR_ERR(file); > > instead? > > This needs a comment to explain the PF_EXITING check. And perhaps another > comment to explain that we can't miss PF_EXITING if the target task has > already passed exit_files, both exit_files() and fget_task() take the same > task_lock(task). > > What do you think? Yes, you're absolutely right. Let me resend. Tycho
© 2016 - 2026 Red Hat, Inc.