fs/internal.h | 2 +- fs/namei.c | 52 +++++++++++++++++++++++++++++------- fs/open.c | 2 +- include/linux/fcntl.h | 2 ++ include/uapi/linux/openat2.h | 3 +++ 5 files changed, 50 insertions(+), 11 deletions(-)
This patch-set implements the OA2_INHERIT_CRED flag for openat2() syscall. It is needed to perform an open operation with the creds that were in effect when the dir_fd was opened. This allows the process to pre-open some dirs and switch eUID (and other UIDs/GIDs) to the less-privileged user, while still retaining the possibility to open/create files within the pre-opened directory set. Changes in v2: - capture full struct cred instead of just fsuid/fsgid. Suggested by Stefan Metzmacher <metze@samba.org> CC: Stefan Metzmacher <metze@samba.org> CC: Eric Biederman <ebiederm@xmission.com> CC: Alexander Viro <viro@zeniv.linux.org.uk> CC: Andy Lutomirski <luto@kernel.org> CC: Christian Brauner <brauner@kernel.org> CC: Jan Kara <jack@suse.cz> CC: Jeff Layton <jlayton@kernel.org> CC: Chuck Lever <chuck.lever@oracle.com> CC: Alexander Aring <alex.aring@gmail.com> CC: linux-fsdevel@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: linux-api@vger.kernel.org CC: Paolo Bonzini <pbonzini@redhat.com> CC: Christian Göttsche <cgzones@googlemail.com> Stas Sergeev (2): fs: reorganize path_openat() openat2: add OA2_INHERIT_CRED flag fs/internal.h | 2 +- fs/namei.c | 52 +++++++++++++++++++++++++++++------- fs/open.c | 2 +- include/linux/fcntl.h | 2 ++ include/uapi/linux/openat2.h | 3 +++ 5 files changed, 50 insertions(+), 11 deletions(-) -- 2.44.0
> On Apr 23, 2024, at 4:02 AM, Stas Sergeev <stsp2@yandex.ru> wrote: > > This patch-set implements the OA2_INHERIT_CRED flag for openat2() syscall. > It is needed to perform an open operation with the creds that were in > effect when the dir_fd was opened. This allows the process to pre-open > some dirs and switch eUID (and other UIDs/GIDs) to the less-privileged > user, while still retaining the possibility to open/create files within > the pre-opened directory set. I like the concept, as it’s a sort of move toward a capability system. But I think that making a dirfd into this sort of capability would need to be much more explicit. Right now, any program could do this entirely by accident, and applying OA2_INHERIT_CRED to an fd fished out of /proc seems hazardous. So perhaps if an open file description for a directory could have something like FMODE_CRED, and if OA2_INHERIT_CRED also blocked .., magic links, symlinks to anywhere above the dirfd (or maybe all symlinks) and absolute path lookups, then this would be okay. Also, there are lots of ways that f_cred could be relevant: fsuid/fsgid, effective capabilities and security labels. And it gets more complex if this ever gets extended to support connecting or sending to a socket or if someone opens a device node. Does CAP_SYS_ADMIN carry over? > > Changes in v2: > - capture full struct cred instead of just fsuid/fsgid. > Suggested by Stefan Metzmacher <metze@samba.org> > > CC: Stefan Metzmacher <metze@samba.org> > CC: Eric Biederman <ebiederm@xmission.com> > CC: Alexander Viro <viro@zeniv.linux.org.uk> > CC: Andy Lutomirski <luto@kernel.org> > CC: Christian Brauner <brauner@kernel.org> > CC: Jan Kara <jack@suse.cz> > CC: Jeff Layton <jlayton@kernel.org> > CC: Chuck Lever <chuck.lever@oracle.com> > CC: Alexander Aring <alex.aring@gmail.com> > CC: linux-fsdevel@vger.kernel.org > CC: linux-kernel@vger.kernel.org > CC: linux-api@vger.kernel.org > CC: Paolo Bonzini <pbonzini@redhat.com> > CC: Christian Göttsche <cgzones@googlemail.com> > > Stas Sergeev (2): > fs: reorganize path_openat() > openat2: add OA2_INHERIT_CRED flag > > fs/internal.h | 2 +- > fs/namei.c | 52 +++++++++++++++++++++++++++++------- > fs/open.c | 2 +- > include/linux/fcntl.h | 2 ++ > include/uapi/linux/openat2.h | 3 +++ > 5 files changed, 50 insertions(+), 11 deletions(-) > > -- > 2.44.0 > >
23.04.2024 19:44, Andy Lutomirski пишет: >> On Apr 23, 2024, at 4:02 AM, Stas Sergeev <stsp2@yandex.ru> wrote: >> >> This patch-set implements the OA2_INHERIT_CRED flag for openat2() syscall. >> It is needed to perform an open operation with the creds that were in >> effect when the dir_fd was opened. This allows the process to pre-open >> some dirs and switch eUID (and other UIDs/GIDs) to the less-privileged >> user, while still retaining the possibility to open/create files within >> the pre-opened directory set. > I like the concept, as it’s a sort of move toward a capability system. But I think that making a dirfd into this sort of capability would need to be much more explicit. Right now, any program could do this entirely by accident, and applying OA2_INHERIT_CRED to an fd fished out of /proc seems hazardous. While I still don't quite understand the threat of /proc symlinks, I posted v4 which disallows them. > So perhaps if an open file description for a directory could have something like FMODE_CRED, and if OA2_INHERIT_CRED also blocked .., magic links, symlinks to anywhere above the dirfd (or maybe all symlinks) and absolute path lookups, then this would be okay. So I think this all is now done. > Also, there are lots of ways that f_cred could be relevant: fsuid/fsgid, effective capabilities and security labels. And it gets more complex if this ever gets extended to support connecting or sending to a socket or if someone opens a device node. Does CAP_SYS_ADMIN carry over? Not any more, nothin is carried but fsuid, fsgid, groupinfo. Thank you.
On Wed, Apr 24, 2024 at 3:57 AM stsp <stsp2@yandex.ru> wrote: > > 23.04.2024 19:44, Andy Lutomirski пишет: > >> On Apr 23, 2024, at 4:02 AM, Stas Sergeev <stsp2@yandex.ru> wrote: > >> > >> This patch-set implements the OA2_INHERIT_CRED flag for openat2() syscall. > >> It is needed to perform an open operation with the creds that were in > >> effect when the dir_fd was opened. This allows the process to pre-open > >> some dirs and switch eUID (and other UIDs/GIDs) to the less-privileged > >> user, while still retaining the possibility to open/create files within > >> the pre-opened directory set. > > I like the concept, as it’s a sort of move toward a capability system. But I think that making a dirfd into this sort of capability would need to be much more explicit. Right now, any program could do this entirely by accident, and applying OA2_INHERIT_CRED to an fd fished out of /proc seems hazardous. > > While I still don't quite understand > the threat of /proc symlinks, I posted > v4 which disallows them. > I like that, but you're blocking it the wrong way. My concern is that someone does dfd = open("/proc/PID/fd/3") and then openat(dfd, ..., OA2_INHERIT_CRED); IIRC open("/proc/PID/fd/3") is extremely magical and returns the _same open file description_ (struct file) as PID's fd 3. > > So perhaps if an open file description for a directory could have something like FMODE_CRED, and if OA2_INHERIT_CRED also blocked .., magic links, symlinks to anywhere above the dirfd (or maybe all symlinks) and absolute path lookups, then this would be okay. > > So I think this all is now done. But you missed the FMODE_CRED part! So here's the problem: right now, in current Linux, a dirfd pointing to a directory that you can open anyway doesn't convey any new powers. So, if I'm a regular program, and I do open("/etc", O_PATH), I get an fd. And if I get an fd pointing at /etc from somewhere else, I get the same thing (possibly with different f_cred, but f_cred is largely a hack to restrict things that would otherwise be insecure because they were designed a bit wrong from the beginning). But, with your patch, these fds suddenly convey a very strong privilege: that of their f_cred *over the entire subtree to which they refer*. And you can attack it using exactly your intended use case: if any program opens a dirfd and then drops privileges, well, oops, it didn't actually fully drop privilege. So I think that, if this whole concept has any chance of working well, it needs to be opt-in *at the time of the original open*. So a privilege-carrying open would be an entirely new option like O_CAPTURE_CREDS or FMODE_CREDS. And OA2_INHERIT_CREDS is rejected if the dirfd doesn't have that special mode.
25.04.2024 03:43, Andy Lutomirski пишет: > But you missed the FMODE_CRED part! OK, I thought its not needed if fd is limited to the one created by the same process. But your explanation is quite clear on that its needed anyway, or otherwise the unsuspecting process doesn't fully drop his privs. Thank you for explaining that bit. Which leaves just one question: is such an opt-in enough or not? Viro points it may not be enough, but doesn't explain why exactly. Maybe we need such an opt-in, and it should be dropped on exec() and on passing via unix fd? I don't know what additional restrictions are needed, as Viro didn't clarify that part, but the opt-in is needed for sure.
On Wed, Apr 24, 2024 at 05:43:02PM -0700, Andy Lutomirski wrote: > I like that, but you're blocking it the wrong way. My concern is that > someone does dfd = open("/proc/PID/fd/3") and then openat(dfd, ..., > OA2_INHERIT_CRED); IIRC open("/proc/PID/fd/3") is extremely magical > and returns the _same open file description_ (struct file) as PID's fd > 3. No, it doesn't. We could implement that, but if we do that'll be *not* a part of procfs and it's going to be limited to current task only. There are two different variants of /dev/fd/* semantics - one is "opening /dev/fd/42 is an equivalent of dup(42)", another is "opening /dev/fd/42 is an equivalent of opening the same fs object that is currently accessed via descriptor 42". Linux is doing the latter, and we can't switch - that would break a lot of userland software, including a lot of scripts. I'm not saying I like the series, but this particular objection is bogus - open via procfs symlinks is *not* an equivalent of dup() and that is not going to change.
On Wed, Apr 24, 2024 at 6:44 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Wed, Apr 24, 2024 at 05:43:02PM -0700, Andy Lutomirski wrote: > > > I like that, but you're blocking it the wrong way. My concern is that > > someone does dfd = open("/proc/PID/fd/3") and then openat(dfd, ..., > > OA2_INHERIT_CRED); IIRC open("/proc/PID/fd/3") is extremely magical > > and returns the _same open file description_ (struct file) as PID's fd > > 3. > > No, it doesn't. We could implement that, but if we do that'll be > *not* a part of procfs and it's going to be limited to current task > only. Egads -- why would we want to implement that? In the apparently incorrect model in my head, Linux's behavior was ridiculous and only made sense for some historical reason. But I wonder why I thought that. Diving a tiny bit down the rabbit hole, I have a copy of TLPI on my bookshelf, and I bought it quite a long time ago and read a bunch of it when I got it, and my copy is *wrong*! Section 5.11 has the behavior of /dev/fd very clearly documented as working like dup(). And here it is: erratum 107. Whoopsies! https://man7.org/tlpi/errata/index.html Anyway, I retract that particular objection to the series. But I wouldn't be shocked if one can break a normal modern systemd using the patchset -- systemd does all kinds of fun things involving passing file descriptors around. --Andy
23.04.2024 19:44, Andy Lutomirski пишет: > Also, there are lots of ways that f_cred could be relevant: fsuid/fsgid, effective capabilities and security labels. And it gets more complex if this ever gets extended to support connecting or sending to a socket or if someone opens a device node. Does CAP_SYS_ADMIN carry over? I posted a v3 where I only override fsuid, fsgid and group_info. Capabilities and whatever else are not overridden to avoid security risks. Does this address your concern? Note that I think your other concerns are already addressed, I just added a bit more of a description now.
23.04.2024 19:44, Andy Lutomirski пишет: >> On Apr 23, 2024, at 4:02 AM, Stas Sergeev <stsp2@yandex.ru> wrote: >> >> This patch-set implements the OA2_INHERIT_CRED flag for openat2() syscall. >> It is needed to perform an open operation with the creds that were in >> effect when the dir_fd was opened. This allows the process to pre-open >> some dirs and switch eUID (and other UIDs/GIDs) to the less-privileged >> user, while still retaining the possibility to open/create files within >> the pre-opened directory set. > I like the concept, as it’s a sort of move toward a capability system. But I think that making a dirfd into this sort of capability would need to be much more explicit. Right now, any program could do this entirely by accident, and applying OA2_INHERIT_CRED to an fd fished out of /proc seems hazardous. Could you please clarify this a bit? I think if you open someone else's fd via /proc, then your current creds would be stored in a struct file, rather than the creds of the process from which you open. I don't think creds can be stolen that way, as I suppose opening via proc is similar to open via some symlink. Or is there some magic going on so that the process's creds can actually be leaked this way? > So perhaps if an open file description for a directory could have something like FMODE_CRED, and if OA2_INHERIT_CRED also blocked .., magic links, symlinks to anywhere above the dirfd (or maybe all symlinks) and absolute path lookups, then this would be okay. This is already there. My fault is that I put a short description in a cover letter and a long description in a patch itself. I should probably swap them, as you only read the cover letter. My patch takes care about possible escape scenarios by working only with restricted lookup modes: RESOLVE_BENEATH, RESOLVE_IN_ROOT. I made sure that symlinks leading outside of a sandbox, are rejected. Also note that my patch requires O_CLOEXEC on a dir_fd to avoid the cred leakage over exec syscall. > Also, there are lots of ways that f_cred could be relevant: fsuid/fsgid, effective capabilities and security labels. And it gets more complex if this ever gets extended to support connecting or sending to a socket or if someone opens a device node. Does CAP_SYS_ADMIN carry over? Hmm, I don't know. The v1 version of a patch only changed fsuid/fsgid. Stefan Metzmacher suggested to use the full creds instead, but I haven't considered the implications of that change just yet. So if using the full creds is too much because it can carry things like CAP_SYS_ADMIN, then should I get back to v1 and only change fsuid/fsgid?
© 2016 - 2024 Red Hat, Inc.