This patch-set implements the OA2_CRED_INHERIT flag for openat2() syscall. It is needed to perform an open operation with the creds that were in effect when the dir_fd was opened, if the dir was opened with O_CRED_ALLOW flag. This allows the process to pre-open some dirs and switch eUID (and other UIDs/GIDs) to the less-privileged user, while still retaining the possibility to open/create files within the pre-opened directory set. The sand-boxing is security-oriented: symlinks leading outside of a sand-box are rejected. /proc magic links are rejected. fds opened with O_CRED_ALLOW are always closed on exec() and cannot be passed via unix socket. The more detailed description (including security considerations) is available in the log messages of individual patches. Changes in v6: - it appears open flags bit 23 is already taken on parisc, and bit 24 is taken on alpha. Move O_CRED_ALLOW to bit 25. - added selftests for both O_CRED_ALLOW and O_CRED_INHERIT additions Changes in v5: - rename OA2_INHERIT_CRED to OA2_CRED_INHERIT - add an "opt-in" flag O_CRED_ALLOW as was suggested by many reviewers - stop using 64bit types, as suggested by Christian Brauner <brauner@kernel.org> - add BUILD_BUG_ON() for VALID_OPENAT2_FLAGS, based on Christian Brauner's comments - fixed problems reported by patch-testing bot - made O_CRED_ALLOW fds not passable via unix sockets and exec(), based on Christian Brauner's comments Changes in v4: - add optimizations suggested by David Laight <David.Laight@ACULAB.COM> - move security checks to build_open_flags() - force RESOLVE_NO_MAGICLINKS as suggested by Andy Lutomirski <luto@kernel.org> Changes in v3: - partially revert v2 changes to avoid overriding capabilities. Only the bare minimum is overridden: fsuid, fsgid and group_info. Document the fact the full cred override is unwanted, as it may represent an unneeded security risk. Changes in v2: - capture full struct cred instead of just fsuid/fsgid. Suggested by Stefan Metzmacher <metze@samba.org> CC: Stefan Metzmacher <metze@samba.org> CC: Eric Biederman <ebiederm@xmission.com> CC: Alexander Viro <viro@zeniv.linux.org.uk> CC: Andy Lutomirski <luto@kernel.org> CC: Christian Brauner <brauner@kernel.org> CC: Jan Kara <jack@suse.cz> CC: Jeff Layton <jlayton@kernel.org> CC: Chuck Lever <chuck.lever@oracle.com> CC: Alexander Aring <alex.aring@gmail.com> CC: David Laight <David.Laight@ACULAB.COM> CC: linux-fsdevel@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: linux-api@vger.kernel.org CC: Paolo Bonzini <pbonzini@redhat.com> CC: Christian Göttsche <cgzones@googlemail.com> -- 2.44.0
On Sat, Apr 27, 2024 at 1:24 PM Stas Sergeev <stsp2@yandex.ru> wrote: > This patch-set implements the OA2_CRED_INHERIT flag for openat2() syscall. > It is needed to perform an open operation with the creds that were in > effect when the dir_fd was opened, if the dir was opened with O_CRED_ALLOW > flag. This allows the process to pre-open some dirs and switch eUID > (and other UIDs/GIDs) to the less-privileged user, while still retaining > the possibility to open/create files within the pre-opened directory set. As Andy Lutomirski mentioned before, Linux already has Landlock (https://docs.kernel.org/userspace-api/landlock.html) for unprivileged filesystem sandboxing. What benefits does OA2_CRED_INHERIT have compared to Landlock?
21.05.2024 22:01, Jann Horn пишет: > On Sat, Apr 27, 2024 at 1:24 PM Stas Sergeev <stsp2@yandex.ru> wrote: >> This patch-set implements the OA2_CRED_INHERIT flag for openat2() syscall. >> It is needed to perform an open operation with the creds that were in >> effect when the dir_fd was opened, if the dir was opened with O_CRED_ALLOW >> flag. This allows the process to pre-open some dirs and switch eUID >> (and other UIDs/GIDs) to the less-privileged user, while still retaining >> the possibility to open/create files within the pre-opened directory set. > As Andy Lutomirski mentioned before, Linux already has Landlock > (https://docs.kernel.org/userspace-api/landlock.html) for unprivileged > filesystem sandboxing. What benefits does OA2_CRED_INHERIT have > compared to Landlock? The idea is different. OA2_CRED_INHERIT was supposed to give you an additional access (to what you can't access otherwise, after a priv drop), while landlock allows you to explicitly restrict an access. OA2_CRED_INHERIT more answered with idmapped mounts rather than the landlock, but idmapped mounts are not fully unpriv'd.
On 2024-04-27, Stas Sergeev <stsp2@yandex.ru> wrote: > This patch-set implements the OA2_CRED_INHERIT flag for openat2() syscall. > It is needed to perform an open operation with the creds that were in > effect when the dir_fd was opened, if the dir was opened with O_CRED_ALLOW > flag. This allows the process to pre-open some dirs and switch eUID > (and other UIDs/GIDs) to the less-privileged user, while still retaining > the possibility to open/create files within the pre-opened directory set. > > The sand-boxing is security-oriented: symlinks leading outside of a > sand-box are rejected. /proc magic links are rejected. fds opened with > O_CRED_ALLOW are always closed on exec() and cannot be passed via unix > socket. > The more detailed description (including security considerations) > is available in the log messages of individual patches. (I meant to reply last week but I couldn't get my mail server to send mail...) It seems to me that this can already be implemented using MOUNT_ATTR_IDMAP, without creating a new form of credential overriding within the filesystem (and with such a deceptively simple implementation...) If you are a privileged process which plans to change users, you can create a detached tree with a user mapping that gives that user access to only that tree. This is far more effective at restricting possible attacks because id-mapped mounts don't override credentials during VFS operations (meaning that if you miss something, you have a big problem), instead they only affect uid-related operations within the filesystem for that mount. Since this implementation does no inherit CAP_DAC_OVERRIDE, being able to rewrite uid/gids is all you need. A new attack I just thought of while writing this mail is that because there is no RESOLVE_NO_XDEV requirement, it should be possible for the process to get an arbitrary write primitive by creating a new userns+mountns and then bind-mounting / underneath the directory. Since O_CRED_INHERIT uses override_creds, it doesn't care about whether something about the O_CRED_ALLOW directory changed afterwards. Yes, you can "just fix this" by adding a RESOLVE_NO_XDEV requirement too, but given that there have been 2-3 security issues with this design found already, it makes me feel really uneasy. Using id-mapped mounts avoids this issue because the new mount will not have the id-mapping applied and thus there is no security issue. -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH <https://www.cyphar.com/>
07.05.2024 10:50, Aleksa Sarai пишет: > If you are a privileged process which plans to change users, Not privileged at all. But I think what you say is still possible with userns? > A new attack I just thought of while writing this mail is that because > there is no RESOLVE_NO_XDEV requirement, it should be possible for the > process to get an arbitrary write primitive by creating a new > userns+mountns and then bind-mounting / underneath the directory. Doesn't this need a write perm to a directory? In his case this is not a threat, because you are not supposed to have a write perm to that dir. OA2_CRED_INHERIT is the only way to write.
On 2024-05-07, stsp <stsp2@yandex.ru> wrote: > 07.05.2024 10:50, Aleksa Sarai пишет: > > If you are a privileged process which plans to change users, > > Not privileged at all. But I think what you say is still possible with > userns? It is possible to configure MOUNT_ATTR_IDMAP in a user namespace but there are some restrictions that I suspect will make this complicated. If you try to do something with a regular filesystem you'll probably run into issues because you won't have CAP_SYS_ADMIN in the super block's userns. But you could probably do it with tmpfs. > > A new attack I just thought of while writing this mail is that because > > there is no RESOLVE_NO_XDEV requirement, it should be possible for the > > process to get an arbitrary write primitive by creating a new > > userns+mountns and then bind-mounting / underneath the directory. > Doesn't this need a write perm to a > directory? In his case this is not a threat, > because you are not supposed to have a > write perm to that dir. OA2_CRED_INHERIT > is the only way to write. No, bind-mounts don't require write permission. As long as you can resolve the target path you can bind-mount on top of it, so if there's a subdirectory you can bind-mount / underneath (and if there is only a file you can bind-mount any file you want to access/overwrite instead). There are restrictions on mounting through /proc/self/fd/... but they don't apply here (all files opened by a process doing setns/unshare have their vfsmounts updated to be from the new mount namespace, meaning you can do mounts through them with /proc/self/fd/... without issue.) -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH <https://www.cyphar.com/>
07.05.2024 14:58, Aleksa Sarai пишет: > On 2024-05-07, stsp <stsp2@yandex.ru> wrote: >> 07.05.2024 10:50, Aleksa Sarai пишет: >>> If you are a privileged process which plans to change users, >> Not privileged at all. But I think what you say is still possible with >> userns? > It is possible to configure MOUNT_ATTR_IDMAP in a user namespace but > there are some restrictions that I suspect will make this complicated. > If you try to do something with a regular filesystem you'll probably run > into issues because you won't have CAP_SYS_ADMIN in the super block's > userns. But you could probably do it with tmpfs. Then its likely not a replacement for my proposal, as I really don't need that on tmpfs. Perhaps right now I can use the helper process and an rpc as a replacement. This is much more work and is slower, but more or less can approximate my original design decision quite precisely. Another disadvantage of an rpc approach is that the fds I get from the helper process, can not be trusted, as in this case kernel doesn't guarantee the fd actually refers to the resource I requested. I've seen a few OSes where rpc is checked by a trusted entity to avoid such problem. >>> A new attack I just thought of while writing this mail is that because >>> there is no RESOLVE_NO_XDEV requirement, it should be possible for the >>> process to get an arbitrary write primitive by creating a new >>> userns+mountns and then bind-mounting / underneath the directory. >> Doesn't this need a write perm to a >> directory? In his case this is not a threat, >> because you are not supposed to have a >> write perm to that dir. OA2_CRED_INHERIT >> is the only way to write. > No, bind-mounts don't require write permission. Oh, isn't this a problem by itself? Yes, in this case my patch needs to avoid RESOLVE_NO_XDEV, but I find this a harsh restriction. Maybe the bind mount was done before a priv drop? Then it is fully legitimate. Anyway, I don't know if I should work on it or not, as there seem to be no indication of a possible acceptance.
© 2016 - 2025 Red Hat, Inc.