[v6] implement OA2_CRED_INHERIT flag for openat2()

[PATCH v6 0/3] implement OA2_CRED_INHERIT flag for openat2()

Posted by Stas Sergeev 1 year, 9 months ago

This patch-set implements the OA2_CRED_INHERIT flag for openat2() syscall.
It is needed to perform an open operation with the creds that were in
effect when the dir_fd was opened, if the dir was opened with O_CRED_ALLOW
flag. This allows the process to pre-open some dirs and switch eUID
(and other UIDs/GIDs) to the less-privileged user, while still retaining
the possibility to open/create files within the pre-opened directory set.

The sand-boxing is security-oriented: symlinks leading outside of a
sand-box are rejected. /proc magic links are rejected. fds opened with
O_CRED_ALLOW are always closed on exec() and cannot be passed via unix
socket.
The more detailed description (including security considerations)
is available in the log messages of individual patches.

Changes in v6:
- it appears open flags bit 23 is already taken on parisc, and bit 24
  is taken on alpha. Move O_CRED_ALLOW to bit 25.
- added selftests for both O_CRED_ALLOW and O_CRED_INHERIT additions

Changes in v5:
- rename OA2_INHERIT_CRED to OA2_CRED_INHERIT
- add an "opt-in" flag O_CRED_ALLOW as was suggested by many reviewers
- stop using 64bit types, as suggested by
  Christian Brauner <brauner@kernel.org>
- add BUILD_BUG_ON() for VALID_OPENAT2_FLAGS, based on Christian Brauner's
  comments
- fixed problems reported by patch-testing bot
- made O_CRED_ALLOW fds not passable via unix sockets and exec(),
  based on Christian Brauner's comments

Changes in v4:
- add optimizations suggested by David Laight <David.Laight@ACULAB.COM>
- move security checks to build_open_flags()
- force RESOLVE_NO_MAGICLINKS as suggested by Andy Lutomirski <luto@kernel.org>

Changes in v3:
- partially revert v2 changes to avoid overriding capabilities.
  Only the bare minimum is overridden: fsuid, fsgid and group_info.
  Document the fact the full cred override is unwanted, as it may
  represent an unneeded security risk.

Changes in v2:
- capture full struct cred instead of just fsuid/fsgid.
  Suggested by Stefan Metzmacher <metze@samba.org>

CC: Stefan Metzmacher <metze@samba.org>
CC: Eric Biederman <ebiederm@xmission.com>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Andy Lutomirski <luto@kernel.org>
CC: Christian Brauner <brauner@kernel.org>
CC: Jan Kara <jack@suse.cz>
CC: Jeff Layton <jlayton@kernel.org>
CC: Chuck Lever <chuck.lever@oracle.com>
CC: Alexander Aring <alex.aring@gmail.com>
CC: David Laight <David.Laight@ACULAB.COM>
CC: linux-fsdevel@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux-api@vger.kernel.org
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Christian Göttsche <cgzones@googlemail.com>

-- 
2.44.0

Re: [PATCH v6 0/3] implement OA2_CRED_INHERIT flag for openat2()

Posted by Jann Horn 1 year, 8 months ago

On Sat, Apr 27, 2024 at 1:24 PM Stas Sergeev <stsp2@yandex.ru> wrote:
> This patch-set implements the OA2_CRED_INHERIT flag for openat2() syscall.
> It is needed to perform an open operation with the creds that were in
> effect when the dir_fd was opened, if the dir was opened with O_CRED_ALLOW
> flag. This allows the process to pre-open some dirs and switch eUID
> (and other UIDs/GIDs) to the less-privileged user, while still retaining
> the possibility to open/create files within the pre-opened directory set.

As Andy Lutomirski mentioned before, Linux already has Landlock
(https://docs.kernel.org/userspace-api/landlock.html) for unprivileged
filesystem sandboxing. What benefits does OA2_CRED_INHERIT have
compared to Landlock?

Re: [PATCH v6 0/3] implement OA2_CRED_INHERIT flag for openat2()

Posted by stsp 1 year, 8 months ago

21.05.2024 22:01, Jann Horn пишет:
> On Sat, Apr 27, 2024 at 1:24 PM Stas Sergeev <stsp2@yandex.ru> wrote:
>> This patch-set implements the OA2_CRED_INHERIT flag for openat2() syscall.
>> It is needed to perform an open operation with the creds that were in
>> effect when the dir_fd was opened, if the dir was opened with O_CRED_ALLOW
>> flag. This allows the process to pre-open some dirs and switch eUID
>> (and other UIDs/GIDs) to the less-privileged user, while still retaining
>> the possibility to open/create files within the pre-opened directory set.
> As Andy Lutomirski mentioned before, Linux already has Landlock
> (https://docs.kernel.org/userspace-api/landlock.html) for unprivileged
> filesystem sandboxing. What benefits does OA2_CRED_INHERIT have
> compared to Landlock?

The idea is different.
OA2_CRED_INHERIT was supposed to give you an additional access (to what 
you can't access otherwise, after a priv drop), while landlock allows 
you to explicitly restrict an access. OA2_CRED_INHERIT more answered 
with idmapped mounts rather than the landlock, but idmapped mounts are 
not fully unpriv'd.

Re: [PATCH v6 0/3] implement OA2_CRED_INHERIT flag for openat2()

Posted by Aleksa Sarai 1 year, 9 months ago

On 2024-04-27, Stas Sergeev <stsp2@yandex.ru> wrote:
> This patch-set implements the OA2_CRED_INHERIT flag for openat2() syscall.
> It is needed to perform an open operation with the creds that were in
> effect when the dir_fd was opened, if the dir was opened with O_CRED_ALLOW
> flag. This allows the process to pre-open some dirs and switch eUID
> (and other UIDs/GIDs) to the less-privileged user, while still retaining
> the possibility to open/create files within the pre-opened directory set.
> 
> The sand-boxing is security-oriented: symlinks leading outside of a
> sand-box are rejected. /proc magic links are rejected. fds opened with
> O_CRED_ALLOW are always closed on exec() and cannot be passed via unix
> socket.
> The more detailed description (including security considerations)
> is available in the log messages of individual patches.

(I meant to reply last week but I couldn't get my mail server to send
mail...)

It seems to me that this can already be implemented using
MOUNT_ATTR_IDMAP, without creating a new form of credential overriding
within the filesystem (and with such a deceptively simple
implementation...)

If you are a privileged process which plans to change users, you can
create a detached tree with a user mapping that gives that user access
to only that tree. This is far more effective at restricting possible
attacks because id-mapped mounts don't override credentials during VFS
operations (meaning that if you miss something, you have a big problem),
instead they only affect uid-related operations within the filesystem
for that mount. Since this implementation does no inherit
CAP_DAC_OVERRIDE, being able to rewrite uid/gids is all you need.

A new attack I just thought of while writing this mail is that because
there is no RESOLVE_NO_XDEV requirement, it should be possible for the
process to get an arbitrary write primitive by creating a new
userns+mountns and then bind-mounting / underneath the directory. Since
O_CRED_INHERIT uses override_creds, it doesn't care about whether
something about the O_CRED_ALLOW directory changed afterwards. Yes, you
can "just fix this" by adding a RESOLVE_NO_XDEV requirement too, but
given that there have been 2-3 security issues with this design found
already, it makes me feel really uneasy. Using id-mapped mounts avoids
this issue because the new mount will not have the id-mapping applied
and thus there is no security issue.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

Re: [PATCH v6 0/3] implement OA2_CRED_INHERIT flag for openat2()

Posted by stsp 1 year, 9 months ago

07.05.2024 10:50, Aleksa Sarai пишет:
> If you are a privileged process which plans to change users,

Not privileged at all.
But I think what you say is still possible
with userns?


> A new attack I just thought of while writing this mail is that because
> there is no RESOLVE_NO_XDEV requirement, it should be possible for the
> process to get an arbitrary write primitive by creating a new
> userns+mountns and then bind-mounting / underneath the directory.
Doesn't this need a write perm to a
directory? In his case this is not a threat,
because you are not supposed to have a
write perm to that dir. OA2_CRED_INHERIT
is the only way to write.

Re: [PATCH v6 0/3] implement OA2_CRED_INHERIT flag for openat2()

Posted by Aleksa Sarai 1 year, 9 months ago

On 2024-05-07, stsp <stsp2@yandex.ru> wrote:
> 07.05.2024 10:50, Aleksa Sarai пишет:
> > If you are a privileged process which plans to change users,
> 
> Not privileged at all. But I think what you say is still possible with
> userns?

It is possible to configure MOUNT_ATTR_IDMAP in a user namespace but
there are some restrictions that I suspect will make this complicated.
If you try to do something with a regular filesystem you'll probably run
into issues because you won't have CAP_SYS_ADMIN in the super block's
userns. But you could probably do it with tmpfs.

> > A new attack I just thought of while writing this mail is that because
> > there is no RESOLVE_NO_XDEV requirement, it should be possible for the
> > process to get an arbitrary write primitive by creating a new
> > userns+mountns and then bind-mounting / underneath the directory.
> Doesn't this need a write perm to a
> directory? In his case this is not a threat,
> because you are not supposed to have a
> write perm to that dir. OA2_CRED_INHERIT
> is the only way to write.

No, bind-mounts don't require write permission. As long as you can
resolve the target path you can bind-mount on top of it, so if there's a
subdirectory you can bind-mount / underneath (and if there is only a
file you can bind-mount any file you want to access/overwrite instead).

There are restrictions on mounting through /proc/self/fd/... but they
don't apply here (all files opened by a process doing setns/unshare have
their vfsmounts updated to be from the new mount namespace, meaning you
can do mounts through them with /proc/self/fd/... without issue.)

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

Re: [PATCH v6 0/3] implement OA2_CRED_INHERIT flag for openat2()

Posted by stsp 1 year, 9 months ago

07.05.2024 14:58, Aleksa Sarai пишет:
> On 2024-05-07, stsp <stsp2@yandex.ru> wrote:
>> 07.05.2024 10:50, Aleksa Sarai пишет:
>>> If you are a privileged process which plans to change users,
>> Not privileged at all. But I think what you say is still possible with
>> userns?
> It is possible to configure MOUNT_ATTR_IDMAP in a user namespace but
> there are some restrictions that I suspect will make this complicated.
> If you try to do something with a regular filesystem you'll probably run
> into issues because you won't have CAP_SYS_ADMIN in the super block's
> userns. But you could probably do it with tmpfs.

Then its likely not a replacement for
my proposal, as I really don't need that
on tmpfs.
Perhaps right now I can use the helper
process and an rpc as a replacement.
This is much more work and is slower,
but more or less can approximate my
original design decision quite precisely.
Another disadvantage of an rpc approach
is that the fds I get from the helper
process, can not be trusted, as in this
case kernel doesn't guarantee the fd
actually refers to the resource I requested.
I've seen a few OSes where rpc is checked
by a trusted entity to avoid such problem.

>>> A new attack I just thought of while writing this mail is that because
>>> there is no RESOLVE_NO_XDEV requirement, it should be possible for the
>>> process to get an arbitrary write primitive by creating a new
>>> userns+mountns and then bind-mounting / underneath the directory.
>> Doesn't this need a write perm to a
>> directory? In his case this is not a threat,
>> because you are not supposed to have a
>> write perm to that dir. OA2_CRED_INHERIT
>> is the only way to write.
> No, bind-mounts don't require write permission.

Oh, isn't this a problem by itself?
Yes, in this case my patch needs to
avoid RESOLVE_NO_XDEV, but I find this a harsh restriction. Maybe the 
bind mount was done before a priv drop? Then it is fully legitimate. 
Anyway, I don't know if I should work on it or not, as there seem to be 
no indication of a possible acceptance.