[v1] Introduce user namespace capabilities

[PATCH 0/3] Introduce user namespace capabilities

Posted by Jonathan Calmels 1 year, 9 months ago

It's that time of the year again where we debate security settings for user
namespaces ;)

I’ve been experimenting with different approaches to address the gripe
around user namespaces being used as attack vectors.
After invaluable feedback from Serge and Christian offline, this is what I
came up with.

There are obviously a lot of things we could do differently but I feel this
is the right balance between functionality, simplicity and security. This
also serves as a good foundation and could always be extended if the need
arises in the future.

Notes:

- Adding a new capability set is far from ideal, but trying to reuse the
  existing capability framework was deemed both impractical and
  questionable security-wise, so here we are.

- We might want to add new capabilities for some of the checks instead of
  reusing CAP_SETPCAP every time. Serge mentioned something like
  CAP_SYS_LIMIT?

- In the last patch, we could decide to have stronger requirements and
  perform checks inside cap_capable() in case we want to retroactively
  prevent capabilities in old namespaces, this might be an overreach though
  so I left it out.

  I'm also not fond of the ulong logic for setting the sysctl parameter, on
  the other hand, the usermodhelper code always uses two u32s which makes it
  very confusing to set in userspace.


Jonathan Calmels (3):
  capabilities: user namespace capabilities
  capabilities: add securebit for strict userns caps
  capabilities: add cap userns sysctl mask

 fs/proc/array.c                 |  9 ++++
 include/linux/cred.h            |  3 ++
 include/linux/securebits.h      |  1 +
 include/linux/user_namespace.h  |  7 +++
 include/uapi/linux/prctl.h      |  7 +++
 include/uapi/linux/securebits.h | 11 ++++-
 kernel/cred.c                   |  3 ++
 kernel/sysctl.c                 | 10 ++++
 kernel/umh.c                    | 16 +++++++
 kernel/user_namespace.c         | 83 ++++++++++++++++++++++++++++++---
 security/commoncap.c            | 59 +++++++++++++++++++++++
 security/keys/process_keys.c    |  3 ++
 12 files changed, 204 insertions(+), 8 deletions(-)

-- 
2.45.0

Re: [PATCH 0/3] Introduce user namespace capabilities

Posted by Casey Schaufler 1 year, 9 months ago

On 5/16/2024 2:22 AM, Jonathan Calmels wrote:
> It's that time of the year again where we debate security settings for user
> namespaces ;)
>
> I’ve been experimenting with different approaches to address the gripe
> around user namespaces being used as attack vectors.
> After invaluable feedback from Serge and Christian offline, this is what I
> came up with.
>
> There are obviously a lot of things we could do differently but I feel this
> is the right balance between functionality, simplicity and security. This
> also serves as a good foundation and could always be extended if the need
> arises in the future.
>
> Notes:
>
> - Adding a new capability set is far from ideal, but trying to reuse the
>   existing capability framework was deemed both impractical and
>   questionable security-wise, so here we are.

I suggest that adding a capability set for user namespaces is a bad idea:
	- It is in no way obvious what problem it solves
	- It is not obvious how it solves any problem
	- The capability mechanism has not been popular, and relying on a
	  community (e.g. container developers) to embrace it based on this
	  enhancement is a recipe for failure
	- Capabilities are already more complicated than modern developers
	  want to deal with. Adding another, special purpose set, is going
	  to make them even more difficult to use.

> - We might want to add new capabilities for some of the checks instead of
>   reusing CAP_SETPCAP every time. Serge mentioned something like
>   CAP_SYS_LIMIT?
>
> - In the last patch, we could decide to have stronger requirements and
>   perform checks inside cap_capable() in case we want to retroactively
>   prevent capabilities in old namespaces, this might be an overreach though
>   so I left it out.
>
>   I'm also not fond of the ulong logic for setting the sysctl parameter, on
>   the other hand, the usermodhelper code always uses two u32s which makes it
>   very confusing to set in userspace.
>
>
> Jonathan Calmels (3):
>   capabilities: user namespace capabilities
>   capabilities: add securebit for strict userns caps
>   capabilities: add cap userns sysctl mask
>
>  fs/proc/array.c                 |  9 ++++
>  include/linux/cred.h            |  3 ++
>  include/linux/securebits.h      |  1 +
>  include/linux/user_namespace.h  |  7 +++
>  include/uapi/linux/prctl.h      |  7 +++
>  include/uapi/linux/securebits.h | 11 ++++-
>  kernel/cred.c                   |  3 ++
>  kernel/sysctl.c                 | 10 ++++
>  kernel/umh.c                    | 16 +++++++
>  kernel/user_namespace.c         | 83 ++++++++++++++++++++++++++++++---
>  security/commoncap.c            | 59 +++++++++++++++++++++++
>  security/keys/process_keys.c    |  3 ++
>  12 files changed, 204 insertions(+), 8 deletions(-)
>

Re: [PATCH 0/3] Introduce user namespace capabilities

Posted by Jarkko Sakkinen 1 year, 9 months ago

On Thu May 16, 2024 at 10:07 PM EEST, Casey Schaufler wrote:
> I suggest that adding a capability set for user namespaces is a bad idea:
> 	- It is in no way obvious what problem it solves
> 	- It is not obvious how it solves any problem
> 	- The capability mechanism has not been popular, and relying on a
> 	  community (e.g. container developers) to embrace it based on this
> 	  enhancement is a recipe for failure
> 	- Capabilities are already more complicated than modern developers
> 	  want to deal with. Adding another, special purpose set, is going
> 	  to make them even more difficult to use.

What Inh, Prm, Eff, Bnd and Amb is not dead obvious to you? ;-)
One UNs cannot hurt...

I'm not following containers that much but didn't seccomp profiles
supposed to be the silver bullet?

BR, Jarkko

Re: [PATCH 0/3] Introduce user namespace capabilities

Posted by Jarkko Sakkinen 1 year, 9 months ago

On Thu May 16, 2024 at 10:29 PM EEST, Jarkko Sakkinen wrote:
> On Thu May 16, 2024 at 10:07 PM EEST, Casey Schaufler wrote:
> > I suggest that adding a capability set for user namespaces is a bad idea:
> > 	- It is in no way obvious what problem it solves
> > 	- It is not obvious how it solves any problem
> > 	- The capability mechanism has not been popular, and relying on a
> > 	  community (e.g. container developers) to embrace it based on this
> > 	  enhancement is a recipe for failure
> > 	- Capabilities are already more complicated than modern developers
> > 	  want to deal with. Adding another, special purpose set, is going
> > 	  to make them even more difficult to use.
>
> What Inh, Prm, Eff, Bnd and Amb is not dead obvious to you? ;-)
> One UNs cannot hurt...
>
> I'm not following containers that much but didn't seccomp profiles
> supposed to be the silver bullet?

Also, I think Kata Containers style way of doing containers is pretty
solid. I've heard that some video streaming service at least in recent
past did launch VM per stream so it's not like VM's cannot be made to
scale I guess.

BR, Jarkko

Re: [PATCH 0/3] Introduce user namespace capabilities

Posted by Jarkko Sakkinen 1 year, 9 months ago

On Thu May 16, 2024 at 10:31 PM EEST, Jarkko Sakkinen wrote:
> On Thu May 16, 2024 at 10:29 PM EEST, Jarkko Sakkinen wrote:
> > On Thu May 16, 2024 at 10:07 PM EEST, Casey Schaufler wrote:
> > > I suggest that adding a capability set for user namespaces is a bad idea:
> > > 	- It is in no way obvious what problem it solves
> > > 	- It is not obvious how it solves any problem
> > > 	- The capability mechanism has not been popular, and relying on a
> > > 	  community (e.g. container developers) to embrace it based on this
> > > 	  enhancement is a recipe for failure
> > > 	- Capabilities are already more complicated than modern developers
> > > 	  want to deal with. Adding another, special purpose set, is going
> > > 	  to make them even more difficult to use.
> >
> > What Inh, Prm, Eff, Bnd and Amb is not dead obvious to you? ;-)
> > One UNs cannot hurt...
> >
> > I'm not following containers that much but didn't seccomp profiles
> > supposed to be the silver bullet?
>
> Also, I think Kata Containers style way of doing containers is pretty
> solid. I've heard that some video streaming service at least in recent
> past did launch VM per stream so it's not like VM's cannot be made to
> scale I guess.

Sorry for multiple responses but this actually nails the key question:
who will use this? Even if this would work out somehow, is there someone
who will actually use this, and not few other more robust solutions
available? I mean it is worth of time to maintain it, if there is no
potential users for a feature.

In addition to "show me the code", there is always also "show me the payload".

BR, Jarkko

Re: [PATCH 0/3] Introduce user namespace capabilities

Posted by Jonathan Calmels 1 year, 9 months ago

> > > On Thu May 16, 2024 at 10:07 PM EEST, Casey Schaufler wrote:
> > > > I suggest that adding a capability set for user namespaces is a bad idea:
> > > > 	- It is in no way obvious what problem it solves
> > > > 	- It is not obvious how it solves any problem
> > > > 	- The capability mechanism has not been popular, and relying on a
> > > > 	  community (e.g. container developers) to embrace it based on this
> > > > 	  enhancement is a recipe for failure
> > > > 	- Capabilities are already more complicated than modern developers
> > > > 	  want to deal with. Adding another, special purpose set, is going
> > > > 	  to make them even more difficult to use.

Sorry if the commit wasn't clear enough. Basically:

- Today user namespaces grant full capabilities.
  This behavior is often abused to attack various kernel subsystems.
  Only option is to disable them altogether which breaks a lot of
  userspace stuff.
  This goes against the least privilege principle.

- It adds a new capability set.
  This set dictates what capabilities are granted in namespaces (instead
  of always getting full caps).
  This brings namespaces in line with the rest of the system, user
  namespaces are no more "special".
  They now work the same way as say a transition to root does with
  inheritable caps.

- This isn't intended to be used by end users per se (although they could).
  This would be used at the same places where existing capabalities are
  used today (e.g. init system, pam, container runtime, browser
  sandbox), or by system administrators.

To give you some ideas of things you could do:

# E.g. prevent alice from getting CAP_NET_ADMIN in user namespaces under SSH
echo "auth optional pam_cap.so" >> /etc/pam.d/sshd
echo "!cap_net_admin alice" >> /etc/security/capability.conf.

# E.g. prevent any Docker container from ever getting CAP_DAC_OVERRIDE
systemd-run -p CapabilityBoundingSet=~CAP_DAC_OVERRIDE \
            -p SecureBits=userns-strict-caps \
            /usr/bin/dockerd

# E.g. kernel could be vulnerable to CAP_SYS_RAWIO exploits
# Prevent users from ever gaining it
sysctl -w cap_bound_userns_mask=0x1fffffdffff

Re: [PATCH 0/3] Introduce user namespace capabilities

Posted by Casey Schaufler 1 year, 9 months ago

On 5/17/2024 4:42 AM, Jonathan Calmels wrote:
>>>> On Thu May 16, 2024 at 10:07 PM EEST, Casey Schaufler wrote:
>>>>> I suggest that adding a capability set for user namespaces is a bad idea:
>>>>> 	- It is in no way obvious what problem it solves
>>>>> 	- It is not obvious how it solves any problem
>>>>> 	- The capability mechanism has not been popular, and relying on a
>>>>> 	  community (e.g. container developers) to embrace it based on this
>>>>> 	  enhancement is a recipe for failure
>>>>> 	- Capabilities are already more complicated than modern developers
>>>>> 	  want to deal with. Adding another, special purpose set, is going
>>>>> 	  to make them even more difficult to use.
> Sorry if the commit wasn't clear enough.

While, as others have pointed out, the commit description left
much to be desired, that isn't the biggest problem with the change
you're proposing.

>  Basically:
>
> - Today user namespaces grant full capabilities.

Of course they do. I have been following the use of capabilities
in Linux since before they were implemented. The uptake has been
disappointing in all use cases.

>   This behavior is often abused to attack various kernel subsystems.

Yes. The problems of a single, all powerful root privilege scheme are
well documented.

>   Only option

Hardly.

>  is to disable them altogether which breaks a lot of
>   userspace stuff.

Updating userspace components to behave properly in a capabilities
environment has never been a popular activity, but is the right way
to address this issue. And before you start on the "no one can do that,
it's too hard", I'll point out that multiple UNIX systems supported
rootless, all capabilities based systems back in the day. 

>   This goes against the least privilege principle.

If you're going to run userspace that *requires* privilege, you have
to have a way to *allow* privilege. If the userspace insists on a root
based privilege model, you're stuck supporting it. Regardless of your
principles.

>
> - It adds a new capability set.

Which is a really, really bad idea. The equation for calculating effective
privilege is already more complicated than userspace developers are generally
willing to put up with.

>   This set dictates what capabilities are granted in namespaces (instead
>   of always getting full caps).

I would not expect container developers to be eager to learn how to use
this facility.

>   This brings namespaces in line with the rest of the system, user
>   namespaces are no more "special".

I'm sorry, but this makes no sense to me whatsoever. You want to introduce
a capability set explicitly for namespaces in order to make them less
special? Maybe I'm just old and cranky.

>   They now work the same way as say a transition to root does with
>   inheritable caps.

That needs some explanation.

>
> - This isn't intended to be used by end users per se (although they could).
>   This would be used at the same places where existing capabalities are
>   used today (e.g. init system, pam, container runtime, browser
>   sandbox), or by system administrators.

I understand that. It is for containers. Containers are not kernel entities.

>
> To give you some ideas of things you could do:
>
> # E.g. prevent alice from getting CAP_NET_ADMIN in user namespaces under SSH
> echo "auth optional pam_cap.so" >> /etc/pam.d/sshd
> echo "!cap_net_admin alice" >> /etc/security/capability.conf.
>
> # E.g. prevent any Docker container from ever getting CAP_DAC_OVERRIDE
> systemd-run -p CapabilityBoundingSet=~CAP_DAC_OVERRIDE \
>             -p SecureBits=userns-strict-caps \
>             /usr/bin/dockerd
>
> # E.g. kernel could be vulnerable to CAP_SYS_RAWIO exploits
> # Prevent users from ever gaining it
> sysctl -w cap_bound_userns_mask=0x1fffffdffff

Re: [PATCH 0/3] Introduce user namespace capabilities

Posted by Serge Hallyn 1 year, 9 months ago

On Fri, May 17, 2024 at 10:53:24AM -0700, Casey Schaufler wrote:
> On 5/17/2024 4:42 AM, Jonathan Calmels wrote:
> >>>> On Thu May 16, 2024 at 10:07 PM EEST, Casey Schaufler wrote:
> >>>>> I suggest that adding a capability set for user namespaces is a bad idea:
> >>>>> 	- It is in no way obvious what problem it solves
> >>>>> 	- It is not obvious how it solves any problem
> >>>>> 	- The capability mechanism has not been popular, and relying on a
> >>>>> 	  community (e.g. container developers) to embrace it based on this
> >>>>> 	  enhancement is a recipe for failure
> >>>>> 	- Capabilities are already more complicated than modern developers
> >>>>> 	  want to deal with. Adding another, special purpose set, is going
> >>>>> 	  to make them even more difficult to use.
> > Sorry if the commit wasn't clear enough.
> 
> While, as others have pointed out, the commit description left
> much to be desired, that isn't the biggest problem with the change
> you're proposing.
> 
> >  Basically:
> >
> > - Today user namespaces grant full capabilities.
> 
> Of course they do. I have been following the use of capabilities
> in Linux since before they were implemented. The uptake has been
> disappointing in all use cases.
> 
> >   This behavior is often abused to attack various kernel subsystems.
> 
> Yes. The problems of a single, all powerful root privilege scheme are
> well documented.
> 
> >   Only option
> 
> Hardly.
> 
> >  is to disable them altogether which breaks a lot of
> >   userspace stuff.
> 
> Updating userspace components to behave properly in a capabilities
> environment has never been a popular activity, but is the right way
> to address this issue. And before you start on the "no one can do that,
> it's too hard", I'll point out that multiple UNIX systems supported
> rootless, all capabilities based systems back in the day. 
> 
> >   This goes against the least privilege principle.
> 
> If you're going to run userspace that *requires* privilege, you have
> to have a way to *allow* privilege. If the userspace insists on a root
> based privilege model, you're stuck supporting it. Regardless of your
> principles.

Casey,

I might be wrong, but I think you're misreading this patchset.  It is not
about limiting capabilities in the init user ns at all.  It's about limiting
the capabilities which a process in a child userns can get.

Any unprivileged task can create a new userns, and get a process with
all capabilities in that namespace.  Always.  User namespaces were a
great success in that we can do this without any resulting privilege
against host owned resources.  The unaddressed issue is the expanded
kernel code surface area.

You say, above, (quoting out of place here)

> Updating userspace components to behave properly in a capabilities
> environment has never been a popular activity, but is the right way
> to address this issue. And before you start on the "no one can do that,
> it's too hard", I'll point out that multiple UNIX systems supported

He's not saying no one can do that.  He's saying, correctly, that the
kernel currently offers no way for userspace to do this limiting.  His
patchset offers two ways: one system wide capability mask (which applies
only to non-initial user namespaces) and on per-process inherited one
which - yay - userspace can use to limit what its children will be
able to get if they unshare a user namespace.

> > - It adds a new capability set.
> 
> Which is a really, really bad idea. The equation for calculating effective
> privilege is already more complicated than userspace developers are generally
> willing to put up with.

This is somewhat true, but I think the semantics of what is proposed here are
about as straightforward as you could hope for, and you can basically reason
about them completely independently of the other sets.  Only when reasoning
about the correctness of this code do you need to consider the other sets.  Not
when administering a system.

If you want root in a child user namespace to not have CAP_MAC_ADMIN, you drop
it from your pU.  Simple as that.

> >   This set dictates what capabilities are granted in namespaces (instead
> >   of always getting full caps).
> 
> I would not expect container developers to be eager to learn how to use
> this facility.

I'm a container developer, and I'm excited about it :)

> >   This brings namespaces in line with the rest of the system, user
> >   namespaces are no more "special".
> 
> I'm sorry, but this makes no sense to me whatsoever. You want to introduce
> a capability set explicitly for namespaces in order to make them less
> special?

Yes, exactly.

> Maybe I'm just old and cranky.

That's fine.

> >   They now work the same way as say a transition to root does with
> >   inheritable caps.
> 
> That needs some explanation.
> 
> >
> > - This isn't intended to be used by end users per se (although they could).
> >   This would be used at the same places where existing capabalities are
> >   used today (e.g. init system, pam, container runtime, browser
> >   sandbox), or by system administrators.
> 
> I understand that. It is for containers. Containers are not kernel entities.

User namespaces are.

This patch set provides userspace a way of limiting the kernel code exposed
to untrusted children, which currently does not exist.

> > To give you some ideas of things you could do:
> >
> > # E.g. prevent alice from getting CAP_NET_ADMIN in user namespaces under SSH
> > echo "auth optional pam_cap.so" >> /etc/pam.d/sshd
> > echo "!cap_net_admin alice" >> /etc/security/capability.conf.
> >
> > # E.g. prevent any Docker container from ever getting CAP_DAC_OVERRIDE
> > systemd-run -p CapabilityBoundingSet=~CAP_DAC_OVERRIDE \
> >             -p SecureBits=userns-strict-caps \
> >             /usr/bin/dockerd
> >
> > # E.g. kernel could be vulnerable to CAP_SYS_RAWIO exploits
> > # Prevent users from ever gaining it
> > sysctl -w cap_bound_userns_mask=0x1fffffdffff

Re: [PATCH 0/3] Introduce user namespace capabilities

Posted by John Johansen 1 year, 8 months ago

On 5/18/24 05:20, Serge Hallyn wrote:
> On Fri, May 17, 2024 at 10:53:24AM -0700, Casey Schaufler wrote:
>> On 5/17/2024 4:42 AM, Jonathan Calmels wrote:
>>>>>> On Thu May 16, 2024 at 10:07 PM EEST, Casey Schaufler wrote:
>>>>>>> I suggest that adding a capability set for user namespaces is a bad idea:
>>>>>>> 	- It is in no way obvious what problem it solves
>>>>>>> 	- It is not obvious how it solves any problem
>>>>>>> 	- The capability mechanism has not been popular, and relying on a
>>>>>>> 	  community (e.g. container developers) to embrace it based on this
>>>>>>> 	  enhancement is a recipe for failure
>>>>>>> 	- Capabilities are already more complicated than modern developers
>>>>>>> 	  want to deal with. Adding another, special purpose set, is going
>>>>>>> 	  to make them even more difficult to use.
>>> Sorry if the commit wasn't clear enough.
>>
>> While, as others have pointed out, the commit description left
>> much to be desired, that isn't the biggest problem with the change
>> you're proposing.
>>
>>>   Basically:
>>>
>>> - Today user namespaces grant full capabilities.
>>
>> Of course they do. I have been following the use of capabilities
>> in Linux since before they were implemented. The uptake has been
>> disappointing in all use cases.
>>
>>>    This behavior is often abused to attack various kernel subsystems.
>>
>> Yes. The problems of a single, all powerful root privilege scheme are
>> well documented.
>>
>>>    Only option
>>
>> Hardly.
>>
>>>   is to disable them altogether which breaks a lot of
>>>    userspace stuff.
>>
>> Updating userspace components to behave properly in a capabilities
>> environment has never been a popular activity, but is the right way
>> to address this issue. And before you start on the "no one can do that,
>> it's too hard", I'll point out that multiple UNIX systems supported
>> rootless, all capabilities based systems back in the day.
>>
>>>    This goes against the least privilege principle.
>>
>> If you're going to run userspace that *requires* privilege, you have
>> to have a way to *allow* privilege. If the userspace insists on a root
>> based privilege model, you're stuck supporting it. Regardless of your
>> principles.
> 
> Casey,
> 
> I might be wrong, but I think you're misreading this patchset.  It is not
> about limiting capabilities in the init user ns at all.  It's about limiting
> the capabilities which a process in a child userns can get.
> 
> Any unprivileged task can create a new userns, and get a process with
> all capabilities in that namespace.  Always.  User namespaces were a
> great success in that we can do this without any resulting privilege
> against host owned resources.  The unaddressed issue is the expanded
> kernel code surface area.
> 
> You say, above, (quoting out of place here)
> 
>> Updating userspace components to behave properly in a capabilities
>> environment has never been a popular activity, but is the right way
>> to address this issue. And before you start on the "no one can do that,
>> it's too hard", I'll point out that multiple UNIX systems supported
> 
> He's not saying no one can do that.  He's saying, correctly, that the
> kernel currently offers no way for userspace to do this limiting.  His
> patchset offers two ways: one system wide capability mask (which applies
> only to non-initial user namespaces) and on per-process inherited one
> which - yay - userspace can use to limit what its children will be
> able to get if they unshare a user namespace.
> 
>>> - It adds a new capability set.
>>
>> Which is a really, really bad idea. The equation for calculating effective
>> privilege is already more complicated than userspace developers are generally
>> willing to put up with.
> 
> This is somewhat true, but I think the semantics of what is proposed here are
> about as straightforward as you could hope for, and you can basically reason
> about them completely independently of the other sets.  Only when reasoning
> about the correctness of this code do you need to consider the other sets.  Not
> when administering a system.
> 
> If you want root in a child user namespace to not have CAP_MAC_ADMIN, you drop
> it from your pU.  Simple as that.
> 
>>>    This set dictates what capabilities are granted in namespaces (instead
>>>    of always getting full caps).
>>
>> I would not expect container developers to be eager to learn how to use
>> this facility.
> 
> I'm a container developer, and I'm excited about it :)
> 
>>>    This brings namespaces in line with the rest of the system, user
>>>    namespaces are no more "special".
>>
>> I'm sorry, but this makes no sense to me whatsoever. You want to introduce
>> a capability set explicitly for namespaces in order to make them less
>> special?
> 
> Yes, exactly.
> 
>> Maybe I'm just old and cranky.
> 
> That's fine.
> 
>>>    They now work the same way as say a transition to root does with
>>>    inheritable caps.
>>
>> That needs some explanation.
>>
>>>
>>> - This isn't intended to be used by end users per se (although they could).
>>>    This would be used at the same places where existing capabalities are
>>>    used today (e.g. init system, pam, container runtime, browser
>>>    sandbox), or by system administrators.
>>
>> I understand that. It is for containers. Containers are not kernel entities.
> 
> User namespaces are.
> 
> This patch set provides userspace a way of limiting the kernel code exposed
> to untrusted children, which currently does not exist.
> 
theoretically, I am worried that in practice the existing utils allow
untrusted code to still access user namespaces.

In practice we have found that we need to allow a different set of capabilities
when bwrap is called from flatpak than when called on its own etc. We see the
same pattern with unshare and other utilities around launching applications
in user namespaces.

In practice at the distro level I don't see this approach actually helping.
Because we have so many uses that require exposing close to the full capabilities
set in multiple utilities that are required by many different applications.

To be clear this doesn't stop distros from doing something more, but is it
worth the added complexity if in practice it can't be used effectively.
I really don't have the answer.

Re: [PATCH 0/3] Introduce user namespace capabilities

Posted by Casey Schaufler 1 year, 9 months ago

On 5/18/2024 5:20 AM, Serge Hallyn wrote:
> On Fri, May 17, 2024 at 10:53:24AM -0700, Casey Schaufler wrote:
>> On 5/17/2024 4:42 AM, Jonathan Calmels wrote:
>>>>>> On Thu May 16, 2024 at 10:07 PM EEST, Casey Schaufler wrote:
>>>>>>> I suggest that adding a capability set for user namespaces is a bad idea:
>>>>>>> 	- It is in no way obvious what problem it solves
>>>>>>> 	- It is not obvious how it solves any problem
>>>>>>> 	- The capability mechanism has not been popular, and relying on a
>>>>>>> 	  community (e.g. container developers) to embrace it based on this
>>>>>>> 	  enhancement is a recipe for failure
>>>>>>> 	- Capabilities are already more complicated than modern developers
>>>>>>> 	  want to deal with. Adding another, special purpose set, is going
>>>>>>> 	  to make them even more difficult to use.
>>> Sorry if the commit wasn't clear enough.
>> While, as others have pointed out, the commit description left
>> much to be desired, that isn't the biggest problem with the change
>> you're proposing.
>>
>>>  Basically:
>>>
>>> - Today user namespaces grant full capabilities.
>> Of course they do. I have been following the use of capabilities
>> in Linux since before they were implemented. The uptake has been
>> disappointing in all use cases.
>>
>>>   This behavior is often abused to attack various kernel subsystems.
>> Yes. The problems of a single, all powerful root privilege scheme are
>> well documented.
>>
>>>   Only option
>> Hardly.
>>
>>>  is to disable them altogether which breaks a lot of
>>>   userspace stuff.
>> Updating userspace components to behave properly in a capabilities
>> environment has never been a popular activity, but is the right way
>> to address this issue. And before you start on the "no one can do that,
>> it's too hard", I'll point out that multiple UNIX systems supported
>> rootless, all capabilities based systems back in the day. 
>>
>>>   This goes against the least privilege principle.
>> If you're going to run userspace that *requires* privilege, you have
>> to have a way to *allow* privilege. If the userspace insists on a root
>> based privilege model, you're stuck supporting it. Regardless of your
>> principles.
> Casey,
>
> I might be wrong, but I think you're misreading this patchset.  It is not
> about limiting capabilities in the init user ns at all.  It's about limiting
> the capabilities which a process in a child userns can get.

I do understand that. My objection is not to the intent, but to the approach.
Adding a capability set to the general mechanism in support of a limited, specific
use case seems wrong to me. I would rather see a mechanism in userns to limit
the capabilities in a user namespace than a mechanism in capabilities that is
specific to user namespaces.

> Any unprivileged task can create a new userns, and get a process with
> all capabilities in that namespace.  Always.  User namespaces were a
> great success in that we can do this without any resulting privilege
> against host owned resources.  The unaddressed issue is the expanded
> kernel code surface area.

An option to clone() then, to limit the capabilities available?
I honestly can't recall if that has been suggested elsewhere, and
apologize if it's already been dismissed as a stoopid idea.

>
> You say, above, (quoting out of place here)
>
>> Updating userspace components to behave properly in a capabilities
>> environment has never been a popular activity, but is the right way
>> to address this issue. And before you start on the "no one can do that,
>> it's too hard", I'll point out that multiple UNIX systems supported
> He's not saying no one can do that.  He's saying, correctly, that the
> kernel currently offers no way for userspace to do this limiting.  His
> patchset offers two ways: one system wide capability mask (which applies
> only to non-initial user namespaces) and on per-process inherited one
> which - yay - userspace can use to limit what its children will be
> able to get if they unshare a user namespace.
>
>>> - It adds a new capability set.
>> Which is a really, really bad idea. The equation for calculating effective
>> privilege is already more complicated than userspace developers are generally
>> willing to put up with.
> This is somewhat true, but I think the semantics of what is proposed here are
> about as straightforward as you could hope for, and you can basically reason
> about them completely independently of the other sets.  Only when reasoning
> about the correctness of this code do you need to consider the other sets.  Not
> when administering a system.
>
> If you want root in a child user namespace to not have CAP_MAC_ADMIN, you drop
> it from your pU.  Simple as that.
>
>>>   This set dictates what capabilities are granted in namespaces (instead
>>>   of always getting full caps).
>> I would not expect container developers to be eager to learn how to use
>> this facility.
> I'm a container developer, and I'm excited about it :)

OK, well, I'm wrong. It's happened before and will happen again.

>
>>>   This brings namespaces in line with the rest of the system, user
>>>   namespaces are no more "special".
>> I'm sorry, but this makes no sense to me whatsoever. You want to introduce
>> a capability set explicitly for namespaces in order to make them less
>> special?
> Yes, exactly.

Hmm. I can't say I buy that. It makes a whole lot more sense to me to
change userns than to change capabilities.

>
>> Maybe I'm just old and cranky.
> That's fine.
>
>>>   They now work the same way as say a transition to root does with
>>>   inheritable caps.
>> That needs some explanation.
>>
>>> - This isn't intended to be used by end users per se (although they could).
>>>   This would be used at the same places where existing capabalities are
>>>   used today (e.g. init system, pam, container runtime, browser
>>>   sandbox), or by system administrators.
>> I understand that. It is for containers. Containers are not kernel entities.
> User namespaces are.
>
> This patch set provides userspace a way of limiting the kernel code exposed
> to untrusted children, which currently does not exist.

Yes, I understand. I would rather see a change to userns in support of a userns
specific need than a change to capabilities for a userns specific need.

>>> To give you some ideas of things you could do:
>>>
>>> # E.g. prevent alice from getting CAP_NET_ADMIN in user namespaces under SSH
>>> echo "auth optional pam_cap.so" >> /etc/pam.d/sshd
>>> echo "!cap_net_admin alice" >> /etc/security/capability.conf.
>>>
>>> # E.g. prevent any Docker container from ever getting CAP_DAC_OVERRIDE
>>> systemd-run -p CapabilityBoundingSet=~CAP_DAC_OVERRIDE \
>>>             -p SecureBits=userns-strict-caps \
>>>             /usr/bin/dockerd
>>>
>>> # E.g. kernel could be vulnerable to CAP_SYS_RAWIO exploits
>>> # Prevent users from ever gaining it
>>> sysctl -w cap_bound_userns_mask=0x1fffffdffff

Re: [PATCH 0/3] Introduce user namespace capabilities

Posted by Jonathan Calmels 1 year, 9 months ago

On Sun, May 19, 2024 at 10:03:29AM GMT, Casey Schaufler wrote:
> I do understand that. My objection is not to the intent, but to the approach.
> Adding a capability set to the general mechanism in support of a limited, specific
> use case seems wrong to me. I would rather see a mechanism in userns to limit
> the capabilities in a user namespace than a mechanism in capabilities that is
> specific to user namespaces.

> An option to clone() then, to limit the capabilities available?
> I honestly can't recall if that has been suggested elsewhere, and
> apologize if it's already been dismissed as a stoopid idea.

No and you're right, this would also make sense. This was considered as
well as things like ioctl_ns() (basically introducing the concept of
capabilities in the user_namespace struct). I also considered reusing
the existing sets with various schemes to no avail.

The main issue with this approach is that you've to consider how this is
going to be used. This ties into the other thread we've had with John
and Eric.
Basically, we're coming from a model where things are wide open and
we're trying to tighten things down.

Quoting John here:

> We are starting from a different posture here. Where applications have
> assumed that user namespaces where safe and no measures were needed.
> Tools like unshare and bwrap if set to allow user namespaces in their
> fcaps will allow exploits a trivial by-pass.

We can't really expect userspace to patch every single userns callsite
and opt-in this new security mechanism.
You said it well yourself:

> Capabilities are already more complicated than modern developers
> want to deal with.

Moreover, policies are not necessarily enforced at said callsites. Take
for example a service like systemd-machined, or a PAM session. Those
need to be able to place restrictions on any processes spawned under
them.

If we do this in clone() (or similar), we'll also need to come up with
inheritance rules, being able to query capabilities, etc.
At this point we're just reinventing capability sets.

Finally the nice thing about having it as a capability set, is that we
can easily define rules between them. Patch 2 is a good example of this.
It constrains the userns set to the bounding set of a task. Thus,
requiring minimal/no change to userspace, and helping with adoption.

> Yes, I understand. I would rather see a change to userns in support of a userns
> specific need than a change to capabilities for a userns specific need.

Valid point, but at the end of the day, those are really just tasks'
capabilities. The unshare() just happens to trigger specific rules when it
comes to the tasks' creds. This isn't so different than the other sets
and their specific rules for execve() or UID 0.

This could also be reframed as:

Why would setting capabilities on taks in a userns be so different than
tasks outside of it?

Re: [PATCH 0/3] Introduce user namespace capabilities

Posted by Jonathan Calmels 1 year, 9 months ago

On Fri, May 17, 2024 at 10:53:24AM GMT, Casey Schaufler wrote:
> Of course they do. I have been following the use of capabilities
> in Linux since before they were implemented. The uptake has been
> disappointing in all use cases.

Why "Of course"?
What if they should not get *all* privileges?

> Yes. The problems of a single, all powerful root privilege scheme are
> well documented.

That's my point, it doesn't have to be this way.

> Hardly.

Maybe I'm missing something, then.
How do I restrict my users from gaining say CAP_NET_ADMIN in their
userns today?

> If you're going to run userspace that *requires* privilege, you have
> to have a way to *allow* privilege. If the userspace insists on a root
> based privilege model, you're stuck supporting it. Regardless of your
> principles.

I want *some* privileges, not *all* of them.

> Which is a really, really bad idea. The equation for calculating effective
> privilege is already more complicated than userspace developers are generally
> willing to put up with.

This is generally true, but this set is way more straightforward than
the other sets, it's always:

pU = pP = pE = X

If you look at the patch, there is no transition logic or anything
complicated, it's just a set of caps behind inherited.

> I would not expect container developers to be eager to learn how to use
> this facility.

And they probably wouldn't.
For most use cases it's going to be enforced through system policies
(init, pam, etc). Other than that, usage won't change, you will run your
usual `docker run --cap-add ...` to get caps, except now it works in
userns.

> I'm sorry, but this makes no sense to me whatsoever. You want to introduce
> a capability set explicitly for namespaces in order to make them less
> special? Maybe I'm just old and cranky.
> 
> >   They now work the same way as say a transition to root does with
> >   inheritable caps.
> 
> That needs some explanation.

From man capabilities(7):

In  order  to  mirror traditional UNIX semantics, the kernel performs
special treatment of file capabilities when a process with UID 0 (root)
executes a program [...]

Thus,  when [...] a process whose real and effective UIDs are
zero execve(2)s a program, the calculation of the process's new
permitted capabilities simplifies to:

   P'(permitted)   = P(inheritable) | P(bounding)

   P'(effective)   = P'(permitted)

So, the same way a root process is bounded by its inheritable set when
it execs, a "rootless" process is bounded by its userns set when it
unshares.

Re: [PATCH 0/3] Introduce user namespace capabilities

Posted by Jarkko Sakkinen 1 year, 9 months ago

On Fri May 17, 2024 at 10:11 PM EEST, Jonathan Calmels wrote:
> On Fri, May 17, 2024 at 10:53:24AM GMT, Casey Schaufler wrote:
> > Of course they do. I have been following the use of capabilities
> > in Linux since before they were implemented. The uptake has been
> > disappointing in all use cases.
>
> Why "Of course"?
> What if they should not get *all* privileges?

They do the job given a real-world workload and stress test.

Here the problem is based on a theory and an experiment.

Even a formal model does not necessarily map all "unknown unknowns".

BR, Jarkko

Re: [PATCH 0/3] Introduce user namespace capabilities

Posted by Jarkko Sakkinen 1 year, 9 months ago

On Sat May 18, 2024 at 2:08 PM EEST, Jarkko Sakkinen wrote:
> On Fri May 17, 2024 at 10:11 PM EEST, Jonathan Calmels wrote:
> > On Fri, May 17, 2024 at 10:53:24AM GMT, Casey Schaufler wrote:
> > > Of course they do. I have been following the use of capabilities
> > > in Linux since before they were implemented. The uptake has been
> > > disappointing in all use cases.
> >
> > Why "Of course"?
> > What if they should not get *all* privileges?
>
> They do the job given a real-world workload and stress test.
>
> Here the problem is based on a theory and an experiment.
>
> Even a formal model does not necessarily map all "unknown unknowns".

So this was like the worst "sales pitch" ever:

1. The cover letter starts with the idea of having to argue about name
spaces, and have fun while doing that ;-) We all have our own ways to
entertain ourselves but "name space duels" are not my thing. Why not
just start with why we all want this instead? Maybe we don't want it
then. Maybe this is just useless spam given the angle presented?
2. There's shitloads of computer science and set theory but nothing
that would make common sense. You need to build more understandable 
model. There's zero "gist" in this work.

Maybe this does make sense but the story around it sucks so far.

BR, Jarkko

Re: [PATCH 0/3] Introduce user namespace capabilities

Posted by Jarkko Sakkinen 1 year, 9 months ago

On Sat May 18, 2024 at 2:17 PM EEST, Jarkko Sakkinen wrote:
> On Sat May 18, 2024 at 2:08 PM EEST, Jarkko Sakkinen wrote:
> > On Fri May 17, 2024 at 10:11 PM EEST, Jonathan Calmels wrote:
> > > On Fri, May 17, 2024 at 10:53:24AM GMT, Casey Schaufler wrote:
> > > > Of course they do. I have been following the use of capabilities
> > > > in Linux since before they were implemented. The uptake has been
> > > > disappointing in all use cases.
> > >
> > > Why "Of course"?
> > > What if they should not get *all* privileges?
> >
> > They do the job given a real-world workload and stress test.
> >
> > Here the problem is based on a theory and an experiment.
> >
> > Even a formal model does not necessarily map all "unknown unknowns".
>
> So this was like the worst "sales pitch" ever:
>
> 1. The cover letter starts with the idea of having to argue about name
> spaces, and have fun while doing that ;-) We all have our own ways to
> entertain ourselves but "name space duels" are not my thing. Why not
> just start with why we all want this instead? Maybe we don't want it
> then. Maybe this is just useless spam given the angle presented?
> 2. There's shitloads of computer science and set theory but nothing
> that would make common sense. You need to build more understandable 
> model. There's zero "gist" in this work.
>
> Maybe this does make sense but the story around it sucks so far.

One tip: I think this is wrong forum to present namespace ideas in the
first place. It would be probably better to talk about this with e.g.
systemd or podman developers, and similar groups. There's zero evidence
of the usefulness. Then when you go that route and come back with actual
users, things click much more easily. Now this is all in the void.

BR, Jarkko

Re: [PATCH 0/3] Introduce user namespace capabilities

Posted by John Johansen 1 year, 8 months ago

On 5/18/24 04:21, Jarkko Sakkinen wrote:
> On Sat May 18, 2024 at 2:17 PM EEST, Jarkko Sakkinen wrote:
>> On Sat May 18, 2024 at 2:08 PM EEST, Jarkko Sakkinen wrote:
>>> On Fri May 17, 2024 at 10:11 PM EEST, Jonathan Calmels wrote:
>>>> On Fri, May 17, 2024 at 10:53:24AM GMT, Casey Schaufler wrote:
>>>>> Of course they do. I have been following the use of capabilities
>>>>> in Linux since before they were implemented. The uptake has been
>>>>> disappointing in all use cases.
>>>>
>>>> Why "Of course"?
>>>> What if they should not get *all* privileges?
>>>
>>> They do the job given a real-world workload and stress test.
>>>
>>> Here the problem is based on a theory and an experiment.
>>>
>>> Even a formal model does not necessarily map all "unknown unknowns".
>>
>> So this was like the worst "sales pitch" ever:
>>
>> 1. The cover letter starts with the idea of having to argue about name
>> spaces, and have fun while doing that ;-) We all have our own ways to
>> entertain ourselves but "name space duels" are not my thing. Why not
>> just start with why we all want this instead? Maybe we don't want it
>> then. Maybe this is just useless spam given the angle presented?
>> 2. There's shitloads of computer science and set theory but nothing
>> that would make common sense. You need to build more understandable
>> model. There's zero "gist" in this work.
>>
>> Maybe this does make sense but the story around it sucks so far.
> 
> One tip: I think this is wrong forum to present namespace ideas in the
> first place. It would be probably better to talk about this with e.g.
> systemd or podman developers, and similar groups. There's zero evidence
> of the usefulness. Then when you go that route and come back with actual
> users, things click much more easily. Now this is all in the void.
> 
> BR, Jarkko

Jarkko,

this is very much the right forum. User namespaces exist today. This
is a discussion around trying to reduce the exposed kernel surface
that is being used to attack the kernel.

Re: [PATCH 0/3] Introduce user namespace capabilities

Posted by Jarkko Sakkinen 1 year, 8 months ago

On Tue May 21, 2024 at 4:57 PM EEST, John Johansen wrote:
> > One tip: I think this is wrong forum to present namespace ideas in the
> > first place. It would be probably better to talk about this with e.g.
> > systemd or podman developers, and similar groups. There's zero evidence
> > of the usefulness. Then when you go that route and come back with actual
> > users, things click much more easily. Now this is all in the void.
> > 
> > BR, Jarkko
>
> Jarkko,
>
> this is very much the right forum. User namespaces exist today. This
> is a discussion around trying to reduce the exposed kernel surface
> that is being used to attack the kernel.

Agreed, that was harsh way to put it. What I mean is that if this
feature was included, would it be enabled by distributions?

This user base part or potential user space part is not very well
described in the cover letter. I.e. "motivation" to put it short.

I mean the technical details are really in detail in this patch set but
it would help to digest them if there was some even rough description
how this would be deployed.

If the motivation should be obvious, then it is beyond me, and thus
would be nice if that obvious thing was stated that everyone else gets.

E.g. I like to sometimes just test quite alien patch sets for the sake
of learning and fun (or not so fun, depends) but this patch set does not
deliver enough information to do anything at all.

Hope this clears a bit where I stand. IMHO a good patch set should bring
the details to the specialists on the topic but also have some wider
audience motivational stuff in order to make clear where it fits in this
world :-)

BR, Jarkko

Re: [PATCH 0/3] Introduce user namespace capabilities

Posted by John Johansen 1 year, 8 months ago

On 5/21/24 07:12, Jarkko Sakkinen wrote:
> On Tue May 21, 2024 at 4:57 PM EEST, John Johansen wrote:
>>> One tip: I think this is wrong forum to present namespace ideas in the
>>> first place. It would be probably better to talk about this with e.g.
>>> systemd or podman developers, and similar groups. There's zero evidence
>>> of the usefulness. Then when you go that route and come back with actual
>>> users, things click much more easily. Now this is all in the void.
>>>
>>> BR, Jarkko
>>
>> Jarkko,
>>
>> this is very much the right forum. User namespaces exist today. This
>> is a discussion around trying to reduce the exposed kernel surface
>> that is being used to attack the kernel.
> 
> Agreed, that was harsh way to put it. What I mean is that if this
> feature was included, would it be enabled by distributions?
> 
Enabled, maybe? It requires the debian distros to make sure their
packaging supports xattrs correctly. It should be good but it isn't
well exercised. It also requires the work to set these on multiple
applications. From experience we are talking 100s.

It will break out of repo applications, and require an extra step for
users to enable. Ubuntu is already breaking these but for many, of the
more popular ones they are shipping profiles so the users don't have
to take an extra step. Things like appimages remain broken and wil
require an approach similar to the Mac with unverified software
downloaded from the internet.

Nor does this fix the bwrap, unshare, ... use case. Which means the
distro is going to have to continue shipping an alternate solution
that covers those. For Ubuntu atm this is just an extra point of
friction but I expect we would still end up enabling it to tick the
checkbox at some point if it goes into the upstream kernel.

> This user base part or potential user space part is not very well
> described in the cover letter. I.e. "motivation" to put it short.
> 
yes the cover letter needs work

> I mean the technical details are really in detail in this patch set but
> it would help to digest them if there was some even rough description
> how this would be deployed.
> 
yes

> If the motivation should be obvious, then it is beyond me, and thus
> would be nice if that obvious thing was stated that everyone else gets.
> 
sure. The cover letter will get updated with this. Seeing as I have been
dealing with this a lot lately. It comes down to user namespaces allow
unprivileged code to access kernel surface area that is usually protected
behind capabilities. This has been leveraged as part of the exploit chain
in the majority of kernel exploits we are seeing.

> E.g. I like to sometimes just test quite alien patch sets for the sake
> of learning and fun (or not so fun, depends) but this patch set does not
> deliver enough information to do anything at all.
> 
under stood, I am playing devils advocate here. Its not that I don't see
value in the proposal, but that I am not sure I see enough value with
the current situation, where so much code has been written around the
assumption that unprivileged user namespaces are safe. Trying to fix
the situation without breaking everything is complicated.

> Hope this clears a bit where I stand. IMHO a good patch set should bring
> the details to the specialists on the topic but also have some wider
> audience motivational stuff in order to make clear where it fits in this
> world :-)
> 
> BR, Jarkko
>

Re: [PATCH 0/3] Introduce user namespace capabilities

Posted by Jonathan Calmels 1 year, 8 months ago

On Tue, May 21, 2024 at 07:45:20AM GMT, John Johansen wrote:
> On 5/21/24 07:12, Jarkko Sakkinen wrote:
> > On Tue May 21, 2024 at 4:57 PM EEST, John Johansen wrote:
> > > > One tip: I think this is wrong forum to present namespace ideas in the
> > > > first place. It would be probably better to talk about this with e.g.
> > > > systemd or podman developers, and similar groups. There's zero evidence
> > > > of the usefulness. Then when you go that route and come back with actual
> > > > users, things click much more easily. Now this is all in the void.
> > > > 
> > > > BR, Jarkko
> > > 
> > > Jarkko,
> > > 
> > > this is very much the right forum. User namespaces exist today. This
> > > is a discussion around trying to reduce the exposed kernel surface
> > > that is being used to attack the kernel.
> > 
> > Agreed, that was harsh way to put it. What I mean is that if this
> > feature was included, would it be enabled by distributions?
> > 
> Enabled, maybe? It requires the debian distros to make sure their
> packaging supports xattrs correctly. It should be good but it isn't
> well exercised. It also requires the work to set these on multiple
> applications. From experience we are talking 100s.
> 
> It will break out of repo applications, and require an extra step for
> users to enable. Ubuntu is already breaking these but for many, of the
> more popular ones they are shipping profiles so the users don't have
> to take an extra step. Things like appimages remain broken and wil
> require an approach similar to the Mac with unverified software
> downloaded from the internet.
> 
> Nor does this fix the bwrap, unshare, ... use case. Which means the
> distro is going to have to continue shipping an alternate solution
> that covers those. For Ubuntu atm this is just an extra point of
> friction but I expect we would still end up enabling it to tick the
> checkbox at some point if it goes into the upstream kernel.

I'm not sure I understand your point here and how this relates to xattrs.
This patchset has nothing to do with file capabilities. The userns
capability set is purely a process based capability set and in no way
influenced by file attributes.

> > This user base part or potential user space part is not very well
> > described in the cover letter. I.e. "motivation" to put it short.
> > 
> yes the cover letter needs work

Yes, it's been mentioned several times already.
While not in the cover letter, the motivation is stated in the first
patch and provides several references to past discussions on the topic.

This is nothing new, this subject has been contentious for years now and
discussed over and over on these lists (Eric would know :)). As
mentioned in the patch also, this recently warranted the inclusion of
new LSM hooks.

But again, I wrongfully assumed that this problem was well understood
and still relatively fresh, that's my bad.

> > I mean the technical details are really in detail in this patch set but
> > it would help to digest them if there was some even rough description
> > how this would be deployed.
> > 
> yes

Yes, this was purposefully left out so as not to influence any specific
implementation. There is a mention of where this could be done (i.e.
init, pam), but at the end of the day, this is going to depend on each
use case.
Having said that, since it appears to be confusing, maybe we could add
some of the examples I sent out in this thread or the other ones.

I want to reiterate that this is a generic capability set, this is not
magic switch you turn on to secure the whole system.
Its implementation is going to vary across environments and it is going
to be dictated by your threat model.

For example, John's threat model of securing a multi-user Ubuntu Desktop
is going to be very different than say securing a server where all the
userspace is fixed and known.
The former might require additional integration with the LSM subsystem.
Thankfully, this patch should synergize well with it.

Fundamentally, and at its core, it's very simple. Serge put it nicely:

> If you want root in a child user namespace to not have CAP_MAC_ADMIN,
> you drop it from your pU.  Simple as that.

From there, you can imagine any integration you want in userspace and
ways to enforce your own policies.

TLDR, this is a first step towards empowering userspace with control
over capabilities granted by a userns. At present, the kernel does not
offer ways to do this. By itself, it is not a comprehensive solution
designed to thwart threat actors. However, it gives userspace the option
to do so.

Re: [PATCH 0/3] Introduce user namespace capabilities

Posted by John Johansen 1 year, 8 months ago

On 5/21/24 17:45, Jonathan Calmels wrote:
> On Tue, May 21, 2024 at 07:45:20AM GMT, John Johansen wrote:
>> On 5/21/24 07:12, Jarkko Sakkinen wrote:
>>> On Tue May 21, 2024 at 4:57 PM EEST, John Johansen wrote:
>>>>> One tip: I think this is wrong forum to present namespace ideas in the
>>>>> first place. It would be probably better to talk about this with e.g.
>>>>> systemd or podman developers, and similar groups. There's zero evidence
>>>>> of the usefulness. Then when you go that route and come back with actual
>>>>> users, things click much more easily. Now this is all in the void.
>>>>>
>>>>> BR, Jarkko
>>>>
>>>> Jarkko,
>>>>
>>>> this is very much the right forum. User namespaces exist today. This
>>>> is a discussion around trying to reduce the exposed kernel surface
>>>> that is being used to attack the kernel.
>>>
>>> Agreed, that was harsh way to put it. What I mean is that if this
>>> feature was included, would it be enabled by distributions?
>>>
>> Enabled, maybe? It requires the debian distros to make sure their
>> packaging supports xattrs correctly. It should be good but it isn't
>> well exercised. It also requires the work to set these on multiple
>> applications. From experience we are talking 100s.
>>
>> It will break out of repo applications, and require an extra step for
>> users to enable. Ubuntu is already breaking these but for many, of the
>> more popular ones they are shipping profiles so the users don't have
>> to take an extra step. Things like appimages remain broken and wil
>> require an approach similar to the Mac with unverified software
>> downloaded from the internet.
>>
>> Nor does this fix the bwrap, unshare, ... use case. Which means the
>> distro is going to have to continue shipping an alternate solution
>> that covers those. For Ubuntu atm this is just an extra point of
>> friction but I expect we would still end up enabling it to tick the
>> checkbox at some point if it goes into the upstream kernel.
> 
> I'm not sure I understand your point here and how this relates to xattrs.
> This patchset has nothing to do with file capabilities. The userns
> capability set is purely a process based capability set and in no way
> influenced by file attributes.
> 

Oopps sorry the fcaps bit is crossing over a side discussion.

>>> This user base part or potential user space part is not very well
>>> described in the cover letter. I.e. "motivation" to put it short.
>>>
>> yes the cover letter needs work
> 
> Yes, it's been mentioned several times already.
> While not in the cover letter, the motivation is stated in the first
> patch and provides several references to past discussions on the topic.
> 
> This is nothing new, this subject has been contentious for years now and
> discussed over and over on these lists (Eric would know :)). As
> mentioned in the patch also, this recently warranted the inclusion of
> new LSM hooks.
> 
> But again, I wrongfully assumed that this problem was well understood
> and still relatively fresh, that's my bad.
> 
>>> I mean the technical details are really in detail in this patch set but
>>> it would help to digest them if there was some even rough description
>>> how this would be deployed.
>>>
>> yes
> 
> Yes, this was purposefully left out so as not to influence any specific
> implementation. There is a mention of where this could be done (i.e.
> init, pam), but at the end of the day, this is going to depend on each
> use case.
> Having said that, since it appears to be confusing, maybe we could add
> some of the examples I sent out in this thread or the other ones.
> 
examples would help, especially for people not too familiar with this.


> I want to reiterate that this is a generic capability set, this is not
> magic switch you turn on to secure the whole system.
> Its implementation is going to vary across environments and it is going
> to be dictated by your threat model.
> 
yeah

> For example, John's threat model of securing a multi-user Ubuntu Desktop
> is going to be very different than say securing a server where all the
> userspace is fixed and known.
> The former might require additional integration with the LSM subsystem.
> Thankfully, this patch should synergize well with it.
> 

hrmmm, maybe, I will be happy if they just don't end up complicating
each other

> Fundamentally, and at its core, it's very simple. Serge put it nicely:
> 

yes it is, and yet it still worries me a great deal. I have some of
the same worries as Casey, and also worry that people will take this
as a solution for all use cases, without understanding the issues.

On the other hand walking back the current state of unprivileged use of
user namespaces is a huge issue. Having another approach also pushing
will actually be helpful in some ways.

>> If you want root in a child user namespace to not have CAP_MAC_ADMIN,
>> you drop it from your pU.  Simple as that.
> 
>  From there, you can imagine any integration you want in userspace and
> ways to enforce your own policies.
> 
> TLDR, this is a first step towards empowering userspace with control
> over capabilities granted by a userns. At present, the kernel does not
> offer ways to do this. By itself, it is not a comprehensive solution

yep

> designed to thwart threat actors. However, it gives userspace the option
> to do so.

again, I don't believe the capabilities system is actually capable of
doing this, it covers some of the use cases. To be fair the LSM doesn't
cover everything either, there are current use cases that just aren't safe,
you either break them or allow them and accept the risks. It relies on
people understanding threat models, and sadly I have become grown quite
grumpy about that topic.

Anyways I will try to finish up my review of the code this weekend.

Re: [PATCH 0/3] Introduce user namespace capabilities

Posted by Paul Moore 1 year, 9 months ago

On Thu, May 16, 2024 at 5:21 AM Jonathan Calmels <jcalmels@3xx0.net> wrote:
>
> It's that time of the year again where we debate security settings for user
> namespaces ;)
>
> I’ve been experimenting with different approaches to address the gripe
> around user namespaces being used as attack vectors.
> After invaluable feedback from Serge and Christian offline, this is what I
> came up with.

As Serge is the capabilities maintainer it would be good to hear his
thoughts on-list about this proposal.

> There are obviously a lot of things we could do differently but I feel this
> is the right balance between functionality, simplicity and security. This
> also serves as a good foundation and could always be extended if the need
> arises in the future.
>
> Notes:
>
> - Adding a new capability set is far from ideal, but trying to reuse the
>   existing capability framework was deemed both impractical and
>   questionable security-wise, so here we are.
>
> - We might want to add new capabilities for some of the checks instead of
>   reusing CAP_SETPCAP every time. Serge mentioned something like
>   CAP_SYS_LIMIT?
>
> - In the last patch, we could decide to have stronger requirements and
>   perform checks inside cap_capable() in case we want to retroactively
>   prevent capabilities in old namespaces, this might be an overreach though
>   so I left it out.
>
>   I'm also not fond of the ulong logic for setting the sysctl parameter, on
>   the other hand, the usermodhelper code always uses two u32s which makes it
>   very confusing to set in userspace.
>
>
> Jonathan Calmels (3):
>   capabilities: user namespace capabilities
>   capabilities: add securebit for strict userns caps
>   capabilities: add cap userns sysctl mask
>
>  fs/proc/array.c                 |  9 ++++
>  include/linux/cred.h            |  3 ++
>  include/linux/securebits.h      |  1 +
>  include/linux/user_namespace.h  |  7 +++
>  include/uapi/linux/prctl.h      |  7 +++
>  include/uapi/linux/securebits.h | 11 ++++-
>  kernel/cred.c                   |  3 ++
>  kernel/sysctl.c                 | 10 ++++
>  kernel/umh.c                    | 16 +++++++
>  kernel/user_namespace.c         | 83 ++++++++++++++++++++++++++++++---
>  security/commoncap.c            | 59 +++++++++++++++++++++++
>  security/keys/process_keys.c    |  3 ++
>  12 files changed, 204 insertions(+), 8 deletions(-)

-- 
paul-moore.com

Re: [PATCH 0/3] Introduce user namespace capabilities

Posted by Jarkko Sakkinen 1 year, 9 months ago

On Thu May 16, 2024 at 7:23 PM EEST, Paul Moore wrote:
> On Thu, May 16, 2024 at 5:21 AM Jonathan Calmels <jcalmels@3xx0.net> wrote:
> >
> > It's that time of the year again where we debate security settings for user
> > namespaces ;)
> >
> > I’ve been experimenting with different approaches to address the gripe
> > around user namespaces being used as attack vectors.
> > After invaluable feedback from Serge and Christian offline, this is what I
> > came up with.
>
> As Serge is the capabilities maintainer it would be good to hear his
> thoughts on-list about this proposal.

Also it would make sense to make this just a bit more digestible to a
wider group of maintainers, i.e. a better introduction to the topic
instead of huge list of references (no bandwidth to read them all).

This is exactly kind of patch set that makes you ignore it unless
you are pro-active exactly in this domain.

I think this could bring more actually useful feedback.

BR, Jarkko

Re: [PATCH 0/3] Introduce user namespace capabilities

Posted by Ben Boeckel 1 year, 9 months ago

On Thu, May 16, 2024 at 02:22:02 -0700, Jonathan Calmels wrote:
> Jonathan Calmels (3):
>   capabilities: user namespace capabilities
>   capabilities: add securebit for strict userns caps
>   capabilities: add cap userns sysctl mask
> 
>  fs/proc/array.c                 |  9 ++++
>  include/linux/cred.h            |  3 ++
>  include/linux/securebits.h      |  1 +
>  include/linux/user_namespace.h  |  7 +++
>  include/uapi/linux/prctl.h      |  7 +++
>  include/uapi/linux/securebits.h | 11 ++++-
>  kernel/cred.c                   |  3 ++
>  kernel/sysctl.c                 | 10 ++++
>  kernel/umh.c                    | 16 +++++++
>  kernel/user_namespace.c         | 83 ++++++++++++++++++++++++++++++---
>  security/commoncap.c            | 59 +++++++++++++++++++++++
>  security/keys/process_keys.c    |  3 ++
>  12 files changed, 204 insertions(+), 8 deletions(-)

I note a lack of any changes to `Documentation/` which seems quite
glaring for something with such a userspace visibility aspect to it.

--Ben

Re: [PATCH 0/3] Introduce user namespace capabilities

Posted by Jarkko Sakkinen 1 year, 9 months ago

On Thu May 16, 2024 at 4:30 PM EEST, Ben Boeckel wrote:
> On Thu, May 16, 2024 at 02:22:02 -0700, Jonathan Calmels wrote:
> > Jonathan Calmels (3):
> >   capabilities: user namespace capabilities
> >   capabilities: add securebit for strict userns caps
> >   capabilities: add cap userns sysctl mask
> > 
> >  fs/proc/array.c                 |  9 ++++
> >  include/linux/cred.h            |  3 ++
> >  include/linux/securebits.h      |  1 +
> >  include/linux/user_namespace.h  |  7 +++
> >  include/uapi/linux/prctl.h      |  7 +++
> >  include/uapi/linux/securebits.h | 11 ++++-
> >  kernel/cred.c                   |  3 ++
> >  kernel/sysctl.c                 | 10 ++++
> >  kernel/umh.c                    | 16 +++++++
> >  kernel/user_namespace.c         | 83 ++++++++++++++++++++++++++++++---
> >  security/commoncap.c            | 59 +++++++++++++++++++++++
> >  security/keys/process_keys.c    |  3 ++
> >  12 files changed, 204 insertions(+), 8 deletions(-)
>
> I note a lack of any changes to `Documentation/` which seems quite
> glaring for something with such a userspace visibility aspect to it.
>
> --Ben

Yeah, also in cover letter it would be nice to refresh what is
a bounding set. I had to xref that (recalled what it is), and
then got bored reading the rest :-)

Not exactly in the nutshell cover letter tbh, but maybe the
content in that would be better put to Documentation/

BR, Jarkko

Re: [PATCH 0/3] Introduce user namespace capabilities

Posted by Jonathan Calmels 1 year, 9 months ago

On Thu, May 16, 2024 at 04:36:07PM GMT, Jarkko Sakkinen wrote:
> On Thu May 16, 2024 at 4:30 PM EEST, Ben Boeckel wrote:
> > I note a lack of any changes to `Documentation/` which seems quite
> > glaring for something with such a userspace visibility aspect to it.
> >
> > --Ben
> 
> Yeah, also in cover letter it would be nice to refresh what is
> a bounding set. I had to xref that (recalled what it is), and
> then got bored reading the rest :-)

Thanks for reminding me, I actually meant to do it, just forgot.
Having said that, `Documentation/security/credentials.rst` is not the
best documention when it comes to capabilities. I will definitely add
few more lines in there, but it's probably not what you're looking for.

capabilities(7) is where everything is explained, I should have
mentioned it. I could try to summarize the existing sets, but honestly I
will probably do a worse job than the man page.

I do plan to update the man page though if it comes to that.