fs/proc/array.c | 9 ++++ include/linux/cred.h | 3 ++ include/linux/securebits.h | 1 + include/linux/user_namespace.h | 7 +++ include/uapi/linux/prctl.h | 7 +++ include/uapi/linux/securebits.h | 11 ++++- kernel/cred.c | 3 ++ kernel/sysctl.c | 10 ++++ kernel/umh.c | 16 +++++++ kernel/user_namespace.c | 83 ++++++++++++++++++++++++++++++--- security/commoncap.c | 59 +++++++++++++++++++++++ security/keys/process_keys.c | 3 ++ 12 files changed, 204 insertions(+), 8 deletions(-)
It's that time of the year again where we debate security settings for user namespaces ;) I’ve been experimenting with different approaches to address the gripe around user namespaces being used as attack vectors. After invaluable feedback from Serge and Christian offline, this is what I came up with. There are obviously a lot of things we could do differently but I feel this is the right balance between functionality, simplicity and security. This also serves as a good foundation and could always be extended if the need arises in the future. Notes: - Adding a new capability set is far from ideal, but trying to reuse the existing capability framework was deemed both impractical and questionable security-wise, so here we are. - We might want to add new capabilities for some of the checks instead of reusing CAP_SETPCAP every time. Serge mentioned something like CAP_SYS_LIMIT? - In the last patch, we could decide to have stronger requirements and perform checks inside cap_capable() in case we want to retroactively prevent capabilities in old namespaces, this might be an overreach though so I left it out. I'm also not fond of the ulong logic for setting the sysctl parameter, on the other hand, the usermodhelper code always uses two u32s which makes it very confusing to set in userspace. Jonathan Calmels (3): capabilities: user namespace capabilities capabilities: add securebit for strict userns caps capabilities: add cap userns sysctl mask fs/proc/array.c | 9 ++++ include/linux/cred.h | 3 ++ include/linux/securebits.h | 1 + include/linux/user_namespace.h | 7 +++ include/uapi/linux/prctl.h | 7 +++ include/uapi/linux/securebits.h | 11 ++++- kernel/cred.c | 3 ++ kernel/sysctl.c | 10 ++++ kernel/umh.c | 16 +++++++ kernel/user_namespace.c | 83 ++++++++++++++++++++++++++++++--- security/commoncap.c | 59 +++++++++++++++++++++++ security/keys/process_keys.c | 3 ++ 12 files changed, 204 insertions(+), 8 deletions(-) -- 2.45.0
On 5/16/2024 2:22 AM, Jonathan Calmels wrote: > It's that time of the year again where we debate security settings for user > namespaces ;) > > I’ve been experimenting with different approaches to address the gripe > around user namespaces being used as attack vectors. > After invaluable feedback from Serge and Christian offline, this is what I > came up with. > > There are obviously a lot of things we could do differently but I feel this > is the right balance between functionality, simplicity and security. This > also serves as a good foundation and could always be extended if the need > arises in the future. > > Notes: > > - Adding a new capability set is far from ideal, but trying to reuse the > existing capability framework was deemed both impractical and > questionable security-wise, so here we are. I suggest that adding a capability set for user namespaces is a bad idea: - It is in no way obvious what problem it solves - It is not obvious how it solves any problem - The capability mechanism has not been popular, and relying on a community (e.g. container developers) to embrace it based on this enhancement is a recipe for failure - Capabilities are already more complicated than modern developers want to deal with. Adding another, special purpose set, is going to make them even more difficult to use. > - We might want to add new capabilities for some of the checks instead of > reusing CAP_SETPCAP every time. Serge mentioned something like > CAP_SYS_LIMIT? > > - In the last patch, we could decide to have stronger requirements and > perform checks inside cap_capable() in case we want to retroactively > prevent capabilities in old namespaces, this might be an overreach though > so I left it out. > > I'm also not fond of the ulong logic for setting the sysctl parameter, on > the other hand, the usermodhelper code always uses two u32s which makes it > very confusing to set in userspace. > > > Jonathan Calmels (3): > capabilities: user namespace capabilities > capabilities: add securebit for strict userns caps > capabilities: add cap userns sysctl mask > > fs/proc/array.c | 9 ++++ > include/linux/cred.h | 3 ++ > include/linux/securebits.h | 1 + > include/linux/user_namespace.h | 7 +++ > include/uapi/linux/prctl.h | 7 +++ > include/uapi/linux/securebits.h | 11 ++++- > kernel/cred.c | 3 ++ > kernel/sysctl.c | 10 ++++ > kernel/umh.c | 16 +++++++ > kernel/user_namespace.c | 83 ++++++++++++++++++++++++++++++--- > security/commoncap.c | 59 +++++++++++++++++++++++ > security/keys/process_keys.c | 3 ++ > 12 files changed, 204 insertions(+), 8 deletions(-) >
On Thu May 16, 2024 at 10:07 PM EEST, Casey Schaufler wrote: > I suggest that adding a capability set for user namespaces is a bad idea: > - It is in no way obvious what problem it solves > - It is not obvious how it solves any problem > - The capability mechanism has not been popular, and relying on a > community (e.g. container developers) to embrace it based on this > enhancement is a recipe for failure > - Capabilities are already more complicated than modern developers > want to deal with. Adding another, special purpose set, is going > to make them even more difficult to use. What Inh, Prm, Eff, Bnd and Amb is not dead obvious to you? ;-) One UNs cannot hurt... I'm not following containers that much but didn't seccomp profiles supposed to be the silver bullet? BR, Jarkko
On Thu May 16, 2024 at 10:29 PM EEST, Jarkko Sakkinen wrote: > On Thu May 16, 2024 at 10:07 PM EEST, Casey Schaufler wrote: > > I suggest that adding a capability set for user namespaces is a bad idea: > > - It is in no way obvious what problem it solves > > - It is not obvious how it solves any problem > > - The capability mechanism has not been popular, and relying on a > > community (e.g. container developers) to embrace it based on this > > enhancement is a recipe for failure > > - Capabilities are already more complicated than modern developers > > want to deal with. Adding another, special purpose set, is going > > to make them even more difficult to use. > > What Inh, Prm, Eff, Bnd and Amb is not dead obvious to you? ;-) > One UNs cannot hurt... > > I'm not following containers that much but didn't seccomp profiles > supposed to be the silver bullet? Also, I think Kata Containers style way of doing containers is pretty solid. I've heard that some video streaming service at least in recent past did launch VM per stream so it's not like VM's cannot be made to scale I guess. BR, Jarkko
On Thu May 16, 2024 at 10:31 PM EEST, Jarkko Sakkinen wrote: > On Thu May 16, 2024 at 10:29 PM EEST, Jarkko Sakkinen wrote: > > On Thu May 16, 2024 at 10:07 PM EEST, Casey Schaufler wrote: > > > I suggest that adding a capability set for user namespaces is a bad idea: > > > - It is in no way obvious what problem it solves > > > - It is not obvious how it solves any problem > > > - The capability mechanism has not been popular, and relying on a > > > community (e.g. container developers) to embrace it based on this > > > enhancement is a recipe for failure > > > - Capabilities are already more complicated than modern developers > > > want to deal with. Adding another, special purpose set, is going > > > to make them even more difficult to use. > > > > What Inh, Prm, Eff, Bnd and Amb is not dead obvious to you? ;-) > > One UNs cannot hurt... > > > > I'm not following containers that much but didn't seccomp profiles > > supposed to be the silver bullet? > > Also, I think Kata Containers style way of doing containers is pretty > solid. I've heard that some video streaming service at least in recent > past did launch VM per stream so it's not like VM's cannot be made to > scale I guess. Sorry for multiple responses but this actually nails the key question: who will use this? Even if this would work out somehow, is there someone who will actually use this, and not few other more robust solutions available? I mean it is worth of time to maintain it, if there is no potential users for a feature. In addition to "show me the code", there is always also "show me the payload". BR, Jarkko
> > > On Thu May 16, 2024 at 10:07 PM EEST, Casey Schaufler wrote:
> > > > I suggest that adding a capability set for user namespaces is a bad idea:
> > > > - It is in no way obvious what problem it solves
> > > > - It is not obvious how it solves any problem
> > > > - The capability mechanism has not been popular, and relying on a
> > > > community (e.g. container developers) to embrace it based on this
> > > > enhancement is a recipe for failure
> > > > - Capabilities are already more complicated than modern developers
> > > > want to deal with. Adding another, special purpose set, is going
> > > > to make them even more difficult to use.
Sorry if the commit wasn't clear enough. Basically:
- Today user namespaces grant full capabilities.
This behavior is often abused to attack various kernel subsystems.
Only option is to disable them altogether which breaks a lot of
userspace stuff.
This goes against the least privilege principle.
- It adds a new capability set.
This set dictates what capabilities are granted in namespaces (instead
of always getting full caps).
This brings namespaces in line with the rest of the system, user
namespaces are no more "special".
They now work the same way as say a transition to root does with
inheritable caps.
- This isn't intended to be used by end users per se (although they could).
This would be used at the same places where existing capabalities are
used today (e.g. init system, pam, container runtime, browser
sandbox), or by system administrators.
To give you some ideas of things you could do:
# E.g. prevent alice from getting CAP_NET_ADMIN in user namespaces under SSH
echo "auth optional pam_cap.so" >> /etc/pam.d/sshd
echo "!cap_net_admin alice" >> /etc/security/capability.conf.
# E.g. prevent any Docker container from ever getting CAP_DAC_OVERRIDE
systemd-run -p CapabilityBoundingSet=~CAP_DAC_OVERRIDE \
-p SecureBits=userns-strict-caps \
/usr/bin/dockerd
# E.g. kernel could be vulnerable to CAP_SYS_RAWIO exploits
# Prevent users from ever gaining it
sysctl -w cap_bound_userns_mask=0x1fffffdffff
On 5/17/2024 4:42 AM, Jonathan Calmels wrote: >>>> On Thu May 16, 2024 at 10:07 PM EEST, Casey Schaufler wrote: >>>>> I suggest that adding a capability set for user namespaces is a bad idea: >>>>> - It is in no way obvious what problem it solves >>>>> - It is not obvious how it solves any problem >>>>> - The capability mechanism has not been popular, and relying on a >>>>> community (e.g. container developers) to embrace it based on this >>>>> enhancement is a recipe for failure >>>>> - Capabilities are already more complicated than modern developers >>>>> want to deal with. Adding another, special purpose set, is going >>>>> to make them even more difficult to use. > Sorry if the commit wasn't clear enough. While, as others have pointed out, the commit description left much to be desired, that isn't the biggest problem with the change you're proposing. > Basically: > > - Today user namespaces grant full capabilities. Of course they do. I have been following the use of capabilities in Linux since before they were implemented. The uptake has been disappointing in all use cases. > This behavior is often abused to attack various kernel subsystems. Yes. The problems of a single, all powerful root privilege scheme are well documented. > Only option Hardly. > is to disable them altogether which breaks a lot of > userspace stuff. Updating userspace components to behave properly in a capabilities environment has never been a popular activity, but is the right way to address this issue. And before you start on the "no one can do that, it's too hard", I'll point out that multiple UNIX systems supported rootless, all capabilities based systems back in the day. > This goes against the least privilege principle. If you're going to run userspace that *requires* privilege, you have to have a way to *allow* privilege. If the userspace insists on a root based privilege model, you're stuck supporting it. Regardless of your principles. > > - It adds a new capability set. Which is a really, really bad idea. The equation for calculating effective privilege is already more complicated than userspace developers are generally willing to put up with. > This set dictates what capabilities are granted in namespaces (instead > of always getting full caps). I would not expect container developers to be eager to learn how to use this facility. > This brings namespaces in line with the rest of the system, user > namespaces are no more "special". I'm sorry, but this makes no sense to me whatsoever. You want to introduce a capability set explicitly for namespaces in order to make them less special? Maybe I'm just old and cranky. > They now work the same way as say a transition to root does with > inheritable caps. That needs some explanation. > > - This isn't intended to be used by end users per se (although they could). > This would be used at the same places where existing capabalities are > used today (e.g. init system, pam, container runtime, browser > sandbox), or by system administrators. I understand that. It is for containers. Containers are not kernel entities. > > To give you some ideas of things you could do: > > # E.g. prevent alice from getting CAP_NET_ADMIN in user namespaces under SSH > echo "auth optional pam_cap.so" >> /etc/pam.d/sshd > echo "!cap_net_admin alice" >> /etc/security/capability.conf. > > # E.g. prevent any Docker container from ever getting CAP_DAC_OVERRIDE > systemd-run -p CapabilityBoundingSet=~CAP_DAC_OVERRIDE \ > -p SecureBits=userns-strict-caps \ > /usr/bin/dockerd > > # E.g. kernel could be vulnerable to CAP_SYS_RAWIO exploits > # Prevent users from ever gaining it > sysctl -w cap_bound_userns_mask=0x1fffffdffff
On Fri, May 17, 2024 at 10:53:24AM -0700, Casey Schaufler wrote: > On 5/17/2024 4:42 AM, Jonathan Calmels wrote: > >>>> On Thu May 16, 2024 at 10:07 PM EEST, Casey Schaufler wrote: > >>>>> I suggest that adding a capability set for user namespaces is a bad idea: > >>>>> - It is in no way obvious what problem it solves > >>>>> - It is not obvious how it solves any problem > >>>>> - The capability mechanism has not been popular, and relying on a > >>>>> community (e.g. container developers) to embrace it based on this > >>>>> enhancement is a recipe for failure > >>>>> - Capabilities are already more complicated than modern developers > >>>>> want to deal with. Adding another, special purpose set, is going > >>>>> to make them even more difficult to use. > > Sorry if the commit wasn't clear enough. > > While, as others have pointed out, the commit description left > much to be desired, that isn't the biggest problem with the change > you're proposing. > > > Basically: > > > > - Today user namespaces grant full capabilities. > > Of course they do. I have been following the use of capabilities > in Linux since before they were implemented. The uptake has been > disappointing in all use cases. > > > This behavior is often abused to attack various kernel subsystems. > > Yes. The problems of a single, all powerful root privilege scheme are > well documented. > > > Only option > > Hardly. > > > is to disable them altogether which breaks a lot of > > userspace stuff. > > Updating userspace components to behave properly in a capabilities > environment has never been a popular activity, but is the right way > to address this issue. And before you start on the "no one can do that, > it's too hard", I'll point out that multiple UNIX systems supported > rootless, all capabilities based systems back in the day. > > > This goes against the least privilege principle. > > If you're going to run userspace that *requires* privilege, you have > to have a way to *allow* privilege. If the userspace insists on a root > based privilege model, you're stuck supporting it. Regardless of your > principles. Casey, I might be wrong, but I think you're misreading this patchset. It is not about limiting capabilities in the init user ns at all. It's about limiting the capabilities which a process in a child userns can get. Any unprivileged task can create a new userns, and get a process with all capabilities in that namespace. Always. User namespaces were a great success in that we can do this without any resulting privilege against host owned resources. The unaddressed issue is the expanded kernel code surface area. You say, above, (quoting out of place here) > Updating userspace components to behave properly in a capabilities > environment has never been a popular activity, but is the right way > to address this issue. And before you start on the "no one can do that, > it's too hard", I'll point out that multiple UNIX systems supported He's not saying no one can do that. He's saying, correctly, that the kernel currently offers no way for userspace to do this limiting. His patchset offers two ways: one system wide capability mask (which applies only to non-initial user namespaces) and on per-process inherited one which - yay - userspace can use to limit what its children will be able to get if they unshare a user namespace. > > - It adds a new capability set. > > Which is a really, really bad idea. The equation for calculating effective > privilege is already more complicated than userspace developers are generally > willing to put up with. This is somewhat true, but I think the semantics of what is proposed here are about as straightforward as you could hope for, and you can basically reason about them completely independently of the other sets. Only when reasoning about the correctness of this code do you need to consider the other sets. Not when administering a system. If you want root in a child user namespace to not have CAP_MAC_ADMIN, you drop it from your pU. Simple as that. > > This set dictates what capabilities are granted in namespaces (instead > > of always getting full caps). > > I would not expect container developers to be eager to learn how to use > this facility. I'm a container developer, and I'm excited about it :) > > This brings namespaces in line with the rest of the system, user > > namespaces are no more "special". > > I'm sorry, but this makes no sense to me whatsoever. You want to introduce > a capability set explicitly for namespaces in order to make them less > special? Yes, exactly. > Maybe I'm just old and cranky. That's fine. > > They now work the same way as say a transition to root does with > > inheritable caps. > > That needs some explanation. > > > > > - This isn't intended to be used by end users per se (although they could). > > This would be used at the same places where existing capabalities are > > used today (e.g. init system, pam, container runtime, browser > > sandbox), or by system administrators. > > I understand that. It is for containers. Containers are not kernel entities. User namespaces are. This patch set provides userspace a way of limiting the kernel code exposed to untrusted children, which currently does not exist. > > To give you some ideas of things you could do: > > > > # E.g. prevent alice from getting CAP_NET_ADMIN in user namespaces under SSH > > echo "auth optional pam_cap.so" >> /etc/pam.d/sshd > > echo "!cap_net_admin alice" >> /etc/security/capability.conf. > > > > # E.g. prevent any Docker container from ever getting CAP_DAC_OVERRIDE > > systemd-run -p CapabilityBoundingSet=~CAP_DAC_OVERRIDE \ > > -p SecureBits=userns-strict-caps \ > > /usr/bin/dockerd > > > > # E.g. kernel could be vulnerable to CAP_SYS_RAWIO exploits > > # Prevent users from ever gaining it > > sysctl -w cap_bound_userns_mask=0x1fffffdffff
On 5/18/24 05:20, Serge Hallyn wrote: > On Fri, May 17, 2024 at 10:53:24AM -0700, Casey Schaufler wrote: >> On 5/17/2024 4:42 AM, Jonathan Calmels wrote: >>>>>> On Thu May 16, 2024 at 10:07 PM EEST, Casey Schaufler wrote: >>>>>>> I suggest that adding a capability set for user namespaces is a bad idea: >>>>>>> - It is in no way obvious what problem it solves >>>>>>> - It is not obvious how it solves any problem >>>>>>> - The capability mechanism has not been popular, and relying on a >>>>>>> community (e.g. container developers) to embrace it based on this >>>>>>> enhancement is a recipe for failure >>>>>>> - Capabilities are already more complicated than modern developers >>>>>>> want to deal with. Adding another, special purpose set, is going >>>>>>> to make them even more difficult to use. >>> Sorry if the commit wasn't clear enough. >> >> While, as others have pointed out, the commit description left >> much to be desired, that isn't the biggest problem with the change >> you're proposing. >> >>> Basically: >>> >>> - Today user namespaces grant full capabilities. >> >> Of course they do. I have been following the use of capabilities >> in Linux since before they were implemented. The uptake has been >> disappointing in all use cases. >> >>> This behavior is often abused to attack various kernel subsystems. >> >> Yes. The problems of a single, all powerful root privilege scheme are >> well documented. >> >>> Only option >> >> Hardly. >> >>> is to disable them altogether which breaks a lot of >>> userspace stuff. >> >> Updating userspace components to behave properly in a capabilities >> environment has never been a popular activity, but is the right way >> to address this issue. And before you start on the "no one can do that, >> it's too hard", I'll point out that multiple UNIX systems supported >> rootless, all capabilities based systems back in the day. >> >>> This goes against the least privilege principle. >> >> If you're going to run userspace that *requires* privilege, you have >> to have a way to *allow* privilege. If the userspace insists on a root >> based privilege model, you're stuck supporting it. Regardless of your >> principles. > > Casey, > > I might be wrong, but I think you're misreading this patchset. It is not > about limiting capabilities in the init user ns at all. It's about limiting > the capabilities which a process in a child userns can get. > > Any unprivileged task can create a new userns, and get a process with > all capabilities in that namespace. Always. User namespaces were a > great success in that we can do this without any resulting privilege > against host owned resources. The unaddressed issue is the expanded > kernel code surface area. > > You say, above, (quoting out of place here) > >> Updating userspace components to behave properly in a capabilities >> environment has never been a popular activity, but is the right way >> to address this issue. And before you start on the "no one can do that, >> it's too hard", I'll point out that multiple UNIX systems supported > > He's not saying no one can do that. He's saying, correctly, that the > kernel currently offers no way for userspace to do this limiting. His > patchset offers two ways: one system wide capability mask (which applies > only to non-initial user namespaces) and on per-process inherited one > which - yay - userspace can use to limit what its children will be > able to get if they unshare a user namespace. > >>> - It adds a new capability set. >> >> Which is a really, really bad idea. The equation for calculating effective >> privilege is already more complicated than userspace developers are generally >> willing to put up with. > > This is somewhat true, but I think the semantics of what is proposed here are > about as straightforward as you could hope for, and you can basically reason > about them completely independently of the other sets. Only when reasoning > about the correctness of this code do you need to consider the other sets. Not > when administering a system. > > If you want root in a child user namespace to not have CAP_MAC_ADMIN, you drop > it from your pU. Simple as that. > >>> This set dictates what capabilities are granted in namespaces (instead >>> of always getting full caps). >> >> I would not expect container developers to be eager to learn how to use >> this facility. > > I'm a container developer, and I'm excited about it :) > >>> This brings namespaces in line with the rest of the system, user >>> namespaces are no more "special". >> >> I'm sorry, but this makes no sense to me whatsoever. You want to introduce >> a capability set explicitly for namespaces in order to make them less >> special? > > Yes, exactly. > >> Maybe I'm just old and cranky. > > That's fine. > >>> They now work the same way as say a transition to root does with >>> inheritable caps. >> >> That needs some explanation. >> >>> >>> - This isn't intended to be used by end users per se (although they could). >>> This would be used at the same places where existing capabalities are >>> used today (e.g. init system, pam, container runtime, browser >>> sandbox), or by system administrators. >> >> I understand that. It is for containers. Containers are not kernel entities. > > User namespaces are. > > This patch set provides userspace a way of limiting the kernel code exposed > to untrusted children, which currently does not exist. > theoretically, I am worried that in practice the existing utils allow untrusted code to still access user namespaces. In practice we have found that we need to allow a different set of capabilities when bwrap is called from flatpak than when called on its own etc. We see the same pattern with unshare and other utilities around launching applications in user namespaces. In practice at the distro level I don't see this approach actually helping. Because we have so many uses that require exposing close to the full capabilities set in multiple utilities that are required by many different applications. To be clear this doesn't stop distros from doing something more, but is it worth the added complexity if in practice it can't be used effectively. I really don't have the answer.
On 5/18/2024 5:20 AM, Serge Hallyn wrote: > On Fri, May 17, 2024 at 10:53:24AM -0700, Casey Schaufler wrote: >> On 5/17/2024 4:42 AM, Jonathan Calmels wrote: >>>>>> On Thu May 16, 2024 at 10:07 PM EEST, Casey Schaufler wrote: >>>>>>> I suggest that adding a capability set for user namespaces is a bad idea: >>>>>>> - It is in no way obvious what problem it solves >>>>>>> - It is not obvious how it solves any problem >>>>>>> - The capability mechanism has not been popular, and relying on a >>>>>>> community (e.g. container developers) to embrace it based on this >>>>>>> enhancement is a recipe for failure >>>>>>> - Capabilities are already more complicated than modern developers >>>>>>> want to deal with. Adding another, special purpose set, is going >>>>>>> to make them even more difficult to use. >>> Sorry if the commit wasn't clear enough. >> While, as others have pointed out, the commit description left >> much to be desired, that isn't the biggest problem with the change >> you're proposing. >> >>> Basically: >>> >>> - Today user namespaces grant full capabilities. >> Of course they do. I have been following the use of capabilities >> in Linux since before they were implemented. The uptake has been >> disappointing in all use cases. >> >>> This behavior is often abused to attack various kernel subsystems. >> Yes. The problems of a single, all powerful root privilege scheme are >> well documented. >> >>> Only option >> Hardly. >> >>> is to disable them altogether which breaks a lot of >>> userspace stuff. >> Updating userspace components to behave properly in a capabilities >> environment has never been a popular activity, but is the right way >> to address this issue. And before you start on the "no one can do that, >> it's too hard", I'll point out that multiple UNIX systems supported >> rootless, all capabilities based systems back in the day. >> >>> This goes against the least privilege principle. >> If you're going to run userspace that *requires* privilege, you have >> to have a way to *allow* privilege. If the userspace insists on a root >> based privilege model, you're stuck supporting it. Regardless of your >> principles. > Casey, > > I might be wrong, but I think you're misreading this patchset. It is not > about limiting capabilities in the init user ns at all. It's about limiting > the capabilities which a process in a child userns can get. I do understand that. My objection is not to the intent, but to the approach. Adding a capability set to the general mechanism in support of a limited, specific use case seems wrong to me. I would rather see a mechanism in userns to limit the capabilities in a user namespace than a mechanism in capabilities that is specific to user namespaces. > Any unprivileged task can create a new userns, and get a process with > all capabilities in that namespace. Always. User namespaces were a > great success in that we can do this without any resulting privilege > against host owned resources. The unaddressed issue is the expanded > kernel code surface area. An option to clone() then, to limit the capabilities available? I honestly can't recall if that has been suggested elsewhere, and apologize if it's already been dismissed as a stoopid idea. > > You say, above, (quoting out of place here) > >> Updating userspace components to behave properly in a capabilities >> environment has never been a popular activity, but is the right way >> to address this issue. And before you start on the "no one can do that, >> it's too hard", I'll point out that multiple UNIX systems supported > He's not saying no one can do that. He's saying, correctly, that the > kernel currently offers no way for userspace to do this limiting. His > patchset offers two ways: one system wide capability mask (which applies > only to non-initial user namespaces) and on per-process inherited one > which - yay - userspace can use to limit what its children will be > able to get if they unshare a user namespace. > >>> - It adds a new capability set. >> Which is a really, really bad idea. The equation for calculating effective >> privilege is already more complicated than userspace developers are generally >> willing to put up with. > This is somewhat true, but I think the semantics of what is proposed here are > about as straightforward as you could hope for, and you can basically reason > about them completely independently of the other sets. Only when reasoning > about the correctness of this code do you need to consider the other sets. Not > when administering a system. > > If you want root in a child user namespace to not have CAP_MAC_ADMIN, you drop > it from your pU. Simple as that. > >>> This set dictates what capabilities are granted in namespaces (instead >>> of always getting full caps). >> I would not expect container developers to be eager to learn how to use >> this facility. > I'm a container developer, and I'm excited about it :) OK, well, I'm wrong. It's happened before and will happen again. > >>> This brings namespaces in line with the rest of the system, user >>> namespaces are no more "special". >> I'm sorry, but this makes no sense to me whatsoever. You want to introduce >> a capability set explicitly for namespaces in order to make them less >> special? > Yes, exactly. Hmm. I can't say I buy that. It makes a whole lot more sense to me to change userns than to change capabilities. > >> Maybe I'm just old and cranky. > That's fine. > >>> They now work the same way as say a transition to root does with >>> inheritable caps. >> That needs some explanation. >> >>> - This isn't intended to be used by end users per se (although they could). >>> This would be used at the same places where existing capabalities are >>> used today (e.g. init system, pam, container runtime, browser >>> sandbox), or by system administrators. >> I understand that. It is for containers. Containers are not kernel entities. > User namespaces are. > > This patch set provides userspace a way of limiting the kernel code exposed > to untrusted children, which currently does not exist. Yes, I understand. I would rather see a change to userns in support of a userns specific need than a change to capabilities for a userns specific need. >>> To give you some ideas of things you could do: >>> >>> # E.g. prevent alice from getting CAP_NET_ADMIN in user namespaces under SSH >>> echo "auth optional pam_cap.so" >> /etc/pam.d/sshd >>> echo "!cap_net_admin alice" >> /etc/security/capability.conf. >>> >>> # E.g. prevent any Docker container from ever getting CAP_DAC_OVERRIDE >>> systemd-run -p CapabilityBoundingSet=~CAP_DAC_OVERRIDE \ >>> -p SecureBits=userns-strict-caps \ >>> /usr/bin/dockerd >>> >>> # E.g. kernel could be vulnerable to CAP_SYS_RAWIO exploits >>> # Prevent users from ever gaining it >>> sysctl -w cap_bound_userns_mask=0x1fffffdffff
On Sun, May 19, 2024 at 10:03:29AM GMT, Casey Schaufler wrote: > I do understand that. My objection is not to the intent, but to the approach. > Adding a capability set to the general mechanism in support of a limited, specific > use case seems wrong to me. I would rather see a mechanism in userns to limit > the capabilities in a user namespace than a mechanism in capabilities that is > specific to user namespaces. > An option to clone() then, to limit the capabilities available? > I honestly can't recall if that has been suggested elsewhere, and > apologize if it's already been dismissed as a stoopid idea. No and you're right, this would also make sense. This was considered as well as things like ioctl_ns() (basically introducing the concept of capabilities in the user_namespace struct). I also considered reusing the existing sets with various schemes to no avail. The main issue with this approach is that you've to consider how this is going to be used. This ties into the other thread we've had with John and Eric. Basically, we're coming from a model where things are wide open and we're trying to tighten things down. Quoting John here: > We are starting from a different posture here. Where applications have > assumed that user namespaces where safe and no measures were needed. > Tools like unshare and bwrap if set to allow user namespaces in their > fcaps will allow exploits a trivial by-pass. We can't really expect userspace to patch every single userns callsite and opt-in this new security mechanism. You said it well yourself: > Capabilities are already more complicated than modern developers > want to deal with. Moreover, policies are not necessarily enforced at said callsites. Take for example a service like systemd-machined, or a PAM session. Those need to be able to place restrictions on any processes spawned under them. If we do this in clone() (or similar), we'll also need to come up with inheritance rules, being able to query capabilities, etc. At this point we're just reinventing capability sets. Finally the nice thing about having it as a capability set, is that we can easily define rules between them. Patch 2 is a good example of this. It constrains the userns set to the bounding set of a task. Thus, requiring minimal/no change to userspace, and helping with adoption. > Yes, I understand. I would rather see a change to userns in support of a userns > specific need than a change to capabilities for a userns specific need. Valid point, but at the end of the day, those are really just tasks' capabilities. The unshare() just happens to trigger specific rules when it comes to the tasks' creds. This isn't so different than the other sets and their specific rules for execve() or UID 0. This could also be reframed as: Why would setting capabilities on taks in a userns be so different than tasks outside of it?
On Fri, May 17, 2024 at 10:53:24AM GMT, Casey Schaufler wrote: > Of course they do. I have been following the use of capabilities > in Linux since before they were implemented. The uptake has been > disappointing in all use cases. Why "Of course"? What if they should not get *all* privileges? > Yes. The problems of a single, all powerful root privilege scheme are > well documented. That's my point, it doesn't have to be this way. > Hardly. Maybe I'm missing something, then. How do I restrict my users from gaining say CAP_NET_ADMIN in their userns today? > If you're going to run userspace that *requires* privilege, you have > to have a way to *allow* privilege. If the userspace insists on a root > based privilege model, you're stuck supporting it. Regardless of your > principles. I want *some* privileges, not *all* of them. > Which is a really, really bad idea. The equation for calculating effective > privilege is already more complicated than userspace developers are generally > willing to put up with. This is generally true, but this set is way more straightforward than the other sets, it's always: pU = pP = pE = X If you look at the patch, there is no transition logic or anything complicated, it's just a set of caps behind inherited. > I would not expect container developers to be eager to learn how to use > this facility. And they probably wouldn't. For most use cases it's going to be enforced through system policies (init, pam, etc). Other than that, usage won't change, you will run your usual `docker run --cap-add ...` to get caps, except now it works in userns. > I'm sorry, but this makes no sense to me whatsoever. You want to introduce > a capability set explicitly for namespaces in order to make them less > special? Maybe I'm just old and cranky. > > > They now work the same way as say a transition to root does with > > inheritable caps. > > That needs some explanation. From man capabilities(7): In order to mirror traditional UNIX semantics, the kernel performs special treatment of file capabilities when a process with UID 0 (root) executes a program [...] Thus, when [...] a process whose real and effective UIDs are zero execve(2)s a program, the calculation of the process's new permitted capabilities simplifies to: P'(permitted) = P(inheritable) | P(bounding) P'(effective) = P'(permitted) So, the same way a root process is bounded by its inheritable set when it execs, a "rootless" process is bounded by its userns set when it unshares.
On Fri May 17, 2024 at 10:11 PM EEST, Jonathan Calmels wrote: > On Fri, May 17, 2024 at 10:53:24AM GMT, Casey Schaufler wrote: > > Of course they do. I have been following the use of capabilities > > in Linux since before they were implemented. The uptake has been > > disappointing in all use cases. > > Why "Of course"? > What if they should not get *all* privileges? They do the job given a real-world workload and stress test. Here the problem is based on a theory and an experiment. Even a formal model does not necessarily map all "unknown unknowns". BR, Jarkko
On Sat May 18, 2024 at 2:08 PM EEST, Jarkko Sakkinen wrote: > On Fri May 17, 2024 at 10:11 PM EEST, Jonathan Calmels wrote: > > On Fri, May 17, 2024 at 10:53:24AM GMT, Casey Schaufler wrote: > > > Of course they do. I have been following the use of capabilities > > > in Linux since before they were implemented. The uptake has been > > > disappointing in all use cases. > > > > Why "Of course"? > > What if they should not get *all* privileges? > > They do the job given a real-world workload and stress test. > > Here the problem is based on a theory and an experiment. > > Even a formal model does not necessarily map all "unknown unknowns". So this was like the worst "sales pitch" ever: 1. The cover letter starts with the idea of having to argue about name spaces, and have fun while doing that ;-) We all have our own ways to entertain ourselves but "name space duels" are not my thing. Why not just start with why we all want this instead? Maybe we don't want it then. Maybe this is just useless spam given the angle presented? 2. There's shitloads of computer science and set theory but nothing that would make common sense. You need to build more understandable model. There's zero "gist" in this work. Maybe this does make sense but the story around it sucks so far. BR, Jarkko
On Sat May 18, 2024 at 2:17 PM EEST, Jarkko Sakkinen wrote: > On Sat May 18, 2024 at 2:08 PM EEST, Jarkko Sakkinen wrote: > > On Fri May 17, 2024 at 10:11 PM EEST, Jonathan Calmels wrote: > > > On Fri, May 17, 2024 at 10:53:24AM GMT, Casey Schaufler wrote: > > > > Of course they do. I have been following the use of capabilities > > > > in Linux since before they were implemented. The uptake has been > > > > disappointing in all use cases. > > > > > > Why "Of course"? > > > What if they should not get *all* privileges? > > > > They do the job given a real-world workload and stress test. > > > > Here the problem is based on a theory and an experiment. > > > > Even a formal model does not necessarily map all "unknown unknowns". > > So this was like the worst "sales pitch" ever: > > 1. The cover letter starts with the idea of having to argue about name > spaces, and have fun while doing that ;-) We all have our own ways to > entertain ourselves but "name space duels" are not my thing. Why not > just start with why we all want this instead? Maybe we don't want it > then. Maybe this is just useless spam given the angle presented? > 2. There's shitloads of computer science and set theory but nothing > that would make common sense. You need to build more understandable > model. There's zero "gist" in this work. > > Maybe this does make sense but the story around it sucks so far. One tip: I think this is wrong forum to present namespace ideas in the first place. It would be probably better to talk about this with e.g. systemd or podman developers, and similar groups. There's zero evidence of the usefulness. Then when you go that route and come back with actual users, things click much more easily. Now this is all in the void. BR, Jarkko
On 5/18/24 04:21, Jarkko Sakkinen wrote: > On Sat May 18, 2024 at 2:17 PM EEST, Jarkko Sakkinen wrote: >> On Sat May 18, 2024 at 2:08 PM EEST, Jarkko Sakkinen wrote: >>> On Fri May 17, 2024 at 10:11 PM EEST, Jonathan Calmels wrote: >>>> On Fri, May 17, 2024 at 10:53:24AM GMT, Casey Schaufler wrote: >>>>> Of course they do. I have been following the use of capabilities >>>>> in Linux since before they were implemented. The uptake has been >>>>> disappointing in all use cases. >>>> >>>> Why "Of course"? >>>> What if they should not get *all* privileges? >>> >>> They do the job given a real-world workload and stress test. >>> >>> Here the problem is based on a theory and an experiment. >>> >>> Even a formal model does not necessarily map all "unknown unknowns". >> >> So this was like the worst "sales pitch" ever: >> >> 1. The cover letter starts with the idea of having to argue about name >> spaces, and have fun while doing that ;-) We all have our own ways to >> entertain ourselves but "name space duels" are not my thing. Why not >> just start with why we all want this instead? Maybe we don't want it >> then. Maybe this is just useless spam given the angle presented? >> 2. There's shitloads of computer science and set theory but nothing >> that would make common sense. You need to build more understandable >> model. There's zero "gist" in this work. >> >> Maybe this does make sense but the story around it sucks so far. > > One tip: I think this is wrong forum to present namespace ideas in the > first place. It would be probably better to talk about this with e.g. > systemd or podman developers, and similar groups. There's zero evidence > of the usefulness. Then when you go that route and come back with actual > users, things click much more easily. Now this is all in the void. > > BR, Jarkko Jarkko, this is very much the right forum. User namespaces exist today. This is a discussion around trying to reduce the exposed kernel surface that is being used to attack the kernel.
On Tue May 21, 2024 at 4:57 PM EEST, John Johansen wrote: > > One tip: I think this is wrong forum to present namespace ideas in the > > first place. It would be probably better to talk about this with e.g. > > systemd or podman developers, and similar groups. There's zero evidence > > of the usefulness. Then when you go that route and come back with actual > > users, things click much more easily. Now this is all in the void. > > > > BR, Jarkko > > Jarkko, > > this is very much the right forum. User namespaces exist today. This > is a discussion around trying to reduce the exposed kernel surface > that is being used to attack the kernel. Agreed, that was harsh way to put it. What I mean is that if this feature was included, would it be enabled by distributions? This user base part or potential user space part is not very well described in the cover letter. I.e. "motivation" to put it short. I mean the technical details are really in detail in this patch set but it would help to digest them if there was some even rough description how this would be deployed. If the motivation should be obvious, then it is beyond me, and thus would be nice if that obvious thing was stated that everyone else gets. E.g. I like to sometimes just test quite alien patch sets for the sake of learning and fun (or not so fun, depends) but this patch set does not deliver enough information to do anything at all. Hope this clears a bit where I stand. IMHO a good patch set should bring the details to the specialists on the topic but also have some wider audience motivational stuff in order to make clear where it fits in this world :-) BR, Jarkko
On 5/21/24 07:12, Jarkko Sakkinen wrote: > On Tue May 21, 2024 at 4:57 PM EEST, John Johansen wrote: >>> One tip: I think this is wrong forum to present namespace ideas in the >>> first place. It would be probably better to talk about this with e.g. >>> systemd or podman developers, and similar groups. There's zero evidence >>> of the usefulness. Then when you go that route and come back with actual >>> users, things click much more easily. Now this is all in the void. >>> >>> BR, Jarkko >> >> Jarkko, >> >> this is very much the right forum. User namespaces exist today. This >> is a discussion around trying to reduce the exposed kernel surface >> that is being used to attack the kernel. > > Agreed, that was harsh way to put it. What I mean is that if this > feature was included, would it be enabled by distributions? > Enabled, maybe? It requires the debian distros to make sure their packaging supports xattrs correctly. It should be good but it isn't well exercised. It also requires the work to set these on multiple applications. From experience we are talking 100s. It will break out of repo applications, and require an extra step for users to enable. Ubuntu is already breaking these but for many, of the more popular ones they are shipping profiles so the users don't have to take an extra step. Things like appimages remain broken and wil require an approach similar to the Mac with unverified software downloaded from the internet. Nor does this fix the bwrap, unshare, ... use case. Which means the distro is going to have to continue shipping an alternate solution that covers those. For Ubuntu atm this is just an extra point of friction but I expect we would still end up enabling it to tick the checkbox at some point if it goes into the upstream kernel. > This user base part or potential user space part is not very well > described in the cover letter. I.e. "motivation" to put it short. > yes the cover letter needs work > I mean the technical details are really in detail in this patch set but > it would help to digest them if there was some even rough description > how this would be deployed. > yes > If the motivation should be obvious, then it is beyond me, and thus > would be nice if that obvious thing was stated that everyone else gets. > sure. The cover letter will get updated with this. Seeing as I have been dealing with this a lot lately. It comes down to user namespaces allow unprivileged code to access kernel surface area that is usually protected behind capabilities. This has been leveraged as part of the exploit chain in the majority of kernel exploits we are seeing. > E.g. I like to sometimes just test quite alien patch sets for the sake > of learning and fun (or not so fun, depends) but this patch set does not > deliver enough information to do anything at all. > under stood, I am playing devils advocate here. Its not that I don't see value in the proposal, but that I am not sure I see enough value with the current situation, where so much code has been written around the assumption that unprivileged user namespaces are safe. Trying to fix the situation without breaking everything is complicated. > Hope this clears a bit where I stand. IMHO a good patch set should bring > the details to the specialists on the topic but also have some wider > audience motivational stuff in order to make clear where it fits in this > world :-) > > BR, Jarkko >
On Tue, May 21, 2024 at 07:45:20AM GMT, John Johansen wrote: > On 5/21/24 07:12, Jarkko Sakkinen wrote: > > On Tue May 21, 2024 at 4:57 PM EEST, John Johansen wrote: > > > > One tip: I think this is wrong forum to present namespace ideas in the > > > > first place. It would be probably better to talk about this with e.g. > > > > systemd or podman developers, and similar groups. There's zero evidence > > > > of the usefulness. Then when you go that route and come back with actual > > > > users, things click much more easily. Now this is all in the void. > > > > > > > > BR, Jarkko > > > > > > Jarkko, > > > > > > this is very much the right forum. User namespaces exist today. This > > > is a discussion around trying to reduce the exposed kernel surface > > > that is being used to attack the kernel. > > > > Agreed, that was harsh way to put it. What I mean is that if this > > feature was included, would it be enabled by distributions? > > > Enabled, maybe? It requires the debian distros to make sure their > packaging supports xattrs correctly. It should be good but it isn't > well exercised. It also requires the work to set these on multiple > applications. From experience we are talking 100s. > > It will break out of repo applications, and require an extra step for > users to enable. Ubuntu is already breaking these but for many, of the > more popular ones they are shipping profiles so the users don't have > to take an extra step. Things like appimages remain broken and wil > require an approach similar to the Mac with unverified software > downloaded from the internet. > > Nor does this fix the bwrap, unshare, ... use case. Which means the > distro is going to have to continue shipping an alternate solution > that covers those. For Ubuntu atm this is just an extra point of > friction but I expect we would still end up enabling it to tick the > checkbox at some point if it goes into the upstream kernel. I'm not sure I understand your point here and how this relates to xattrs. This patchset has nothing to do with file capabilities. The userns capability set is purely a process based capability set and in no way influenced by file attributes. > > This user base part or potential user space part is not very well > > described in the cover letter. I.e. "motivation" to put it short. > > > yes the cover letter needs work Yes, it's been mentioned several times already. While not in the cover letter, the motivation is stated in the first patch and provides several references to past discussions on the topic. This is nothing new, this subject has been contentious for years now and discussed over and over on these lists (Eric would know :)). As mentioned in the patch also, this recently warranted the inclusion of new LSM hooks. But again, I wrongfully assumed that this problem was well understood and still relatively fresh, that's my bad. > > I mean the technical details are really in detail in this patch set but > > it would help to digest them if there was some even rough description > > how this would be deployed. > > > yes Yes, this was purposefully left out so as not to influence any specific implementation. There is a mention of where this could be done (i.e. init, pam), but at the end of the day, this is going to depend on each use case. Having said that, since it appears to be confusing, maybe we could add some of the examples I sent out in this thread or the other ones. I want to reiterate that this is a generic capability set, this is not magic switch you turn on to secure the whole system. Its implementation is going to vary across environments and it is going to be dictated by your threat model. For example, John's threat model of securing a multi-user Ubuntu Desktop is going to be very different than say securing a server where all the userspace is fixed and known. The former might require additional integration with the LSM subsystem. Thankfully, this patch should synergize well with it. Fundamentally, and at its core, it's very simple. Serge put it nicely: > If you want root in a child user namespace to not have CAP_MAC_ADMIN, > you drop it from your pU. Simple as that. From there, you can imagine any integration you want in userspace and ways to enforce your own policies. TLDR, this is a first step towards empowering userspace with control over capabilities granted by a userns. At present, the kernel does not offer ways to do this. By itself, it is not a comprehensive solution designed to thwart threat actors. However, it gives userspace the option to do so.
On 5/21/24 17:45, Jonathan Calmels wrote: > On Tue, May 21, 2024 at 07:45:20AM GMT, John Johansen wrote: >> On 5/21/24 07:12, Jarkko Sakkinen wrote: >>> On Tue May 21, 2024 at 4:57 PM EEST, John Johansen wrote: >>>>> One tip: I think this is wrong forum to present namespace ideas in the >>>>> first place. It would be probably better to talk about this with e.g. >>>>> systemd or podman developers, and similar groups. There's zero evidence >>>>> of the usefulness. Then when you go that route and come back with actual >>>>> users, things click much more easily. Now this is all in the void. >>>>> >>>>> BR, Jarkko >>>> >>>> Jarkko, >>>> >>>> this is very much the right forum. User namespaces exist today. This >>>> is a discussion around trying to reduce the exposed kernel surface >>>> that is being used to attack the kernel. >>> >>> Agreed, that was harsh way to put it. What I mean is that if this >>> feature was included, would it be enabled by distributions? >>> >> Enabled, maybe? It requires the debian distros to make sure their >> packaging supports xattrs correctly. It should be good but it isn't >> well exercised. It also requires the work to set these on multiple >> applications. From experience we are talking 100s. >> >> It will break out of repo applications, and require an extra step for >> users to enable. Ubuntu is already breaking these but for many, of the >> more popular ones they are shipping profiles so the users don't have >> to take an extra step. Things like appimages remain broken and wil >> require an approach similar to the Mac with unverified software >> downloaded from the internet. >> >> Nor does this fix the bwrap, unshare, ... use case. Which means the >> distro is going to have to continue shipping an alternate solution >> that covers those. For Ubuntu atm this is just an extra point of >> friction but I expect we would still end up enabling it to tick the >> checkbox at some point if it goes into the upstream kernel. > > I'm not sure I understand your point here and how this relates to xattrs. > This patchset has nothing to do with file capabilities. The userns > capability set is purely a process based capability set and in no way > influenced by file attributes. > Oopps sorry the fcaps bit is crossing over a side discussion. >>> This user base part or potential user space part is not very well >>> described in the cover letter. I.e. "motivation" to put it short. >>> >> yes the cover letter needs work > > Yes, it's been mentioned several times already. > While not in the cover letter, the motivation is stated in the first > patch and provides several references to past discussions on the topic. > > This is nothing new, this subject has been contentious for years now and > discussed over and over on these lists (Eric would know :)). As > mentioned in the patch also, this recently warranted the inclusion of > new LSM hooks. > > But again, I wrongfully assumed that this problem was well understood > and still relatively fresh, that's my bad. > >>> I mean the technical details are really in detail in this patch set but >>> it would help to digest them if there was some even rough description >>> how this would be deployed. >>> >> yes > > Yes, this was purposefully left out so as not to influence any specific > implementation. There is a mention of where this could be done (i.e. > init, pam), but at the end of the day, this is going to depend on each > use case. > Having said that, since it appears to be confusing, maybe we could add > some of the examples I sent out in this thread or the other ones. > examples would help, especially for people not too familiar with this. > I want to reiterate that this is a generic capability set, this is not > magic switch you turn on to secure the whole system. > Its implementation is going to vary across environments and it is going > to be dictated by your threat model. > yeah > For example, John's threat model of securing a multi-user Ubuntu Desktop > is going to be very different than say securing a server where all the > userspace is fixed and known. > The former might require additional integration with the LSM subsystem. > Thankfully, this patch should synergize well with it. > hrmmm, maybe, I will be happy if they just don't end up complicating each other > Fundamentally, and at its core, it's very simple. Serge put it nicely: > yes it is, and yet it still worries me a great deal. I have some of the same worries as Casey, and also worry that people will take this as a solution for all use cases, without understanding the issues. On the other hand walking back the current state of unprivileged use of user namespaces is a huge issue. Having another approach also pushing will actually be helpful in some ways. >> If you want root in a child user namespace to not have CAP_MAC_ADMIN, >> you drop it from your pU. Simple as that. > > From there, you can imagine any integration you want in userspace and > ways to enforce your own policies. > > TLDR, this is a first step towards empowering userspace with control > over capabilities granted by a userns. At present, the kernel does not > offer ways to do this. By itself, it is not a comprehensive solution yep > designed to thwart threat actors. However, it gives userspace the option > to do so. again, I don't believe the capabilities system is actually capable of doing this, it covers some of the use cases. To be fair the LSM doesn't cover everything either, there are current use cases that just aren't safe, you either break them or allow them and accept the risks. It relies on people understanding threat models, and sadly I have become grown quite grumpy about that topic. Anyways I will try to finish up my review of the code this weekend.
On Thu, May 16, 2024 at 5:21 AM Jonathan Calmels <jcalmels@3xx0.net> wrote: > > It's that time of the year again where we debate security settings for user > namespaces ;) > > I’ve been experimenting with different approaches to address the gripe > around user namespaces being used as attack vectors. > After invaluable feedback from Serge and Christian offline, this is what I > came up with. As Serge is the capabilities maintainer it would be good to hear his thoughts on-list about this proposal. > There are obviously a lot of things we could do differently but I feel this > is the right balance between functionality, simplicity and security. This > also serves as a good foundation and could always be extended if the need > arises in the future. > > Notes: > > - Adding a new capability set is far from ideal, but trying to reuse the > existing capability framework was deemed both impractical and > questionable security-wise, so here we are. > > - We might want to add new capabilities for some of the checks instead of > reusing CAP_SETPCAP every time. Serge mentioned something like > CAP_SYS_LIMIT? > > - In the last patch, we could decide to have stronger requirements and > perform checks inside cap_capable() in case we want to retroactively > prevent capabilities in old namespaces, this might be an overreach though > so I left it out. > > I'm also not fond of the ulong logic for setting the sysctl parameter, on > the other hand, the usermodhelper code always uses two u32s which makes it > very confusing to set in userspace. > > > Jonathan Calmels (3): > capabilities: user namespace capabilities > capabilities: add securebit for strict userns caps > capabilities: add cap userns sysctl mask > > fs/proc/array.c | 9 ++++ > include/linux/cred.h | 3 ++ > include/linux/securebits.h | 1 + > include/linux/user_namespace.h | 7 +++ > include/uapi/linux/prctl.h | 7 +++ > include/uapi/linux/securebits.h | 11 ++++- > kernel/cred.c | 3 ++ > kernel/sysctl.c | 10 ++++ > kernel/umh.c | 16 +++++++ > kernel/user_namespace.c | 83 ++++++++++++++++++++++++++++++--- > security/commoncap.c | 59 +++++++++++++++++++++++ > security/keys/process_keys.c | 3 ++ > 12 files changed, 204 insertions(+), 8 deletions(-) -- paul-moore.com
On Thu May 16, 2024 at 7:23 PM EEST, Paul Moore wrote: > On Thu, May 16, 2024 at 5:21 AM Jonathan Calmels <jcalmels@3xx0.net> wrote: > > > > It's that time of the year again where we debate security settings for user > > namespaces ;) > > > > I’ve been experimenting with different approaches to address the gripe > > around user namespaces being used as attack vectors. > > After invaluable feedback from Serge and Christian offline, this is what I > > came up with. > > As Serge is the capabilities maintainer it would be good to hear his > thoughts on-list about this proposal. Also it would make sense to make this just a bit more digestible to a wider group of maintainers, i.e. a better introduction to the topic instead of huge list of references (no bandwidth to read them all). This is exactly kind of patch set that makes you ignore it unless you are pro-active exactly in this domain. I think this could bring more actually useful feedback. BR, Jarkko
On Thu, May 16, 2024 at 02:22:02 -0700, Jonathan Calmels wrote: > Jonathan Calmels (3): > capabilities: user namespace capabilities > capabilities: add securebit for strict userns caps > capabilities: add cap userns sysctl mask > > fs/proc/array.c | 9 ++++ > include/linux/cred.h | 3 ++ > include/linux/securebits.h | 1 + > include/linux/user_namespace.h | 7 +++ > include/uapi/linux/prctl.h | 7 +++ > include/uapi/linux/securebits.h | 11 ++++- > kernel/cred.c | 3 ++ > kernel/sysctl.c | 10 ++++ > kernel/umh.c | 16 +++++++ > kernel/user_namespace.c | 83 ++++++++++++++++++++++++++++++--- > security/commoncap.c | 59 +++++++++++++++++++++++ > security/keys/process_keys.c | 3 ++ > 12 files changed, 204 insertions(+), 8 deletions(-) I note a lack of any changes to `Documentation/` which seems quite glaring for something with such a userspace visibility aspect to it. --Ben
On Thu May 16, 2024 at 4:30 PM EEST, Ben Boeckel wrote: > On Thu, May 16, 2024 at 02:22:02 -0700, Jonathan Calmels wrote: > > Jonathan Calmels (3): > > capabilities: user namespace capabilities > > capabilities: add securebit for strict userns caps > > capabilities: add cap userns sysctl mask > > > > fs/proc/array.c | 9 ++++ > > include/linux/cred.h | 3 ++ > > include/linux/securebits.h | 1 + > > include/linux/user_namespace.h | 7 +++ > > include/uapi/linux/prctl.h | 7 +++ > > include/uapi/linux/securebits.h | 11 ++++- > > kernel/cred.c | 3 ++ > > kernel/sysctl.c | 10 ++++ > > kernel/umh.c | 16 +++++++ > > kernel/user_namespace.c | 83 ++++++++++++++++++++++++++++++--- > > security/commoncap.c | 59 +++++++++++++++++++++++ > > security/keys/process_keys.c | 3 ++ > > 12 files changed, 204 insertions(+), 8 deletions(-) > > I note a lack of any changes to `Documentation/` which seems quite > glaring for something with such a userspace visibility aspect to it. > > --Ben Yeah, also in cover letter it would be nice to refresh what is a bounding set. I had to xref that (recalled what it is), and then got bored reading the rest :-) Not exactly in the nutshell cover letter tbh, but maybe the content in that would be better put to Documentation/ BR, Jarkko
On Thu, May 16, 2024 at 04:36:07PM GMT, Jarkko Sakkinen wrote: > On Thu May 16, 2024 at 4:30 PM EEST, Ben Boeckel wrote: > > I note a lack of any changes to `Documentation/` which seems quite > > glaring for something with such a userspace visibility aspect to it. > > > > --Ben > > Yeah, also in cover letter it would be nice to refresh what is > a bounding set. I had to xref that (recalled what it is), and > then got bored reading the rest :-) Thanks for reminding me, I actually meant to do it, just forgot. Having said that, `Documentation/security/credentials.rst` is not the best documention when it comes to capabilities. I will definitely add few more lines in there, but it's probably not what you're looking for. capabilities(7) is where everything is explained, I should have mentioned it. I could try to summarize the existing sets, but honestly I will probably do a worse job than the man page. I do plan to update the man page though if it comes to that.
© 2016 - 2026 Red Hat, Inc.