[PATCH v2 24/33] user: support ns lookup

Christian Brauner posted 33 patches 2 weeks, 6 days ago
[PATCH v2 24/33] user: support ns lookup
Posted by Christian Brauner 2 weeks, 6 days ago
Support the generic ns lookup infrastructure to support file handles for
namespaces.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 kernel/user_namespace.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 98f4fe84d039..ade5b6806c5c 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -21,6 +21,7 @@
 #include <linux/fs_struct.h>
 #include <linux/bsearch.h>
 #include <linux/sort.h>
+#include <linux/nstree.h>
 
 static struct kmem_cache *user_ns_cachep __ro_after_init;
 static DEFINE_MUTEX(userns_state_mutex);
@@ -158,6 +159,7 @@ int create_user_ns(struct cred *new)
 		goto fail_keyring;
 
 	set_cred_user_ns(new, ns);
+	ns_tree_add(ns);
 	return 0;
 fail_keyring:
 #ifdef CONFIG_PERSISTENT_KEYRINGS
@@ -200,6 +202,7 @@ static void free_user_ns(struct work_struct *work)
 	do {
 		struct ucounts *ucounts = ns->ucounts;
 		parent = ns->parent;
+		ns_tree_remove(ns);
 		if (ns->gid_map.nr_extents > UID_GID_MAP_MAX_BASE_EXTENTS) {
 			kfree(ns->gid_map.forward);
 			kfree(ns->gid_map.reverse);
@@ -218,7 +221,8 @@ static void free_user_ns(struct work_struct *work)
 		retire_userns_sysctls(ns);
 		key_free_user_ns(ns);
 		ns_free_inum(&ns->ns);
-		kmem_cache_free(user_ns_cachep, ns);
+		/* Concurrent nstree traversal depends on a grace period. */
+		kfree_rcu(ns, ns.ns_rcu);
 		dec_user_namespaces(ucounts);
 		ns = parent;
 	} while (refcount_dec_and_test(&parent->ns.count));
@@ -1412,6 +1416,7 @@ const struct proc_ns_operations userns_operations = {
 static __init int user_namespaces_init(void)
 {
 	user_ns_cachep = KMEM_CACHE(user_namespace, SLAB_PANIC | SLAB_ACCOUNT);
+	ns_tree_add(&init_user_ns);
 	return 0;
 }
 subsys_initcall(user_namespaces_init);

-- 
2.47.3
Re: [PATCH v2 24/33] user: support ns lookup
Posted by Jan Kara 2 weeks, 3 days ago
On Fri 12-09-25 13:52:47, Christian Brauner wrote:
> Support the generic ns lookup infrastructure to support file handles for
> namespaces.
> 
> Signed-off-by: Christian Brauner <brauner@kernel.org>
...
> @@ -200,6 +202,7 @@ static void free_user_ns(struct work_struct *work)
>  	do {
>  		struct ucounts *ucounts = ns->ucounts;
>  		parent = ns->parent;
> +		ns_tree_remove(ns);
>  		if (ns->gid_map.nr_extents > UID_GID_MAP_MAX_BASE_EXTENTS) {
>  			kfree(ns->gid_map.forward);
>  			kfree(ns->gid_map.reverse);
> @@ -218,7 +221,8 @@ static void free_user_ns(struct work_struct *work)
>  		retire_userns_sysctls(ns);
>  		key_free_user_ns(ns);
>  		ns_free_inum(&ns->ns);
> -		kmem_cache_free(user_ns_cachep, ns);
> +		/* Concurrent nstree traversal depends on a grace period. */
> +		kfree_rcu(ns, ns.ns_rcu);

So this is correct for now but it's a bit of a landmine. A lot of stuff
that ns references is kfreed before the RCU expires. Thus if you lookup ns
using id, then even if you're under RCU protection you have to be very
careful about what you can and cannot dereference. IMHO this deserves a
careful documentation at least or, preferably, split free_user_ns() into
pre and post-RCU period parts...

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
Re: [PATCH v2 24/33] user: support ns lookup
Posted by Christian Brauner 2 weeks, 3 days ago
On Mon, Sep 15, 2025 at 02:11:55PM +0200, Jan Kara wrote:
> On Fri 12-09-25 13:52:47, Christian Brauner wrote:
> > Support the generic ns lookup infrastructure to support file handles for
> > namespaces.
> > 
> > Signed-off-by: Christian Brauner <brauner@kernel.org>
> ...
> > @@ -200,6 +202,7 @@ static void free_user_ns(struct work_struct *work)
> >  	do {
> >  		struct ucounts *ucounts = ns->ucounts;
> >  		parent = ns->parent;
> > +		ns_tree_remove(ns);
> >  		if (ns->gid_map.nr_extents > UID_GID_MAP_MAX_BASE_EXTENTS) {
> >  			kfree(ns->gid_map.forward);
> >  			kfree(ns->gid_map.reverse);
> > @@ -218,7 +221,8 @@ static void free_user_ns(struct work_struct *work)
> >  		retire_userns_sysctls(ns);
> >  		key_free_user_ns(ns);
> >  		ns_free_inum(&ns->ns);
> > -		kmem_cache_free(user_ns_cachep, ns);
> > +		/* Concurrent nstree traversal depends on a grace period. */
> > +		kfree_rcu(ns, ns.ns_rcu);
> 
> So this is correct for now but it's a bit of a landmine. A lot of stuff
> that ns references is kfreed before the RCU expires. Thus if you lookup ns
> using id, then even if you're under RCU protection you have to be very
> careful about what you can and cannot dereference. IMHO this deserves a
> careful documentation at least or, preferably, split free_user_ns() into
> pre and post-RCU period parts...

Right, the thing is that you cannot touch anything in any namespace
structure without having an actual reference to it. IOW, the only thing
that's valid under rcu is to access the reference count. That's the only
guarantee that the _generic_ infrastructure gives _and_ expects. IOW, if
one can get a live reference (inc_not_zero) that thing better be valid.

Individual namespace implementers may ofc provide additional guarantees
but they are not transparent to the generic infrastructure.

Otherwise I fully agree.
Re: [PATCH v2 24/33] user: support ns lookup
Posted by Jan Kara 2 weeks, 3 days ago
On Mon 15-09-25 15:54:26, Christian Brauner wrote:
> On Mon, Sep 15, 2025 at 02:11:55PM +0200, Jan Kara wrote:
> > On Fri 12-09-25 13:52:47, Christian Brauner wrote:
> > > Support the generic ns lookup infrastructure to support file handles for
> > > namespaces.
> > > 
> > > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > ...
> > > @@ -200,6 +202,7 @@ static void free_user_ns(struct work_struct *work)
> > >  	do {
> > >  		struct ucounts *ucounts = ns->ucounts;
> > >  		parent = ns->parent;
> > > +		ns_tree_remove(ns);
> > >  		if (ns->gid_map.nr_extents > UID_GID_MAP_MAX_BASE_EXTENTS) {
> > >  			kfree(ns->gid_map.forward);
> > >  			kfree(ns->gid_map.reverse);
> > > @@ -218,7 +221,8 @@ static void free_user_ns(struct work_struct *work)
> > >  		retire_userns_sysctls(ns);
> > >  		key_free_user_ns(ns);
> > >  		ns_free_inum(&ns->ns);
> > > -		kmem_cache_free(user_ns_cachep, ns);
> > > +		/* Concurrent nstree traversal depends on a grace period. */
> > > +		kfree_rcu(ns, ns.ns_rcu);
> > 
> > So this is correct for now but it's a bit of a landmine. A lot of stuff
> > that ns references is kfreed before the RCU expires. Thus if you lookup ns
> > using id, then even if you're under RCU protection you have to be very
> > careful about what you can and cannot dereference. IMHO this deserves a
> > careful documentation at least or, preferably, split free_user_ns() into
> > pre and post-RCU period parts...
> 
> Right, the thing is that you cannot touch anything in any namespace
> structure without having an actual reference to it. IOW, the only thing
> that's valid under rcu is to access the reference count. That's the only
> guarantee that the _generic_ infrastructure gives _and_ expects. IOW, if
> one can get a live reference (inc_not_zero) that thing better be valid.
> 
> Individual namespace implementers may ofc provide additional guarantees
> but they are not transparent to the generic infrastructure.
> 
> Otherwise I fully agree.

I guess fair enough for this patch set so feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

but longer term we might need to revisit this.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR