[PATCH net-next v3] netns: optimize netns cleaning by batching unhash_nsid calls

Qiliang Yuan posted 1 patch 1 week, 3 days ago
There is a newer version of this series
include/net/net_namespace.h |  1 +
net/core/net_namespace.c    | 46 ++++++++++++++++++++++++++-----------
2 files changed, 34 insertions(+), 13 deletions(-)
[PATCH net-next v3] netns: optimize netns cleaning by batching unhash_nsid calls
Posted by Qiliang Yuan 1 week, 3 days ago
Currently, unhash_nsid() scans the entire net_namespace_list for each
netns in a destruction batch during cleanup_net(). This leads to
O(M_batch * N_system * M_nsids) complexity, where M_batch is the
destruction batch size, N_system is the total number of namespaces,
and M_nsids is the number of IDs in each IDR.

Reduce the complexity to O(N_system * M_nsids) by introducing an
'is_dying' flag to mark namespaces being destroyed. This allows
unhash_nsid() to perform a single-pass traversal over the system's
namespaces. In this pass, for each survivor namespace, iterate
through its netns_ids and remove any mappings that point to a marked
namespace, effectively eliminating the M_batch multiplier.

Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn>
---
v3:
 - Update target tree to net-next.
 - Post as a new thread instead of a reply.
v2:
 - Move 'is_dying' setting to __put_net() to eliminate the O(M_batch) loop.
 - Remove redundant initializations in preinit_net().
v1:
 - Initial implementation of batch unhash_nsid().

 include/net/net_namespace.h |  1 +
 net/core/net_namespace.c    | 46 ++++++++++++++++++++++++++-----------
 2 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index cb664f6e3558..bd1acc6056ac 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -69,6 +69,7 @@ struct net {
 
 	unsigned int		dev_base_seq;	/* protected by rtnl_mutex */
 	u32			ifindex;
+	bool			is_dying;
 
 	spinlock_t		nsid_lock;
 	atomic_t		fnhe_genid;
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index a6e6a964a287..50fdd4f9bb3b 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -624,9 +624,10 @@ void net_ns_get_ownership(const struct net *net, kuid_t *uid, kgid_t *gid)
 }
 EXPORT_SYMBOL_GPL(net_ns_get_ownership);
 
-static void unhash_nsid(struct net *net, struct net *last)
+static void unhash_nsid(struct net *last)
 {
 	struct net *tmp;
+
 	/* This function is only called from cleanup_net() work,
 	 * and this work is the only process, that may delete
 	 * a net from net_namespace_list. So, when the below
@@ -636,20 +637,34 @@ static void unhash_nsid(struct net *net, struct net *last)
 	for_each_net(tmp) {
 		int id;
 
-		spin_lock(&tmp->nsid_lock);
-		id = __peernet2id(tmp, net);
-		if (id >= 0)
-			idr_remove(&tmp->netns_ids, id);
-		spin_unlock(&tmp->nsid_lock);
-		if (id >= 0)
-			rtnl_net_notifyid(tmp, RTM_DELNSID, id, 0, NULL,
-					  GFP_KERNEL);
+		for (id = 0; ; id++) {
+			struct net *peer;
+			bool dying;
+
+			rcu_read_lock();
+			peer = idr_get_next(&tmp->netns_ids, &id);
+			dying = peer && peer->is_dying;
+			rcu_read_unlock();
+
+			if (!peer)
+				break;
+			if (!dying)
+				continue;
+
+			spin_lock(&tmp->nsid_lock);
+			if (idr_find(&tmp->netns_ids, id) == peer)
+				idr_remove(&tmp->netns_ids, id);
+			else
+				peer = NULL;
+			spin_unlock(&tmp->nsid_lock);
+
+			if (peer)
+				rtnl_net_notifyid(tmp, RTM_DELNSID, id, 0,
+						  NULL, GFP_KERNEL);
+		}
 		if (tmp == last)
 			break;
 	}
-	spin_lock(&net->nsid_lock);
-	idr_destroy(&net->netns_ids);
-	spin_unlock(&net->nsid_lock);
 }
 
 static LLIST_HEAD(cleanup_list);
@@ -688,8 +703,12 @@ static void cleanup_net(struct work_struct *work)
 	last = list_last_entry(&net_namespace_list, struct net, list);
 	up_write(&net_rwsem);
 
+	unhash_nsid(last);
+
 	llist_for_each_entry(net, net_kill_list, cleanup_list) {
-		unhash_nsid(net, last);
+		spin_lock(&net->nsid_lock);
+		idr_destroy(&net->netns_ids);
+		spin_unlock(&net->nsid_lock);
 		list_add_tail(&net->exit_list, &net_exit_list);
 	}
 
@@ -739,6 +758,7 @@ static DECLARE_WORK(net_cleanup_work, cleanup_net);
 void __put_net(struct net *net)
 {
 	ref_tracker_dir_exit(&net->refcnt_tracker);
+	net->is_dying = true;
 	/* Cleanup the network namespace in process context */
 	if (llist_add(&net->cleanup_list, &cleanup_list))
 		queue_work(netns_wq, &net_cleanup_work);
-- 
2.51.0
Re: [PATCH net-next v3] netns: optimize netns cleaning by batching unhash_nsid calls
Posted by Kuniyuki Iwashima 1 week, 3 days ago
On Tue, Jan 27, 2026 at 5:22 PM Qiliang Yuan <realwujing@gmail.com> wrote:
>
> Currently, unhash_nsid() scans the entire net_namespace_list for each
> netns in a destruction batch during cleanup_net(). This leads to
> O(M_batch * N_system * M_nsids) complexity, where M_batch is the
> destruction batch size, N_system is the total number of namespaces,
> and M_nsids is the number of IDs in each IDR.
>
> Reduce the complexity to O(N_system * M_nsids) by introducing an
> 'is_dying' flag to mark namespaces being destroyed. This allows
> unhash_nsid() to perform a single-pass traversal over the system's
> namespaces. In this pass, for each survivor namespace, iterate
> through its netns_ids and remove any mappings that point to a marked
> namespace, effectively eliminating the M_batch multiplier.
>
> Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
> Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn>

Why two SOBs with the same person ?


> ---
> v3:
>  - Update target tree to net-next.
>  - Post as a new thread instead of a reply.
> v2:
>  - Move 'is_dying' setting to __put_net() to eliminate the O(M_batch) loop.
>  - Remove redundant initializations in preinit_net().
> v1:
>  - Initial implementation of batch unhash_nsid().
>
>  include/net/net_namespace.h |  1 +
>  net/core/net_namespace.c    | 46 ++++++++++++++++++++++++++-----------
>  2 files changed, 34 insertions(+), 13 deletions(-)
>
> diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
> index cb664f6e3558..bd1acc6056ac 100644
> --- a/include/net/net_namespace.h
> +++ b/include/net/net_namespace.h
> @@ -69,6 +69,7 @@ struct net {
>
>         unsigned int            dev_base_seq;   /* protected by rtnl_mutex */
>         u32                     ifindex;
> +       bool                    is_dying;
>
>         spinlock_t              nsid_lock;
>         atomic_t                fnhe_genid;
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index a6e6a964a287..50fdd4f9bb3b 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -624,9 +624,10 @@ void net_ns_get_ownership(const struct net *net, kuid_t *uid, kgid_t *gid)
>  }
>  EXPORT_SYMBOL_GPL(net_ns_get_ownership);
>
> -static void unhash_nsid(struct net *net, struct net *last)
> +static void unhash_nsid(struct net *last)
>  {
>         struct net *tmp;
> +
>         /* This function is only called from cleanup_net() work,
>          * and this work is the only process, that may delete
>          * a net from net_namespace_list. So, when the below
> @@ -636,20 +637,34 @@ static void unhash_nsid(struct net *net, struct net *last)
>         for_each_net(tmp) {
>                 int id;
>
> -               spin_lock(&tmp->nsid_lock);
> -               id = __peernet2id(tmp, net);
> -               if (id >= 0)
> -                       idr_remove(&tmp->netns_ids, id);
> -               spin_unlock(&tmp->nsid_lock);
> -               if (id >= 0)
> -                       rtnl_net_notifyid(tmp, RTM_DELNSID, id, 0, NULL,
> -                                         GFP_KERNEL);
> +               for (id = 0; ; id++) {

Doesn't this rather slow down in a common case where
init_net has ids for other netns since it is never dismantled ?


> +                       struct net *peer;
> +                       bool dying;
> +
> +                       rcu_read_lock();
> +                       peer = idr_get_next(&tmp->netns_ids, &id);
> +                       dying = peer && peer->is_dying;
> +                       rcu_read_unlock();
> +
> +                       if (!peer)
> +                               break;
> +                       if (!dying)
> +                               continue;
> +
> +                       spin_lock(&tmp->nsid_lock);
> +                       if (idr_find(&tmp->netns_ids, id) == peer)
> +                               idr_remove(&tmp->netns_ids, id);
> +                       else
> +                               peer = NULL;
> +                       spin_unlock(&tmp->nsid_lock);
> +
> +                       if (peer)
> +                               rtnl_net_notifyid(tmp, RTM_DELNSID, id, 0,
> +                                                 NULL, GFP_KERNEL);
> +               }
>                 if (tmp == last)
>                         break;
>         }
> -       spin_lock(&net->nsid_lock);
> -       idr_destroy(&net->netns_ids);
> -       spin_unlock(&net->nsid_lock);
>  }
>
>  static LLIST_HEAD(cleanup_list);
> @@ -688,8 +703,12 @@ static void cleanup_net(struct work_struct *work)
>         last = list_last_entry(&net_namespace_list, struct net, list);
>         up_write(&net_rwsem);
>
> +       unhash_nsid(last);
> +
>         llist_for_each_entry(net, net_kill_list, cleanup_list) {
> -               unhash_nsid(net, last);
> +               spin_lock(&net->nsid_lock);

This lock can be removed.


> +               idr_destroy(&net->netns_ids);
> +               spin_unlock(&net->nsid_lock);
>                 list_add_tail(&net->exit_list, &net_exit_list);
>         }
>
> @@ -739,6 +758,7 @@ static DECLARE_WORK(net_cleanup_work, cleanup_net);
>  void __put_net(struct net *net)
>  {
>         ref_tracker_dir_exit(&net->refcnt_tracker);
> +       net->is_dying = true;
>         /* Cleanup the network namespace in process context */
>         if (llist_add(&net->cleanup_list, &cleanup_list))
>                 queue_work(netns_wq, &net_cleanup_work);
> --
> 2.51.0
>
Re: [PATCH net-next v3] netns: optimize netns cleaning by batching unhash_nsid calls
Posted by Qiliang Yuan 1 week, 3 days ago
Hi Kuniyuki,

On Tue, Jan 27, 2026 at 6:05 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>
> On Tue, Jan 27, 2026 at 5:22 PM Qiliang Yuan <realwujing@gmail.com> wrote:
> >
> > Currently, unhash_nsid() scans the entire net_namespace_list for each
> > netns in a destruction batch during cleanup_net(). This leads to
> > O(M_batch * N_system * M_nsids) complexity, where M_batch is the
> > destruction batch size, N_system is the total number of namespaces,
> > and M_nsids is the number of IDs in each IDR.
> >
> > Reduce the complexity to O(N_system * M_nsids) by introducing an
> > 'is_dying' flag to mark namespaces being destroyed. This allows
> > unhash_nsid() to perform a single-pass traversal over the system's
> > namespaces. In this pass, for each survivor namespace, iterate
> > through its netns_ids and remove any mappings that point to a marked
> > namespace, effectively eliminating the M_batch multiplier.
> >
> > Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
> > Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn>
>
> Why two SOBs with the same person ?

- Signed-off-by: Qiliang Yuan <realwujing@gmail.com> (Personal email)
- Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn> (Work email)

My work email often has trouble receiving external mailing list replies, 
so I've included both to ensure I don't miss any feedback and to 
properly attribute the work. The v8 version should have everything 
matching correctly now.

> > diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> > index a6e6a964a287..50fdd4f9bb3b 100644
> > --- a/net/core/net_namespace.c
> > +++ b/net/core/net_namespace.c
> > @@ -624,9 +624,10 @@ void net_ns_get_ownership(const struct net *net, kuid_t *uid, kgid_t *gid)
> >  }
> >  EXPORT_SYMBOL_GPL(net_ns_get_ownership);
> >
> > -static void unhash_nsid(struct net *net, struct net *last)
> > +static void unhash_nsid(struct net *last)
> >  {
> >         struct net *tmp;
> > +
> >         /* This function is only called from cleanup_net() work,
> >          * and this work is the only process, that may delete
> >          * a net from net_namespace_list. So, when the below
> > @@ -636,20 +637,34 @@ static void unhash_nsid(struct net *net, struct net *last)
> >         for_each_net(tmp) {
> >                 int id;
> >
> > -               spin_lock(&tmp->nsid_lock);
> > -               id = __peernet2id(tmp, net);
> > -               if (id >= 0)
> > -                       idr_remove(&tmp->netns_ids, id);
> > -               spin_unlock(&tmp->nsid_lock);
> > -               if (id >= 0)
> > -                       rtnl_net_notifyid(tmp, RTM_DELNSID, id, 0, NULL,
> > -                                         GFP_KERNEL);
> > +               for (id = 0; ; id++) {
>
> Doesn't this rather slow down in a common case where
> init_net has ids for other netns since it is never dismantled ?

Yes, you're right. In the original code, we only scanned 'tmp' for specific 'net' 
which was being killed. Now we are scanning all IDs in 'tmp' to find any dying 
peers. 

If 'tmp' (like init_net) has many long-lived netns IDs, we end up iterating through 
them even if none of them are dying.

To address this and avoid the overhead, I can use idr_for_each() with a callback 
to find and collect dying IDs, or keep the O(M_batch) outer loop but optimize the 
inner part if it's truly problematic. 

However, given that this is the cleanup path, I thought the batching benefit 
(N_system vs M_batch * N_system) would outweigh the per-netns IDR scan. 

I'll revert to a more efficient iteration or use idr_for_each() to handle this 
gracefully in v4.

Thanks,
Qiliang
Re: [PATCH net-next v3] netns: optimize netns cleaning by batching unhash_nsid calls
Posted by Kuniyuki Iwashima 1 week, 2 days ago
On Wed, Jan 28, 2026 at 4:19 AM Qiliang Yuan <realwujing@gmail.com> wrote:
>
> Hi Kuniyuki,
>
> On Tue, Jan 27, 2026 at 6:05 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
> >
> > On Tue, Jan 27, 2026 at 5:22 PM Qiliang Yuan <realwujing@gmail.com> wrote:
> > >
> > > Currently, unhash_nsid() scans the entire net_namespace_list for each
> > > netns in a destruction batch during cleanup_net(). This leads to
> > > O(M_batch * N_system * M_nsids) complexity, where M_batch is the
> > > destruction batch size, N_system is the total number of namespaces,
> > > and M_nsids is the number of IDs in each IDR.
> > >
> > > Reduce the complexity to O(N_system * M_nsids) by introducing an
> > > 'is_dying' flag to mark namespaces being destroyed. This allows
> > > unhash_nsid() to perform a single-pass traversal over the system's
> > > namespaces. In this pass, for each survivor namespace, iterate
> > > through its netns_ids and remove any mappings that point to a marked
> > > namespace, effectively eliminating the M_batch multiplier.
> > >
> > > Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
> > > Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn>
> >
> > Why two SOBs with the same person ?
>
> - Signed-off-by: Qiliang Yuan <realwujing@gmail.com> (Personal email)
> - Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn> (Work email)
>
> My work email often has trouble receiving external mailing list replies,
> so I've included both to ensure I don't miss any feedback and to
> properly attribute the work. The v8 version should have everything
> matching correctly now.

You can just CC two of them and keep one SOB.


>
> > > diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> > > index a6e6a964a287..50fdd4f9bb3b 100644
> > > --- a/net/core/net_namespace.c
> > > +++ b/net/core/net_namespace.c
> > > @@ -624,9 +624,10 @@ void net_ns_get_ownership(const struct net *net, kuid_t *uid, kgid_t *gid)
> > >  }
> > >  EXPORT_SYMBOL_GPL(net_ns_get_ownership);
> > >
> > > -static void unhash_nsid(struct net *net, struct net *last)
> > > +static void unhash_nsid(struct net *last)
> > >  {
> > >         struct net *tmp;
> > > +
> > >         /* This function is only called from cleanup_net() work,
> > >          * and this work is the only process, that may delete
> > >          * a net from net_namespace_list. So, when the below
> > > @@ -636,20 +637,34 @@ static void unhash_nsid(struct net *net, struct net *last)
> > >         for_each_net(tmp) {
> > >                 int id;
> > >
> > > -               spin_lock(&tmp->nsid_lock);
> > > -               id = __peernet2id(tmp, net);
> > > -               if (id >= 0)
> > > -                       idr_remove(&tmp->netns_ids, id);
> > > -               spin_unlock(&tmp->nsid_lock);
> > > -               if (id >= 0)
> > > -                       rtnl_net_notifyid(tmp, RTM_DELNSID, id, 0, NULL,
> > > -                                         GFP_KERNEL);
> > > +               for (id = 0; ; id++) {
> >
> > Doesn't this rather slow down in a common case where
> > init_net has ids for other netns since it is never dismantled ?
>
> Yes, you're right. In the original code, we only scanned 'tmp' for specific 'net'
> which was being killed. Now we are scanning all IDs in 'tmp' to find any dying
> peers.
>
> If 'tmp' (like init_net) has many long-lived netns IDs, we end up iterating through
> them even if none of them are dying.
>
> To address this and avoid the overhead, I can use idr_for_each() with a callback
> to find and collect dying IDs,

idr_for_each() sounds better to me.

If we replace list_del_rcu(&net->list); with
list_del_init_rcu(&net->list);, we can check net->list.pprev
instead of adding dying_net, which is a bit racy since
idr_for_each() could return a net which would have been
processed in the next cleanup_net() invocation.


> or keep the O(M_batch) outer loop but optimize the
> inner part if it's truly problematic.
>
> However, given that this is the cleanup path, I thought the batching benefit
> (N_system vs M_batch * N_system) would outweigh the per-netns IDR scan.
>
> I'll revert to a more efficient iteration or use idr_for_each() to handle this
> gracefully in v4.
>
> Thanks,
> Qiliang
Re: [PATCH net-next v3] netns: optimize netns cleaning by batching unhash_nsid calls
Posted by Qiliang Yuan 1 week, 1 day ago
Hi Kuniyuki,

Thank you for your valuable feedback!

On  Wed, 28 Jan 2026 09:13:59 -0800 Kuniyuki Iwashima <kuniyu@google.com> wrote:
> idr_for_each() sounds better to me.

I have integrated this in v4. It indeed makes the IDR traversal much more 
idiomatic and efficient.

> If we replace list_del_rcu(&net->list); with
> list_del_init_rcu(&net->list);, we can check net->list.pprev
> instead of adding dying_net, which is a bit racy since
> idr_for_each() could return a net which would have been
> processed in the next cleanup_net() invocation.

To resolve this race, I've moved the setting of 'is_dying = true' inside 
cleanup_net() while still holding the net_rwsem write lock. This ensures 
all namespaces in the current kill_list are marked before we release the 
lock and perform the batch unhashing.

Regarding list_del_init_rcu(), as it is not a standard API, I evaluated 
using list_del_rcu() followed by INIT_LIST_HEAD(). However, resetting 
list pointers is generally unsafe for RCU readers (e.g., in for_each_net_rcu), 
as it could cause them to enter an infinite loop. Using the 'is_dying' 
boolean under the existing lock seems to be the safest and simplest approach.

I've also cleaned up the redundant nsid_lock and the duplicate Signed-off-by 
tags as you suggested.

The complexity is now O(N_system * N_ids), effectively eliminating the 
O(M_batch) multiplier.

I've just sent out the v4 patch. Looking forward to your thoughts.

Thanks,
Qiliang