netfilter: fix NULL ops race in iptable lazy init

[PATCH 0/2] netfilter: fix NULL ops race in iptable lazy init

Posted by Tristan Madani 1 month, 2 weeks ago

From: Tristan Madani <tristan@talencesecurity.com>

ipt_register_table() and ip6t_register_table() call xt_register_table()
which adds the new table to the per-netns list, making it visible to
other code paths.  Only afterwards do they allocate the per-net copy of
hook ops via kmemdup_array().  This leaves a window where the table is
findable via xt_find_table() but has ops=NULL.

If cleanup_net runs during this window (racing namespace teardown against
lazy table init), ipt_unregister_table_pre_exit() /
ip6t_unregister_table_pre_exit() finds the table and passes the NULL ops
pointer to nf_unregister_net_hooks(), causing a general protection fault.

Fix both ip_tables.c and ip6_tables.c by moving the ops allocation
before xt_register_table(), so the table is never in the list with a
NULL ops pointer.

Tristan Madani (2):
  netfilter: ip_tables: allocate hook ops before making table visible
  netfilter: ip6_tables: allocate hook ops before making table visible

 net/ipv4/netfilter/ip_tables.c  | 31 ++++++++++++++++---------------
 net/ipv6/netfilter/ip6_tables.c | 28 ++++++++++++++++------------
 2 files changed, 32 insertions(+), 27 deletions(-)

-- 
2.47.3

Re: [PATCH 0/2] netfilter: fix NULL ops race in iptable lazy init

Posted by Tristan Madani 1 month, 2 weeks ago

On Wed, 30 Apr 2026 Phil Sutter wrote:
> Is this true? Your patch moves the ops allocation, but new_table->ops is
> still assigned after xt_register_table() has returned. AIUI, the race
> window is just reduced, not eliminated.

You are right -- I missed that new_table->ops is assigned after
xt_register_table() returns. The table becomes visible via list_add()
inside xt_register_table(), but the ops pointer is still NULL at that
point. Moving the allocation alone does not close the window.

We cannot assign ops before xt_register_table() because we need the
returned new_table pointer to set ops[i].priv.

Would a V2 that guards the pre_exit path instead be acceptable?
Something like:

  void ipt_unregister_table_pre_exit(struct net *net, const char *name)
  {
  	struct xt_table *table = xt_find_table(net, NFPROTO_IPV4, name);

  	if (table && table->ops)
  		nf_unregister_net_hooks(net, table->ops,
  				        hweight32(table->valid_hooks));
  }

This way cleanup_net simply skips the table if ops has not been assigned
yet. The register path will either complete and call
nf_register_net_hooks() normally, or fail and clean up via
__ipt_unregister_table().

Thanks,
Tristan

Re: [PATCH 0/2] netfilter: fix NULL ops race in iptable lazy init

Posted by Phil Sutter 1 month, 2 weeks ago

Hi,

On Wed, Apr 29, 2026 at 05:56:10PM +0000, Tristan Madani wrote:
> From: Tristan Madani <tristan@talencesecurity.com>
> 
> ipt_register_table() and ip6t_register_table() call xt_register_table()
> which adds the new table to the per-netns list, making it visible to
> other code paths.  Only afterwards do they allocate the per-net copy of
> hook ops via kmemdup_array().  This leaves a window where the table is
> findable via xt_find_table() but has ops=NULL.
> 
> If cleanup_net runs during this window (racing namespace teardown against
> lazy table init), ipt_unregister_table_pre_exit() /
> ip6t_unregister_table_pre_exit() finds the table and passes the NULL ops
> pointer to nf_unregister_net_hooks(), causing a general protection fault.
> 
> Fix both ip_tables.c and ip6_tables.c by moving the ops allocation
> before xt_register_table(), so the table is never in the list with a
> NULL ops pointer.

Is this true? Your patch moves the ops allocation, but new_table->ops is
still assigned after xt_register_table() has returned. AIUI, the race
window is just reduced, not eliminated.

First I thought you could assign to table->ops since xt_register_table()
calls kmemdup(), but 'table' is const.

I guess checking table->ops value in *_pre_exit() is nonsense as well
since *_register_table() still runs in parallel. Do we need
serialization between the two routines?

Cheers, Phil

[PATCH v2 0/2] netfilter: fix NULL ops dereference in iptable lazy init

Posted by Tristan Madani 1 month, 2 weeks ago

v1 moved the ops allocation before xt_register_table(), but as Phil
Sutter pointed out, new_table->ops is still assigned after the table
becomes visible via list_add() inside xt_register_table(). The race
window was reduced but not eliminated.

v2 takes a different approach: guard the pre_exit path against a NULL
ops pointer. If cleanup_net races against lazy table init and finds the
table before ops has been assigned, it simply skips the
nf_unregister_net_hooks() call. The register path will either complete
normally or fail and clean up via __ipt_unregister_table().

v1: https://lore.kernel.org/netdev/20260429175613.1459342-1-tristmd@gmail.com/

Tristan Madani (2):
  netfilter: ip_tables: guard ipt_unregister_table_pre_exit against NULL ops
  netfilter: ip6_tables: guard ip6t_unregister_table_pre_exit against NULL ops

 net/ipv4/netfilter/ip_tables.c  | 2 +-
 net/ipv6/netfilter/ip6_tables.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

[PATCH v2 1/2] netfilter: ip_tables: guard ipt_unregister_table_pre_exit against NULL ops

Posted by Tristan Madani 1 month, 2 weeks ago

ipt_register_table() adds the table to the per-netns list via
xt_register_table() before assigning the per-net ops copy to
new_table->ops.  If cleanup_net runs during this window,
ipt_unregister_table_pre_exit() finds the table via xt_find_table()
and passes the NULL ops pointer to nf_unregister_net_hooks(), causing
a general protection fault.

Guard against this by checking table->ops before calling
nf_unregister_net_hooks().  If ops is NULL the table is still being
set up; the register path will either complete and register the hooks
normally, or fail and clean up via __ipt_unregister_table().

Fixes: ae689334225f ("netfilter: xtables: Bring back xt_register_table()")
Cc: stable@vger.kernel.org
Signed-off-by: Tristan Madani <tristan@talencesecurity.com>
---
 net/ipv4/netfilter/ip_tables.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index XXXXXXX..XXXXXXX 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -1795,7 +1795,7 @@ void ipt_unregister_table_pre_exit(struct net *net, const char *name)
 {
 	struct xt_table *table = xt_find_table(net, NFPROTO_IPV4, name);

-	if (table)
+	if (table && table->ops)
 		nf_unregister_net_hooks(net, table->ops, hweight32(table->valid_hooks));
 }

Re: [PATCH v2 1/2] netfilter: ip_tables: guard ipt_unregister_table_pre_exit against NULL ops

Posted by Florian Westphal 1 month, 2 weeks ago

Tristan Madani <tristmd@gmail.com> wrote:
> ipt_register_table() adds the table to the per-netns list via
> xt_register_table() before assigning the per-net ops copy to
> new_table->ops.  If cleanup_net runs during this window,
> ipt_unregister_table_pre_exit() finds the table via xt_find_table()
> and passes the NULL ops pointer to nf_unregister_net_hooks(), causing
> a general protection fault.
> 
> Guard against this by checking table->ops before calling
> nf_unregister_net_hooks().  If ops is NULL the table is still being
> set up; the register path will either complete and register the hooks
> normally, or fail and clean up via __ipt_unregister_table().

Is there a reproducer for this bug?

This explanation makes little sense to me.
If netns is being destroyed, then there should be no more requests
to set/getsockopt.

Is this perhaps about aggressive rmmod + parallel set/getsockopt calls?
That would make more sense, but this needs a different fix.

I'm working on a new unreg scheme to avoid rmmod racing with concurrent
calls into iptables set/getsockopts.

Re: [PATCH v2 1/2] netfilter: ip_tables: guard ipt_unregister_table_pre_exit against NULL ops

Posted by Tristan Madani 1 month, 2 weeks ago

Florian Westphal <fw@strlen.de> wrote:
> Is there a reproducer for this bug?

Syzkaller hit it under failslab. The race is between the lazy
init path in ipt_register_table() and cleanup_net(). The table
becomes visible via xt_register_table() before ops is assigned,
so pre_exit can find it with NULL ops.

Cleaned crash log:

  Oops: general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN NOPTI
  KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
  CPU: 1 UID: 0 PID: 604 Comm: kworker/u8:19 Tainted: G            E      6.14.11 #1
  Workqueue: netns cleanup_net
  RIP: 0010:nf_unregister_net_hook net/netfilter/core.c:531 [inline]
  RIP: 0010:nf_unregister_net_hooks+0xbc/0x150 net/netfilter/core.c:613
  Call Trace:
   <TASK>
   ipt_unregister_table_pre_exit+0x8a/0xc0 net/ipv4/netfilter/ip_tables.c:1814
   iptable_mangle_net_pre_exit+0x21/0x30 net/ipv4/netfilter/iptable_mangle.c:99
   ops_pre_exit_list net/core/net_namespace.c:162 [inline]
   cleanup_net+0x4b9/0xbe0 net/core/net_namespace.c:632
   process_one_work+0x98f/0x1750 kernel/workqueue.c:3238
   worker_thread+0x679/0xf50 kernel/workqueue.c:3402
   kthread+0x3f0/0x7e0 kernel/kthread.c:464
   ret_from_fork+0x60/0x90 arch/x86/kernel/process.c:153
   </TASK>

> I'm working on a new unreg scheme to avoid rmmod racing with
> concurrent calls into iptables set/getsockopts.

That sounds like a different issue (rmmod vs sockopt). This one
is init vs cleanup_net -- the NULL ops window exists regardless
of the unreg scheme. V2 is a minimal guard for that.

Thanks,
Tristan

Re: [PATCH v2 1/2] netfilter: ip_tables: guard ipt_unregister_table_pre_exit against NULL ops

Posted by Florian Westphal 1 month, 2 weeks ago

Tristan Madani <tristmd@gmail.com> wrote:
> Florian Westphal <fw@strlen.de> wrote:
> > Is there a reproducer for this bug?
> 
> Syzkaller hit it under failslab. The race is between the lazy
> init path in ipt_register_table() and cleanup_net(). The table
> becomes visible via xt_register_table() before ops is assigned,
> so pre_exit can find it with NULL ops.

If we have races between a thread calling ipt_register_table and
the netns cleanup path there is nothing we could ever do to fix it:
we are tearing down a live network namespace.

Something else must be going on.

Re: [PATCH v2 1/2] netfilter: ip_tables: guard ipt_unregister_table_pre_exit against NULL ops

Posted by Tristan Madani 1 month, 2 weeks ago

On Thu, 1 May 2026 Florian Westphal wrote:
> If we have races between a thread calling ipt_register_table
> and the netns cleanup path there is nothing we could ever do to
> fix it: we are tearing down a live network namespace.
> Something else must be going on.

I agree, this one is unusual. I tried multiple PoC approaches
without success -- all I have is the syzkaller crash I shared,
no reliable reproducer. Syzkaller itself could not minimize it
either.

That said, the crash is real -- KASAN shows ops=NULL in
pre_exit during cleanup_net -- so something is reaching that
path. The V2 guard handles it regardless of the root cause:
if ops is NULL in pre_exit, we should not pass it to
nf_unregister_net_hooks.

I will share any PoC/repro if I get one.

Thanks,
Tristan

Re: [PATCH v2 1/2] netfilter: ip_tables: guard ipt_unregister_table_pre_exit against NULL ops

Posted by Florian Westphal 1 month, 2 weeks ago

Tristan Madani <tristmd@gmail.com> wrote:
> That said, the crash is real -- KASAN shows ops=NULL in
> pre_exit during cleanup_net -- so something is reaching that
> path. The V2 guard handles it regardless of the root cause:
> if ops is NULL in pre_exit, we should not pass it to
> nf_unregister_net_hooks.
> 
> I will share any PoC/repro if I get one.

Thanks. I have a patch series that should close all
races, I need to retest it tomorrow and then I'll post it
so sashiko, syzbot etc. can have a go at it.

I found a few other problems in the general area so it should
be a good improvement over the current state of affairs.

[PATCH v2 2/2] netfilter: ip6_tables: guard ip6t_unregister_table_pre_exit against NULL ops

Posted by Tristan Madani 1 month, 2 weeks ago

Same race as the ipv4 counterpart: ip6t_register_table() adds the
table to the per-netns list before assigning new_table->ops.
cleanup_net can find the table with a NULL ops pointer and crash in
nf_unregister_net_hooks().

Guard against this by checking table->ops before the call.

Fixes: ee177a54413a ("netfilter: ip6_tables: Use xt_register_table()")
Cc: stable@vger.kernel.org
Signed-off-by: Tristan Madani <tristan@talencesecurity.com>
---
 net/ipv6/netfilter/ip6_tables.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index XXXXXXX..XXXXXXX 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -1804,7 +1804,7 @@ void ip6t_unregister_table_pre_exit(struct net *net, const char *name)
 {
 	struct xt_table *table = xt_find_table(net, NFPROTO_IPV6, name);

-	if (table)
+	if (table && table->ops)
 		nf_unregister_net_hooks(net, table->ops, hweight32(table->valid_hooks));
 }