net/ipv4/netfilter/ip_tables.c | 31 ++++++++++++++++--------------- net/ipv6/netfilter/ip6_tables.c | 28 ++++++++++++++++------------ 2 files changed, 32 insertions(+), 27 deletions(-)
From: Tristan Madani <tristan@talencesecurity.com> ipt_register_table() and ip6t_register_table() call xt_register_table() which adds the new table to the per-netns list, making it visible to other code paths. Only afterwards do they allocate the per-net copy of hook ops via kmemdup_array(). This leaves a window where the table is findable via xt_find_table() but has ops=NULL. If cleanup_net runs during this window (racing namespace teardown against lazy table init), ipt_unregister_table_pre_exit() / ip6t_unregister_table_pre_exit() finds the table and passes the NULL ops pointer to nf_unregister_net_hooks(), causing a general protection fault. Fix both ip_tables.c and ip6_tables.c by moving the ops allocation before xt_register_table(), so the table is never in the list with a NULL ops pointer. Tristan Madani (2): netfilter: ip_tables: allocate hook ops before making table visible netfilter: ip6_tables: allocate hook ops before making table visible net/ipv4/netfilter/ip_tables.c | 31 ++++++++++++++++--------------- net/ipv6/netfilter/ip6_tables.c | 28 ++++++++++++++++------------ 2 files changed, 32 insertions(+), 27 deletions(-) -- 2.47.3
On Wed, 30 Apr 2026 Phil Sutter wrote:
> Is this true? Your patch moves the ops allocation, but new_table->ops is
> still assigned after xt_register_table() has returned. AIUI, the race
> window is just reduced, not eliminated.
You are right -- I missed that new_table->ops is assigned after
xt_register_table() returns. The table becomes visible via list_add()
inside xt_register_table(), but the ops pointer is still NULL at that
point. Moving the allocation alone does not close the window.
We cannot assign ops before xt_register_table() because we need the
returned new_table pointer to set ops[i].priv.
Would a V2 that guards the pre_exit path instead be acceptable?
Something like:
void ipt_unregister_table_pre_exit(struct net *net, const char *name)
{
struct xt_table *table = xt_find_table(net, NFPROTO_IPV4, name);
if (table && table->ops)
nf_unregister_net_hooks(net, table->ops,
hweight32(table->valid_hooks));
}
This way cleanup_net simply skips the table if ops has not been assigned
yet. The register path will either complete and call
nf_register_net_hooks() normally, or fail and clean up via
__ipt_unregister_table().
Thanks,
Tristan
Hi, On Wed, Apr 29, 2026 at 05:56:10PM +0000, Tristan Madani wrote: > From: Tristan Madani <tristan@talencesecurity.com> > > ipt_register_table() and ip6t_register_table() call xt_register_table() > which adds the new table to the per-netns list, making it visible to > other code paths. Only afterwards do they allocate the per-net copy of > hook ops via kmemdup_array(). This leaves a window where the table is > findable via xt_find_table() but has ops=NULL. > > If cleanup_net runs during this window (racing namespace teardown against > lazy table init), ipt_unregister_table_pre_exit() / > ip6t_unregister_table_pre_exit() finds the table and passes the NULL ops > pointer to nf_unregister_net_hooks(), causing a general protection fault. > > Fix both ip_tables.c and ip6_tables.c by moving the ops allocation > before xt_register_table(), so the table is never in the list with a > NULL ops pointer. Is this true? Your patch moves the ops allocation, but new_table->ops is still assigned after xt_register_table() has returned. AIUI, the race window is just reduced, not eliminated. First I thought you could assign to table->ops since xt_register_table() calls kmemdup(), but 'table' is const. I guess checking table->ops value in *_pre_exit() is nonsense as well since *_register_table() still runs in parallel. Do we need serialization between the two routines? Cheers, Phil
v1 moved the ops allocation before xt_register_table(), but as Phil Sutter pointed out, new_table->ops is still assigned after the table becomes visible via list_add() inside xt_register_table(). The race window was reduced but not eliminated. v2 takes a different approach: guard the pre_exit path against a NULL ops pointer. If cleanup_net races against lazy table init and finds the table before ops has been assigned, it simply skips the nf_unregister_net_hooks() call. The register path will either complete normally or fail and clean up via __ipt_unregister_table(). v1: https://lore.kernel.org/netdev/20260429175613.1459342-1-tristmd@gmail.com/ Tristan Madani (2): netfilter: ip_tables: guard ipt_unregister_table_pre_exit against NULL ops netfilter: ip6_tables: guard ip6t_unregister_table_pre_exit against NULL ops net/ipv4/netfilter/ip_tables.c | 2 +- net/ipv6/netfilter/ip6_tables.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
ipt_register_table() adds the table to the per-netns list via
xt_register_table() before assigning the per-net ops copy to
new_table->ops. If cleanup_net runs during this window,
ipt_unregister_table_pre_exit() finds the table via xt_find_table()
and passes the NULL ops pointer to nf_unregister_net_hooks(), causing
a general protection fault.
Guard against this by checking table->ops before calling
nf_unregister_net_hooks(). If ops is NULL the table is still being
set up; the register path will either complete and register the hooks
normally, or fail and clean up via __ipt_unregister_table().
Fixes: ae689334225f ("netfilter: xtables: Bring back xt_register_table()")
Cc: stable@vger.kernel.org
Signed-off-by: Tristan Madani <tristan@talencesecurity.com>
---
net/ipv4/netfilter/ip_tables.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index XXXXXXX..XXXXXXX 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -1795,7 +1795,7 @@ void ipt_unregister_table_pre_exit(struct net *net, const char *name)
{
struct xt_table *table = xt_find_table(net, NFPROTO_IPV4, name);
- if (table)
+ if (table && table->ops)
nf_unregister_net_hooks(net, table->ops, hweight32(table->valid_hooks));
}
Tristan Madani <tristmd@gmail.com> wrote: > ipt_register_table() adds the table to the per-netns list via > xt_register_table() before assigning the per-net ops copy to > new_table->ops. If cleanup_net runs during this window, > ipt_unregister_table_pre_exit() finds the table via xt_find_table() > and passes the NULL ops pointer to nf_unregister_net_hooks(), causing > a general protection fault. > > Guard against this by checking table->ops before calling > nf_unregister_net_hooks(). If ops is NULL the table is still being > set up; the register path will either complete and register the hooks > normally, or fail and clean up via __ipt_unregister_table(). Is there a reproducer for this bug? This explanation makes little sense to me. If netns is being destroyed, then there should be no more requests to set/getsockopt. Is this perhaps about aggressive rmmod + parallel set/getsockopt calls? That would make more sense, but this needs a different fix. I'm working on a new unreg scheme to avoid rmmod racing with concurrent calls into iptables set/getsockopts.
Florian Westphal <fw@strlen.de> wrote: > Is there a reproducer for this bug? Syzkaller hit it under failslab. The race is between the lazy init path in ipt_register_table() and cleanup_net(). The table becomes visible via xt_register_table() before ops is assigned, so pre_exit can find it with NULL ops. Cleaned crash log: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f] CPU: 1 UID: 0 PID: 604 Comm: kworker/u8:19 Tainted: G E 6.14.11 #1 Workqueue: netns cleanup_net RIP: 0010:nf_unregister_net_hook net/netfilter/core.c:531 [inline] RIP: 0010:nf_unregister_net_hooks+0xbc/0x150 net/netfilter/core.c:613 Call Trace: <TASK> ipt_unregister_table_pre_exit+0x8a/0xc0 net/ipv4/netfilter/ip_tables.c:1814 iptable_mangle_net_pre_exit+0x21/0x30 net/ipv4/netfilter/iptable_mangle.c:99 ops_pre_exit_list net/core/net_namespace.c:162 [inline] cleanup_net+0x4b9/0xbe0 net/core/net_namespace.c:632 process_one_work+0x98f/0x1750 kernel/workqueue.c:3238 worker_thread+0x679/0xf50 kernel/workqueue.c:3402 kthread+0x3f0/0x7e0 kernel/kthread.c:464 ret_from_fork+0x60/0x90 arch/x86/kernel/process.c:153 </TASK> > I'm working on a new unreg scheme to avoid rmmod racing with > concurrent calls into iptables set/getsockopts. That sounds like a different issue (rmmod vs sockopt). This one is init vs cleanup_net -- the NULL ops window exists regardless of the unreg scheme. V2 is a minimal guard for that. Thanks, Tristan
Tristan Madani <tristmd@gmail.com> wrote: > Florian Westphal <fw@strlen.de> wrote: > > Is there a reproducer for this bug? > > Syzkaller hit it under failslab. The race is between the lazy > init path in ipt_register_table() and cleanup_net(). The table > becomes visible via xt_register_table() before ops is assigned, > so pre_exit can find it with NULL ops. If we have races between a thread calling ipt_register_table and the netns cleanup path there is nothing we could ever do to fix it: we are tearing down a live network namespace. Something else must be going on.
On Thu, 1 May 2026 Florian Westphal wrote: > If we have races between a thread calling ipt_register_table > and the netns cleanup path there is nothing we could ever do to > fix it: we are tearing down a live network namespace. > Something else must be going on. I agree, this one is unusual. I tried multiple PoC approaches without success -- all I have is the syzkaller crash I shared, no reliable reproducer. Syzkaller itself could not minimize it either. That said, the crash is real -- KASAN shows ops=NULL in pre_exit during cleanup_net -- so something is reaching that path. The V2 guard handles it regardless of the root cause: if ops is NULL in pre_exit, we should not pass it to nf_unregister_net_hooks. I will share any PoC/repro if I get one. Thanks, Tristan
Tristan Madani <tristmd@gmail.com> wrote: > That said, the crash is real -- KASAN shows ops=NULL in > pre_exit during cleanup_net -- so something is reaching that > path. The V2 guard handles it regardless of the root cause: > if ops is NULL in pre_exit, we should not pass it to > nf_unregister_net_hooks. > > I will share any PoC/repro if I get one. Thanks. I have a patch series that should close all races, I need to retest it tomorrow and then I'll post it so sashiko, syzbot etc. can have a go at it. I found a few other problems in the general area so it should be a good improvement over the current state of affairs.
Same race as the ipv4 counterpart: ip6t_register_table() adds the
table to the per-netns list before assigning new_table->ops.
cleanup_net can find the table with a NULL ops pointer and crash in
nf_unregister_net_hooks().
Guard against this by checking table->ops before the call.
Fixes: ee177a54413a ("netfilter: ip6_tables: Use xt_register_table()")
Cc: stable@vger.kernel.org
Signed-off-by: Tristan Madani <tristan@talencesecurity.com>
---
net/ipv6/netfilter/ip6_tables.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index XXXXXXX..XXXXXXX 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -1804,7 +1804,7 @@ void ip6t_unregister_table_pre_exit(struct net *net, const char *name)
{
struct xt_table *table = xt_find_table(net, NFPROTO_IPV6, name);
- if (table)
+ if (table && table->ops)
nf_unregister_net_hooks(net, table->ops, hweight32(table->valid_hooks));
}
© 2016 - 2026 Red Hat, Inc.