[PATCH] netfilter: conntrack: drop expectations before freeing templates

Qingjie Xing posted 1 patch 1 month, 2 weeks ago
net/netfilter/nf_conntrack_core.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
[PATCH] netfilter: conntrack: drop expectations before freeing templates
Posted by Qingjie Xing 1 month, 2 weeks ago
When deleting an xt_CT rule, its per-rule template conntrack is freed via
nf_ct_destroy() -> nf_ct_tmpl_free(). If an expectation was created with
that template as its master, the expectation's timeout/flush later calls
nf_ct_unlink_expect_report() and dereferences exp->master, which now points
to freed memory, leading to a NULL/poison deref and crash.

Move nf_ct_remove_expectations(ct) before the template early-return in
nf_ct_destroy() so that any expectations attached to a template are removed
(and their timers cancelled) before the template's extensions are torn down.

Signed-off-by: Qingjie Xing <xqjcool@gmail.com>
---
 net/netfilter/nf_conntrack_core.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 344f88295976..7f6b95404907 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -577,6 +577,13 @@ void nf_ct_destroy(struct nf_conntrack *nfct)
 
 	WARN_ON(refcount_read(&nfct->use) != 0);
 
+	/* Expectations will have been removed in clean_from_lists,
+	 * except TFTP can create an expectation on the first packet,
+	 * before connection is in the list, so we need to clean here,
+	 * too.
+	 */
+	nf_ct_remove_expectations(ct);
+
 	if (unlikely(nf_ct_is_template(ct))) {
 		nf_ct_tmpl_free(ct);
 		return;
@@ -585,13 +592,6 @@ void nf_ct_destroy(struct nf_conntrack *nfct)
 	if (unlikely(nf_ct_protonum(ct) == IPPROTO_GRE))
 		destroy_gre_conntrack(ct);
 
-	/* Expectations will have been removed in clean_from_lists,
-	 * except TFTP can create an expectation on the first packet,
-	 * before connection is in the list, so we need to clean here,
-	 * too.
-	 */
-	nf_ct_remove_expectations(ct);
-
 	if (ct->master)
 		nf_ct_put(ct->master);
 

base-commit: 01792bc3e5bdafa171dd83c7073f00e7de93a653
-- 
2.25.1
Re: [PATCH] netfilter: conntrack: drop expectations before freeing templates
Posted by Florian Westphal 1 month, 2 weeks ago
Qingjie Xing <xqjcool@gmail.com> wrote:
> When deleting an xt_CT rule, its per-rule template conntrack is freed via
> nf_ct_destroy() -> nf_ct_tmpl_free(). If an expectation was created with
> that template as its master, 

Uhm.  How can that happen?  A template isn't a connection, so it should
not be able to create an expectation.
Re: [PATCH] netfilter: conntrack: drop expectations before freeing templates
Posted by Qingjie Xing 1 month, 2 weeks ago
With an iptables-configured TFTP helper in place, a UDP packet 
(10.65.41.36:1069 → 10.65.36.2:69, TFTP RRQ) triggered creation of an expectation.
Later, iptables changes removed the rule’s per-rule template nf_conn. 
When the expectation’s timer expired, nf_ct_unlink_expect_report() 
ran and dereferenced the freed master, causing a crash.

The detailed system logs are as follows:
--------------------------------------------------------------------------------
//create
[ 1978.316487] nf_conntrack: [nf_ct_tmpl_alloc:580] nf_conn:ffff8881391e3800 ext:0

//insert
[ 2131.989389] [nf_ct_expect_insert:417] exp:ffff88823aac8008 master:ffff8881391e3800 ext:ffff888286a3c500 jiffies:4296796140 timeout:300 expires:4297096140
[ 2140.352649] nf_conntrack: [nf_ct_tmpl_alloc:580] nf_conn:ffff88813ae58e00 ext:0
[ 2140.352657] nf_conntrack: [nf_ct_tmpl_alloc:580] nf_conn:ffff88813ae59a00 ext:0
[ 2140.352661] nf_conntrack: [nf_ct_tmpl_alloc:580] nf_conn:ffff88813ae5d600 ext:0
[ 2140.352664] nf_conntrack: [nf_ct_tmpl_alloc:580] nf_conn:ffff88813ae58800 ext:0
[ 2140.352735] nf_conntrack: [nf_ct_tmpl_free:594] nf_conn:ffff8881391e3200 ext:6b6b6b6b6b6b6b6b
[ 2140.352738] CPU: 0 PID: 4691 Comm: netd Kdump: loaded Tainted: G        W  O       6.1 #16
[ 2140.352740] Hardware name: Supermicro SYS-2049P-TN8R-FI005/X11QPL, BIOS 3.3 02/19/2020
[ 2140.352741] Call Trace:
[ 2140.352742]  <TASK>
[ 2140.352743]  nf_ct_tmpl_free+0x4f/0x60
[ 2140.352749]  nf_ct_destroy+0xce/0x290
[ 2140.352752]  xt_ct_tg_destroy+0x78/0xc0
[ 2140.352756]  xt_ct_tg_destroy_v1+0x12/0x20
[ 2140.352758]  cleanup_entry+0x115/0x1b0
[ 2140.352761]  __do_replace+0x3ab/0x530
[ 2140.352763]  ? do_ipt_set_ctl+0x5ef/0x6c0
[ 2140.352765]  do_ipt_set_ctl+0x5ef/0x6c0
[ 2140.352767]  nf_setsockopt+0x1a8/0x2e0
[ 2140.352769]  raw_setsockopt+0x7b/0x120
[ 2140.352771]  sock_common_setsockopt+0x18/0x30
[ 2140.352773]  __sys_setsockopt+0xb9/0x130
[ 2140.352775]  __x64_sys_setsockopt+0x21/0x30
[ 2140.352777]  do_syscall_64+0x49/0xa0
[ 2140.352780]  ? irqentry_exit+0x12/0x40
[ 2140.352782]  entry_SYSCALL_64_after_hwframe+0x64/0xce
[ 2140.352785] RIP: 0033:0x7f621d3f49aa
[ 2140.352787] Code: ff ff ff c3 0f 1f 40 00 48 8b 15 69 b4 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 49 89 ca b8 36 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 06 c3 0f 1f 44 00 00 48 8b 15 39 b4 0c 00 f7

//free
[ 2140.352889] nf_conntrack: [nf_ct_tmpl_free:594] nf_conn:ffff8881391e3800 ext:6b6b6b6b6b6b6b6b
[ 2140.352891] CPU: 0 PID: 4691 Comm: netd Kdump: loaded Tainted: G        W  O       6.1 #16
[ 2140.352892] Hardware name: Supermicro SYS-2049P-TN8R-FI005/X11QPL, BIOS 3.3 02/19/2020
[ 2140.352893] Call Trace:
[ 2140.352893]  <TASK>
[ 2140.352894]  nf_ct_tmpl_free+0x4f/0x60
[ 2140.352896]  nf_ct_destroy+0xce/0x290
[ 2140.352898]  xt_ct_tg_destroy+0x78/0xc0
[ 2140.352900]  xt_ct_tg_destroy_v1+0x12/0x20
[ 2140.352902]  cleanup_entry+0x115/0x1b0
[ 2140.352904]  __do_replace+0x3ab/0x530
[ 2140.352906]  ? do_ipt_set_ctl+0x5ef/0x6c0
[ 2140.352907]  do_ipt_set_ctl+0x5ef/0x6c0
[ 2140.352909]  nf_setsockopt+0x1a8/0x2e0
[ 2140.352911]  raw_setsockopt+0x7b/0x120
[ 2140.352912]  sock_common_setsockopt+0x18/0x30
[ 2140.352913]  __sys_setsockopt+0xb9/0x130
[ 2140.352915]  __x64_sys_setsockopt+0x21/0x30
[ 2140.352917]  do_syscall_64+0x49/0xa0
[ 2140.352919]  ? irqentry_exit+0x12/0x40
[ 2140.352920]  entry_SYSCALL_64_after_hwframe+0x64/0xce
[ 2140.352923] RIP: 0033:0x7f621d3f49aa
[ 2140.352924] Code: ff ff ff c3 0f 1f 40 00 48 8b 15 69 b4 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 49 89 ca b8 36 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 06 c3 0f 1f 44 00 00 48 8b 15 39 b4 0c 00 f7


//expectation timeout
[ 2433.066066] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] SMP NOPTI
[ 2433.187797] CPU: 10 PID: 66 Comm: ksoftirqd/10 Kdump: loaded Tainted: G        W  O       6.1 #16
[ 2433.293977] Hardware name: Supermicro SYS-2049P-TN8R-FI005/X11QPL, BIOS 3.3 02/19/2020
[ 2433.306651] nf_conntrack: [__nf_conntrack_alloc:1729] nf_conn:ffff8882a9268440 jiffies:4297097457
[ 2433.388722] RIP: 0010:nf_ct_unlink_expect_report+0x2d/0x1f0
[ 2433.388730] Code: 00 00 55 48 89 e5 41 56 53 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 45 e8 48 8b 4f 70 4c 8b 81 e8 00 00 00 4d 85 c0 74 39 <41> 0f b7 00 48 85 c0 74 30 41 83 78 1c 00 75 11 4c 01 c0 48 8b 99
[ 2433.388732] RSP: 0018:ffffc9000ce0fce0 EFLAGS: 00010202
[ 2433.848812] RAX: a79bfdc906a58200 RBX: ffff88823aac8088 RCX: ffff8881391e3800
[ 2433.934200] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88823aac8008
[ 2434.019584] RBP: ffffc9000ce0fd08 R08: 6b6b6b6b6b6b6b6b R09: 0000000000000000
[ 2434.104964] R10: ffff8897e0f1cc00 R11: ffffffff80ee7e00 R12: ffffffff80ee7e00
[ 2434.190349] R13: 0000000000000000 R14: ffff88823aac8008 R15: ffff88823aac8088
[ 2434.275728] FS:  0000000000000000(0000) GS:ffff8897e0f00000(0000) knlGS:0000000000000000
[ 2434.372555] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2434.441296] CR2: 00007f980bc97000 CR3: 0000000107734003 CR4: 00000000007706e0
[ 2434.526684] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2434.612066] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2434.697449] PKRU: 55555554
[ 2434.729791] Call Trace:
[ 2434.759017]  <TASK>
[ 2434.784079]  ? __die_body+0x82/0x130
[ 2434.826826]  ? die_addr+0xaa/0xe0
[ 2434.866446]  ? exc_general_protection+0x13a/0x1e0
[ 2434.922711]  ? asm_exc_general_protection+0x27/0x30
[ 2434.981054]  ? nf_ct_expect_dst_hash+0x120/0x120
[ 2435.036276]  ? nf_ct_expect_dst_hash+0x120/0x120
[ 2435.091503]  ? nf_ct_unlink_expect_report+0x2d/0x1f0
[ 2435.150885]  nf_ct_expectation_timed_out+0x2b/0x90
[ 2435.208189]  ? nf_ct_expect_dst_hash+0x120/0x120
[ 2435.263415]  call_timer_fn+0x2f/0x110
[ 2435.307195]  run_timer_softirq+0x616/0x700
[ 2435.356179]  ? newidle_balance+0x299/0x320
[ 2435.405166]  __do_softirq+0xdc/0x2ab
[ 2435.447904]  run_ksoftirqd+0x1c/0x30
[ 2435.490649]  smpboot_thread_fn+0xe8/0x1b0
[ 2435.538595]  kthread+0x269/0x2a0
[ 2435.577179]  ? __smpboot_create_thread+0x220/0x220
[ 2435.634479]  ? kthreadd+0x380/0x380
[ 2435.676187]  ret_from_fork+0x1f/0x30
[ 2435.718930]  </TASK>
-------------------------------------------------------------------------------

Re: [PATCH] netfilter: conntrack: drop expectations before freeing templates
Posted by Florian Westphal 1 month, 2 weeks ago
Qingjie Xing <xqjcool@gmail.com> wrote:
> With an iptables-configured TFTP helper in place, a UDP packet 
> (10.65.41.36:1069 → 10.65.36.2:69, TFTP RRQ) triggered creation of an expectation.
> Later, iptables changes removed the rule’s per-rule template nf_conn. 
> When the expectation’s timer expired, nf_ct_unlink_expect_report() 
> ran and dereferenced the freed master, causing a crash.

Sorry, I do not see the problem.
A template should never be listed as exp->master.

Can you make a reproducer/selftest for this bug?

I worry we paper over a different bug.
Re: [PATCH] netfilter: conntrack: drop expectations before freeing templates
Posted by Florian Westphal 1 month, 2 weeks ago
Florian Westphal <fw@strlen.de> wrote:
> Qingjie Xing <xqjcool@gmail.com> wrote:
> > With an iptables-configured TFTP helper in place, a UDP packet 
> > (10.65.41.36:1069 → 10.65.36.2:69, TFTP RRQ) triggered creation of an expectation.
> > Later, iptables changes removed the rule’s per-rule template nf_conn. 
> > When the expectation’s timer expired, nf_ct_unlink_expect_report() 
> > ran and dereferenced the freed master, causing a crash.
> 
> Sorry, I do not see the problem.
> A template should never be listed as exp->master.
> 
> Can you make a reproducer/selftest for this bug?
> 
> I worry we paper over a different bug.

Or maybe this will provide a clue (not even compile tested).

@@ -299,6 +302,9 @@ struct nf_conntrack_expect *nf_ct_expect_alloc(struct nf_conn *me)
 {
        struct nf_conntrack_expect *new;

+       if (WARN_ON_ONCE(nf_ct_is_template(me)))
+               return NULL;
+
        new = kmem_cache_alloc(nf_ct_expect_cachep, GFP_ATOMIC);
        if (!new)

Re: [PATCH] netfilter: conntrack: drop expectations before freeing templates
Posted by Qingjie Xing 1 month, 2 weeks ago
I added a panic() in nf_ct_expect_insert(). After reproducing, the crash dump 
(via crash) shows the nf_conntrack involved is a template (used as the master), 
and the expectation insertion was triggered by a TFTP packet.

The detailed information is as follows:
---------------------------------------------------
crash> sys
      KERNEL: vmlinux  [TAINTED]
    DUMPFILE: coredump-2025-08-15-00_40-8.0.0-B3  [PARTIAL DUMP]
        CPUS: 64
        DATE: Thu Aug 14 17:40:23 PDT 2025
      UPTIME: 00:01:36
LOAD AVERAGE: 4.39, 1.37, 0.48
       TASKS: 1115
    NODENAME: MYNODE
     RELEASE: 6.1
     VERSION: #15 SMP Thu Aug 14 17:02:44 PDT 2025
     MACHINE: x86_64  (2800 Mhz)
      MEMORY: 382.7 GB
       PANIC: "Kernel panic - not syncing: [nf_ct_expect_insert:417] exp:ffff88822ed78008 master:ffff888136fcd000 ext:ffff8881087bf500 jiffies:4294761886 timeout:300 expires:4295061886"
crash> bt
PID: 8605     TASK: ffff888139140040  CPU: 4    COMMAND: "cli"
 #0 [ffffc9001762b7f8] machine_kexec at ffffffff80279d53
 #1 [ffffc9001762b878] __crash_kexec at ffffffff8038b7b7
 #2 [ffffc9001762b948] panic at ffffffff802973e0
 #3 [ffffc9001762b9c8] nf_ct_expect_related_report at ffffffff80ee7b27
 #4 [ffffc9001762ba40] tftp_help at ffffffff80f001ea
 #5 [ffffc9001762ba98] nf_confirm at ffffffff80eeaa77
 #6 [ffffc9001762bac8] ipv4_confirm at ffffffff80eeafa9
 #7 [ffffc9001762baf8] nf_hook_slow at ffffffff80ed24db
 #8 [ffffc9001762bb40] ip_output at ffffffff80fe85a5
 #9 [ffffc9001762bbc8] udp_send_skb at ffffffff81033372
#10 [ffffc9001762bc18] udp_sendmsg at ffffffff81032cb2
#11 [ffffc9001762bd90] inet_sendmsg at ffffffff810488a1
#12 [ffffc9001762bdb8] __sys_sendto at ffffffff80dcdda7
#13 [ffffc9001762bf08] __x64_sys_sendto at ffffffff80dcde46
#14 [ffffc9001762bf18] do_syscall_64 at ffffffff811eab09
#15 [ffffc9001762bf50] entry_SYSCALL_64_after_hwframe at ffffffff812000dc
    RIP: 00007fde06cdb8f3  RSP: 00007ffc80d56358  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: 000055eb6f07fc03  RCX: 00007fde06cdb8f3
    RDX: 000000000000001d  RSI: 00007fde047666c0  RDI: 0000000000000007
    RBP: 000055eb6fc941c3   R8: 00007fde047665a0   R9: 0000000000000010
    R10: 0000000000000000  R11: 0000000000000202  R12: 0000000000000000
    R13: 000000000000000b  R14: 00007fde030f4908  R15: 0000000000000006
    ORIG_RAX: 000000000000002c  CS: 0033  SS: 002b
crash> nf_conn.status -x ffff888136fcd000     
  status = 0x808,
----------------------------------------------------
Re: [PATCH] netfilter: conntrack: drop expectations before freeing templates
Posted by Florian Westphal 1 month, 1 week ago
Qingjie Xing <xqjcool@gmail.com> wrote:
> I added a panic() in nf_ct_expect_insert(). After reproducing, the crash dump 
> (via crash) shows the nf_conntrack involved is a template (used as the master), 
> and the expectation insertion was triggered by a TFTP packet.

The tftp packet should be associated with a conntrack entry, not a
template.

>  #3 [ffffc9001762b9c8] nf_ct_expect_related_report at ffffffff80ee7b27
>  #4 [ffffc9001762ba40] tftp_help at ffffffff80f001ea
>  #5 [ffffc9001762ba98] nf_confirm at ffffffff80eeaa77
>  #6 [ffffc9001762bac8] ipv4_confirm at ffffffff80eeafa9
>  #7 [ffffc9001762baf8] nf_hook_slow at ffffffff80ed24db
>  #8 [ffffc9001762bb40] ip_output at ffffffff80fe85a5
>  #9 [ffffc9001762bbc8] udp_send_skb at ffffffff81033372
> #10 [ffffc9001762bc18] udp_sendmsg at ffffffff81032cb2
> #11 [ffffc9001762bd90] inet_sendmsg at ffffffff810488a1

How can this happen?

1. -t raw assigns skb->_nfct to the template.
2. at OUTPUT, nf_conntrack_in is called:

unsigned int
nf_conntrack_in(struct sk_buff *skb, const struct nf_hook_state *state)
{
        enum ip_conntrack_info ctinfo;
        struct nf_conn *ct, *tmpl;
        u_int8_t protonum;
        int dataoff, ret;

        tmpl = nf_ct_get(skb, &ctinfo);
        if (tmpl || ctinfo == IP_CT_UNTRACKED) {
                /* Previously seen (loopback or untracked)?  Ignore. */
                if ((tmpl && !nf_ct_is_template(tmpl)) ||
                     ctinfo == IP_CT_UNTRACKED)
                        return NF_ACCEPT;
                skb->_nfct = 0; // HERE
        }

... and that will *clear* the template again.

3. nf_conntrack_in assigns skb->_nfct to a newly allocated
   connrack (not a template).

The backtrace you quote should be impossible.

You need to figure out why skb->_nfct was not cleared by
nf_conntrack_in().

You did not mention anything about timing, does this only
happen at the start, i.e. do we have a race where nf_confirm
was just registered with nf_hook_slow for the first time but
ipv4_confirm wasn't set up yet?

If so, please fix nf_confirm() to return early if the skb
has a template attached.
Re: [PATCH] netfilter: conntrack: drop expectations before freeing templates
Posted by Qingjie Xing 1 month, 1 week ago
Thanks for the careful review and the pointers.

I dug deeper and found the root cause on my side: there was leftover/out-of-tree
 code in my local tree that could attach the per-rule template to skb->_nfct. 
After cleaning up those remnants, upstream behavior matches your description—
nf_conntrack_in() clears any template, tftp_help() sees a real conntrack, 
and I can no longer reproduce the crash.

Apologies for the noise and for any time this cost you. I’ll withdraw the patch 
as it was addressing a problem introduced by my local changes. 

Thanks again for the guidance.