net/netfilter/nf_conntrack_core.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
When deleting an xt_CT rule, its per-rule template conntrack is freed via
nf_ct_destroy() -> nf_ct_tmpl_free(). If an expectation was created with
that template as its master, the expectation's timeout/flush later calls
nf_ct_unlink_expect_report() and dereferences exp->master, which now points
to freed memory, leading to a NULL/poison deref and crash.
Move nf_ct_remove_expectations(ct) before the template early-return in
nf_ct_destroy() so that any expectations attached to a template are removed
(and their timers cancelled) before the template's extensions are torn down.
Signed-off-by: Qingjie Xing <xqjcool@gmail.com>
---
net/netfilter/nf_conntrack_core.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 344f88295976..7f6b95404907 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -577,6 +577,13 @@ void nf_ct_destroy(struct nf_conntrack *nfct)
WARN_ON(refcount_read(&nfct->use) != 0);
+ /* Expectations will have been removed in clean_from_lists,
+ * except TFTP can create an expectation on the first packet,
+ * before connection is in the list, so we need to clean here,
+ * too.
+ */
+ nf_ct_remove_expectations(ct);
+
if (unlikely(nf_ct_is_template(ct))) {
nf_ct_tmpl_free(ct);
return;
@@ -585,13 +592,6 @@ void nf_ct_destroy(struct nf_conntrack *nfct)
if (unlikely(nf_ct_protonum(ct) == IPPROTO_GRE))
destroy_gre_conntrack(ct);
- /* Expectations will have been removed in clean_from_lists,
- * except TFTP can create an expectation on the first packet,
- * before connection is in the list, so we need to clean here,
- * too.
- */
- nf_ct_remove_expectations(ct);
-
if (ct->master)
nf_ct_put(ct->master);
base-commit: 01792bc3e5bdafa171dd83c7073f00e7de93a653
--
2.25.1
Qingjie Xing <xqjcool@gmail.com> wrote: > When deleting an xt_CT rule, its per-rule template conntrack is freed via > nf_ct_destroy() -> nf_ct_tmpl_free(). If an expectation was created with > that template as its master, Uhm. How can that happen? A template isn't a connection, so it should not be able to create an expectation.
With an iptables-configured TFTP helper in place, a UDP packet (10.65.41.36:1069 → 10.65.36.2:69, TFTP RRQ) triggered creation of an expectation. Later, iptables changes removed the rule’s per-rule template nf_conn. When the expectation’s timer expired, nf_ct_unlink_expect_report() ran and dereferenced the freed master, causing a crash. The detailed system logs are as follows: -------------------------------------------------------------------------------- //create [ 1978.316487] nf_conntrack: [nf_ct_tmpl_alloc:580] nf_conn:ffff8881391e3800 ext:0 //insert [ 2131.989389] [nf_ct_expect_insert:417] exp:ffff88823aac8008 master:ffff8881391e3800 ext:ffff888286a3c500 jiffies:4296796140 timeout:300 expires:4297096140 [ 2140.352649] nf_conntrack: [nf_ct_tmpl_alloc:580] nf_conn:ffff88813ae58e00 ext:0 [ 2140.352657] nf_conntrack: [nf_ct_tmpl_alloc:580] nf_conn:ffff88813ae59a00 ext:0 [ 2140.352661] nf_conntrack: [nf_ct_tmpl_alloc:580] nf_conn:ffff88813ae5d600 ext:0 [ 2140.352664] nf_conntrack: [nf_ct_tmpl_alloc:580] nf_conn:ffff88813ae58800 ext:0 [ 2140.352735] nf_conntrack: [nf_ct_tmpl_free:594] nf_conn:ffff8881391e3200 ext:6b6b6b6b6b6b6b6b [ 2140.352738] CPU: 0 PID: 4691 Comm: netd Kdump: loaded Tainted: G W O 6.1 #16 [ 2140.352740] Hardware name: Supermicro SYS-2049P-TN8R-FI005/X11QPL, BIOS 3.3 02/19/2020 [ 2140.352741] Call Trace: [ 2140.352742] <TASK> [ 2140.352743] nf_ct_tmpl_free+0x4f/0x60 [ 2140.352749] nf_ct_destroy+0xce/0x290 [ 2140.352752] xt_ct_tg_destroy+0x78/0xc0 [ 2140.352756] xt_ct_tg_destroy_v1+0x12/0x20 [ 2140.352758] cleanup_entry+0x115/0x1b0 [ 2140.352761] __do_replace+0x3ab/0x530 [ 2140.352763] ? do_ipt_set_ctl+0x5ef/0x6c0 [ 2140.352765] do_ipt_set_ctl+0x5ef/0x6c0 [ 2140.352767] nf_setsockopt+0x1a8/0x2e0 [ 2140.352769] raw_setsockopt+0x7b/0x120 [ 2140.352771] sock_common_setsockopt+0x18/0x30 [ 2140.352773] __sys_setsockopt+0xb9/0x130 [ 2140.352775] __x64_sys_setsockopt+0x21/0x30 [ 2140.352777] do_syscall_64+0x49/0xa0 [ 2140.352780] ? irqentry_exit+0x12/0x40 [ 2140.352782] entry_SYSCALL_64_after_hwframe+0x64/0xce [ 2140.352785] RIP: 0033:0x7f621d3f49aa [ 2140.352787] Code: ff ff ff c3 0f 1f 40 00 48 8b 15 69 b4 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 49 89 ca b8 36 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 06 c3 0f 1f 44 00 00 48 8b 15 39 b4 0c 00 f7 //free [ 2140.352889] nf_conntrack: [nf_ct_tmpl_free:594] nf_conn:ffff8881391e3800 ext:6b6b6b6b6b6b6b6b [ 2140.352891] CPU: 0 PID: 4691 Comm: netd Kdump: loaded Tainted: G W O 6.1 #16 [ 2140.352892] Hardware name: Supermicro SYS-2049P-TN8R-FI005/X11QPL, BIOS 3.3 02/19/2020 [ 2140.352893] Call Trace: [ 2140.352893] <TASK> [ 2140.352894] nf_ct_tmpl_free+0x4f/0x60 [ 2140.352896] nf_ct_destroy+0xce/0x290 [ 2140.352898] xt_ct_tg_destroy+0x78/0xc0 [ 2140.352900] xt_ct_tg_destroy_v1+0x12/0x20 [ 2140.352902] cleanup_entry+0x115/0x1b0 [ 2140.352904] __do_replace+0x3ab/0x530 [ 2140.352906] ? do_ipt_set_ctl+0x5ef/0x6c0 [ 2140.352907] do_ipt_set_ctl+0x5ef/0x6c0 [ 2140.352909] nf_setsockopt+0x1a8/0x2e0 [ 2140.352911] raw_setsockopt+0x7b/0x120 [ 2140.352912] sock_common_setsockopt+0x18/0x30 [ 2140.352913] __sys_setsockopt+0xb9/0x130 [ 2140.352915] __x64_sys_setsockopt+0x21/0x30 [ 2140.352917] do_syscall_64+0x49/0xa0 [ 2140.352919] ? irqentry_exit+0x12/0x40 [ 2140.352920] entry_SYSCALL_64_after_hwframe+0x64/0xce [ 2140.352923] RIP: 0033:0x7f621d3f49aa [ 2140.352924] Code: ff ff ff c3 0f 1f 40 00 48 8b 15 69 b4 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 49 89 ca b8 36 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 06 c3 0f 1f 44 00 00 48 8b 15 39 b4 0c 00 f7 //expectation timeout [ 2433.066066] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] SMP NOPTI [ 2433.187797] CPU: 10 PID: 66 Comm: ksoftirqd/10 Kdump: loaded Tainted: G W O 6.1 #16 [ 2433.293977] Hardware name: Supermicro SYS-2049P-TN8R-FI005/X11QPL, BIOS 3.3 02/19/2020 [ 2433.306651] nf_conntrack: [__nf_conntrack_alloc:1729] nf_conn:ffff8882a9268440 jiffies:4297097457 [ 2433.388722] RIP: 0010:nf_ct_unlink_expect_report+0x2d/0x1f0 [ 2433.388730] Code: 00 00 55 48 89 e5 41 56 53 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 45 e8 48 8b 4f 70 4c 8b 81 e8 00 00 00 4d 85 c0 74 39 <41> 0f b7 00 48 85 c0 74 30 41 83 78 1c 00 75 11 4c 01 c0 48 8b 99 [ 2433.388732] RSP: 0018:ffffc9000ce0fce0 EFLAGS: 00010202 [ 2433.848812] RAX: a79bfdc906a58200 RBX: ffff88823aac8088 RCX: ffff8881391e3800 [ 2433.934200] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88823aac8008 [ 2434.019584] RBP: ffffc9000ce0fd08 R08: 6b6b6b6b6b6b6b6b R09: 0000000000000000 [ 2434.104964] R10: ffff8897e0f1cc00 R11: ffffffff80ee7e00 R12: ffffffff80ee7e00 [ 2434.190349] R13: 0000000000000000 R14: ffff88823aac8008 R15: ffff88823aac8088 [ 2434.275728] FS: 0000000000000000(0000) GS:ffff8897e0f00000(0000) knlGS:0000000000000000 [ 2434.372555] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2434.441296] CR2: 00007f980bc97000 CR3: 0000000107734003 CR4: 00000000007706e0 [ 2434.526684] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2434.612066] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 2434.697449] PKRU: 55555554 [ 2434.729791] Call Trace: [ 2434.759017] <TASK> [ 2434.784079] ? __die_body+0x82/0x130 [ 2434.826826] ? die_addr+0xaa/0xe0 [ 2434.866446] ? exc_general_protection+0x13a/0x1e0 [ 2434.922711] ? asm_exc_general_protection+0x27/0x30 [ 2434.981054] ? nf_ct_expect_dst_hash+0x120/0x120 [ 2435.036276] ? nf_ct_expect_dst_hash+0x120/0x120 [ 2435.091503] ? nf_ct_unlink_expect_report+0x2d/0x1f0 [ 2435.150885] nf_ct_expectation_timed_out+0x2b/0x90 [ 2435.208189] ? nf_ct_expect_dst_hash+0x120/0x120 [ 2435.263415] call_timer_fn+0x2f/0x110 [ 2435.307195] run_timer_softirq+0x616/0x700 [ 2435.356179] ? newidle_balance+0x299/0x320 [ 2435.405166] __do_softirq+0xdc/0x2ab [ 2435.447904] run_ksoftirqd+0x1c/0x30 [ 2435.490649] smpboot_thread_fn+0xe8/0x1b0 [ 2435.538595] kthread+0x269/0x2a0 [ 2435.577179] ? __smpboot_create_thread+0x220/0x220 [ 2435.634479] ? kthreadd+0x380/0x380 [ 2435.676187] ret_from_fork+0x1f/0x30 [ 2435.718930] </TASK> -------------------------------------------------------------------------------
Qingjie Xing <xqjcool@gmail.com> wrote: > With an iptables-configured TFTP helper in place, a UDP packet > (10.65.41.36:1069 → 10.65.36.2:69, TFTP RRQ) triggered creation of an expectation. > Later, iptables changes removed the rule’s per-rule template nf_conn. > When the expectation’s timer expired, nf_ct_unlink_expect_report() > ran and dereferenced the freed master, causing a crash. Sorry, I do not see the problem. A template should never be listed as exp->master. Can you make a reproducer/selftest for this bug? I worry we paper over a different bug.
Florian Westphal <fw@strlen.de> wrote: > Qingjie Xing <xqjcool@gmail.com> wrote: > > With an iptables-configured TFTP helper in place, a UDP packet > > (10.65.41.36:1069 → 10.65.36.2:69, TFTP RRQ) triggered creation of an expectation. > > Later, iptables changes removed the rule’s per-rule template nf_conn. > > When the expectation’s timer expired, nf_ct_unlink_expect_report() > > ran and dereferenced the freed master, causing a crash. > > Sorry, I do not see the problem. > A template should never be listed as exp->master. > > Can you make a reproducer/selftest for this bug? > > I worry we paper over a different bug. Or maybe this will provide a clue (not even compile tested). @@ -299,6 +302,9 @@ struct nf_conntrack_expect *nf_ct_expect_alloc(struct nf_conn *me) { struct nf_conntrack_expect *new; + if (WARN_ON_ONCE(nf_ct_is_template(me))) + return NULL; + new = kmem_cache_alloc(nf_ct_expect_cachep, GFP_ATOMIC); if (!new)
I added a panic() in nf_ct_expect_insert(). After reproducing, the crash dump (via crash) shows the nf_conntrack involved is a template (used as the master), and the expectation insertion was triggered by a TFTP packet. The detailed information is as follows: --------------------------------------------------- crash> sys KERNEL: vmlinux [TAINTED] DUMPFILE: coredump-2025-08-15-00_40-8.0.0-B3 [PARTIAL DUMP] CPUS: 64 DATE: Thu Aug 14 17:40:23 PDT 2025 UPTIME: 00:01:36 LOAD AVERAGE: 4.39, 1.37, 0.48 TASKS: 1115 NODENAME: MYNODE RELEASE: 6.1 VERSION: #15 SMP Thu Aug 14 17:02:44 PDT 2025 MACHINE: x86_64 (2800 Mhz) MEMORY: 382.7 GB PANIC: "Kernel panic - not syncing: [nf_ct_expect_insert:417] exp:ffff88822ed78008 master:ffff888136fcd000 ext:ffff8881087bf500 jiffies:4294761886 timeout:300 expires:4295061886" crash> bt PID: 8605 TASK: ffff888139140040 CPU: 4 COMMAND: "cli" #0 [ffffc9001762b7f8] machine_kexec at ffffffff80279d53 #1 [ffffc9001762b878] __crash_kexec at ffffffff8038b7b7 #2 [ffffc9001762b948] panic at ffffffff802973e0 #3 [ffffc9001762b9c8] nf_ct_expect_related_report at ffffffff80ee7b27 #4 [ffffc9001762ba40] tftp_help at ffffffff80f001ea #5 [ffffc9001762ba98] nf_confirm at ffffffff80eeaa77 #6 [ffffc9001762bac8] ipv4_confirm at ffffffff80eeafa9 #7 [ffffc9001762baf8] nf_hook_slow at ffffffff80ed24db #8 [ffffc9001762bb40] ip_output at ffffffff80fe85a5 #9 [ffffc9001762bbc8] udp_send_skb at ffffffff81033372 #10 [ffffc9001762bc18] udp_sendmsg at ffffffff81032cb2 #11 [ffffc9001762bd90] inet_sendmsg at ffffffff810488a1 #12 [ffffc9001762bdb8] __sys_sendto at ffffffff80dcdda7 #13 [ffffc9001762bf08] __x64_sys_sendto at ffffffff80dcde46 #14 [ffffc9001762bf18] do_syscall_64 at ffffffff811eab09 #15 [ffffc9001762bf50] entry_SYSCALL_64_after_hwframe at ffffffff812000dc RIP: 00007fde06cdb8f3 RSP: 00007ffc80d56358 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 000055eb6f07fc03 RCX: 00007fde06cdb8f3 RDX: 000000000000001d RSI: 00007fde047666c0 RDI: 0000000000000007 RBP: 000055eb6fc941c3 R8: 00007fde047665a0 R9: 0000000000000010 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 R13: 000000000000000b R14: 00007fde030f4908 R15: 0000000000000006 ORIG_RAX: 000000000000002c CS: 0033 SS: 002b crash> nf_conn.status -x ffff888136fcd000 status = 0x808, ----------------------------------------------------
Qingjie Xing <xqjcool@gmail.com> wrote: > I added a panic() in nf_ct_expect_insert(). After reproducing, the crash dump > (via crash) shows the nf_conntrack involved is a template (used as the master), > and the expectation insertion was triggered by a TFTP packet. The tftp packet should be associated with a conntrack entry, not a template. > #3 [ffffc9001762b9c8] nf_ct_expect_related_report at ffffffff80ee7b27 > #4 [ffffc9001762ba40] tftp_help at ffffffff80f001ea > #5 [ffffc9001762ba98] nf_confirm at ffffffff80eeaa77 > #6 [ffffc9001762bac8] ipv4_confirm at ffffffff80eeafa9 > #7 [ffffc9001762baf8] nf_hook_slow at ffffffff80ed24db > #8 [ffffc9001762bb40] ip_output at ffffffff80fe85a5 > #9 [ffffc9001762bbc8] udp_send_skb at ffffffff81033372 > #10 [ffffc9001762bc18] udp_sendmsg at ffffffff81032cb2 > #11 [ffffc9001762bd90] inet_sendmsg at ffffffff810488a1 How can this happen? 1. -t raw assigns skb->_nfct to the template. 2. at OUTPUT, nf_conntrack_in is called: unsigned int nf_conntrack_in(struct sk_buff *skb, const struct nf_hook_state *state) { enum ip_conntrack_info ctinfo; struct nf_conn *ct, *tmpl; u_int8_t protonum; int dataoff, ret; tmpl = nf_ct_get(skb, &ctinfo); if (tmpl || ctinfo == IP_CT_UNTRACKED) { /* Previously seen (loopback or untracked)? Ignore. */ if ((tmpl && !nf_ct_is_template(tmpl)) || ctinfo == IP_CT_UNTRACKED) return NF_ACCEPT; skb->_nfct = 0; // HERE } ... and that will *clear* the template again. 3. nf_conntrack_in assigns skb->_nfct to a newly allocated connrack (not a template). The backtrace you quote should be impossible. You need to figure out why skb->_nfct was not cleared by nf_conntrack_in(). You did not mention anything about timing, does this only happen at the start, i.e. do we have a race where nf_confirm was just registered with nf_hook_slow for the first time but ipv4_confirm wasn't set up yet? If so, please fix nf_confirm() to return early if the skb has a template attached.
Thanks for the careful review and the pointers. I dug deeper and found the root cause on my side: there was leftover/out-of-tree code in my local tree that could attach the per-rule template to skb->_nfct. After cleaning up those remnants, upstream behavior matches your description— nf_conntrack_in() clears any template, tftp_help() sees a real conntrack, and I can no longer reproduce the crash. Apologies for the noise and for any time this cost you. I’ll withdraw the patch as it was addressing a problem introduced by my local changes. Thanks again for the guidance.
© 2016 - 2025 Red Hat, Inc.