From nobody Tue Dec 2 01:07:44 2025 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F0B1E1FF7BC for ; Sat, 22 Nov 2025 00:37:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763771852; cv=none; b=WLUKxUKlIgXTs3/8YbwWxTEHUKF6MF9f5ezxoiXUg05pUSUucOk7yvnkkccg29XHSzIaibluK85YYavKpP/7HpWWAqx5sWUP7cCHJUFgjFXW51ylKJmWvQom47LB9e44FLYyWgCGaEde/HGVb+7v6DPl0fwm4BqILYPnyqFHO8o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763771852; c=relaxed/simple; bh=nD/OreCZyNYJ6uxc6JTZKZu9jQPGRrx/6pbDTQ06Hp8=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=CuG4UjSgDRc9g7EbBOvZ15qbFa+t1ltJbtKOxohg1mqK1VhnaCtY2QoBVm+OXjf162iCZ6/l4lGkBH09ZOcMYMgutt2rN+A7Lf3PsI0l2TIR3dK3wGLEUKVxFDSlK0soXJutxT1jsyTISCKnci+gRLZUGC8NppgkAa+8N5kAKu8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=JOfUMtJ4; arc=none smtp.client-ip=209.85.214.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="JOfUMtJ4" Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-2984dfae043so23535565ad.0 for ; Fri, 21 Nov 2025 16:37:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1763771850; x=1764376650; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=GdE87jv+ksb90tyIoSPueieOcJlnzUjYmwmmT0MapeQ=; b=JOfUMtJ4lTaCBRHcXYfrSCK6smjvQcI3RgmrxgjH9NlrztYFBSdzjEtNiGXRIs5E/w j6EQlZKfYBwnCVFR1kTIwhjXbWHueaoIsAQvKPjLHabrnBZsR+6/xLuvxWtBtIV34WQ0 6UXqMdpqhjKy+1m8Rsyt0Zr0Ubd0iTIihGs6TjbMMqX6fvXXC6thUKdxgwZCKqA8ZSXO micBDCOwaDPRRMH5buMpiFdRvZYAD0/o+M0IWFxanHsX+YgU8NwWFXlqROnVD6pdOvDk 4vtkCcq44+QEg3AW8ZXtTOgEEI2bhGxHigfamnCcvU3d5eZddhmVoNUXbM6M+++eRsp7 2W2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763771850; x=1764376650; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=GdE87jv+ksb90tyIoSPueieOcJlnzUjYmwmmT0MapeQ=; b=ikUAW0mh5izDH00wZBZORmWuhh+Y/FnEqyb9iOmw4Qw1PZsRVXSUygwAs4mfQpVi2o zlZzLHvkTCp7X61r93OvFp0dwUry1TXBMdn3WruIzDvAPt/5jUhZXuptjrZvzHUVz5C3 YGz0HJmUgr2O3HrTo7Bfunslw1hpR4nsna13eF3wj085CTGmHiDLK76CCucncg5DhATi sHy3SJe09Qs7u3fPV7/pdgekOy+BctIe00PshfUNUp42TSkeUrlFW+BPQ2tNIGNb1tu2 +g0PkGndMpq6CpSYI5su87zXNcQDYvBg4Lj1LYNhxlxSMdU080QxYmWDEyLiJLAuSuH3 E9lA== X-Forwarded-Encrypted: i=1; AJvYcCVKWLo9F+yJnzE/jQy1Cl6AIyhJDug12imZXLerYPBp+aD4pRKFabLSlUT8RAWrpeykDFdtl4prWXhWYY0=@vger.kernel.org X-Gm-Message-State: AOJu0YzZQCkBBs9iGdQLivrIo3CsyR4cROzyOlSh+xZLRZIzrUhqNIZm HqHoImdKmDRn3gLL9nOntrW+hPN6xKfBN73G7JROpw+IkH/nB9+xAs6J X-Gm-Gg: ASbGncuUIUm3ztCR++PeqFzhteA8wgZA7vLAebBPnvYK4JShrH+iWBEkT6cQUkfYB6Q AE7/z4UdOfYMiPyRs7RQmwP9wLBgzYJD4ly/MMhZC2d0uxsJEv7rgahaIuu/2sGQ3kZgWfvTC/a 88H1sIuepRMss7HgqFk7I0DGbwWTYfvoWZ0+jxAN8NTHedYJ4C9+TL4N8kf3k6gApGdDD30at73 XiBhI+l/W+XIuoW0Necx9Dg/9q8cLDYetiBhTUY3u2scBt3BEJOVCA3H9/Ujn5g35DjyKqHasyy GsLG9CFKO1sLP2ktqCLBDK3fFAy80Rt3F5tD1N8Iw6VJM5RNk8CiZS9feSdPvRERvF8BxCcAfo3 nu043yO7eKUTcoRO1pOztgUaX7chEXk59Bx6bnsyl1P2gGM8ArzRQTjgvow6ldPa7pTqYrPmM8h q0trkAtMqm1c2qvvfq3gzDnA== X-Google-Smtp-Source: AGHT+IGNR/G92uhQP4HSNTyf73IpurlHjWO6Un88yj3xf3bOPKhTUsWmvZOLAqdkY6cuuIpwzmf20w== X-Received: by 2002:a05:7022:250a:b0:11b:baa5:f4d1 with SMTP id a92af1059eb24-11c9d711d43mr1437404c88.6.1763771850044; Fri, 21 Nov 2025 16:37:30 -0800 (PST) Received: from mac.com ([136.24.82.250]) by smtp.gmail.com with ESMTPSA id a92af1059eb24-11c93e6dbc8sm31120801c88.10.2025.11.21.16.37.29 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 21 Nov 2025 16:37:29 -0800 (PST) From: Scott Mitchell X-Google-Original-From: Scott Mitchell To: pablo@netfilter.org Cc: kadlec@netfilter.org, fw@strlen.de, phil@nwl.cc, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, horms@kernel.org, netfilter-devel@vger.kernel.org, coreteam@netfilter.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Scott Mitchell , syzbot@syzkaller.appspotmail.com Subject: [PATCH v5] netfilter: nfnetlink_queue: optimize verdict lookup with hash table Date: Fri, 21 Nov 2025 16:37:20 -0800 Message-Id: <20251122003720.16724-1-scott_mitchell@apple.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Scott Mitchell The current implementation uses a linear list to find queued packets by ID when processing verdicts from userspace. With large queue depths and out-of-order verdicting, this O(n) lookup becomes a significant bottleneck, causing userspace verdict processing to dominate CPU time. Replace the linear search with a hash table for O(1) average-case packet lookup by ID. The hash table size is configurable via the new NFQA_CFG_HASH_SIZE netlink attribute (default 1024 buckets, matching NFQNL_QMAX_DEFAULT; max 131072). The size is normalized to a power of two to enable efficient bitwise masking instead of modulo operations. Unpatched kernels silently ignore the new attribute, maintaining backward compatibility. The existing list data structure is retained for operations requiring linear iteration (e.g. flush, device down events). Hot fields (queue_hash_mask, queue_hash pointer) are placed in the same cache line as the spinlock and packet counters for optimal memory access patterns. Signed-off-by: Scott Mitchell Tested-by: syzbot@syzkaller.appspotmail.com --- Changes in v5: - Use GFP_ATOMIC with kvmalloc_array instead of GFP_KERNEL_ACCOUNT due to rcu_read_lock held in nfqnl_recv_config. Add comment explaining that GFP_KERNEL_ACCOUNT would require lock refactoring (Florian Westphal) Changes in v4: - Fix sleeping while atomic bug: allocate hash table before taking spinlock in instance_create() (syzbot) Changes in v3: - Simplify hash function to use direct masking (id & mask) instead of hash_32() for better cache locality with sequential IDs (Eric Dumazet) Changes in v2: - Use kvcalloc/kvfree with GFP_KERNEL_ACCOUNT to support larger hash tables with vmalloc fallback (Florian Westphal) - Remove incorrect comment about concurrent resizes - nfnetlink subsystem mutex already serializes config operations (Florian Westphal) - Fix style: remove unnecessary braces around single-line if (Florian Westp= hal) include/net/netfilter/nf_queue.h | 1 + .../uapi/linux/netfilter/nfnetlink_queue.h | 1 + net/netfilter/nfnetlink_queue.c | 133 ++++++++++++++++-- 3 files changed, 127 insertions(+), 8 deletions(-) diff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_qu= eue.h index 4aeffddb7586..3d0def310523 100644 --- a/include/net/netfilter/nf_queue.h +++ b/include/net/netfilter/nf_queue.h @@ -11,6 +11,7 @@ /* Each queued (to userspace) skbuff has one of these. */ struct nf_queue_entry { struct list_head list; + struct hlist_node hash_node; struct sk_buff *skb; unsigned int id; unsigned int hook_index; /* index in hook_entries->hook[] */ diff --git a/include/uapi/linux/netfilter/nfnetlink_queue.h b/include/uapi/= linux/netfilter/nfnetlink_queue.h index efcb7c044a74..bc296a17e5aa 100644 --- a/include/uapi/linux/netfilter/nfnetlink_queue.h +++ b/include/uapi/linux/netfilter/nfnetlink_queue.h @@ -107,6 +107,7 @@ enum nfqnl_attr_config { NFQA_CFG_QUEUE_MAXLEN, /* __u32 */ NFQA_CFG_MASK, /* identify which flags to change */ NFQA_CFG_FLAGS, /* value of these flags (__u32) */ + NFQA_CFG_HASH_SIZE, /* __u32 hash table size (rounded to power of 2) */ __NFQA_CFG_MAX }; #define NFQA_CFG_MAX (__NFQA_CFG_MAX-1) diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queu= e.c index 8b7b39d8a109..b142fac70ed9 100644 --- a/net/netfilter/nfnetlink_queue.c +++ b/net/netfilter/nfnetlink_queue.c @@ -46,7 +46,10 @@ #include #endif =20 -#define NFQNL_QMAX_DEFAULT 1024 +#define NFQNL_QMAX_DEFAULT 1024 +#define NFQNL_MIN_HASH_SIZE 16 +#define NFQNL_DEFAULT_HASH_SIZE 1024 +#define NFQNL_MAX_HASH_SIZE 131072 =20 /* We're using struct nlattr which has 16bit nla_len. Note that nla_len * includes the header length. Thus, the maximum packet length that we @@ -65,6 +68,7 @@ struct nfqnl_instance { unsigned int copy_range; unsigned int queue_dropped; unsigned int queue_user_dropped; + unsigned int queue_hash_size; =20 =20 u_int16_t queue_num; /* number of this queue */ @@ -77,6 +81,8 @@ struct nfqnl_instance { spinlock_t lock ____cacheline_aligned_in_smp; unsigned int queue_total; unsigned int id_sequence; /* 'sequence' of pkt ids */ + unsigned int queue_hash_mask; + struct hlist_head *queue_hash; struct list_head queue_list; /* packets in queue */ }; =20 @@ -95,6 +101,39 @@ static struct nfnl_queue_net *nfnl_queue_pernet(struct = net *net) return net_generic(net, nfnl_queue_net_id); } =20 +static inline unsigned int +nfqnl_packet_hash(u32 id, unsigned int mask) +{ + return id & mask; +} + +static inline u32 +nfqnl_normalize_hash_size(u32 hash_size) +{ + /* Must be power of two for queue_hash_mask to work correctly. + * Avoid overflow of is_power_of_2 by bounding NFQNL_MAX_HASH_SIZE. + */ + BUILD_BUG_ON(!is_power_of_2(NFQNL_MIN_HASH_SIZE) || + !is_power_of_2(NFQNL_DEFAULT_HASH_SIZE) || + !is_power_of_2(NFQNL_MAX_HASH_SIZE) || + NFQNL_MAX_HASH_SIZE > 1U << 31); + + if (!hash_size) + return NFQNL_DEFAULT_HASH_SIZE; + + /* Clamp to valid range before power of two to avoid overflow */ + if (hash_size <=3D NFQNL_MIN_HASH_SIZE) + return NFQNL_MIN_HASH_SIZE; + + if (hash_size >=3D NFQNL_MAX_HASH_SIZE) + return NFQNL_MAX_HASH_SIZE; + + if (!is_power_of_2(hash_size)) + hash_size =3D roundup_pow_of_two(hash_size); + + return hash_size; +} + static inline u_int8_t instance_hashfn(u_int16_t queue_num) { return ((queue_num >> 8) ^ queue_num) % INSTANCE_BUCKETS; @@ -114,13 +153,70 @@ instance_lookup(struct nfnl_queue_net *q, u_int16_t q= ueue_num) return NULL; } =20 +static int +nfqnl_hash_resize(struct nfqnl_instance *inst, u32 hash_size) +{ + struct hlist_head *new_hash, *old_hash; + struct nf_queue_entry *entry; + unsigned int h, hash_mask; + + hash_size =3D nfqnl_normalize_hash_size(hash_size); + if (hash_size =3D=3D inst->queue_hash_size) + return 0; + + /* GFP_ATOMIC required: called under rcu_read_lock in nfqnl_recv_config. + * Using GFP_KERNEL_ACCOUNT would require refactoring lock placement. + */ + new_hash =3D kvmalloc_array(hash_size, sizeof(*new_hash), GFP_ATOMIC); + if (!new_hash) + return -ENOMEM; + + hash_mask =3D hash_size - 1; + + for (h =3D 0; h < hash_size; h++) + INIT_HLIST_HEAD(&new_hash[h]); + + spin_lock_bh(&inst->lock); + + list_for_each_entry(entry, &inst->queue_list, list) { + /* No hlist_del() since old_hash will be freed and we hold lock */ + h =3D nfqnl_packet_hash(entry->id, hash_mask); + hlist_add_head(&entry->hash_node, &new_hash[h]); + } + + old_hash =3D inst->queue_hash; + inst->queue_hash_size =3D hash_size; + inst->queue_hash_mask =3D hash_mask; + inst->queue_hash =3D new_hash; + + spin_unlock_bh(&inst->lock); + + kvfree(old_hash); + + return 0; +} + static struct nfqnl_instance * -instance_create(struct nfnl_queue_net *q, u_int16_t queue_num, u32 portid) +instance_create(struct nfnl_queue_net *q, u_int16_t queue_num, u32 portid, + u32 hash_size) { struct nfqnl_instance *inst; + struct hlist_head *queue_hash; unsigned int h; int err; =20 + hash_size =3D nfqnl_normalize_hash_size(hash_size); + + /* GFP_ATOMIC required: called under rcu_read_lock in nfqnl_recv_config. + * Using GFP_KERNEL_ACCOUNT would require refactoring lock placement. + */ + queue_hash =3D kvmalloc_array(hash_size, sizeof(*queue_hash), GFP_ATOMIC); + if (!queue_hash) + return ERR_PTR(-ENOMEM); + + for (h =3D 0; h < hash_size; h++) + INIT_HLIST_HEAD(&queue_hash[h]); + spin_lock(&q->instances_lock); if (instance_lookup(q, queue_num)) { err =3D -EEXIST; @@ -133,11 +229,14 @@ instance_create(struct nfnl_queue_net *q, u_int16_t q= ueue_num, u32 portid) goto out_unlock; } =20 + inst->queue_hash =3D queue_hash; inst->queue_num =3D queue_num; inst->peer_portid =3D portid; inst->queue_maxlen =3D NFQNL_QMAX_DEFAULT; inst->copy_range =3D NFQNL_MAX_COPY_RANGE; inst->copy_mode =3D NFQNL_COPY_NONE; + inst->queue_hash_size =3D hash_size; + inst->queue_hash_mask =3D hash_size - 1; spin_lock_init(&inst->lock); INIT_LIST_HEAD(&inst->queue_list); =20 @@ -157,6 +256,7 @@ instance_create(struct nfnl_queue_net *q, u_int16_t que= ue_num, u32 portid) kfree(inst); out_unlock: spin_unlock(&q->instances_lock); + kvfree(queue_hash); return ERR_PTR(err); } =20 @@ -172,6 +272,7 @@ instance_destroy_rcu(struct rcu_head *head) rcu_read_lock(); nfqnl_flush(inst, NULL, 0); rcu_read_unlock(); + kvfree(inst->queue_hash); kfree(inst); module_put(THIS_MODULE); } @@ -194,13 +295,17 @@ instance_destroy(struct nfnl_queue_net *q, struct nfq= nl_instance *inst) static inline void __enqueue_entry(struct nfqnl_instance *queue, struct nf_queue_entry *entry) { - list_add_tail(&entry->list, &queue->queue_list); - queue->queue_total++; + unsigned int hash =3D nfqnl_packet_hash(entry->id, queue->queue_hash_mask= ); + + hlist_add_head(&entry->hash_node, &queue->queue_hash[hash]); + list_add_tail(&entry->list, &queue->queue_list); + queue->queue_total++; } =20 static void __dequeue_entry(struct nfqnl_instance *queue, struct nf_queue_entry *entry) { + hlist_del(&entry->hash_node); list_del(&entry->list); queue->queue_total--; } @@ -209,10 +314,11 @@ static struct nf_queue_entry * find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id) { struct nf_queue_entry *entry =3D NULL, *i; + unsigned int hash =3D nfqnl_packet_hash(id, queue->queue_hash_mask); =20 spin_lock_bh(&queue->lock); =20 - list_for_each_entry(i, &queue->queue_list, list) { + hlist_for_each_entry(i, &queue->queue_hash[hash], hash_node) { if (i->id =3D=3D id) { entry =3D i; break; @@ -407,8 +513,7 @@ nfqnl_flush(struct nfqnl_instance *queue, nfqnl_cmpfn c= mpfn, unsigned long data) spin_lock_bh(&queue->lock); list_for_each_entry_safe(entry, next, &queue->queue_list, list) { if (!cmpfn || cmpfn(entry, data)) { - list_del(&entry->list); - queue->queue_total--; + __dequeue_entry(queue, entry); nfqnl_reinject(entry, NF_DROP); } } @@ -1483,6 +1588,7 @@ static const struct nla_policy nfqa_cfg_policy[NFQA_C= FG_MAX+1] =3D { [NFQA_CFG_QUEUE_MAXLEN] =3D { .type =3D NLA_U32 }, [NFQA_CFG_MASK] =3D { .type =3D NLA_U32 }, [NFQA_CFG_FLAGS] =3D { .type =3D NLA_U32 }, + [NFQA_CFG_HASH_SIZE] =3D { .type =3D NLA_U32 }, }; =20 static const struct nf_queue_handler nfqh =3D { @@ -1495,11 +1601,15 @@ static int nfqnl_recv_config(struct sk_buff *skb, c= onst struct nfnl_info *info, { struct nfnl_queue_net *q =3D nfnl_queue_pernet(info->net); u_int16_t queue_num =3D ntohs(info->nfmsg->res_id); + u32 hash_size =3D 0; struct nfqnl_msg_config_cmd *cmd =3D NULL; struct nfqnl_instance *queue; __u32 flags =3D 0, mask =3D 0; int ret =3D 0; =20 + if (nfqa[NFQA_CFG_HASH_SIZE]) + hash_size =3D ntohl(nla_get_be32(nfqa[NFQA_CFG_HASH_SIZE])); + if (nfqa[NFQA_CFG_CMD]) { cmd =3D nla_data(nfqa[NFQA_CFG_CMD]); =20 @@ -1559,11 +1669,12 @@ static int nfqnl_recv_config(struct sk_buff *skb, c= onst struct nfnl_info *info, goto err_out_unlock; } queue =3D instance_create(q, queue_num, - NETLINK_CB(skb).portid); + NETLINK_CB(skb).portid, hash_size); if (IS_ERR(queue)) { ret =3D PTR_ERR(queue); goto err_out_unlock; } + hash_size =3D 0; /* avoid resize later in this function */ break; case NFQNL_CFG_CMD_UNBIND: if (!queue) { @@ -1586,6 +1697,12 @@ static int nfqnl_recv_config(struct sk_buff *skb, co= nst struct nfnl_info *info, goto err_out_unlock; } =20 + if (hash_size > 0) { + ret =3D nfqnl_hash_resize(queue, hash_size); + if (ret) + goto err_out_unlock; + } + if (nfqa[NFQA_CFG_PARAMS]) { struct nfqnl_msg_config_params *params =3D nla_data(nfqa[NFQA_CFG_PARAMS]); --=20 2.39.5 (Apple Git-154)