From nobody Sun Feb  8 00:11:41 2026
Received: from mail-pf1-f178.google.com (mail-pf1-f178.google.com
 [209.85.210.178])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3C63033E372
	for <linux-kernel@vger.kernel.org>; Wed, 12 Nov 2025 16:03:43 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.210.178
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1762963425; cv=none;
 b=ELjZ1N2aEaTvj0DiehOzgq8kar73QRqBkgD3xPU/NraUC9y9pRAb22Pyc9vV11vgs8N3QJympXLTW5tSQe5sZpNu5TiVTt9vNnRIBD1vrd/n9AmwocRdXmCOqD7ZjmifthZXg0t3iqLIOQP67wv9Gbg81Xer34JBkU8oICT0kPQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1762963425; c=relaxed/simple;
	bh=uB1oQtOQ15BLSaUWuhZgZHXrjItQNcG9k2ZhCYTfeUY=;
	h=From:To:Cc:Subject:Date:Message-Id:MIME-Version;
 b=SHQS9G26mAScg9gom1dpQt/M1dToQpvqjXl+k0+0s8LBkIqOe76Uc8Z8RoN0X7ptao+uy3qsFgzA3hhAi6sCF6zw41+fdnMefwstpyjvD+flFfdquaUeiEdI8eOIyiBOxoRiTgB7aKiwpd20hbFaIIgQUXK2vZ5LRpNiJtt3EAk=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=kVHOaOUq; arc=none smtp.client-ip=209.85.210.178
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="kVHOaOUq"
Received: by mail-pf1-f178.google.com with SMTP id
 d2e1a72fcca58-7b22ffa2a88so751890b3a.1
        for <linux-kernel@vger.kernel.org>;
 Wed, 12 Nov 2025 08:03:43 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1762963422; x=1763568222;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:message-id:date:subject:cc
         :to:from:from:to:cc:subject:date:message-id:reply-to;
        bh=3aer1zYnzBnP7WlGWfehdI1nNpZrBenxdLy983KU5T8=;
        b=kVHOaOUqPTflA722QcVJq1MdEaVZodF21+Tr4Kj1wMss0V8aKT/EQ3Y2u9bBfhlbfr
         QX1IxDvbDJlQPfIZSi5lwOTIA8kKu5mP58SI9zWqGK5ACSL09Nbdmz4AfgwWhg0Zg+e6
         gQ8qtc9uXYPZw9gfc/qZQtrPEl/QisZhkYPfkDvdhiChhpvtdiT3J9s/JsCav3iFmytl
         E3hpEXF5rdW73CyGEr1//0cPXJPM+PjPYeqgDVGA9VCmGcNhXM0KUqz5QcvGBlSjI7zv
         uksI+Dm4Ciq90IPxHz02QLE9lzV/yz6t8p/4XvnfdCeWSIXN1Gx/FpUN8CRw22Pms/VQ
         ZhwA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1762963422; x=1763568222;
        h=content-transfer-encoding:mime-version:message-id:date:subject:cc
         :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date
         :message-id:reply-to;
        bh=3aer1zYnzBnP7WlGWfehdI1nNpZrBenxdLy983KU5T8=;
        b=EUTFgvGuTg3oUK1Vp8pL7tKXQ3gVyTPSQMzZs3Rmg4QgVRGdDlWjrn5YTRbb3Ar6MY
         84j7ULkx+qxRZpPqkcHOrflTBYj7OMZviPwhJgmJXEfvLLtcmB1XXT44QOw1pPaLRDkt
         dzzJvm6MBtAR/tUF19uGfFC97HluOSCTqgufzHp9142QYdrGmVNSw/yjkwwlUsiVxg1f
         +wMkzPgvKwUC2TfwR4iB8kGpf9zZ8jyuibuZaNgj4CMkA9pnjlaECnk7sYDSRYRoc2Cn
         beyyJKKtzXi2cUARX/sX/lWD+MomvAaH3ROtU3GL4Tvhfu0ljAHbVTJymWW3iaxsSvoS
         J+VA==
X-Forwarded-Encrypted: i=1;
 AJvYcCXd3AXljYWWfV1OTIXRhLWnfDaT7g7/HhO7I/3X0zvbL47FmaphoCHYxAxY3jGHdkqUkR/MLmgtwWTz9DU=@vger.kernel.org
X-Gm-Message-State: AOJu0Yy2XIXv9JnyJkESKSI/k25TnexbKXG8m/BITCHwcN9+hBXkOZuv
	Dp0U44sITTiqDkj02ZzGpvxZc7lEwTuVjD+1Z0nu5k2PLHzDR17rykm0fhUEIZ35
X-Gm-Gg: ASbGncvIywzE/672qG8usx51obbQDwaGKBEyBumPKASx3jEJTfFPKMdWUGngc/lyfnT
	NxelHPlMSNfvjdByptbu8kXt2Xm1eRBssy/ofgz2RVZCcK4AciKkaBGAgbYQ+wOCyLGhXgkU2wY
	BJXjsx10CqqpQ1nCSYsLHFmicHIj7xI3i7OSl7IWhsMpUGvOmMj0c6mEUTNGPVYyrObUFAvdrvL
	lMiE9Cx33OyyILCEEWPy9T7C8RxEzLYahaXvP5y662jkW1SqdppNE6ejo+DS7jyrZJlxtF7sDDD
	1XIv6jZRYEOeYC5LQFUtt0AAoUT6twTwHZeAlYP9hQ9wFtEFHXWz7McSKnB5jOgixlJbYoEDZ0W
	xcZ40xrrZ0r4X2dmGB3yddJPtAE23/RT2NT6BhQBYG+/yQHKKpNgAVP1V58mxaqkpuhlmhe1ARP
	RlpfgqP+Snj0+enItapbRFE86PgUOEmL1PPS6woGfqwA==
X-Google-Smtp-Source: 
 AGHT+IHBAoSxsZdpC3n/QTGqyr3DK1eINEAeZtHdkSCkrpX1RfaxlZHuNTnv+5sd0BxGuaY4f6LQwg==
X-Received: by 2002:a05:6a21:3394:b0:354:e52e:135a with SMTP id
 adf61e73a8af0-3590939889amr4481664637.1.1762963422093;
        Wed, 12 Nov 2025 08:03:42 -0800 (PST)
Received: from mr55p01nt-relayp04.apple.com ([216.157.103.144])
        by smtp.gmail.com with ESMTPSA id
 d2e1a72fcca58-7b8a7d9b543sm509424b3a.53.2025.11.12.08.03.39
        (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256);
        Wed, 12 Nov 2025 08:03:41 -0800 (PST)
From: Scott Mitchell <scott.k.mitch1@gmail.com>
X-Google-Original-From: Scott Mitchell <scott_mitchell@apple.com>
To: pablo@netfilter.org
Cc: kadlec@netfilter.org,
	fw@strlen.de,
	phil@nwl.cc,
	davem@davemloft.net,
	edumazet@google.com,
	kuba@kernel.org,
	pabeni@redhat.com,
	horms@kernel.org,
	netfilter-devel@vger.kernel.org,
	coreteam@netfilter.org,
	netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Scott Mitchell <scott_mitchell@apple.com>
Subject: [PATCH] netfilter: nfnetlink_queue: optimize verdict lookup with hash
 table
Date: Wed, 12 Nov 2025 08:03:33 -0800
Message-Id: <20251112160333.30883-1-scott_mitchell@apple.com>
X-Mailer: git-send-email 2.39.5 (Apple Git-154)
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

The current implementation uses a linear list to find queued packets by
ID when processing verdicts from userspace. With large queue depths and
out-of-order verdicting, this O(n) lookup becomes a significant
bottleneck, causing userspace verdict processing to dominate CPU time.

Replace the linear search with a hash table for O(1) average-case
packet lookup by ID. The hash table size is configurable via the new
NFQA_CFG_HASH_SIZE netlink attribute (default 1024 buckets, matching
NFQNL_QMAX_DEFAULT; max 131072). The size is normalized to a power of
two to enable efficient bitwise masking instead of modulo operations.
Unpatched kernels silently ignore the new attribute, maintaining
backward compatibility.

The existing list data structure is retained for operations requiring
linear iteration (e.g. flush, device down events). Hot fields
(queue_hash_mask, queue_hash pointer) are placed in the same cache line
as the spinlock and packet counters for optimal memory access patterns.

Signed-off-by: Scott Mitchell <scott_mitchell@apple.com>
---
 include/net/netfilter/nf_queue.h              |   1 +
 .../uapi/linux/netfilter/nfnetlink_queue.h    |   1 +
 net/netfilter/nfnetlink_queue.c               | 137 +++++++++++++++++-
 3 files changed, 131 insertions(+), 8 deletions(-)

diff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_qu=
eue.h
index 4aeffddb7586..3d0def310523 100644
--- a/include/net/netfilter/nf_queue.h
+++ b/include/net/netfilter/nf_queue.h
@@ -11,6 +11,7 @@
 /* Each queued (to userspace) skbuff has one of these. */
 struct nf_queue_entry {
 	struct list_head	list;
+	struct hlist_node	hash_node;
 	struct sk_buff		*skb;
 	unsigned int		id;
 	unsigned int		hook_index;	/* index in hook_entries->hook[] */
diff --git a/include/uapi/linux/netfilter/nfnetlink_queue.h b/include/uapi/=
linux/netfilter/nfnetlink_queue.h
index efcb7c044a74..bc296a17e5aa 100644
--- a/include/uapi/linux/netfilter/nfnetlink_queue.h
+++ b/include/uapi/linux/netfilter/nfnetlink_queue.h
@@ -107,6 +107,7 @@ enum nfqnl_attr_config {
 	NFQA_CFG_QUEUE_MAXLEN,		/* __u32 */
 	NFQA_CFG_MASK,			/* identify which flags to change */
 	NFQA_CFG_FLAGS,			/* value of these flags (__u32) */
+	NFQA_CFG_HASH_SIZE,		/* __u32 hash table size (rounded to power of 2) */
 	__NFQA_CFG_MAX
 };
 #define NFQA_CFG_MAX (__NFQA_CFG_MAX-1)
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queu=
e.c
index 8b7b39d8a109..a344c987c33b 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -46,7 +46,10 @@
 #include <net/netfilter/nf_conntrack.h>
 #endif
=20
-#define NFQNL_QMAX_DEFAULT 1024
+#define NFQNL_QMAX_DEFAULT      1024
+#define NFQNL_MIN_HASH_SIZE     16
+#define NFQNL_DEFAULT_HASH_SIZE 1024
+#define NFQNL_MAX_HASH_SIZE     131072
=20
 /* We're using struct nlattr which has 16bit nla_len. Note that nla_len
  * includes the header length. Thus, the maximum packet length that we
@@ -65,6 +68,7 @@ struct nfqnl_instance {
 	unsigned int copy_range;
 	unsigned int queue_dropped;
 	unsigned int queue_user_dropped;
+	unsigned int queue_hash_size;
=20
=20
 	u_int16_t queue_num;			/* number of this queue */
@@ -77,6 +81,8 @@ struct nfqnl_instance {
 	spinlock_t	lock	____cacheline_aligned_in_smp;
 	unsigned int	queue_total;
 	unsigned int	id_sequence;		/* 'sequence' of pkt ids */
+	unsigned int	queue_hash_mask;
+	struct hlist_head *queue_hash;
 	struct list_head queue_list;		/* packets in queue */
 };
=20
@@ -95,6 +101,39 @@ static struct nfnl_queue_net *nfnl_queue_pernet(struct =
net *net)
 	return net_generic(net, nfnl_queue_net_id);
 }
=20
+static inline unsigned int
+nfqnl_packet_hash(u32 id, unsigned int mask)
+{
+	return hash_32(id, 32) & mask;
+}
+
+static inline u32
+nfqnl_normalize_hash_size(u32 hash_size)
+{
+	/* Must be power of two for queue_hash_mask to work correctly.
+	 * Avoid overflow of is_power_of_2 by bounding NFQNL_MAX_HASH_SIZE.
+	 */
+	BUILD_BUG_ON(!is_power_of_2(NFQNL_MIN_HASH_SIZE) ||
+		     !is_power_of_2(NFQNL_DEFAULT_HASH_SIZE) ||
+		     !is_power_of_2(NFQNL_MAX_HASH_SIZE) ||
+		     NFQNL_MAX_HASH_SIZE > 1U << 31);
+
+	if (!hash_size)
+		return NFQNL_DEFAULT_HASH_SIZE;
+
+	/* Clamp to valid range before power of two to avoid overflow */
+	if (hash_size <=3D NFQNL_MIN_HASH_SIZE)
+		return NFQNL_MIN_HASH_SIZE;
+
+	if (hash_size >=3D NFQNL_MAX_HASH_SIZE)
+		return NFQNL_MAX_HASH_SIZE;
+
+	if (!is_power_of_2(hash_size))
+		hash_size =3D roundup_pow_of_two(hash_size);
+
+	return hash_size;
+}
+
 static inline u_int8_t instance_hashfn(u_int16_t queue_num)
 {
 	return ((queue_num >> 8) ^ queue_num) % INSTANCE_BUCKETS;
@@ -114,13 +153,63 @@ instance_lookup(struct nfnl_queue_net *q, u_int16_t q=
ueue_num)
 	return NULL;
 }
=20
+static int
+nfqnl_hash_resize(struct nfqnl_instance *inst, u32 hash_size)
+{
+	struct hlist_head *new_hash, *old_hash;
+	struct nf_queue_entry *entry;
+	unsigned int h, hash_mask;
+
+	/* lock scope includes kcalloc/kfree to bound memory if concurrent resize=
s.
+	 * lock scope could be reduced to exclude the  kcalloc/kfree at the cost
+	 * of increased code complexity (re-check of hash_size) and relaxed memory
+	 * bounds (concurrent resize may each do allocations). since resize is
+	 * expected to be rare, the broader lock scope is simpler and preferred.
+	 */
+	spin_lock_bh(&inst->lock);
+
+	hash_size =3D nfqnl_normalize_hash_size(hash_size);
+	if (hash_size =3D=3D inst->queue_hash_size)
+		return 0;
+
+	new_hash =3D kcalloc(hash_size, sizeof(*new_hash), GFP_ATOMIC);
+	if (!new_hash)
+		return -ENOMEM;
+
+	hash_mask =3D hash_size - 1;
+
+	for (h =3D 0; h < hash_size; h++)
+		INIT_HLIST_HEAD(&new_hash[h]);
+
+	list_for_each_entry(entry, &inst->queue_list, list) {
+		/* No hlist_del() since old_hash will be freed and we hold lock */
+		h =3D nfqnl_packet_hash(entry->id, hash_mask);
+		hlist_add_head(&entry->hash_node, &new_hash[h]);
+	}
+
+	old_hash =3D inst->queue_hash;
+	inst->queue_hash_size =3D hash_size;
+	inst->queue_hash_mask =3D hash_mask;
+	inst->queue_hash =3D new_hash;
+
+	/* free before unlock. make memory available to concurrent resizes. */
+	kfree(old_hash);
+
+	spin_unlock_bh(&inst->lock);
+
+	return 0;
+}
+
 static struct nfqnl_instance *
-instance_create(struct nfnl_queue_net *q, u_int16_t queue_num, u32 portid)
+instance_create(struct nfnl_queue_net *q, u_int16_t queue_num, u32 portid,
+		u32 hash_size)
 {
 	struct nfqnl_instance *inst;
 	unsigned int h;
 	int err;
=20
+	hash_size =3D nfqnl_normalize_hash_size(hash_size);
+
 	spin_lock(&q->instances_lock);
 	if (instance_lookup(q, queue_num)) {
 		err =3D -EEXIST;
@@ -133,11 +222,24 @@ instance_create(struct nfnl_queue_net *q, u_int16_t q=
ueue_num, u32 portid)
 		goto out_unlock;
 	}
=20
+	inst->queue_hash =3D kcalloc(hash_size, sizeof(*inst->queue_hash),
+				   GFP_ATOMIC);
+	if (!inst->queue_hash) {
+		kfree(inst);
+		err =3D -ENOMEM;
+		goto out_unlock;
+	}
+
+	for (h =3D 0; h < hash_size; h++)
+		INIT_HLIST_HEAD(&inst->queue_hash[h]);
+
 	inst->queue_num =3D queue_num;
 	inst->peer_portid =3D portid;
 	inst->queue_maxlen =3D NFQNL_QMAX_DEFAULT;
 	inst->copy_range =3D NFQNL_MAX_COPY_RANGE;
 	inst->copy_mode =3D NFQNL_COPY_NONE;
+	inst->queue_hash_size =3D hash_size;
+	inst->queue_hash_mask =3D hash_size - 1;
 	spin_lock_init(&inst->lock);
 	INIT_LIST_HEAD(&inst->queue_list);
=20
@@ -154,6 +256,7 @@ instance_create(struct nfnl_queue_net *q, u_int16_t que=
ue_num, u32 portid)
 	return inst;
=20
 out_free:
+	kfree(inst->queue_hash);
 	kfree(inst);
 out_unlock:
 	spin_unlock(&q->instances_lock);
@@ -172,6 +275,7 @@ instance_destroy_rcu(struct rcu_head *head)
 	rcu_read_lock();
 	nfqnl_flush(inst, NULL, 0);
 	rcu_read_unlock();
+	kfree(inst->queue_hash);
 	kfree(inst);
 	module_put(THIS_MODULE);
 }
@@ -194,13 +298,17 @@ instance_destroy(struct nfnl_queue_net *q, struct nfq=
nl_instance *inst)
 static inline void
 __enqueue_entry(struct nfqnl_instance *queue, struct nf_queue_entry *entry)
 {
-       list_add_tail(&entry->list, &queue->queue_list);
-       queue->queue_total++;
+	unsigned int hash =3D nfqnl_packet_hash(entry->id, queue->queue_hash_mask=
);
+
+	hlist_add_head(&entry->hash_node, &queue->queue_hash[hash]);
+	list_add_tail(&entry->list, &queue->queue_list);
+	queue->queue_total++;
 }
=20
 static void
 __dequeue_entry(struct nfqnl_instance *queue, struct nf_queue_entry *entry)
 {
+	hlist_del(&entry->hash_node);
 	list_del(&entry->list);
 	queue->queue_total--;
 }
@@ -209,10 +317,11 @@ static struct nf_queue_entry *
 find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id)
 {
 	struct nf_queue_entry *entry =3D NULL, *i;
+	unsigned int hash =3D nfqnl_packet_hash(id, queue->queue_hash_mask);
=20
 	spin_lock_bh(&queue->lock);
=20
-	list_for_each_entry(i, &queue->queue_list, list) {
+	hlist_for_each_entry(i, &queue->queue_hash[hash], hash_node) {
 		if (i->id =3D=3D id) {
 			entry =3D i;
 			break;
@@ -407,8 +516,7 @@ nfqnl_flush(struct nfqnl_instance *queue, nfqnl_cmpfn c=
mpfn, unsigned long data)
 	spin_lock_bh(&queue->lock);
 	list_for_each_entry_safe(entry, next, &queue->queue_list, list) {
 		if (!cmpfn || cmpfn(entry, data)) {
-			list_del(&entry->list);
-			queue->queue_total--;
+			__dequeue_entry(queue, entry);
 			nfqnl_reinject(entry, NF_DROP);
 		}
 	}
@@ -1483,6 +1591,7 @@ static const struct nla_policy nfqa_cfg_policy[NFQA_C=
FG_MAX+1] =3D {
 	[NFQA_CFG_QUEUE_MAXLEN]	=3D { .type =3D NLA_U32 },
 	[NFQA_CFG_MASK]		=3D { .type =3D NLA_U32 },
 	[NFQA_CFG_FLAGS]	=3D { .type =3D NLA_U32 },
+	[NFQA_CFG_HASH_SIZE]    =3D { .type =3D NLA_U32 },
 };
=20
 static const struct nf_queue_handler nfqh =3D {
@@ -1495,11 +1604,16 @@ static int nfqnl_recv_config(struct sk_buff *skb, c=
onst struct nfnl_info *info,
 {
 	struct nfnl_queue_net *q =3D nfnl_queue_pernet(info->net);
 	u_int16_t queue_num =3D ntohs(info->nfmsg->res_id);
+	u32 hash_size =3D 0;
 	struct nfqnl_msg_config_cmd *cmd =3D NULL;
 	struct nfqnl_instance *queue;
 	__u32 flags =3D 0, mask =3D 0;
 	int ret =3D 0;
=20
+	if (nfqa[NFQA_CFG_HASH_SIZE]) {
+		hash_size =3D ntohl(nla_get_be32(nfqa[NFQA_CFG_HASH_SIZE]));
+	}
+
 	if (nfqa[NFQA_CFG_CMD]) {
 		cmd =3D nla_data(nfqa[NFQA_CFG_CMD]);
=20
@@ -1559,11 +1673,12 @@ static int nfqnl_recv_config(struct sk_buff *skb, c=
onst struct nfnl_info *info,
 				goto err_out_unlock;
 			}
 			queue =3D instance_create(q, queue_num,
-						NETLINK_CB(skb).portid);
+						NETLINK_CB(skb).portid, hash_size);
 			if (IS_ERR(queue)) {
 				ret =3D PTR_ERR(queue);
 				goto err_out_unlock;
 			}
+			hash_size =3D 0; /* avoid resize later in this function */
 			break;
 		case NFQNL_CFG_CMD_UNBIND:
 			if (!queue) {
@@ -1586,6 +1701,12 @@ static int nfqnl_recv_config(struct sk_buff *skb, co=
nst struct nfnl_info *info,
 		goto err_out_unlock;
 	}
=20
+	if (hash_size > 0) {
+		ret =3D nfqnl_hash_resize(queue, hash_size);
+		if (ret)
+			goto err_out_unlock;
+	}
+
 	if (nfqa[NFQA_CFG_PARAMS]) {
 		struct nfqnl_msg_config_params *params =3D
 			nla_data(nfqa[NFQA_CFG_PARAMS]);
--=20
2.39.5 (Apple Git-154)