From nobody Wed Nov 27 07:31:20 2024 Received: from out30-101.freemail.mail.aliyun.com (out30-101.freemail.mail.aliyun.com [115.124.30.101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D5F521AAD7; Sat, 12 Oct 2024 01:29:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.101 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728696571; cv=none; b=OGjekoz63HkXQ0EHmpu6b2pRVeynmUK6onPh1lfyKVQ9We4Mq/PpXL5Brot9wzEDsjdh8dY4EoSTSHNAOzI7hAVuMPWiRlJFyDrBips8rDjN+9kkUWTY2fFhJoCyT1y/6VDdR0yeQNfqFjv6UkJjPGO3Hy2IXZW5+KiSK7lcyYE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728696571; c=relaxed/simple; bh=K0NAOUy00cWZFlKllFchokEQlvANsED9WkIjN2WI95w=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=iu2LOQ1rMjtTDrxO/X4GiQnYWWjZ6kelvFxuaHYrwpfAGZyNAqjAbuuS+4DMxDiemJnSXIra3TAxYr/h0UjySaKil4+8OjLPLdVvFZGJl2k/+bt1PfqhKTermTnOZTEae2N6S7f3ug8YqoHS5RcMB1pwrLIxH8rDkRZePEaymik= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=yFmhlmhJ; arc=none smtp.client-ip=115.124.30.101 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="yFmhlmhJ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1728696560; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=NgOyzLipfGSnPRJlfW95l6Q3mNHir115KUVxS/JtI1Y=; b=yFmhlmhJPvZ/YpLjstsePFt4S6Pv2288IPZyv+k+M4dXzooY6YdnVKru24b//hJya7f3YUxSpwiDBIuSwfVOIVL5UClJtBS+l4nGoMt49u56njxRHaGrT0vBsWDMfISeZXg76poqyvCB+lyZlI6NWAq5wO+mZpqTTygi0tlfNsg= Received: from localhost(mailfrom:lulie@linux.alibaba.com fp:SMTPD_---0WGtAIFK_1728696559 cluster:ay36) by smtp.aliyun-inc.com; Sat, 12 Oct 2024 09:29:20 +0800 From: Philo Lu To: netdev@vger.kernel.org Cc: willemdebruijn.kernel@gmail.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, dsahern@kernel.org, antony.antony@secunet.com, steffen.klassert@secunet.com, linux-kernel@vger.kernel.org, dust.li@linux.alibaba.com, jakub@cloudflare.com, fred.cc@alibaba-inc.com, yubing.qiuyubing@alibaba-inc.com Subject: [PATCH v4 net-next 1/3] net/udp: Add a new struct for hash2 slot Date: Sat, 12 Oct 2024 09:29:16 +0800 Message-Id: <20241012012918.70888-2-lulie@linux.alibaba.com> X-Mailer: git-send-email 2.32.0.3.g01195cf9f In-Reply-To: <20241012012918.70888-1-lulie@linux.alibaba.com> References: <20241012012918.70888-1-lulie@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Preparing for udp 4-tuple hash (uhash4 for short). To implement uhash4 without cache line missing when lookup, hslot2 is used to record the number of hashed sockets in hslot4. Thus adding a new struct udp_hslot_main with field hash4_cnt, which is used by hash2. The new struct is used to avoid doubling the size of udp_hslot. Before uhash4 lookup, firstly checking hash4_cnt to see if there are hashed sks in hslot4. Because hslot2 is always used in lookup, there is no cache line miss. Related helpers are updated, and use the helpers as possible. uhash4 is implemented in following patches. Signed-off-by: Philo Lu --- include/net/udp.h | 27 +++++++++++++++++++++++---- net/ipv4/udp.c | 44 +++++++++++++++++++++++--------------------- net/ipv6/udp.c | 15 ++++++--------- 3 files changed, 52 insertions(+), 34 deletions(-) diff --git a/include/net/udp.h b/include/net/udp.h index 61222545ab1c..595364729138 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -50,7 +50,7 @@ struct udp_skb_cb { #define UDP_SKB_CB(__skb) ((struct udp_skb_cb *)((__skb)->cb)) =20 /** - * struct udp_hslot - UDP hash slot + * struct udp_hslot - UDP hash slot used by udp_table.hash * * @head: head of list of sockets * @count: number of sockets in 'head' list @@ -60,7 +60,19 @@ struct udp_hslot { struct hlist_head head; int count; spinlock_t lock; -} __attribute__((aligned(2 * sizeof(long)))); +} __aligned(2 * sizeof(long)); + +/** + * struct udp_hslot_main - UDP hash slot used by udp_table.hash2 + * + * @hslot: basic hash slot + * @hash4_cnt: number of sockets in hslot4 of the same (local port, local = address) + */ +struct udp_hslot_main { + struct udp_hslot hslot; /* must be the first member */ + u32 hash4_cnt; +} __aligned(2 * sizeof(long)); +#define UDP_HSLOT_MAIN(__hslot) ((struct udp_hslot_main *)(__hslot)) =20 /** * struct udp_table - UDP table @@ -72,7 +84,7 @@ struct udp_hslot { */ struct udp_table { struct udp_hslot *hash; - struct udp_hslot *hash2; + struct udp_hslot_main *hash2; unsigned int mask; unsigned int log; }; @@ -84,6 +96,13 @@ static inline struct udp_hslot *udp_hashslot(struct udp_= table *table, { return &table->hash[udp_hashfn(net, num, table->mask)]; } + +static inline struct udp_hslot_main *udp_hashslot2_main(struct udp_table *= table, + unsigned int hash) +{ + return &table->hash2[hash & table->mask]; +} + /* * For secondary hash, net_hash_mix() is performed before calling * udp_hashslot2(), this explains difference with udp_hashslot() @@ -91,7 +110,7 @@ static inline struct udp_hslot *udp_hashslot(struct udp_= table *table, static inline struct udp_hslot *udp_hashslot2(struct udp_table *table, unsigned int hash) { - return &table->hash2[hash & table->mask]; + return &table->hash2[hash & table->mask].hslot; } =20 extern struct proto udp_prot; diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 8accbf4cb295..3a31e7d6d0dd 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -486,13 +486,12 @@ struct sock *__udp4_lib_lookup(const struct net *net,= __be32 saddr, int sdif, struct udp_table *udptable, struct sk_buff *skb) { unsigned short hnum =3D ntohs(dport); - unsigned int hash2, slot2; struct udp_hslot *hslot2; struct sock *result, *sk; + unsigned int hash2; =20 hash2 =3D ipv4_portaddr_hash(net, daddr, hnum); - slot2 =3D hash2 & udptable->mask; - hslot2 =3D &udptable->hash2[slot2]; + hslot2 =3D udp_hashslot2(udptable, hash2); =20 /* Lookup connected or non-wildcard socket */ result =3D udp4_lib_lookup2(net, saddr, sport, @@ -519,8 +518,7 @@ struct sock *__udp4_lib_lookup(const struct net *net, _= _be32 saddr, =20 /* Lookup wildcard sockets */ hash2 =3D ipv4_portaddr_hash(net, htonl(INADDR_ANY), hnum); - slot2 =3D hash2 & udptable->mask; - hslot2 =3D &udptable->hash2[slot2]; + hslot2 =3D udp_hashslot2(udptable, hash2); =20 result =3D udp4_lib_lookup2(net, saddr, sport, htonl(INADDR_ANY), hnum, dif, sdif, @@ -2266,7 +2264,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, = struct sk_buff *skb, udptable->mask; hash2 =3D ipv4_portaddr_hash(net, daddr, hnum) & udptable->mask; start_lookup: - hslot =3D &udptable->hash2[hash2]; + hslot =3D &udptable->hash2[hash2].hslot; offset =3D offsetof(typeof(*sk), __sk_common.skc_portaddr_node); } =20 @@ -2537,14 +2535,13 @@ static struct sock *__udp4_lib_demux_lookup(struct = net *net, struct udp_table *udptable =3D net->ipv4.udp_table; INET_ADDR_COOKIE(acookie, rmt_addr, loc_addr); unsigned short hnum =3D ntohs(loc_port); - unsigned int hash2, slot2; struct udp_hslot *hslot2; + unsigned int hash2; __portpair ports; struct sock *sk; =20 hash2 =3D ipv4_portaddr_hash(net, loc_addr, hnum); - slot2 =3D hash2 & udptable->mask; - hslot2 =3D &udptable->hash2[slot2]; + hslot2 =3D udp_hashslot2(udptable, hash2); ports =3D INET_COMBINED_PORTS(rmt_port, hnum); =20 udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) { @@ -3185,7 +3182,7 @@ static struct sock *bpf_iter_udp_batch(struct seq_fil= e *seq) batch_sks =3D 0; =20 for (; state->bucket <=3D udptable->mask; state->bucket++) { - struct udp_hslot *hslot2 =3D &udptable->hash2[state->bucket]; + struct udp_hslot *hslot2 =3D &udptable->hash2[state->bucket].hslot; =20 if (hlist_empty(&hslot2->head)) continue; @@ -3426,10 +3423,11 @@ __setup("uhash_entries=3D", set_uhash_entries); =20 void __init udp_table_init(struct udp_table *table, const char *name) { - unsigned int i; + unsigned int i, slot_size; =20 + slot_size =3D sizeof(struct udp_hslot) + sizeof(struct udp_hslot_main); table->hash =3D alloc_large_system_hash(name, - 2 * sizeof(struct udp_hslot), + slot_size, uhash_entries, 21, /* one slot per 2 MB */ 0, @@ -3438,16 +3436,17 @@ void __init udp_table_init(struct udp_table *table,= const char *name) UDP_HTABLE_SIZE_MIN, UDP_HTABLE_SIZE_MAX); =20 - table->hash2 =3D table->hash + (table->mask + 1); + table->hash2 =3D (void *)(table->hash + (table->mask + 1)); for (i =3D 0; i <=3D table->mask; i++) { INIT_HLIST_HEAD(&table->hash[i].head); table->hash[i].count =3D 0; spin_lock_init(&table->hash[i].lock); } for (i =3D 0; i <=3D table->mask; i++) { - INIT_HLIST_HEAD(&table->hash2[i].head); - table->hash2[i].count =3D 0; - spin_lock_init(&table->hash2[i].lock); + INIT_HLIST_HEAD(&table->hash2[i].hslot.head); + table->hash2[i].hslot.count =3D 0; + spin_lock_init(&table->hash2[i].hslot.lock); + table->hash2[i].hash4_cnt =3D 0; } } =20 @@ -3474,18 +3473,20 @@ static void __net_init udp_sysctl_init(struct net *= net) static struct udp_table __net_init *udp_pernet_table_alloc(unsigned int ha= sh_entries) { struct udp_table *udptable; + unsigned int slot_size; int i; =20 udptable =3D kmalloc(sizeof(*udptable), GFP_KERNEL); if (!udptable) goto out; =20 - udptable->hash =3D vmalloc_huge(hash_entries * 2 * sizeof(struct udp_hslo= t), + slot_size =3D sizeof(struct udp_hslot) + sizeof(struct udp_hslot_main); + udptable->hash =3D vmalloc_huge(hash_entries * slot_size, GFP_KERNEL_ACCOUNT); if (!udptable->hash) goto free_table; =20 - udptable->hash2 =3D udptable->hash + hash_entries; + udptable->hash2 =3D (void *)(udptable->hash + hash_entries); udptable->mask =3D hash_entries - 1; udptable->log =3D ilog2(hash_entries); =20 @@ -3494,9 +3495,10 @@ static struct udp_table __net_init *udp_pernet_table= _alloc(unsigned int hash_ent udptable->hash[i].count =3D 0; spin_lock_init(&udptable->hash[i].lock); =20 - INIT_HLIST_HEAD(&udptable->hash2[i].head); - udptable->hash2[i].count =3D 0; - spin_lock_init(&udptable->hash2[i].lock); + INIT_HLIST_HEAD(&udptable->hash2[i].hslot.head); + udptable->hash2[i].hslot.count =3D 0; + spin_lock_init(&udptable->hash2[i].hslot.lock); + udptable->hash2[i].hash4_cnt =3D 0; } =20 return udptable; diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 52dfbb2ff1a8..bbf3352213c4 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -224,13 +224,12 @@ struct sock *__udp6_lib_lookup(const struct net *net, struct sk_buff *skb) { unsigned short hnum =3D ntohs(dport); - unsigned int hash2, slot2; struct udp_hslot *hslot2; struct sock *result, *sk; + unsigned int hash2; =20 hash2 =3D ipv6_portaddr_hash(net, daddr, hnum); - slot2 =3D hash2 & udptable->mask; - hslot2 =3D &udptable->hash2[slot2]; + hslot2 =3D udp_hashslot2(udptable, hash2); =20 /* Lookup connected or non-wildcard sockets */ result =3D udp6_lib_lookup2(net, saddr, sport, @@ -257,8 +256,7 @@ struct sock *__udp6_lib_lookup(const struct net *net, =20 /* Lookup wildcard sockets */ hash2 =3D ipv6_portaddr_hash(net, &in6addr_any, hnum); - slot2 =3D hash2 & udptable->mask; - hslot2 =3D &udptable->hash2[slot2]; + hslot2 =3D udp_hashslot2(udptable, hash2); =20 result =3D udp6_lib_lookup2(net, saddr, sport, &in6addr_any, hnum, dif, sdif, @@ -859,7 +857,7 @@ static int __udp6_lib_mcast_deliver(struct net *net, st= ruct sk_buff *skb, udptable->mask; hash2 =3D ipv6_portaddr_hash(net, daddr, hnum) & udptable->mask; start_lookup: - hslot =3D &udptable->hash2[hash2]; + hslot =3D &udptable->hash2[hash2].hslot; offset =3D offsetof(typeof(*sk), __sk_common.skc_portaddr_node); } =20 @@ -1065,14 +1063,13 @@ static struct sock *__udp6_lib_demux_lookup(struct = net *net, { struct udp_table *udptable =3D net->ipv4.udp_table; unsigned short hnum =3D ntohs(loc_port); - unsigned int hash2, slot2; struct udp_hslot *hslot2; + unsigned int hash2; __portpair ports; struct sock *sk; =20 hash2 =3D ipv6_portaddr_hash(net, loc_addr, hnum); - slot2 =3D hash2 & udptable->mask; - hslot2 =3D &udptable->hash2[slot2]; + hslot2 =3D udp_hashslot2(udptable, hash2); ports =3D INET_COMBINED_PORTS(rmt_port, hnum); =20 udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) { --=20 2.32.0.3.g01195cf9f From nobody Wed Nov 27 07:31:20 2024 Received: from out30-110.freemail.mail.aliyun.com (out30-110.freemail.mail.aliyun.com [115.124.30.110]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 41061C2C8; Sat, 12 Oct 2024 01:29:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.110 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728696567; cv=none; b=Rt7Yz+cda1AUilgRJZ9wPG8ZemS/K3QXVO9JaQAb9AwoGsSkEiN9gKA/0u0YWx/YuclE9TmmHN8YS++r+BipUiD6bnswF/CrFg1b8x4nIH3E7bh2jvl+2a9o/n/H1Qa/XKX27J9BtU2OD6SG/l4+xBLWzmFL2IQi/N0befBJBvs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728696567; c=relaxed/simple; bh=vuZPXIzASHqDPRJxBCHYr9B/LsnI12otXIE6AP3hJqs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Ax+tDs1jhAS+i9IcHKne9b4kTHRdi0QS8tpAvvYooNoOz4QR5s9OWig8r2boNSpJOi0h/xrioms0+JuQWu8nKTaipPoX2DjDMbLqmh0rVas67NfNaDxtv+Kuw6zrti8ScBgUzCQvPQl1YGTs3DApGPW4JebHRhFgCiC5kKlcBTk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=xYVwcN2z; arc=none smtp.client-ip=115.124.30.110 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="xYVwcN2z" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1728696561; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=AmsVKVS1jW0/gVNF4UtHwXqXcTznpW+OwZPatF089VE=; b=xYVwcN2zjG1RNgphLmE8AlVHgEUE/8KMDg8nBjt43GrITY84sTkeLRjY96vZKZlAOOqr7ByfI1CFokmjUOBSe/uwyJLUKru/mJvJGoqJjSy3YDC+GS7/QcwzXu5uNUnrlwK6R4fZKsAsx7TrK9zz++yzwRaC+5J+3R/WMq9RHJg= Received: from localhost(mailfrom:lulie@linux.alibaba.com fp:SMTPD_---0WGt8qkv_1728696560 cluster:ay36) by smtp.aliyun-inc.com; Sat, 12 Oct 2024 09:29:21 +0800 From: Philo Lu To: netdev@vger.kernel.org Cc: willemdebruijn.kernel@gmail.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, dsahern@kernel.org, antony.antony@secunet.com, steffen.klassert@secunet.com, linux-kernel@vger.kernel.org, dust.li@linux.alibaba.com, jakub@cloudflare.com, fred.cc@alibaba-inc.com, yubing.qiuyubing@alibaba-inc.com Subject: [PATCH v4 net-next 2/3] net/udp: Add 4-tuple hash list basis Date: Sat, 12 Oct 2024 09:29:17 +0800 Message-Id: <20241012012918.70888-3-lulie@linux.alibaba.com> X-Mailer: git-send-email 2.32.0.3.g01195cf9f In-Reply-To: <20241012012918.70888-1-lulie@linux.alibaba.com> References: <20241012012918.70888-1-lulie@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a new hash list, hash4, in udp table. It will be used to implement 4-tuple hash for connected udp sockets. This patch adds the hlist to table, and implements helpers and the initialization. 4-tuple hash is implemented in the following patch. Signed-off-by: Philo Lu Signed-off-by: Cambda Zhu Signed-off-by: Fred Chen Signed-off-by: Yubing Qiu --- include/linux/udp.h | 7 +++++++ include/net/udp.h | 16 +++++++++++++++- net/ipv4/udp.c | 15 +++++++++++++-- 3 files changed, 35 insertions(+), 3 deletions(-) diff --git a/include/linux/udp.h b/include/linux/udp.h index 3eb3f2b9a2a0..c04808360a05 100644 --- a/include/linux/udp.h +++ b/include/linux/udp.h @@ -56,6 +56,10 @@ struct udp_sock { int pending; /* Any pending frames ? */ __u8 encap_type; /* Is this an Encapsulation socket? */ =20 + /* For UDP 4-tuple hash */ + __u16 udp_lrpa_hash; + struct hlist_node udp_lrpa_node; + /* * Following member retains the information to create a UDP header * when the socket is uncorked. @@ -206,6 +210,9 @@ static inline void udp_allow_gso(struct sock *sk) #define udp_portaddr_for_each_entry_rcu(__sk, list) \ hlist_for_each_entry_rcu(__sk, list, __sk_common.skc_portaddr_node) =20 +#define udp_lrpa_for_each_entry_rcu(__up, list) \ + hlist_for_each_entry_rcu(__up, list, udp_lrpa_node) + #define IS_UDPLITE(__sk) (__sk->sk_protocol =3D=3D IPPROTO_UDPLITE) =20 #endif /* _LINUX_UDP_H */ diff --git a/include/net/udp.h b/include/net/udp.h index 595364729138..80f9622d0db3 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -50,7 +50,7 @@ struct udp_skb_cb { #define UDP_SKB_CB(__skb) ((struct udp_skb_cb *)((__skb)->cb)) =20 /** - * struct udp_hslot - UDP hash slot used by udp_table.hash + * struct udp_hslot - UDP hash slot used by udp_table.hash/hash4 * * @head: head of list of sockets * @count: number of sockets in 'head' list @@ -79,12 +79,15 @@ struct udp_hslot_main { * * @hash: hash table, sockets are hashed on (local port) * @hash2: hash table, sockets are hashed on (local port, local address) + * @hash4: hash table, connected sockets are hashed on + * (local port, local address, remote port, remote address) * @mask: number of slots in hash tables, minus 1 * @log: log2(number of slots in hash table) */ struct udp_table { struct udp_hslot *hash; struct udp_hslot_main *hash2; + struct udp_hslot *hash4; unsigned int mask; unsigned int log; }; @@ -113,6 +116,17 @@ static inline struct udp_hslot *udp_hashslot2(struct u= dp_table *table, return &table->hash2[hash & table->mask].hslot; } =20 +static inline struct udp_hslot *udp_hashslot4(struct udp_table *table, + unsigned int hash) +{ + return &table->hash4[hash & table->mask]; +} + +static inline bool udp_hashed4(const struct sock *sk) +{ + return !hlist_unhashed(&udp_sk(sk)->udp_lrpa_node); +} + extern struct proto udp_prot; =20 extern atomic_long_t udp_memory_allocated; diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 3a31e7d6d0dd..1498ccb79c58 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -3425,7 +3425,7 @@ void __init udp_table_init(struct udp_table *table, c= onst char *name) { unsigned int i, slot_size; =20 - slot_size =3D sizeof(struct udp_hslot) + sizeof(struct udp_hslot_main); + slot_size =3D 2 * sizeof(struct udp_hslot) + sizeof(struct udp_hslot_main= ); table->hash =3D alloc_large_system_hash(name, slot_size, uhash_entries, @@ -3437,6 +3437,7 @@ void __init udp_table_init(struct udp_table *table, c= onst char *name) UDP_HTABLE_SIZE_MAX); =20 table->hash2 =3D (void *)(table->hash + (table->mask + 1)); + table->hash4 =3D (void *)(table->hash2 + (table->mask + 1)); for (i =3D 0; i <=3D table->mask; i++) { INIT_HLIST_HEAD(&table->hash[i].head); table->hash[i].count =3D 0; @@ -3448,6 +3449,11 @@ void __init udp_table_init(struct udp_table *table, = const char *name) spin_lock_init(&table->hash2[i].hslot.lock); table->hash2[i].hash4_cnt =3D 0; } + for (i =3D 0; i <=3D table->mask; i++) { + INIT_HLIST_HEAD(&table->hash4[i].head); + table->hash4[i].count =3D 0; + spin_lock_init(&table->hash4[i].lock); + } } =20 u32 udp_flow_hashrnd(void) @@ -3480,13 +3486,14 @@ static struct udp_table __net_init *udp_pernet_tabl= e_alloc(unsigned int hash_ent if (!udptable) goto out; =20 - slot_size =3D sizeof(struct udp_hslot) + sizeof(struct udp_hslot_main); + slot_size =3D 2 * sizeof(struct udp_hslot) + sizeof(struct udp_hslot_main= ); udptable->hash =3D vmalloc_huge(hash_entries * slot_size, GFP_KERNEL_ACCOUNT); if (!udptable->hash) goto free_table; =20 udptable->hash2 =3D (void *)(udptable->hash + hash_entries); + udptable->hash4 =3D (void *)(udptable->hash2 + hash_entries); udptable->mask =3D hash_entries - 1; udptable->log =3D ilog2(hash_entries); =20 @@ -3499,6 +3506,10 @@ static struct udp_table __net_init *udp_pernet_table= _alloc(unsigned int hash_ent udptable->hash2[i].hslot.count =3D 0; spin_lock_init(&udptable->hash2[i].hslot.lock); udptable->hash2[i].hash4_cnt =3D 0; + + INIT_HLIST_HEAD(&udptable->hash4[i].head); + udptable->hash4[i].count =3D 0; + spin_lock_init(&udptable->hash4[i].lock); } =20 return udptable; --=20 2.32.0.3.g01195cf9f From nobody Wed Nov 27 07:31:20 2024 Received: from out30-110.freemail.mail.aliyun.com (out30-110.freemail.mail.aliyun.com [115.124.30.110]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B0991C13C; Sat, 12 Oct 2024 01:29:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.110 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728696569; cv=none; b=e+jVO1/ZVt1bfxEB8t5yMp7PmR01dZ8c3gIiVM8+4T/yhACFbT38YQ1/a7j1lEn9Z0zEdxW7npJbIS0H/iCNXPtagCK+GtlPVyIQcDPFza+qCAvYqKGEGNOVh3s/cEV7nIXmCwYdXRuzgVfnlagaaB6JfGWfFvlB8M4T+RzoTAc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728696569; c=relaxed/simple; bh=UyRRmdSiBSiSEtr7Ig4aUmF7hG5TYY9ph5NJyArBOaw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=qzvu/xMukTRZW7dkuTG7QluzEYlok51j4zTx+kVorpA8w6ydxSQxI9KOFl16MK4q/caB/uOvzW1KAwuiIftDFRcAUnp9PX3dHWz1uBVr4PTPRzvlK0wSm9k5ybVYxm0UPr9ZZ+blsTz1X+5VITbKjUOqZ01s2mEKNyeBCVnvcD4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=aT1WEZZI; arc=none smtp.client-ip=115.124.30.110 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="aT1WEZZI" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1728696563; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=y1N2AVpnjdi1B9UGRgfVFOIURIhOEBNxoghNH/yKJVk=; b=aT1WEZZIgzCRLMdYtCd6tSTXZ3WtFd1gHw2jtlt2cVoq152DJrnyZ54bvrX8TEQBbNPFMbpFemIqDWrIPLoCPp40fbFN3sz36kn3POctOnHXiktBQWG3lBCtoDsON/SltY+i/cl5lpPoF2kmXqSn3iCePYDa23PIw6E1hjzgKhg= Received: from localhost(mailfrom:lulie@linux.alibaba.com fp:SMTPD_---0WGtAIG6_1728696561 cluster:ay36) by smtp.aliyun-inc.com; Sat, 12 Oct 2024 09:29:22 +0800 From: Philo Lu To: netdev@vger.kernel.org Cc: willemdebruijn.kernel@gmail.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, dsahern@kernel.org, antony.antony@secunet.com, steffen.klassert@secunet.com, linux-kernel@vger.kernel.org, dust.li@linux.alibaba.com, jakub@cloudflare.com, fred.cc@alibaba-inc.com, yubing.qiuyubing@alibaba-inc.com Subject: [PATCH v4 net-next 3/3] ipv4/udp: Add 4-tuple hash for connected socket Date: Sat, 12 Oct 2024 09:29:18 +0800 Message-Id: <20241012012918.70888-4-lulie@linux.alibaba.com> X-Mailer: git-send-email 2.32.0.3.g01195cf9f In-Reply-To: <20241012012918.70888-1-lulie@linux.alibaba.com> References: <20241012012918.70888-1-lulie@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently, the udp_table has two hash table, the port hash and portaddr hash. Usually for UDP servers, all sockets have the same local port and addr, so they are all on the same hash slot within a reuseport group. In some applications, UDP servers use connect() to manage clients. In particular, when firstly receiving from an unseen 4 tuple, a new socket is created and connect()ed to the remote addr:port, and then the fd is used exclusively by the client. Once there are connected sks in a reuseport group, udp has to score all sks in the same hash2 slot to find the best match. This could be inefficient with a large number of connections, resulting in high softirq overhead. To solve the problem, this patch implement 4-tuple hash for connected udp sockets. During connect(), hash4 slot is updated, as well as a corresponding counter, hash4_cnt, in hslot2. In __udp4_lib_lookup(), hslot4 will be searched firstly if the counter is non-zero. Otherwise, hslot2 is used like before. Note that only connected sockets enter this hash4 path, while un-connected ones are not affected. Signed-off-by: Philo Lu Signed-off-by: Cambda Zhu Signed-off-by: Fred Chen Signed-off-by: Yubing Qiu --- include/net/udp.h | 3 +- net/ipv4/udp.c | 142 ++++++++++++++++++++++++++++++++++++++++++++-- net/ipv6/udp.c | 2 +- 3 files changed, 141 insertions(+), 6 deletions(-) diff --git a/include/net/udp.h b/include/net/udp.h index 80f9622d0db3..5633b51cf8d4 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -226,7 +226,7 @@ static inline int udp_lib_hash(struct sock *sk) } =20 void udp_lib_unhash(struct sock *sk); -void udp_lib_rehash(struct sock *sk, u16 new_hash); +void udp_lib_rehash(struct sock *sk, u16 new_hash, u16 new_hash4); =20 static inline void udp_lib_close(struct sock *sk, long timeout) { @@ -319,6 +319,7 @@ int udp_rcv(struct sk_buff *skb); int udp_ioctl(struct sock *sk, int cmd, int *karg); int udp_init_sock(struct sock *sk); int udp_pre_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len); +int udp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len); int __udp_disconnect(struct sock *sk, int flags); int udp_disconnect(struct sock *sk, int flags); __poll_t udp_poll(struct file *file, struct socket *sock, poll_table *wait= ); diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 1498ccb79c58..d7e3866617e0 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -478,6 +478,27 @@ static struct sock *udp4_lib_lookup2(const struct net = *net, return result; } =20 +static struct sock *udp4_lib_lookup4(const struct net *net, + __be32 saddr, __be16 sport, + __be32 daddr, unsigned int hnum, + int dif, int sdif, + struct udp_table *udptable) +{ + unsigned int hash4 =3D udp_ehashfn(net, daddr, hnum, saddr, sport); + const __portpair ports =3D INET_COMBINED_PORTS(sport, hnum); + struct udp_hslot *hslot4 =3D udp_hashslot4(udptable, hash4); + struct udp_sock *up; + struct sock *sk; + + INET_ADDR_COOKIE(acookie, saddr, daddr); + udp_lrpa_for_each_entry_rcu(up, &hslot4->head) { + sk =3D (struct sock *)up; + if (inet_match(net, sk, acookie, ports, dif, sdif)) + return sk; + } + return NULL; +} + /* UDP is nearly always wildcards out the wazoo, it makes no sense to try * harder than this. -DaveM */ @@ -493,6 +514,12 @@ struct sock *__udp4_lib_lookup(const struct net *net, = __be32 saddr, hash2 =3D ipv4_portaddr_hash(net, daddr, hnum); hslot2 =3D udp_hashslot2(udptable, hash2); =20 + if (UDP_HSLOT_MAIN(hslot2)->hash4_cnt) { + result =3D udp4_lib_lookup4(net, saddr, sport, daddr, hnum, dif, sdif, u= dptable); + if (result) /* udp4_lib_lookup4 return sk or NULL */ + return result; + } + /* Lookup connected or non-wildcard socket */ result =3D udp4_lib_lookup2(net, saddr, sport, daddr, hnum, dif, sdif, @@ -1931,6 +1958,85 @@ int udp_pre_connect(struct sock *sk, struct sockaddr= *uaddr, int addr_len) } EXPORT_SYMBOL(udp_pre_connect); =20 +/* In hash4, rehash can also happen in connect(), where hash4_cnt keeps un= changed. */ +static void udp4_rehash4(struct udp_table *udptable, struct sock *sk, u16 = newhash4) +{ + struct udp_hslot *hslot4, *nhslot4; + + hslot4 =3D udp_hashslot4(udptable, udp_sk(sk)->udp_lrpa_hash); + nhslot4 =3D udp_hashslot4(udptable, newhash4); + udp_sk(sk)->udp_lrpa_hash =3D newhash4; + + if (hslot4 !=3D nhslot4) { + spin_lock_bh(&hslot4->lock); + hlist_del_init_rcu(&udp_sk(sk)->udp_lrpa_node); + hslot4->count--; + spin_unlock_bh(&hslot4->lock); + + synchronize_rcu(); + + spin_lock_bh(&nhslot4->lock); + hlist_add_head_rcu(&udp_sk(sk)->udp_lrpa_node, &nhslot4->head); + nhslot4->count++; + spin_unlock_bh(&nhslot4->lock); + } +} + +/* call with sock lock */ +static void udp4_hash4(struct sock *sk) +{ + struct udp_hslot *hslot, *hslot4; + struct udp_hslot_main *hslotm2; + struct net *net =3D sock_net(sk); + struct udp_table *udptable; + unsigned int hash; + + if (sk_unhashed(sk) || inet_sk(sk)->inet_rcv_saddr =3D=3D htonl(INADDR_AN= Y)) + return; + + hash =3D udp_ehashfn(net, inet_sk(sk)->inet_rcv_saddr, inet_sk(sk)->inet_= num, + inet_sk(sk)->inet_daddr, inet_sk(sk)->inet_dport); + + udptable =3D net->ipv4.udp_table; + if (udp_hashed4(sk)) { + udp4_rehash4(udptable, sk, hash); + return; + } + + hslot =3D udp_hashslot(udptable, net, udp_sk(sk)->udp_port_hash); + hslotm2 =3D udp_hashslot2_main(udptable, udp_sk(sk)->udp_portaddr_hash); + hslot4 =3D udp_hashslot4(udptable, hash); + udp_sk(sk)->udp_lrpa_hash =3D hash; + + spin_lock_bh(&hslot->lock); + if (rcu_access_pointer(sk->sk_reuseport_cb)) + reuseport_detach_sock(sk); + + spin_lock(&hslot4->lock); + hlist_add_head_rcu(&udp_sk(sk)->udp_lrpa_node, &hslot4->head); + hslot4->count++; + spin_unlock(&hslot4->lock); + + spin_lock(&hslotm2->hslot.lock); + hslotm2->hash4_cnt++; + spin_unlock(&hslotm2->hslot.lock); + + spin_unlock_bh(&hslot->lock); +} + +int udp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len) +{ + int res; + + lock_sock(sk); + res =3D __ip4_datagram_connect(sk, uaddr, addr_len); + if (!res) + udp4_hash4(sk); + release_sock(sk); + return res; +} +EXPORT_SYMBOL(udp_connect); + int __udp_disconnect(struct sock *sk, int flags) { struct inet_sock *inet =3D inet_sk(sk); @@ -1972,7 +2078,7 @@ void udp_lib_unhash(struct sock *sk) { if (sk_hashed(sk)) { struct udp_table *udptable =3D udp_get_table_prot(sk); - struct udp_hslot *hslot, *hslot2; + struct udp_hslot *hslot, *hslot2, *hslot4; =20 hslot =3D udp_hashslot(udptable, sock_net(sk), udp_sk(sk)->udp_port_hash); @@ -1990,6 +2096,18 @@ void udp_lib_unhash(struct sock *sk) hlist_del_init_rcu(&udp_sk(sk)->udp_portaddr_node); hslot2->count--; spin_unlock(&hslot2->lock); + + if (udp_hashed4(sk)) { + hslot4 =3D udp_hashslot4(udptable, udp_sk(sk)->udp_lrpa_hash); + spin_lock(&hslot4->lock); + hlist_del_init_rcu(&udp_sk(sk)->udp_lrpa_node); + hslot4->count--; + spin_unlock(&hslot4->lock); + + spin_lock(&hslot2->lock); + UDP_HSLOT_MAIN(hslot2)->hash4_cnt--; + spin_unlock(&hslot2->lock); + } } spin_unlock_bh(&hslot->lock); } @@ -1999,7 +2117,7 @@ EXPORT_SYMBOL(udp_lib_unhash); /* * inet_rcv_saddr was changed, we must rehash secondary hash */ -void udp_lib_rehash(struct sock *sk, u16 newhash) +void udp_lib_rehash(struct sock *sk, u16 newhash, u16 newhash4) { if (sk_hashed(sk)) { struct udp_table *udptable =3D udp_get_table_prot(sk); @@ -2031,6 +2149,19 @@ void udp_lib_rehash(struct sock *sk, u16 newhash) spin_unlock(&nhslot2->lock); } =20 + if (udp_hashed4(sk)) { + udp4_rehash4(udptable, sk, newhash4); + + if (hslot2 !=3D nhslot2) { + spin_lock(&hslot2->lock); + UDP_HSLOT_MAIN(hslot2)->hash4_cnt--; + spin_unlock(&hslot2->lock); + + spin_lock(&nhslot2->lock); + UDP_HSLOT_MAIN(nhslot2)->hash4_cnt++; + spin_unlock(&nhslot2->lock); + } + } spin_unlock_bh(&hslot->lock); } } @@ -2042,7 +2173,10 @@ void udp_v4_rehash(struct sock *sk) u16 new_hash =3D ipv4_portaddr_hash(sock_net(sk), inet_sk(sk)->inet_rcv_saddr, inet_sk(sk)->inet_num); - udp_lib_rehash(sk, new_hash); + u16 new_hash4 =3D udp_ehashfn(sock_net(sk), + inet_sk(sk)->inet_rcv_saddr, inet_sk(sk)->inet_num, + inet_sk(sk)->inet_daddr, inet_sk(sk)->inet_dport); + udp_lib_rehash(sk, new_hash, new_hash4); } =20 static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) @@ -2935,7 +3069,7 @@ struct proto udp_prot =3D { .owner =3D THIS_MODULE, .close =3D udp_lib_close, .pre_connect =3D udp_pre_connect, - .connect =3D ip4_datagram_connect, + .connect =3D udp_connect, .disconnect =3D udp_disconnect, .ioctl =3D udp_ioctl, .init =3D udp_init_sock, diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index bbf3352213c4..4d3dfcb48a39 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -111,7 +111,7 @@ void udp_v6_rehash(struct sock *sk) &sk->sk_v6_rcv_saddr, inet_sk(sk)->inet_num); =20 - udp_lib_rehash(sk, new_hash); + udp_lib_rehash(sk, new_hash, 0); /* 4-tuple hash not implemented */ } =20 static int compute_score(struct sock *sk, const struct net *net, --=20 2.32.0.3.g01195cf9f