From nobody Thu Dec 18 09:47:21 2025 Received: from m16.mail.163.com (m16.mail.163.com [117.135.210.3]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 8E014143C72; Wed, 18 Dec 2024 05:35:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=117.135.210.3 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734500162; cv=none; b=c7PMrxWdPsbGub29LvncWBuurqQBXWcqlfx2Q3jQ+zE1TylpC/QWxzgZpMpp7hWIdCDChlbouKja+5uhysysOky1/E2D/ES7WRFjDlz7WFCqFYnR5YN2H9JHF4SbifeFkia47bV2Tv72ku93RCs9vMvdvPxnADAtjRvXzw7jZ6M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734500162; c=relaxed/simple; bh=8d7/b4+IrlPm1X5n1GPzLt0aJg11yahl6Jzx8D7r0eg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kQ/AvL11co9/yZIzUKGJHevt9ZTMgZS/vtn9FsuFjDZuo6S1deAcRQTR3hivf/wfKkD66DJBtf3H1eEY7Lfe6A2naDqAErVK22w5AnZLyOrjZ/2hzT1RPZZ79w2VybeyJrArx6TAH56NALBAoI0lwLMpRkFekX2SejClvJzmBiw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=163.com; spf=pass smtp.mailfrom=163.com; dkim=pass (1024-bit key) header.d=163.com header.i=@163.com header.b=BiLhFiZM; arc=none smtp.client-ip=117.135.210.3 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=163.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=163.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=163.com header.i=@163.com header.b="BiLhFiZM" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:Subject:Date:Message-ID:MIME-Version; bh=qpPMu Z8hLVKWxvUDyYWXZWQnVfjNtTHkjFSiRr/hX0Y=; b=BiLhFiZMyJGtYbAP7oS21 g/9TSYz+LGTu8lMD/duWjucVbz7fZo2lnQHQpMwOsTTW9yfoPt2MHelLRWQuEWjN x/kDOQpetWhjwdEV8wf7iN4mPDW/cID+Ginq3WqIz1KZS1eMpTO/ww0xVRouiFug sfkmk11ApP11cEIopSe9Qk= Received: from localhost.localdomain (unknown []) by gzga-smtp-mtada-g1-2 (Coremail) with SMTP id _____wDXf3juXmJn1VZZBQ--.30577S3; Wed, 18 Dec 2024 13:34:57 +0800 (CST) From: Jiayuan Chen To: bpf@vger.kernel.org Cc: martin.lau@linux.dev, ast@kernel.org, edumazet@google.com, jakub@cloudflare.com, davem@davemloft.net, dsahern@kernel.org, kuba@kernel.org, pabeni@redhat.com, linux-kernel@vger.kernel.org, song@kernel.org, john.fastabend@gmail.com, andrii@kernel.org, mhal@rbox.co, yonghong.song@linux.dev, daniel@iogearbox.net, xiyou.wangcong@gmail.com, horms@kernel.org, Jiayuan Chen Subject: [PATCH bpf v3 1/2] bpf: fix wrong copied_seq calculation Date: Wed, 18 Dec 2024 13:34:07 +0800 Message-ID: <20241218053408.437295-2-mrpre@163.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241218053408.437295-1-mrpre@163.com> References: <20241218053408.437295-1-mrpre@163.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _____wDXf3juXmJn1VZZBQ--.30577S3 X-Coremail-Antispam: 1Uf129KBjvJXoWxKryrKFW7tFyfXFW7CFyfZwb_yoWDJF47pF 1DAw4fZF4DGFW7WanYyFZrXr1agw4rGayjk348W3ySyrsrKr1SyF95KFyayF1rGrZ5uw13 ArWjgw45Gw1DAa7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x0piBHqfUUUUU= X-CM-SenderInfo: xpus2vi6rwjhhfrp/xtbBDwe5p2diWzNk1wABsJ Content-Type: text/plain; charset="utf-8" 'sk->copied_seq' was updated in the tcp_eat_skb() function when the action of a BPF program was SK_REDIRECT. For other actions, like SK_PASS, the update logic for 'sk->copied_seq' was moved to tcp_bpf_recvmsg_parser() to ensure the accuracy of the 'fionread' feature. It works for a single stream_verdict scenario, as it also modified 'sk_data_ready->sk_psock_verdict_data_ready->tcp_read_skb' to remove updating 'sk->copied_seq'. However, for programs where both stream_parser and stream_verdict are active(strparser purpose), tcp_read_sock() was used instead of tcp_read_skb() (sk_data_ready->strp_data_ready->tcp_read_sock) tcp_read_sock() now still update 'sk->copied_seq', leading to duplicated updates. In summary, for strparser + SK_PASS, copied_seq is redundantly calculated in both tcp_read_sock() and tcp_bpf_recvmsg_parser(). The issue causes incorrect copied_seq calculations, which prevent correct data reads from the recv() interface in user-land. Modifying tcp_read_sock() or strparser implementation directly is unreasonable, as it is widely used in other modules. Here, we introduce a method tcp_bpf_read_sock() to replace 'sk->sk_socket->ops->read_sock' (like 'tls_build_proto()' does in tls_main.c). Such replacement action was also used in updating tcp_bpf_prots in tcp_bpf.c, so it's not weird. (Note that checkpatch.pl may complain missing 'const' qualifier when we define the bpf-specified 'proto_ops', but we have to do because we need update it). Also we remove strparser check in tcp_eat_skb() since we implement custom function tcp_bpf_read_sock() without copied_seq updating. Since strparser currently supports only TCP, it's sufficient for 'ops' to inherit inet_stream_ops. Fixes: e5c6de5fa025 ("bpf, sockmap: Incorrectly handling copied_seq") Signed-off-by: Jiayuan Chen --- include/linux/skmsg.h | 2 + include/net/tcp.h | 1 + net/core/skmsg.c | 3 ++ net/ipv4/tcp.c | 2 +- net/ipv4/tcp_bpf.c | 108 ++++++++++++++++++++++++++++++++++++++++-- 5 files changed, 112 insertions(+), 4 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index d9b03e0746e7..7f91bc67e50f 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -85,6 +85,7 @@ struct sk_psock { struct sock *sk_redir; u32 apply_bytes; u32 cork_bytes; + u32 strp_offset; u32 eval; bool redir_ingress; /* undefined if sk_redir is null */ struct sk_msg *cork; @@ -112,6 +113,7 @@ struct sk_psock { int (*psock_update_sk_prot)(struct sock *sk, struct sk_psock *psock, bool restore); struct proto *sk_proto; + const struct proto_ops *sk_proto_ops; struct mutex work_mutex; struct sk_psock_work_state work_state; struct delayed_work work; diff --git a/include/net/tcp.h b/include/net/tcp.h index e9b37b76e894..fb3215936ece 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -353,6 +353,7 @@ ssize_t tcp_splice_read(struct socket *sk, loff_t *ppos, unsigned int flags); struct sk_buff *tcp_stream_alloc_skb(struct sock *sk, gfp_t gfp, bool force_schedule); +void tcp_eat_recv_skb(struct sock *sk, struct sk_buff *skb); =20 static inline void tcp_dec_quickack_mode(struct sock *sk) { diff --git a/net/core/skmsg.c b/net/core/skmsg.c index e90fbab703b2..99dd75c9e689 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -702,6 +702,7 @@ struct sk_psock *sk_psock_init(struct sock *sk, int nod= e) { struct sk_psock *psock; struct proto *prot; + const struct proto_ops *proto_ops; =20 write_lock_bh(&sk->sk_callback_lock); =20 @@ -722,9 +723,11 @@ struct sk_psock *sk_psock_init(struct sock *sk, int no= de) } =20 prot =3D READ_ONCE(sk->sk_prot); + proto_ops =3D likely(sk->sk_socket) ? sk->sk_socket->ops : NULL; psock->sk =3D sk; psock->eval =3D __SK_NONE; psock->sk_proto =3D prot; + psock->sk_proto_ops =3D proto_ops; psock->saved_unhash =3D prot->unhash; psock->saved_destroy =3D prot->destroy; psock->saved_close =3D prot->close; diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 0d704bda6c41..6a07d98017f7 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1517,7 +1517,7 @@ void tcp_cleanup_rbuf(struct sock *sk, int copied) __tcp_cleanup_rbuf(sk, copied); } =20 -static void tcp_eat_recv_skb(struct sock *sk, struct sk_buff *skb) +void tcp_eat_recv_skb(struct sock *sk, struct sk_buff *skb) { __skb_unlink(skb, &sk->sk_receive_queue); if (likely(skb->destructor =3D=3D sock_rfree)) { diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 99cef92e6290..4a089afc09b7 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -19,9 +19,6 @@ void tcp_eat_skb(struct sock *sk, struct sk_buff *skb) if (!skb || !skb->len || !sk_is_tcp(sk)) return; =20 - if (skb_bpf_strparser(skb)) - return; - tcp =3D tcp_sk(sk); copied =3D tcp->copied_seq + skb->len; WRITE_ONCE(tcp->copied_seq, copied); @@ -578,6 +575,81 @@ static int tcp_bpf_sendmsg(struct sock *sk, struct msg= hdr *msg, size_t size) return copied > 0 ? copied : err; } =20 +static void sock_replace_proto_ops(struct sock *sk, + const struct proto_ops *proto_ops) +{ + if (sk->sk_socket) + WRITE_ONCE(sk->sk_socket->ops, proto_ops); +} + +/* The tcp_bpf_read_sock() is an alternative implementation + * of tcp_read_sock(), except that it does not update copied_seq. + */ +static int tcp_bpf_read_sock(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor) +{ + struct sk_psock *psock; + struct sk_buff *skb; + int offset; + int copied =3D 0; + + if (sk->sk_state =3D=3D TCP_LISTEN) + return -ENOTCONN; + + /* we are called from sk_psock_strp_data_ready() and + * psock has already been checked and can't be NULL. + */ + psock =3D sk_psock_get(sk); + /* The offset keeps track of how much data was processed during + * the last call. + */ + offset =3D psock->strp_offset; + while ((skb =3D skb_peek(&sk->sk_receive_queue)) !=3D NULL) { + u8 tcp_flags; + int used; + size_t len; + + len =3D skb->len - offset; + tcp_flags =3D TCP_SKB_CB(skb)->tcp_flags; + WARN_ON_ONCE(!skb_set_owner_sk_safe(skb, sk)); + used =3D recv_actor(desc, skb, offset, len); + if (used <=3D 0) { + /* None of the data in skb has been consumed. + * May -ENOMEM or other error happened + */ + if (!copied) + copied =3D used; + break; + } + + if (WARN_ON_ONCE(used > len)) + used =3D len; + copied +=3D used; + if (used < len) { + /* Strparser clone and consume all input skb except + * -ENOMEM happened and it will replay skb by it's + * framework later. So We need to keep offset and + * skb for next retry. + */ + offset +=3D used; + break; + } + + /* Entire skb was consumed, and we don't need this skb + * anymore and clean the offset. + */ + offset =3D 0; + tcp_eat_recv_skb(sk, skb); + if (!desc->count) + break; + if (tcp_flags & TCPHDR_FIN) + break; + } + + WRITE_ONCE(psock->strp_offset, offset); + return copied; +} + enum { TCP_BPF_IPV4, TCP_BPF_IPV6, @@ -595,6 +667,10 @@ enum { static struct proto *tcpv6_prot_saved __read_mostly; static DEFINE_SPINLOCK(tcpv6_prot_lock); static struct proto tcp_bpf_prots[TCP_BPF_NUM_PROTS][TCP_BPF_NUM_CFGS]; +/* we do not use 'const' here because it will be polluted later. + * It may cause const check warning by script, just ignore it. + */ +static struct proto_ops tcp_bpf_proto_ops[TCP_BPF_NUM_PROTS]; =20 static void tcp_bpf_rebuild_protos(struct proto prot[TCP_BPF_NUM_CFGS], struct proto *base) @@ -615,6 +691,13 @@ static void tcp_bpf_rebuild_protos(struct proto prot[T= CP_BPF_NUM_CFGS], prot[TCP_BPF_TXRX].recvmsg =3D tcp_bpf_recvmsg_parser; } =20 +static void tcp_bpf_rebuild_proto_ops(struct proto_ops *ops, + const struct proto_ops *base) +{ + *ops =3D *base; + ops->read_sock =3D tcp_bpf_read_sock; +} + static void tcp_bpf_check_v6_needs_rebuild(struct proto *ops) { if (unlikely(ops !=3D smp_load_acquire(&tcpv6_prot_saved))) { @@ -627,6 +710,19 @@ static void tcp_bpf_check_v6_needs_rebuild(struct prot= o *ops) } } =20 +static int __init tcp_bpf_build_proto_ops(void) +{ + /* We update ops separately for further scalability + * although v4 and v6 use same ops. + */ + tcp_bpf_rebuild_proto_ops(&tcp_bpf_proto_ops[TCP_BPF_IPV4], + &inet_stream_ops); + tcp_bpf_rebuild_proto_ops(&tcp_bpf_proto_ops[TCP_BPF_IPV6], + &inet_stream_ops); + return 0; +} +late_initcall(tcp_bpf_build_proto_ops); + static int __init tcp_bpf_v4_build_proto(void) { tcp_bpf_rebuild_protos(tcp_bpf_prots[TCP_BPF_IPV4], &tcp_prot); @@ -648,6 +744,7 @@ int tcp_bpf_update_proto(struct sock *sk, struct sk_pso= ck *psock, bool restore) { int family =3D sk->sk_family =3D=3D AF_INET6 ? TCP_BPF_IPV6 : TCP_BPF_IPV= 4; int config =3D psock->progs.msg_parser ? TCP_BPF_TX : TCP_BPF_BASE; + bool strp =3D psock->progs.stream_verdict && psock->progs.stream_parser; =20 if (psock->progs.stream_verdict || psock->progs.skb_verdict) { config =3D (config =3D=3D TCP_BPF_TX) ? TCP_BPF_TXRX : TCP_BPF_RX; @@ -666,6 +763,7 @@ int tcp_bpf_update_proto(struct sock *sk, struct sk_pso= ck *psock, bool restore) sk->sk_write_space =3D psock->saved_write_space; /* Pairs with lockless read in sk_clone_lock() */ sock_replace_proto(sk, psock->sk_proto); + sock_replace_proto_ops(sk, psock->sk_proto_ops); } return 0; } @@ -679,6 +777,10 @@ int tcp_bpf_update_proto(struct sock *sk, struct sk_ps= ock *psock, bool restore) =20 /* Pairs with lockless read in sk_clone_lock() */ sock_replace_proto(sk, &tcp_bpf_prots[family][config]); + + if (strp) + sock_replace_proto_ops(sk, &tcp_bpf_proto_ops[family]); + return 0; } EXPORT_SYMBOL_GPL(tcp_bpf_update_proto); --=20 2.43.5 From nobody Thu Dec 18 09:47:21 2025 Received: from m16.mail.163.com (m16.mail.163.com [117.135.210.2]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 492821474C9; Wed, 18 Dec 2024 05:36:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=117.135.210.2 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734500202; cv=none; b=KWQnPXQYRY7maWRZ7xeeCIuYigArpm4g6b/fuLt5U6dl5IkQrJzh11iJrlXSKEtqtwWwoEVERRNOjxqEA1zvUZQ7BahZ+1CBH2O4Hm3QoucLgKA0SJD+mmsASU3e0pdCb+MHKzErflCRDYV/F4KYja8KXIbHqg3IDa7UUU12XCQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734500202; c=relaxed/simple; bh=dIUP7WdiNL+gacq4RzMEJCyL7wZSCrcOzOwR8uvJg2E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZH36NvC88n/jGWXYvSSYFF/g30gmwsU24ck6uSyskJ5ZXKKMWMAow7MYqCBCPjmFHi9hPZVHPIcNmbosItx4HtnXSCMwsUa92IrQelV+2xKBSZ9miUAlvvx66GD/Z0XClE/hKX9SdJIor19GiCRbtbwKbEa1mfCNay2KuKfr3jU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=163.com; spf=pass smtp.mailfrom=163.com; dkim=pass (1024-bit key) header.d=163.com header.i=@163.com header.b=PjL4cx0d; arc=none smtp.client-ip=117.135.210.2 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=163.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=163.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=163.com header.i=@163.com header.b="PjL4cx0d" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:Subject:Date:Message-ID:MIME-Version; bh=iC8jk 3Ez3kXmPN1ZIXjPdlYnL0nYGf90nwsAxr8CmYg=; b=PjL4cx0dFWsTzG+M7lGll R59H3nNT7oOleWAGIS+tgLIkClAzoj6/CMeIVEPpcQCeAATMqOAnih3hFVE1vWS9 6vuafgCVBucg+CkmcnzHpSFc/Ar5n+e2H2J9UQ6iDGGRx+1EUwLVCQixFVjutfPA NXYm1UuYoPuY4F6CubceHk= Received: from localhost.localdomain (unknown []) by gzga-smtp-mtada-g1-2 (Coremail) with SMTP id _____wDXf3juXmJn1VZZBQ--.30577S4; Wed, 18 Dec 2024 13:35:05 +0800 (CST) From: Jiayuan Chen To: bpf@vger.kernel.org Cc: martin.lau@linux.dev, ast@kernel.org, edumazet@google.com, jakub@cloudflare.com, davem@davemloft.net, dsahern@kernel.org, kuba@kernel.org, pabeni@redhat.com, linux-kernel@vger.kernel.org, song@kernel.org, john.fastabend@gmail.com, andrii@kernel.org, mhal@rbox.co, yonghong.song@linux.dev, daniel@iogearbox.net, xiyou.wangcong@gmail.com, horms@kernel.org, Jiayuan Chen Subject: [PATCH bpf v3 2/2] selftests/bpf: add strparser test for bpf Date: Wed, 18 Dec 2024 13:34:08 +0800 Message-ID: <20241218053408.437295-3-mrpre@163.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241218053408.437295-1-mrpre@163.com> References: <20241218053408.437295-1-mrpre@163.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _____wDXf3juXmJn1VZZBQ--.30577S4 X-Coremail-Antispam: 1Uf129KBjvAXoWfGF1fXr18Cw1DJw1fJFyrJFb_yoW8WF13Co Z3Gan8J3yxGrnxJ34kG3yDCa1fWF4xWw4kWw47J3y5XFyjyrWj9ayUGws3W3Wa9r4Sgr93 JFWqva4rWr15Jr4fn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UbIYCTnIWIevJa73UjIFyTuYvj4RPWrWUUUUU X-CM-SenderInfo: xpus2vi6rwjhhfrp/1tbiWxC5p2diVznRhwAAso Content-Type: text/plain; charset="utf-8" Add test cases for bpf + strparser and separated them from sockmap_basic. This is because we need to add more test cases for strparser in the future. Signed-off-by: Jiayuan Chen --- .../selftests/bpf/prog_tests/sockmap_basic.c | 53 --- .../selftests/bpf/prog_tests/sockmap_strp.c | 344 ++++++++++++++++++ .../selftests/bpf/progs/test_sockmap_strp.c | 51 +++ 3 files changed, 395 insertions(+), 53 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/sockmap_strp.c create mode 100644 tools/testing/selftests/bpf/progs/test_sockmap_strp.c diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools= /testing/selftests/bpf/prog_tests/sockmap_basic.c index fdff0652d7ef..4c0eebc433d8 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c @@ -530,57 +530,6 @@ static void test_sockmap_skb_verdict_shutdown(void) test_sockmap_pass_prog__destroy(skel); } =20 -static void test_sockmap_stream_pass(void) -{ - int zero =3D 0, sent, recvd; - int verdict, parser; - int err, map; - int c =3D -1, p =3D -1; - struct test_sockmap_pass_prog *pass =3D NULL; - char snd[256] =3D "0123456789"; - char rcv[256] =3D "0"; - - pass =3D test_sockmap_pass_prog__open_and_load(); - verdict =3D bpf_program__fd(pass->progs.prog_skb_verdict); - parser =3D bpf_program__fd(pass->progs.prog_skb_parser); - map =3D bpf_map__fd(pass->maps.sock_map_rx); - - err =3D bpf_prog_attach(parser, map, BPF_SK_SKB_STREAM_PARSER, 0); - if (!ASSERT_OK(err, "bpf_prog_attach stream parser")) - goto out; - - err =3D bpf_prog_attach(verdict, map, BPF_SK_SKB_STREAM_VERDICT, 0); - if (!ASSERT_OK(err, "bpf_prog_attach stream verdict")) - goto out; - - err =3D create_pair(AF_INET, SOCK_STREAM, &c, &p); - if (err) - goto out; - - /* sk_data_ready of 'p' will be replaced by strparser handler */ - err =3D bpf_map_update_elem(map, &zero, &p, BPF_NOEXIST); - if (!ASSERT_OK(err, "bpf_map_update_elem(p)")) - goto out_close; - - /* - * as 'prog_skb_parser' return the original skb len and - * 'prog_skb_verdict' return SK_PASS, the kernel will just - * pass it through to original socket 'p' - */ - sent =3D xsend(c, snd, sizeof(snd), 0); - ASSERT_EQ(sent, sizeof(snd), "xsend(c)"); - - recvd =3D recv_timeout(p, rcv, sizeof(rcv), SOCK_NONBLOCK, - IO_TIMEOUT_SEC); - ASSERT_EQ(recvd, sizeof(rcv), "recv_timeout(p)"); - -out_close: - close(c); - close(p); - -out: - test_sockmap_pass_prog__destroy(pass); -} =20 static void test_sockmap_skb_verdict_fionread(bool pass_prog) { @@ -1050,8 +999,6 @@ void test_sockmap_basic(void) test_sockmap_progs_query(BPF_SK_SKB_VERDICT); if (test__start_subtest("sockmap skb_verdict shutdown")) test_sockmap_skb_verdict_shutdown(); - if (test__start_subtest("sockmap stream parser and verdict pass")) - test_sockmap_stream_pass(); if (test__start_subtest("sockmap skb_verdict fionread")) test_sockmap_skb_verdict_fionread(true); if (test__start_subtest("sockmap skb_verdict fionread on drop")) diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_strp.c b/tools/= testing/selftests/bpf/prog_tests/sockmap_strp.c new file mode 100644 index 000000000000..0398658d4787 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_strp.c @@ -0,0 +1,344 @@ +// SPDX-License-Identifier: GPL-2.0 +#include + +#include +#include "sockmap_helpers.h" +#include "test_skmsg_load_helpers.skel.h" +#include "test_sockmap_strp.skel.h" +#define STRP_PACKET_HEAD_LEN 4 +#define STRP_PACKET_BODY_LEN 6 +#define STRP_PACKET_FULL_LEN (STRP_PACKET_HEAD_LEN + STRP_PACKET_BODY_LEN) +static const char packet[STRP_PACKET_FULL_LEN] =3D "head+body\0"; +static const int test_packet_num =3D 100; + +static struct test_sockmap_strp *sockmap_strp_init(int *map) +{ + struct test_sockmap_strp *strp =3D NULL; + int verdict, parser; + int err; + + strp =3D test_sockmap_strp__open_and_load(); + verdict =3D bpf_program__fd(strp->progs.prog_skb_verdict_pass); + parser =3D bpf_program__fd(strp->progs.prog_skb_parser_partial); + *map =3D bpf_map__fd(strp->maps.sock_map); + + err =3D bpf_prog_attach(parser, *map, BPF_SK_SKB_STREAM_PARSER, 0); + if (!ASSERT_OK(err, "bpf_prog_attach stream parser")) + goto err; + + err =3D bpf_prog_attach(verdict, *map, BPF_SK_SKB_STREAM_VERDICT, 0); + if (!ASSERT_OK(err, "bpf_prog_attach stream verdict")) + goto err; + + return strp; +err: + test_sockmap_strp__destroy(strp); + return NULL; +} + +/* we have multiple packets in one skb + * ------------ ------------ ------------ + * | packet1 | packet2 | ... + * ------------ ------------ ------------ + */ +static void test_sockmap_strp_multi_packet(int family, int sotype) +{ + int i, zero =3D 0; + int sent, recvd, total; + int err, map; + int c =3D -1, p =3D -1; + struct test_sockmap_strp *strp =3D NULL; + char *snd =3D NULL, *rcv =3D NULL; + + strp =3D sockmap_strp_init(&map); + if (!ASSERT_TRUE(strp !=3D NULL, "sockmap_strp_init")) + return; + + err =3D create_pair(family, sotype, &c, &p); + if (err) + goto out; + + err =3D bpf_map_update_elem(map, &zero, &p, BPF_NOEXIST); + if (!ASSERT_OK(err, "bpf_map_update_elem(zero, p)")) + goto out_close; + + /* construct multiple packets in one buffer */ + total =3D test_packet_num * STRP_PACKET_FULL_LEN; + snd =3D malloc(total); + rcv =3D malloc(total + 1); + if (!ASSERT_TRUE(snd !=3D NULL, "malloc(multi block)") + || !ASSERT_TRUE(rcv !=3D NULL, "malloc(multi block)")) + goto out_close; + + for (i =3D 0; i < test_packet_num; i++) { + memcpy(snd + i * STRP_PACKET_FULL_LEN, + packet, STRP_PACKET_FULL_LEN); + } + + sent =3D xsend(c, snd, total, 0); + if (!ASSERT_EQ(sent, total, "xsend(c)")) + goto out_close; + + /* try to recv one more byte to avoid truncation check */ + recvd =3D recv_timeout(p, rcv, total + 1, MSG_DONTWAIT, IO_TIMEOUT_SEC); + if (!ASSERT_EQ(recvd, total, "recv(rcv)")) + goto out_close; + + /* we sent TCP segment with multiple encapsulation + * then check whether packets are handled correctly + */ + if (!ASSERT_OK(memcmp(snd, rcv, total), "memcmp(snd, rcv)")) + goto out_close; + +out_close: + close(c); + close(p); + if (snd) + free(snd); + if (rcv) + free(rcv); +out: + test_sockmap_strp__destroy(strp); +} + +static void test_sockmap_strp_partial_read(int family, int sotype) +{ + int zero =3D 0, recvd, off; + int verdict, parser; + int err, map; + int c =3D -1, p =3D -1; + struct test_sockmap_strp *strp =3D NULL; + char rcv[STRP_PACKET_FULL_LEN + 1] =3D "0"; + + strp =3D test_sockmap_strp__open_and_load(); + verdict =3D bpf_program__fd(strp->progs.prog_skb_verdict_pass); + parser =3D bpf_program__fd(strp->progs.prog_skb_parser_partial); + map =3D bpf_map__fd(strp->maps.sock_map); + + err =3D bpf_prog_attach(parser, map, BPF_SK_SKB_STREAM_PARSER, 0); + if (!ASSERT_OK(err, "bpf_prog_attach stream parser")) + goto out; + + err =3D bpf_prog_attach(verdict, map, BPF_SK_SKB_STREAM_VERDICT, 0); + if (!ASSERT_OK(err, "bpf_prog_attach stream verdict")) + goto out; + + err =3D create_pair(family, sotype, &c, &p); + if (err) + goto out; + + /* sk_data_ready of 'p' will be replaced by strparser handler */ + err =3D bpf_map_update_elem(map, &zero, &p, BPF_NOEXIST); + if (!ASSERT_OK(err, "bpf_map_update_elem(zero, p)")) + goto out_close; + + /* 1.1 send partial head, 1 byte header left*/ + off =3D STRP_PACKET_HEAD_LEN - 1; + xsend(c, packet, off, 0); + recvd =3D recv_timeout(p, rcv, sizeof(rcv), MSG_DONTWAIT, 5); + if (!ASSERT_EQ(-1, recvd, "insufficient head, should no data recvd")) + goto out_close; + + /* 1.2 send remaining head and body */ + xsend(c, packet + off, STRP_PACKET_FULL_LEN - off, 0); + recvd =3D recv_timeout(p, rcv, sizeof(rcv), MSG_DONTWAIT, IO_TIMEOUT_SEC); + if (!ASSERT_EQ(recvd, STRP_PACKET_FULL_LEN, "should full data recvd")) + goto out_close; + + /* 2.1 send partial head, 1 byte header left */ + off =3D STRP_PACKET_HEAD_LEN - 1; + xsend(c, packet, off, 0); + + /* 2.2 send remaining head and partial body, 1 byte body left */ + xsend(c, packet + off, STRP_PACKET_FULL_LEN - off - 1, 0); + off =3D STRP_PACKET_FULL_LEN - 1; + recvd =3D recv_timeout(p, rcv, sizeof(rcv), MSG_DONTWAIT, 1); + if (!ASSERT_EQ(-1, recvd, "insufficient body, should no data read")) + goto out_close; + + /* 2.3 send remaining body */ + xsend(c, packet + off, STRP_PACKET_FULL_LEN - off, 0); + recvd =3D recv_timeout(p, rcv, sizeof(rcv), MSG_DONTWAIT, IO_TIMEOUT_SEC); + if (!ASSERT_EQ(recvd, STRP_PACKET_FULL_LEN, "should full data recvd")) + goto out_close; + +out_close: + close(c); + close(p); + +out: + test_sockmap_strp__destroy(strp); +} + +static void test_sockmap_strp_pass(int family, int sotype, bool fionread) +{ + int zero =3D 0, pkt_size, sent, recvd, avail; + int verdict, parser; + int err, map; + int c =3D -1, p =3D -1; + int read_cnt =3D 10, i; + struct test_sockmap_strp *strp =3D NULL; + char rcv[STRP_PACKET_FULL_LEN + 1] =3D "0"; + + strp =3D test_sockmap_strp__open_and_load(); + verdict =3D bpf_program__fd(strp->progs.prog_skb_verdict_pass); + parser =3D bpf_program__fd(strp->progs.prog_skb_parser); + map =3D bpf_map__fd(strp->maps.sock_map); + + err =3D bpf_prog_attach(parser, map, BPF_SK_SKB_STREAM_PARSER, 0); + if (!ASSERT_OK(err, "bpf_prog_attach stream parser")) + goto out; + + err =3D bpf_prog_attach(verdict, map, BPF_SK_SKB_STREAM_VERDICT, 0); + if (!ASSERT_OK(err, "bpf_prog_attach stream verdict")) + goto out; + + err =3D create_pair(family, sotype, &c, &p); + if (err) + goto out; + + /* sk_data_ready of 'p' will be replaced by strparser handler */ + err =3D bpf_map_update_elem(map, &zero, &p, BPF_NOEXIST); + if (!ASSERT_OK(err, "bpf_map_update_elem(p)")) + goto out_close; + + /* Previously, we encountered issues such as deadlocks and + * sequence errors that resulted in the inability to read + * continuously. Therefore, we perform multiple iterations + * of testing here. + */ + pkt_size =3D STRP_PACKET_FULL_LEN; + for (i =3D 0; i < read_cnt; i++) { + sent =3D xsend(c, packet, pkt_size, 0); + if (!ASSERT_EQ(sent, pkt_size, "xsend(c)")) + goto out_close; + + recvd =3D recv_timeout(p, rcv, sizeof(rcv), MSG_DONTWAIT, + IO_TIMEOUT_SEC); + if (!ASSERT_EQ(recvd, pkt_size, "recv_timeout(p)") + || !ASSERT_OK(memcmp(packet, rcv, pkt_size), + "recv_timeout(p)")) + goto out_close; + } + + if (fionread) { + sent =3D xsend(c, packet, pkt_size, 0); + if (!ASSERT_EQ(sent, pkt_size, "second xsend(c)")) + goto out_close; + + err =3D ioctl(p, FIONREAD, &avail); + if (!ASSERT_OK(err, "ioctl(FIONREAD) error") + || ASSERT_EQ(avail, pkt_size, "ioctl(FIONREAD)")) + goto out_close; + + recvd =3D recv_timeout(p, rcv, sizeof(rcv), MSG_DONTWAIT, + IO_TIMEOUT_SEC); + if (!ASSERT_EQ(recvd, pkt_size, "second recv_timeout(p)") + || ASSERT_OK(memcmp(packet, rcv, pkt_size), + "second recv_timeout(p)")) + goto out_close; + } + +out_close: + close(c); + close(p); + +out: + test_sockmap_strp__destroy(strp); +} + +static void test_sockmap_strp_verdict(int family, int sotype) +{ + int zero =3D 0, one =3D 1, sent, recvd, off; + int verdict, parser; + int err, map; + int c0 =3D -1, p0 =3D -1, c1 =3D -1, p1 =3D -1; + struct test_sockmap_strp *strp =3D NULL; + char rcv[STRP_PACKET_FULL_LEN + 1] =3D "0"; + + strp =3D test_sockmap_strp__open_and_load(); + verdict =3D bpf_program__fd(strp->progs.prog_skb_verdict); + parser =3D bpf_program__fd(strp->progs.prog_skb_parser); + map =3D bpf_map__fd(strp->maps.sock_map); + + err =3D bpf_prog_attach(parser, map, BPF_SK_SKB_STREAM_PARSER, 0); + if (!ASSERT_OK(err, "bpf_prog_attach stream parser")) + goto out; + + err =3D bpf_prog_attach(verdict, map, BPF_SK_SKB_STREAM_VERDICT, 0); + if (!ASSERT_OK(err, "bpf_prog_attach stream verdict")) + goto out; + + /* We simulate a reverse proxy server. + * When p0 receives data from c0, we forward it to p1. + * From p1's perspective, it will consider this data + * as being sent by c1. + */ + err =3D create_socket_pairs(family, sotype, &c0, &c1, &p0, &p1); + if (!ASSERT_OK(err, "create_socket_pairs()")) + goto out; + + err =3D bpf_map_update_elem(map, &zero, &p0, BPF_NOEXIST); + if (!ASSERT_OK(err, "bpf_map_update_elem(p0)")) + goto out_close; + + err =3D bpf_map_update_elem(map, &one, &c1, BPF_NOEXIST); + if (!ASSERT_OK(err, "bpf_map_update_elem(c1)")) + goto out_close; + + sent =3D xsend(c0, packet, STRP_PACKET_FULL_LEN, 0); + if (!ASSERT_EQ(sent, STRP_PACKET_FULL_LEN, "xsend(c0)")) + goto out_close; + + recvd =3D recv_timeout(p1, rcv, sizeof(rcv), MSG_DONTWAIT, + IO_TIMEOUT_SEC); + if (!ASSERT_EQ(recvd, STRP_PACKET_FULL_LEN, "recv_timeout(p1)") + || !ASSERT_OK(memcmp(packet, rcv, STRP_PACKET_FULL_LEN), + "received data does not match the sent data")) + goto out_close; + + /* send again to ensure the stream is functioning correctly. */ + sent =3D xsend(c0, packet, STRP_PACKET_FULL_LEN, 0); + if (!ASSERT_EQ(sent, STRP_PACKET_FULL_LEN, "second xsend(c0)")) + goto out_close; + + /* partial read */ + off =3D STRP_PACKET_FULL_LEN/2; + recvd =3D recv_timeout(p1, rcv, off, MSG_DONTWAIT, + IO_TIMEOUT_SEC); + recvd +=3D recv_timeout(p1, rcv + off, sizeof(rcv) - off, MSG_DONTWAIT, + IO_TIMEOUT_SEC); + + if (!ASSERT_EQ(recvd, STRP_PACKET_FULL_LEN, "partial recv_timeout(p1)") + || !ASSERT_OK(memcmp(packet, rcv, STRP_PACKET_FULL_LEN), + "partial received data does not match the sent data")) + goto out_close; + +out_close: + close(c0); + close(c1); + close(p0); + close(p1); +out: + test_sockmap_strp__destroy(strp); +} + +void test_sockmap_strp(void) +{ + if (test__start_subtest("sockmap strp tcp pass")) + test_sockmap_strp_pass(AF_INET, SOCK_STREAM, false); + if (test__start_subtest("sockmap strp tcp v6 pass")) + test_sockmap_strp_pass(AF_INET6, SOCK_STREAM, false); + if (test__start_subtest("sockmap strp tcp pass fionread")) + test_sockmap_strp_pass(AF_INET, SOCK_STREAM, true); + if (test__start_subtest("sockmap strp tcp v6 pass fionread")) + test_sockmap_strp_pass(AF_INET6, SOCK_STREAM, true); + if (test__start_subtest("sockmap strp tcp verdict")) + test_sockmap_strp_verdict(AF_INET, SOCK_STREAM); + if (test__start_subtest("sockmap strp tcp v6 verdict")) + test_sockmap_strp_verdict(AF_INET6, SOCK_STREAM); + if (test__start_subtest("sockmap strp tcp partial read")) + test_sockmap_strp_partial_read(AF_INET, SOCK_STREAM); + if (test__start_subtest("sockmap strp tcp multiple packets")) + test_sockmap_strp_multi_packet(AF_INET, SOCK_STREAM); +} diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_strp.c b/tools/= testing/selftests/bpf/progs/test_sockmap_strp.c new file mode 100644 index 000000000000..db2f3b6c87ba --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_sockmap_strp.c @@ -0,0 +1,51 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include + +struct { + __uint(type, BPF_MAP_TYPE_SOCKMAP); + __uint(max_entries, 20); + __type(key, int); + __type(value, int); +} sock_map SEC(".maps"); + + +SEC("sk_skb/stream_verdict") +int prog_skb_verdict_pass(struct __sk_buff *skb) +{ + return SK_PASS; +} + + +SEC("sk_skb/stream_verdict") +int prog_skb_verdict(struct __sk_buff *skb) +{ + __u32 one =3D 1; + + return bpf_sk_redirect_map(skb, &sock_map, one, 0); +} + +SEC("sk_skb/stream_parser") +int prog_skb_parser(struct __sk_buff *skb) +{ + return skb->len; +} + +SEC("sk_skb/stream_parser") +int prog_skb_parser_partial(struct __sk_buff *skb) +{ + /* agreement with the test program on a 4-byte size header + * and 6-byte body. + */ + if (skb->len < 4) { + /* need more header to determine full length */ + return 0; + } + /* return full length decoded from header. + * the return value may be larger than skb->len which + * means framework must wait body coming. + */ + return 10; +} +char _license[] SEC("license") =3D "GPL"; --=20 2.43.5