From nobody Sun Feb 8 08:22:35 2026 Received: from out-178.mta0.migadu.com (out-178.mta0.migadu.com [91.218.175.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 407173451D4 for ; Thu, 8 Jan 2026 15:01:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767884501; cv=none; b=vBos3tjX94FM+HX8W6sZD8UYwFUilGeHm9UuuPb88TLbYqwWIX3D11qSD9gd2k/G6fr1lJJ8mwzKWE7NR0PQQAxhFVvVNnZGbeE/NV0xPUvuPPPSvTGGEQb9hgcdNoHRZ8oGNu44u6EJtqzUySLRXS4Ffpr2rj1+g71hktMFG4I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767884501; c=relaxed/simple; bh=1wM+Umqivq6fi2g259smDi3EfrM2ZQeywOPoTMUrnO8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ixrLzqUkA3CxfuX6cunMTtD3NkD42rq/oiTLUh+bH6RZPDOwRCiydhNYtyL4De4jAEyEYocAwooiEN0bHhixR5J4f0/Z/ImIIjluCI9bjej2LiIGKcKtMqZTz+2/CxEvKDla7Mcdxr+dwbwZgT6gQJ15+rdD9XOX1vcvyxrqjak= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=luXrKEP8; arc=none smtp.client-ip=91.218.175.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="luXrKEP8" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1767884495; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=V53pa5S0H0LfMXP8GS8ZkZDhqoGNJL2k82MVkgcqumI=; b=luXrKEP8a2U0fRquefBFUZYo3y1muSEH6iaJfq3oHV4qU7Lh1XrAulkhf6hSvmrpkszmCX GofQvJ5m6Xi3wSo2h3sT01x0rLYTK5OuxUAxS2Ozyav3kRUwnDfoCNHSFiMy4h4cJVy5th XJG7BFY+Jt+UvLstQblELfxtSqZH66g= From: Jiayuan Chen To: bpf@vger.kernel.org Cc: Jiayuan Chen , Jakub Sitnicki , John Fastabend , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Neal Cardwell , Kuniyuki Iwashima , David Ahern , Andrii Nakryiko , Eduard Zingerman , Alexei Starovoitov , Daniel Borkmann , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Shuah Khan , Michal Luczaj , Stefano Garzarella , Cong Wang , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH bpf-next v6 1/3] bpf, sockmap: Fix incorrect copied_seq calculation Date: Thu, 8 Jan 2026 23:00:30 +0800 Message-ID: <20260108150102.12563-2-jiayuan.chen@linux.dev> In-Reply-To: <20260108150102.12563-1-jiayuan.chen@linux.dev> References: <20260108150102.12563-1-jiayuan.chen@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" A socket using sockmap has its own independent receive queue: ingress_msg. This queue may contain data from its own protocol stack or from other sockets. The issue is that when reading from ingress_msg, we update tp->copied_seq by default. However, if the data is not from its own protocol stack, tcp->rcv_nxt is not increased. Later, if we convert this socket to a native socket, reading from this socket may fail because copied_seq might be significantly larger than rcv_nxt. This fix also addresses the syzkaller-reported bug referenced in the Closes tag. This patch marks the skmsg objects in ingress_msg. When reading, we update copied_seq only if the data is from its own protocol stack. FD1:read() -- FD1->copied_seq++ | [read data] | [enqueue data] v [sockmap] -> ingress to self -> ingress_msg queue FD1 native stack ------> ^ -- FD1->rcv_nxt++ -> redirect to other | [enqueue data] | | | ingress to FD1 v ^ ... | [sockmap] FD2 native stack Closes: https://syzkaller.appspot.com/bug?extid=3D06dbd397158ec0ea4983 Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Reviewed-by: Jakub Sitnicki Signed-off-by: Jiayuan Chen --- include/linux/skmsg.h | 2 ++ net/core/skmsg.c | 25 ++++++++++++++++++++++--- net/ipv4/tcp_bpf.c | 5 +++-- 3 files changed, 27 insertions(+), 5 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 49847888c287..dfdc158ab88c 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -141,6 +141,8 @@ int sk_msg_memcopy_from_iter(struct sock *sk, struct io= v_iter *from, struct sk_msg *msg, u32 bytes); int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr = *msg, int len, int flags); +int __sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghd= r *msg, + int len, int flags, int *copied_from_self); bool sk_msg_is_readable(struct sock *sk); =20 static inline void sk_msg_check_to_free(struct sk_msg *msg, u32 i, u32 byt= es) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 2ac7731e1e0a..3d147837b82c 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -409,14 +409,14 @@ int sk_msg_memcopy_from_iter(struct sock *sk, struct = iov_iter *from, } EXPORT_SYMBOL_GPL(sk_msg_memcopy_from_iter); =20 -/* Receive sk_msg from psock->ingress_msg to @msg. */ -int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr = *msg, - int len, int flags) +int __sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghd= r *msg, + int len, int flags, int *copied_from_self) { struct iov_iter *iter =3D &msg->msg_iter; int peek =3D flags & MSG_PEEK; struct sk_msg *msg_rx; int i, copied =3D 0; + bool from_self; =20 msg_rx =3D sk_psock_peek_msg(psock); while (copied !=3D len) { @@ -425,6 +425,7 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *ps= ock, struct msghdr *msg, if (unlikely(!msg_rx)) break; =20 + from_self =3D msg_rx->sk =3D=3D sk; i =3D msg_rx->sg.start; do { struct page *page; @@ -443,6 +444,9 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *ps= ock, struct msghdr *msg, } =20 copied +=3D copy; + if (from_self && copied_from_self) + *copied_from_self +=3D copy; + if (likely(!peek)) { sge->offset +=3D copy; sge->length -=3D copy; @@ -487,6 +491,14 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *p= sock, struct msghdr *msg, out: return copied; } +EXPORT_SYMBOL_GPL(__sk_msg_recvmsg); + +/* Receive sk_msg from psock->ingress_msg to @msg. */ +int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr = *msg, + int len, int flags) +{ + return __sk_msg_recvmsg(sk, psock, msg, len, flags, NULL); +} EXPORT_SYMBOL_GPL(sk_msg_recvmsg); =20 bool sk_msg_is_readable(struct sock *sk) @@ -616,6 +628,12 @@ static int sk_psock_skb_ingress_self(struct sk_psock *= psock, struct sk_buff *skb if (unlikely(!msg)) return -EAGAIN; skb_set_owner_r(skb, sk); + + /* This is used in tcp_bpf_recvmsg_parser() to determine whether the + * data originates from the socket's own protocol stack. No need to + * refcount sk because msg's lifetime is bound to sk via the ingress_msg. + */ + msg->sk =3D sk; err =3D sk_psock_skb_ingress_enqueue(skb, off, len, psock, sk, msg, take_= ref); if (err < 0) kfree(msg); @@ -909,6 +927,7 @@ int sk_psock_msg_verdict(struct sock *sk, struct sk_pso= ck *psock, sk_msg_compute_data_pointers(msg); msg->sk =3D sk; ret =3D bpf_prog_run_pin_on_cpu(prog, msg); + msg->sk =3D NULL; ret =3D sk_psock_map_verd(ret, msg->sk_redir); psock->apply_bytes =3D msg->apply_bytes; if (ret =3D=3D __SK_REDIRECT) { diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index a268e1595b22..5c698fd7fbf8 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -226,6 +226,7 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk, int peek =3D flags & MSG_PEEK; struct sk_psock *psock; struct tcp_sock *tcp; + int copied_from_self =3D 0; int copied =3D 0; u32 seq; =20 @@ -262,7 +263,7 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk, } =20 msg_bytes_ready: - copied =3D sk_msg_recvmsg(sk, psock, msg, len, flags); + copied =3D __sk_msg_recvmsg(sk, psock, msg, len, flags, &copied_from_self= ); /* The typical case for EFAULT is the socket was gracefully * shutdown with a FIN pkt. So check here the other case is * some error on copy_page_to_iter which would be unexpected. @@ -277,7 +278,7 @@ static int tcp_bpf_recvmsg_parser(struct sock *sk, goto out; } } - seq +=3D copied; + seq +=3D copied_from_self; if (!copied) { long timeo; int data; --=20 2.43.0