net/ipv4/tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Syzkaller reported a bug [1] where sk->sk_forward_alloc can overflow.
When we send data, if an skb exists at the tail of the write queue, the
kernel will attempt to append the new data to that skb. However, the code
that checks for available space in the skb is flawed:
'''
copy = size_goal - skb->len
'''
The types of the variables involved are:
'''
copy: ssize_t (s64 on 64-bit systems)
size_goal: int
skb->len: unsigned int
'''
Due to C's type promotion rules, the signed size_goal is converted to an
unsigned int to match skb->len before the subtraction. The result is an
unsigned int.
When this unsigned int result is then assigned to the s64 copy variable,
it is zero-extended, preserving its non-negative value. Consequently, copy
is always >= 0.
Assume we are sending 2GB of data and size_goal has been adjusted to a
value smaller than skb->len. The subtraction will result in copy holding a
very large positive integer. In the subsequent logic, this large value is
used to update sk->sk_forward_alloc, which can easily cause it to overflow.
The syzkaller reproducer uses TCP_REPAIR to reliably create this
condition. However, this can also occur in real-world scenarios. The
tcp_bound_to_half_wnd() function can also reduce size_goal to a small
value. This would cause the subsequent tcp_wmem_schedule() to set
sk->sk_forward_alloc to a value close to INT_MAX. Further memory
allocation requests would then cause sk_forward_alloc to wrap around and
become negative.
[1]: https://syzkaller.appspot.com/bug?extid=de6565462ab540f50e47
Reported-by: syzbot+de6565462ab540f50e47@syzkaller.appspotmail.com
Fixes: 270a1c3de47e ("tcp: Support MSG_SPLICE_PAGES")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
v2 -> v4: Use correct syzkaller link and fix typo
v1 -> v2: Added more commit message
https://lore.kernel.org/netdev/20250702110039.15038-1-jiayuan.chen@linux.dev/
---
net/ipv4/tcp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 8a3c99246d2e..803a419f4ea0 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1176,7 +1176,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
goto do_error;
while (msg_data_left(msg)) {
- ssize_t copy = 0;
+ int copy = 0;
skb = tcp_write_queue_tail(sk);
if (skb)
--
2.47.1
Jiayuan Chen <jiayuan.chen@linux.dev> wrote: > Syzkaller reported a bug [1] where sk->sk_forward_alloc can overflow. > > When we send data, if an skb exists at the tail of the write queue, the > kernel will attempt to append the new data to that skb. However, the code > that checks for available space in the skb is flawed: > ''' > copy = size_goal - skb->len > ''' > > The types of the variables involved are: > ''' > copy: ssize_t (s64 on 64-bit systems) > size_goal: int > skb->len: unsigned int > ''' > > Due to C's type promotion rules, the signed size_goal is converted to an > unsigned int to match skb->len before the subtraction. The result is an > unsigned int. > > When this unsigned int result is then assigned to the s64 copy variable, > it is zero-extended, preserving its non-negative value. Consequently, copy > is always >= 0. > > Assume we are sending 2GB of data and size_goal has been adjusted to a > value smaller than skb->len. The subtraction will result in copy holding a > very large positive integer. In the subsequent logic, this large value is > used to update sk->sk_forward_alloc, which can easily cause it to overflow. > > The syzkaller reproducer uses TCP_REPAIR to reliably create this > condition. However, this can also occur in real-world scenarios. The > tcp_bound_to_half_wnd() function can also reduce size_goal to a small > value. This would cause the subsequent tcp_wmem_schedule() to set > sk->sk_forward_alloc to a value close to INT_MAX. Further memory > allocation requests would then cause sk_forward_alloc to wrap around and > become negative. > > [1]: https://syzkaller.appspot.com/bug?extid=de6565462ab540f50e47 > > Reported-by: syzbot+de6565462ab540f50e47@syzkaller.appspotmail.com > Fixes: 270a1c3de47e ("tcp: Support MSG_SPLICE_PAGES") > Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: David Howells <dhowells@redhat.com>
On Mon, 7 Jul 2025 13:41:11 +0800 Jiayuan Chen wrote: > Reported-by: syzbot+de6565462ab540f50e47@syzkaller.appspotmail.com > Fixes: 270a1c3de47e ("tcp: Support MSG_SPLICE_PAGES") > Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index 8a3c99246d2e..803a419f4ea0 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -1176,7 +1176,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) > goto do_error; > > while (msg_data_left(msg)) { > - ssize_t copy = 0; > + int copy = 0; nice! FWIW I think we started hitting something like this in prod, not sure what the actual repro is but recently some workloads here moved to 6.9 kernels and I see spikes in warnings in copy from iter that the size is > INT_MAX: WARNING: CPU: 26 PID: 3504902 at include/linux/thread_info.h:249 tcp_sendmsg+0xefd/0x1450 RIP: 0010:tcp_sendmsg (./include/linux/thread_info.h:249 ./include/linux/uio.h:203 ./include/linux/uio.h:221 ./include/net/sock.h:2176 ./include/net/sock.h:2202 net/ipv4/tcp.c:1219 net/ipv4/tcp.c:1355) I'll take this fix via net.
Jiayuan Chen <jiayuan.chen@linux.dev> wrote: > The types of the variables involved are: > ''' > copy: ssize_t (s64 on 64-bit systems) > size_goal: int > skb->len: unsigned int > ''' > > Due to C's type promotion rules, the signed size_goal is converted to an > unsigned int to match skb->len before the subtraction. The result is an > unsigned int. > > When this unsigned int result is then assigned to the s64 copy variable, > it is zero-extended, preserving its non-negative value. Consequently, copy > is always >= 0. Ewww. Would it be better to explicitly force the subtraction to be signed, e.g.: skb = tcp_write_queue_tail(sk); if (skb) copy = size_goal - (ssize_t)skb->len; rather than relying on getting it right with an implicit conversion to a signed int of the same size? If not, is it worth sticking in a comment to note the potential issue? David
On Tue, Jul 8, 2025 at 6:11 AM David Howells <dhowells@redhat.com> wrote: > > Jiayuan Chen <jiayuan.chen@linux.dev> wrote: > > > The types of the variables involved are: > > ''' > > copy: ssize_t (s64 on 64-bit systems) > > size_goal: int > > skb->len: unsigned int > > ''' > > > > Due to C's type promotion rules, the signed size_goal is converted to an > > unsigned int to match skb->len before the subtraction. The result is an > > unsigned int. > > > > When this unsigned int result is then assigned to the s64 copy variable, > > it is zero-extended, preserving its non-negative value. Consequently, copy > > is always >= 0. > > Ewww. > > Would it be better to explicitly force the subtraction to be signed, e.g.: > > skb = tcp_write_queue_tail(sk); > if (skb) > copy = size_goal - (ssize_t)skb->len; > > rather than relying on getting it right with an implicit conversion to a > signed int of the same size? > > If not, is it worth sticking in a comment to note the potential issue? I prefer the old construct, without a cast. TCP has a lot of 32bit operations, stcking casts or comments is not helping. Note how throwing a 'bigger type just in case' was broken...
On Sun, Jul 6, 2025 at 10:41 PM Jiayuan Chen <jiayuan.chen@linux.dev> wrote: > > Syzkaller reported a bug [1] where sk->sk_forward_alloc can overflow. > > When we send data, if an skb exists at the tail of the write queue, the > kernel will attempt to append the new data to that skb. However, the code > that checks for available space in the skb is flawed: > ''' > copy = size_goal - skb->len > ''' > > The types of the variables involved are: > ''' > copy: ssize_t (s64 on 64-bit systems) > size_goal: int > skb->len: unsigned int > ''' > > Due to C's type promotion rules, the signed size_goal is converted to an > unsigned int to match skb->len before the subtraction. The result is an > unsigned int. > > When this unsigned int result is then assigned to the s64 copy variable, > it is zero-extended, preserving its non-negative value. Consequently, copy > is always >= 0. > > Assume we are sending 2GB of data and size_goal has been adjusted to a > value smaller than skb->len. The subtraction will result in copy holding a > very large positive integer. In the subsequent logic, this large value is > used to update sk->sk_forward_alloc, which can easily cause it to overflow. > > The syzkaller reproducer uses TCP_REPAIR to reliably create this > condition. However, this can also occur in real-world scenarios. The > tcp_bound_to_half_wnd() function can also reduce size_goal to a small > value. This would cause the subsequent tcp_wmem_schedule() to set > sk->sk_forward_alloc to a value close to INT_MAX. Further memory > allocation requests would then cause sk_forward_alloc to wrap around and > become negative. > > [1]: https://syzkaller.appspot.com/bug?extid=de6565462ab540f50e47 > > Reported-by: syzbot+de6565462ab540f50e47@syzkaller.appspotmail.com > Fixes: 270a1c3de47e ("tcp: Support MSG_SPLICE_PAGES") > Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by : Eric Dumazet <edumazet@google.com>
© 2016 - 2025 Red Hat, Inc.