net/ipv4/tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
The calculation for the remaining space, 'copy = size_goal - skb->len',
was prone to an integer promotion bug that prevented copy from ever being
negative.
The variable types involved are:
copy: ssize_t (long)
size_goal: int
skb->len: unsigned int
Due to C's type promotion rules, the signed size_goal is converted to an
unsigned int to match skb->len before the subtraction. The result is an
unsigned int.
When this unsigned int result is then assigned to the s64 copy variable,
it is zero-extended, preserving its non-negative value. Consequently,
copy is always >= 0.
The intended logic is that a negative copy value indicates that the tail
skb lacks sufficient space for appending new data, which should trigger
the allocation of a new skb. Because of this bug, the condition copy <= 0
was never met, causing the code to always append to the tail skb.
Fixes: 270a1c3de47e ("tcp: Support MSG_SPLICE_PAGES")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
net/ipv4/tcp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 8a3c99246d2e..ed942cd17351 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1180,7 +1180,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
skb = tcp_write_queue_tail(sk);
if (skb)
- copy = size_goal - skb->len;
+ copy = size_goal - (ssize_t)skb->len;
trace_tcp_sendmsg_locked(sk, msg, skb, size_goal);
--
2.47.1
July 2, 2025 at 19:00, "Jiayuan Chen" <jiayuan.chen@linux.dev> wrote:
>
> The calculation for the remaining space, 'copy = size_goal - skb->len',
>
> was prone to an integer promotion bug that prevented copy from ever being
>
> negative.
>
> The variable types involved are:
>
> copy: ssize_t (long)
>
> size_goal: int
>
> skb->len: unsigned int
>
> Due to C's type promotion rules, the signed size_goal is converted to an
>
> unsigned int to match skb->len before the subtraction. The result is an
>
> unsigned int.
>
> When this unsigned int result is then assigned to the s64 copy variable,
>
> it is zero-extended, preserving its non-negative value. Consequently,
>
> copy is always >= 0.
>
To better explain this problem, consider the following example:
'''
#include <sys/types.h>
#include <stdio.h>
int size_goal = 536;
unsigned int skblen = 1131;
void main() {
ssize_t copy = 0;
copy = size_goal - skblen;
printf("wrong: %zd\n", copy);
copy = size_goal - (ssize_t)skblen;
printf("correct: %zd\n", copy);
return;
}
'''
Output:
'''
wrong: 4294966701
correct: -595
'''
On Wed, Jul 2, 2025 at 6:42 AM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
>
> July 2, 2025 at 19:00, "Jiayuan Chen" <jiayuan.chen@linux.dev> wrote:
>
>
> >
> > The calculation for the remaining space, 'copy = size_goal - skb->len',
> >
> > was prone to an integer promotion bug that prevented copy from ever being
> >
> > negative.
> >
> > The variable types involved are:
> >
> > copy: ssize_t (long)
> >
> > size_goal: int
> >
> > skb->len: unsigned int
> >
> > Due to C's type promotion rules, the signed size_goal is converted to an
> >
> > unsigned int to match skb->len before the subtraction. The result is an
> >
> > unsigned int.
> >
> > When this unsigned int result is then assigned to the s64 copy variable,
> >
> > it is zero-extended, preserving its non-negative value. Consequently,
> >
> > copy is always >= 0.
> >
>
> To better explain this problem, consider the following example:
> '''
> #include <sys/types.h>
> #include <stdio.h>
> int size_goal = 536;
> unsigned int skblen = 1131;
>
> void main() {
> ssize_t copy = 0;
> copy = size_goal - skblen;
> printf("wrong: %zd\n", copy);
>
> copy = size_goal - (ssize_t)skblen;
> printf("correct: %zd\n", copy);
> return;
> }
> '''
> Output:
> '''
> wrong: 4294966701
> correct: -595
> '''
Can you explain how one skb could have more bytes (skb->len) than size_goal ?
If we are under this condition, we already have a prior bug ?
Please describe how you caught this issue.
On Wed, Jul 2, 2025 at 6:59 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, Jul 2, 2025 at 6:42 AM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
> >
> > July 2, 2025 at 19:00, "Jiayuan Chen" <jiayuan.chen@linux.dev> wrote:
> >
> >
> > >
> > > The calculation for the remaining space, 'copy = size_goal - skb->len',
> > >
> > > was prone to an integer promotion bug that prevented copy from ever being
> > >
> > > negative.
> > >
> > > The variable types involved are:
> > >
> > > copy: ssize_t (long)
> > >
> > > size_goal: int
> > >
> > > skb->len: unsigned int
> > >
> > > Due to C's type promotion rules, the signed size_goal is converted to an
> > >
> > > unsigned int to match skb->len before the subtraction. The result is an
> > >
> > > unsigned int.
> > >
> > > When this unsigned int result is then assigned to the s64 copy variable,
> > >
> > > it is zero-extended, preserving its non-negative value. Consequently,
> > >
> > > copy is always >= 0.
> > >
> >
> > To better explain this problem, consider the following example:
> > '''
> > #include <sys/types.h>
> > #include <stdio.h>
> > int size_goal = 536;
> > unsigned int skblen = 1131;
> >
> > void main() {
> > ssize_t copy = 0;
> > copy = size_goal - skblen;
> > printf("wrong: %zd\n", copy);
> >
> > copy = size_goal - (ssize_t)skblen;
> > printf("correct: %zd\n", copy);
> > return;
> > }
> > '''
> > Output:
> > '''
> > wrong: 4294966701
> > correct: -595
> > '''
>
> Can you explain how one skb could have more bytes (skb->len) than size_goal ?
>
> If we are under this condition, we already have a prior bug ?
>
> Please describe how you caught this issue.
Also, not sure why copy variable had to be changed from "int" to "ssize_t"
A nicer patch (without a cast) would be to make it an "int" again/
July 2, 2025 at 22:02, "Eric Dumazet" <edumazet@google.com> wrote:
>
> On Wed, Jul 2, 2025 at 6:59 AM Eric Dumazet <edumazet@google.com> wrote:
>
> >
> > On Wed, Jul 2, 2025 at 6:42 AM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
> >
> > July 2, 2025 at 19:00, "Jiayuan Chen" <jiayuan.chen@linux.dev> wrote:
> >
> > >
> >
> > > The calculation for the remaining space, 'copy = size_goal - skb->len',
> >
> > >
> >
> > > was prone to an integer promotion bug that prevented copy from ever being
> >
> > >
> >
> > > negative.
> >
> > >
> >
> > > The variable types involved are:
> >
> > >
> >
> > > copy: ssize_t (long)
> >
> > >
> >
> > > size_goal: int
> >
> > >
> >
> > > skb->len: unsigned int
> >
> > >
> >
> > > Due to C's type promotion rules, the signed size_goal is converted to an
> >
> > >
> >
> > > unsigned int to match skb->len before the subtraction. The result is an
> >
> > >
> >
> > > unsigned int.
> >
> > >
> >
> > > When this unsigned int result is then assigned to the s64 copy variable,
> >
> > >
> >
> > > it is zero-extended, preserving its non-negative value. Consequently,
> >
> > >
> >
> > > copy is always >= 0.
> >
> > >
> >
> > To better explain this problem, consider the following example:
> >
> > '''
> >
> > #include <sys/types.h>
> >
> > #include <stdio.h>
> >
> > int size_goal = 536;
> >
> > unsigned int skblen = 1131;
> >
> > void main() {
> >
> > ssize_t copy = 0;
> >
> > copy = size_goal - skblen;
> >
> > printf("wrong: %zd\n", copy);
> >
> > copy = size_goal - (ssize_t)skblen;
> >
> > printf("correct: %zd\n", copy);
> >
> > return;
> >
> > }
> >
> > '''
> >
> > Output:
> >
> > '''
> >
> > wrong: 4294966701
> >
> > correct: -595
> >
> > '''
> >
> > Can you explain how one skb could have more bytes (skb->len) than size_goal ?
> >
> > If we are under this condition, we already have a prior bug ?
> >
> > Please describe how you caught this issue.
> >
>
> Also, not sure why copy variable had to be changed from "int" to "ssize_t"
>
> A nicer patch (without a cast) would be to make it an "int" again/
>
I encountered this issue because I had tcp_repair enabled, which uses
tcp_init_tso_segs to reset the MSS.
However, it seems that tcp_bound_to_half_wnd also dynamically adjusts
the value to be smaller than the current size_goal.
Looking at the commit history, it's indeed unnecessary to define the
copy variable as type ssize_t.
On Wed, Jul 2, 2025 at 8:28 AM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
>
> July 2, 2025 at 22:02, "Eric Dumazet" <edumazet@google.com> wrote:
>
>
>
> >
> > On Wed, Jul 2, 2025 at 6:59 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > >
> > > On Wed, Jul 2, 2025 at 6:42 AM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
> > >
> > > July 2, 2025 at 19:00, "Jiayuan Chen" <jiayuan.chen@linux.dev> wrote:
> > >
> > > >
> > >
> > > > The calculation for the remaining space, 'copy = size_goal - skb->len',
> > >
> > > >
> > >
> > > > was prone to an integer promotion bug that prevented copy from ever being
> > >
> > > >
> > >
> > > > negative.
> > >
> > > >
> > >
> > > > The variable types involved are:
> > >
> > > >
> > >
> > > > copy: ssize_t (long)
> > >
> > > >
> > >
> > > > size_goal: int
> > >
> > > >
> > >
> > > > skb->len: unsigned int
> > >
> > > >
> > >
> > > > Due to C's type promotion rules, the signed size_goal is converted to an
> > >
> > > >
> > >
> > > > unsigned int to match skb->len before the subtraction. The result is an
> > >
> > > >
> > >
> > > > unsigned int.
> > >
> > > >
> > >
> > > > When this unsigned int result is then assigned to the s64 copy variable,
> > >
> > > >
> > >
> > > > it is zero-extended, preserving its non-negative value. Consequently,
> > >
> > > >
> > >
> > > > copy is always >= 0.
> > >
> > > >
> > >
> > > To better explain this problem, consider the following example:
> > >
> > > '''
> > >
> > > #include <sys/types.h>
> > >
> > > #include <stdio.h>
> > >
> > > int size_goal = 536;
> > >
> > > unsigned int skblen = 1131;
> > >
> > > void main() {
> > >
> > > ssize_t copy = 0;
> > >
> > > copy = size_goal - skblen;
> > >
> > > printf("wrong: %zd\n", copy);
> > >
> > > copy = size_goal - (ssize_t)skblen;
> > >
> > > printf("correct: %zd\n", copy);
> > >
> > > return;
> > >
> > > }
> > >
> > > '''
> > >
> > > Output:
> > >
> > > '''
> > >
> > > wrong: 4294966701
> > >
> > > correct: -595
> > >
> > > '''
> > >
> > > Can you explain how one skb could have more bytes (skb->len) than size_goal ?
> > >
> > > If we are under this condition, we already have a prior bug ?
> > >
> > > Please describe how you caught this issue.
> > >
> >
> > Also, not sure why copy variable had to be changed from "int" to "ssize_t"
> >
> > A nicer patch (without a cast) would be to make it an "int" again/
> >
>
> I encountered this issue because I had tcp_repair enabled, which uses
> tcp_init_tso_segs to reset the MSS.
> However, it seems that tcp_bound_to_half_wnd also dynamically adjusts
> the value to be smaller than the current size_goal.
>
Okay, and what was the end result ?
An skb has a limited amount of bytes that can be put into it
(MAX_SKB_FRAGS * 32K) , and I can't see what are the effects of having
an
"not optimally sized skb in socket write queue".
BTW if you have a tcp_repair test, I would love having it in the
tools/testing/selftests/net :)
Thanks.
> Looking at the commit history, it's indeed unnecessary to define the
> copy variable as type ssize_t.
2025/7/2 23:34, "Eric Dumazet" <edumazet@google.com> 写到:
>
> On Wed, Jul 2, 2025 at 8:28 AM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
>
> >
> > July 2, 2025 at 22:02, "Eric Dumazet" <edumazet@google.com> wrote:
> >
> > On Wed, Jul 2, 2025 at 6:59 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > >
> >
> > > On Wed, Jul 2, 2025 at 6:42 AM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
> >
> > >
> >
> > > July 2, 2025 at 19:00, "Jiayuan Chen" <jiayuan.chen@linux.dev> wrote:
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > The calculation for the remaining space, 'copy = size_goal - skb->len',
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > was prone to an integer promotion bug that prevented copy from ever being
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > negative.
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > The variable types involved are:
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > copy: ssize_t (long)
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > size_goal: int
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > skb->len: unsigned int
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > Due to C's type promotion rules, the signed size_goal is converted to an
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > unsigned int to match skb->len before the subtraction. The result is an
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > unsigned int.
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > When this unsigned int result is then assigned to the s64 copy variable,
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > it is zero-extended, preserving its non-negative value. Consequently,
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > copy is always >= 0.
> >
> > >
> >
> > > >
> >
> > >
> >
> > > To better explain this problem, consider the following example:
> >
> > >
> >
> > > '''
> >
> > >
> >
> > > #include <sys/types.h>
> >
> > >
> >
> > > #include <stdio.h>
> >
> > >
> >
> > > int size_goal = 536;
> >
> > >
> >
> > > unsigned int skblen = 1131;
> >
> > >
> >
> > > void main() {
> >
> > >
> >
> > > ssize_t copy = 0;
> >
> > >
> >
> > > copy = size_goal - skblen;
> >
> > >
> >
> > > printf("wrong: %zd\n", copy);
> >
> > >
> >
> > > copy = size_goal - (ssize_t)skblen;
> >
> > >
> >
> > > printf("correct: %zd\n", copy);
> >
> > >
> >
> > > return;
> >
> > >
> >
> > > }
> >
> > >
> >
> > > '''
> >
> > >
> >
> > > Output:
> >
> > >
> >
> > > '''
> >
> > >
> >
> > > wrong: 4294966701
> >
> > >
> >
> > > correct: -595
> >
> > >
> >
> > > '''
> >
> > >
> >
> > > Can you explain how one skb could have more bytes (skb->len) than size_goal ?
> >
> > >
> >
> > > If we are under this condition, we already have a prior bug ?
> >
> > >
> >
> > > Please describe how you caught this issue.
> >
> > >
> >
> > Also, not sure why copy variable had to be changed from "int" to "ssize_t"
> >
> > A nicer patch (without a cast) would be to make it an "int" again/
> >
> > I encountered this issue because I had tcp_repair enabled, which uses
> >
> > tcp_init_tso_segs to reset the MSS.
> >
> > However, it seems that tcp_bound_to_half_wnd also dynamically adjusts
> >
> > the value to be smaller than the current size_goal.
> >
>
> Okay, and what was the end result ?
>
> An skb has a limited amount of bytes that can be put into it
>
> (MAX_SKB_FRAGS * 32K) , and I can't see what are the effects of having
>
Hi Eric,
I'm working with a reproducer generated by syzkaller [1], and its core
logic is roughly as follows:
'''
setsockopt(fd, TCP_REPAIR, 1)
connect(fd);
setsockopt(fd, TCP_REPAIR, -1)
send(fd, small);
sendmmsg(fd, buffer_2G);
'''
First, because TCP_REPAIR is enabled, the send() operation leaves the skb
at the tail of the write_queue. Subsequently, sendmmsg is called to send
2GB of data.
Due to TCP_REPAIR, the size_goal is reduced, which can cause the copy
variable to become negative. However, because of integer promotion bug
mentioned in the previous email, this negative value is misinterpreted as
a large positive number. Ultimately, copy becomes a huge value, approaching
the int32 limit. This, in turn, causes sk->sk_forward_alloc to overflow,
which is the exact issue reported by syzkaller.
On a related note, even without using TCP_REPAIR, the tcp_bound_to_half_wnd()
function can also reduce size_goal on its own. Therefore, my understanding is
that under extreme conditions, we might still encounter an overflow in
sk->sk_forward_alloc.
So, I think we have good reason to change copy to an int.
On Thu, Jul 3, 2025 at 5:03 AM Jiayuan Chen <jiayuan.chen@linux.dev> wrote: > Hi Eric, > > I'm working with a reproducer generated by syzkaller [1], and its core > logic is roughly as follows: > > ''' > setsockopt(fd, TCP_REPAIR, 1) > connect(fd); > setsockopt(fd, TCP_REPAIR, -1) > > send(fd, small); > sendmmsg(fd, buffer_2G); > ''' > > First, because TCP_REPAIR is enabled, the send() operation leaves the skb > at the tail of the write_queue. Subsequently, sendmmsg is called to send > 2GB of data. > > Due to TCP_REPAIR, the size_goal is reduced, which can cause the copy > variable to become negative. However, because of integer promotion bug > mentioned in the previous email, this negative value is misinterpreted as > a large positive number. Ultimately, copy becomes a huge value, approaching > the int32 limit. This, in turn, causes sk->sk_forward_alloc to overflow, > which is the exact issue reported by syzkaller. > > On a related note, even without using TCP_REPAIR, the tcp_bound_to_half_wnd() > function can also reduce size_goal on its own. Therefore, my understanding is > that under extreme conditions, we might still encounter an overflow in > sk->sk_forward_alloc. > > So, I think we have good reason to change copy to an int. Ok, I wish you had stated you were working on a syzbot report from the very beginning. Why hiding ? Please send a V2 of the patch.
2025/7/3 20:03, "Jiayuan Chen" <jiayuan.chen@linux.dev> 写到: > > Hi Eric, > > I'm working with a reproducer generated by syzkaller [1], and its core https://syzkaller.appspot.com/bug?extid=de6565462ab540f50e47 (sorry losting the link...)
© 2016 - 2026 Red Hat, Inc.