net/mptcp/protocol.c | 4 ++++ 1 file changed, 4 insertions(+)
After the blamed commit below, the TCP sockets (and the MPTCP subflows)
can build egress packets larger then 64K. That exceeds the maximum DSS
data size, the length being misrepresent on the wire and the stream being
corrupted, as later observed on the receiver:
WARNING: CPU: 0 PID: 9696 at net/mptcp/protocol.c:705 __mptcp_move_skbs_from_subflow+0x2604/0x26e0
CPU: 0 PID: 9696 Comm: syz-executor.7 Not tainted 6.6.0-rc5-gcd8bdf563d46 #45
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
netlink: 8 bytes leftover after parsing attributes in process `syz-executor.4'.
RIP: 0010:__mptcp_move_skbs_from_subflow+0x2604/0x26e0 net/mptcp/protocol.c:705
RSP: 0018:ffffc90000006e80 EFLAGS: 00010246
RAX: ffffffff83e9f674 RBX: ffff88802f45d870 RCX: ffff888102ad0000
netlink: 8 bytes leftover after parsing attributes in process `syz-executor.4'.
RDX: 0000000080000303 RSI: 0000000000013908 RDI: 0000000000003908
RBP: ffffc90000007110 R08: ffffffff83e9e078 R09: 1ffff1100e548c8a
R10: dffffc0000000000 R11: ffffed100e548c8b R12: 0000000000013908
R13: dffffc0000000000 R14: 0000000000003908 R15: 000000000031cf29
FS: 00007f239c47e700(0000) GS:ffff88811b200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f239c45cd78 CR3: 000000006a66c006 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
PKRU: 55555554
Call Trace:
<IRQ>
mptcp_data_ready+0x263/0xac0 net/mptcp/protocol.c:819
subflow_data_ready+0x268/0x6d0 net/mptcp/subflow.c:1409
tcp_data_queue+0x21a1/0x7a60 net/ipv4/tcp_input.c:5151
tcp_rcv_established+0x950/0x1d90 net/ipv4/tcp_input.c:6098
tcp_v6_do_rcv+0x554/0x12f0 net/ipv6/tcp_ipv6.c:1483
tcp_v6_rcv+0x2e26/0x3810 net/ipv6/tcp_ipv6.c:1749
ip6_protocol_deliver_rcu+0xd6b/0x1ae0 net/ipv6/ip6_input.c:438
ip6_input+0x1c5/0x470 net/ipv6/ip6_input.c:483
ipv6_rcv+0xef/0x2c0 include/linux/netfilter.h:304
__netif_receive_skb+0x1ea/0x6a0 net/core/dev.c:5532
process_backlog+0x353/0x660 net/core/dev.c:5974
__napi_poll+0xc6/0x5a0 net/core/dev.c:6536
net_rx_action+0x6a0/0xfd0 net/core/dev.c:6603
__do_softirq+0x184/0x524 kernel/softirq.c:553
do_softirq+0xdd/0x130 kernel/softirq.c:454
Address the issue explicitly bounding the maximum GSO size to what MPTCP
actually allows.
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/450
Fixes: 7c4e983c4f3c ("net: allow gso_max_size to exceed 65536")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
this is a rough quick attempt to address the issue. A cleaner one
would be catching sk_gso_max_size in the control path and bound the
value there. Done this way to:
- allow earlier testing
- create a much smaller patch.
---
net/mptcp/protocol.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index b41f8c3e0c00..d522fabc863d 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1234,6 +1234,8 @@ static void mptcp_update_infinite_map(struct mptcp_sock *msk,
mptcp_do_fallback(ssk);
}
+#define MPTCP_MAX_GSO_SIZE (GSO_LEGACY_MAX_SIZE - (MAX_TCP_HEADER + 1))
+
static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
struct mptcp_data_frag *dfrag,
struct mptcp_sendmsg_info *info)
@@ -1260,6 +1262,8 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
return -EAGAIN;
/* compute send limit */
+ if (unlikely(ssk->sk_gso_max_size > MPTCP_MAX_GSO_SIZE))
+ ssk->sk_gso_max_size = MPTCP_MAX_GSO_SIZE;
info->mss_now = tcp_send_mss(ssk, &info->size_goal, info->flags);
copy = info->size_goal;
--
2.41.0
Hi Paolo, Mat, On 27/10/2023 16:21, Paolo Abeni wrote: > After the blamed commit below, the TCP sockets (and the MPTCP subflows) > can build egress packets larger then 64K. That exceeds the maximum DSS > data size, the length being misrepresent on the wire and the stream being > corrupted, as later observed on the receiver: Thank you for the patch and the review! Now in our tree (fix for -net) with Mat's RvB tag: New patches for t/upstream-net and t/upstream: - 6ee098165aec: mptcp: deal with large GSO size - Results: 9fa919e1e582..f1fb06db6c5f (export-net) - Results: fd17d7e00023..58f8fd527fbf (export) Tests are now in progress: https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export-net/20231107T210949 https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export/20231107T210949 Cheers, Matt
On Fri, 27 Oct 2023, Paolo Abeni wrote: > After the blamed commit below, the TCP sockets (and the MPTCP subflows) > can build egress packets larger then 64K. That exceeds the maximum DSS > data size, the length being misrepresent on the wire and the stream being > corrupted, as later observed on the receiver: > > WARNING: CPU: 0 PID: 9696 at net/mptcp/protocol.c:705 __mptcp_move_skbs_from_subflow+0x2604/0x26e0 > CPU: 0 PID: 9696 Comm: syz-executor.7 Not tainted 6.6.0-rc5-gcd8bdf563d46 #45 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 > netlink: 8 bytes leftover after parsing attributes in process `syz-executor.4'. > RIP: 0010:__mptcp_move_skbs_from_subflow+0x2604/0x26e0 net/mptcp/protocol.c:705 > RSP: 0018:ffffc90000006e80 EFLAGS: 00010246 > RAX: ffffffff83e9f674 RBX: ffff88802f45d870 RCX: ffff888102ad0000 > netlink: 8 bytes leftover after parsing attributes in process `syz-executor.4'. > RDX: 0000000080000303 RSI: 0000000000013908 RDI: 0000000000003908 > RBP: ffffc90000007110 R08: ffffffff83e9e078 R09: 1ffff1100e548c8a > R10: dffffc0000000000 R11: ffffed100e548c8b R12: 0000000000013908 > R13: dffffc0000000000 R14: 0000000000003908 R15: 000000000031cf29 > FS: 00007f239c47e700(0000) GS:ffff88811b200000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f239c45cd78 CR3: 000000006a66c006 CR4: 0000000000770ef0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 > PKRU: 55555554 > Call Trace: > <IRQ> > mptcp_data_ready+0x263/0xac0 net/mptcp/protocol.c:819 > subflow_data_ready+0x268/0x6d0 net/mptcp/subflow.c:1409 > tcp_data_queue+0x21a1/0x7a60 net/ipv4/tcp_input.c:5151 > tcp_rcv_established+0x950/0x1d90 net/ipv4/tcp_input.c:6098 > tcp_v6_do_rcv+0x554/0x12f0 net/ipv6/tcp_ipv6.c:1483 > tcp_v6_rcv+0x2e26/0x3810 net/ipv6/tcp_ipv6.c:1749 > ip6_protocol_deliver_rcu+0xd6b/0x1ae0 net/ipv6/ip6_input.c:438 > ip6_input+0x1c5/0x470 net/ipv6/ip6_input.c:483 > ipv6_rcv+0xef/0x2c0 include/linux/netfilter.h:304 > __netif_receive_skb+0x1ea/0x6a0 net/core/dev.c:5532 > process_backlog+0x353/0x660 net/core/dev.c:5974 > __napi_poll+0xc6/0x5a0 net/core/dev.c:6536 > net_rx_action+0x6a0/0xfd0 net/core/dev.c:6603 > __do_softirq+0x184/0x524 kernel/softirq.c:553 > do_softirq+0xdd/0x130 kernel/softirq.c:454 > > Address the issue explicitly bounding the maximum GSO size to what MPTCP > actually allows. > > Reported-by: Christoph Paasch <cpaasch@apple.com> > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/450 > Fixes: 7c4e983c4f3c ("net: allow gso_max_size to exceed 65536") > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > --- > this is a rough quick attempt to address the issue. A cleaner one > would be catching sk_gso_max_size in the control path and bound the > value there. Done this way to: > - allow earlier testing > - create a much smaller patch. > --- > net/mptcp/protocol.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c > index b41f8c3e0c00..d522fabc863d 100644 > --- a/net/mptcp/protocol.c > +++ b/net/mptcp/protocol.c > @@ -1234,6 +1234,8 @@ static void mptcp_update_infinite_map(struct mptcp_sock *msk, > mptcp_do_fallback(ssk); > } > > +#define MPTCP_MAX_GSO_SIZE (GSO_LEGACY_MAX_SIZE - (MAX_TCP_HEADER + 1)) > + > static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk, > struct mptcp_data_frag *dfrag, > struct mptcp_sendmsg_info *info) > @@ -1260,6 +1262,8 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk, > return -EAGAIN; > > /* compute send limit */ > + if (unlikely(ssk->sk_gso_max_size > MPTCP_MAX_GSO_SIZE)) > + ssk->sk_gso_max_size = MPTCP_MAX_GSO_SIZE; > info->mss_now = tcp_send_mss(ssk, &info->size_goal, info->flags); > copy = info->size_goal; > As we discussed in the meeting, since the limit is related to the MPTCP-specific DSS limitations (which could change) rather than anything directly related to GSO, it's helpful to have this check in the MPTCP code. Reviewed-by: Mat Martineau <martineau@kernel.org>
On Fri, 2023-10-27 at 16:21 +0200, Paolo Abeni wrote: > After the blamed commit below, the TCP sockets (and the MPTCP subflows) > can build egress packets larger then 64K. That exceeds the maximum DSS > data size, the length being misrepresent on the wire and the stream being > corrupted, as later observed on the receiver: > > WARNING: CPU: 0 PID: 9696 at net/mptcp/protocol.c:705 __mptcp_move_skbs_from_subflow+0x2604/0x26e0 > CPU: 0 PID: 9696 Comm: syz-executor.7 Not tainted 6.6.0-rc5-gcd8bdf563d46 #45 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 > netlink: 8 bytes leftover after parsing attributes in process `syz-executor.4'. > RIP: 0010:__mptcp_move_skbs_from_subflow+0x2604/0x26e0 net/mptcp/protocol.c:705 > RSP: 0018:ffffc90000006e80 EFLAGS: 00010246 > RAX: ffffffff83e9f674 RBX: ffff88802f45d870 RCX: ffff888102ad0000 > netlink: 8 bytes leftover after parsing attributes in process `syz-executor.4'. > RDX: 0000000080000303 RSI: 0000000000013908 RDI: 0000000000003908 > RBP: ffffc90000007110 R08: ffffffff83e9e078 R09: 1ffff1100e548c8a > R10: dffffc0000000000 R11: ffffed100e548c8b R12: 0000000000013908 > R13: dffffc0000000000 R14: 0000000000003908 R15: 000000000031cf29 > FS: 00007f239c47e700(0000) GS:ffff88811b200000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f239c45cd78 CR3: 000000006a66c006 CR4: 0000000000770ef0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 > PKRU: 55555554 > Call Trace: > <IRQ> > mptcp_data_ready+0x263/0xac0 net/mptcp/protocol.c:819 > subflow_data_ready+0x268/0x6d0 net/mptcp/subflow.c:1409 > tcp_data_queue+0x21a1/0x7a60 net/ipv4/tcp_input.c:5151 > tcp_rcv_established+0x950/0x1d90 net/ipv4/tcp_input.c:6098 > tcp_v6_do_rcv+0x554/0x12f0 net/ipv6/tcp_ipv6.c:1483 > tcp_v6_rcv+0x2e26/0x3810 net/ipv6/tcp_ipv6.c:1749 > ip6_protocol_deliver_rcu+0xd6b/0x1ae0 net/ipv6/ip6_input.c:438 > ip6_input+0x1c5/0x470 net/ipv6/ip6_input.c:483 > ipv6_rcv+0xef/0x2c0 include/linux/netfilter.h:304 > __netif_receive_skb+0x1ea/0x6a0 net/core/dev.c:5532 > process_backlog+0x353/0x660 net/core/dev.c:5974 > __napi_poll+0xc6/0x5a0 net/core/dev.c:6536 > net_rx_action+0x6a0/0xfd0 net/core/dev.c:6603 > __do_softirq+0x184/0x524 kernel/softirq.c:553 > do_softirq+0xdd/0x130 kernel/softirq.c:454 > > Address the issue explicitly bounding the maximum GSO size to what MPTCP > actually allows. > > Reported-by: Christoph Paasch <cpaasch@apple.com> > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/450 > Fixes: 7c4e983c4f3c ("net: allow gso_max_size to exceed 65536") > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > --- > this is a rough quick attempt to address the issue. A cleaner one > would be catching sk_gso_max_size in the control path and bound the > value there. Done this way to: > - allow earlier testing > - create a much smaller patch. To share more details, the other option will need to check and bound sk_gso_max_size every time/place where such field is updated: 1) after icsk_af_ops->syn_recv_sock 2) after icsk_af_ops->rebuild_header 3) after icsk_af_ops->mtu_reduced 4) after icsk_af_ops->queue_xmit 5) after connect() 1) would be straight forward, 2) will require separate modification for ipv4 and ipv6, 3 & 4 the same, plus additional new subflow specific implementation, and finally 5 is problematic in the fastopen case, because we would need a new hook inside the tcp code. All in all I think we are better off adding just an additional conditional in the fastpath. Cheers, Paolo
On Fri, 27 Oct 2023, Paolo Abeni wrote: > On Fri, 2023-10-27 at 16:21 +0200, Paolo Abeni wrote: >> After the blamed commit below, the TCP sockets (and the MPTCP subflows) >> can build egress packets larger then 64K. That exceeds the maximum DSS >> data size, the length being misrepresent on the wire and the stream being >> corrupted, as later observed on the receiver: >> >> WARNING: CPU: 0 PID: 9696 at net/mptcp/protocol.c:705 __mptcp_move_skbs_from_subflow+0x2604/0x26e0 >> CPU: 0 PID: 9696 Comm: syz-executor.7 Not tainted 6.6.0-rc5-gcd8bdf563d46 #45 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 >> netlink: 8 bytes leftover after parsing attributes in process `syz-executor.4'. >> RIP: 0010:__mptcp_move_skbs_from_subflow+0x2604/0x26e0 net/mptcp/protocol.c:705 >> RSP: 0018:ffffc90000006e80 EFLAGS: 00010246 >> RAX: ffffffff83e9f674 RBX: ffff88802f45d870 RCX: ffff888102ad0000 >> netlink: 8 bytes leftover after parsing attributes in process `syz-executor.4'. >> RDX: 0000000080000303 RSI: 0000000000013908 RDI: 0000000000003908 >> RBP: ffffc90000007110 R08: ffffffff83e9e078 R09: 1ffff1100e548c8a >> R10: dffffc0000000000 R11: ffffed100e548c8b R12: 0000000000013908 >> R13: dffffc0000000000 R14: 0000000000003908 R15: 000000000031cf29 >> FS: 00007f239c47e700(0000) GS:ffff88811b200000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 00007f239c45cd78 CR3: 000000006a66c006 CR4: 0000000000770ef0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 >> PKRU: 55555554 >> Call Trace: >> <IRQ> >> mptcp_data_ready+0x263/0xac0 net/mptcp/protocol.c:819 >> subflow_data_ready+0x268/0x6d0 net/mptcp/subflow.c:1409 >> tcp_data_queue+0x21a1/0x7a60 net/ipv4/tcp_input.c:5151 >> tcp_rcv_established+0x950/0x1d90 net/ipv4/tcp_input.c:6098 >> tcp_v6_do_rcv+0x554/0x12f0 net/ipv6/tcp_ipv6.c:1483 >> tcp_v6_rcv+0x2e26/0x3810 net/ipv6/tcp_ipv6.c:1749 >> ip6_protocol_deliver_rcu+0xd6b/0x1ae0 net/ipv6/ip6_input.c:438 >> ip6_input+0x1c5/0x470 net/ipv6/ip6_input.c:483 >> ipv6_rcv+0xef/0x2c0 include/linux/netfilter.h:304 >> __netif_receive_skb+0x1ea/0x6a0 net/core/dev.c:5532 >> process_backlog+0x353/0x660 net/core/dev.c:5974 >> __napi_poll+0xc6/0x5a0 net/core/dev.c:6536 >> net_rx_action+0x6a0/0xfd0 net/core/dev.c:6603 >> __do_softirq+0x184/0x524 kernel/softirq.c:553 >> do_softirq+0xdd/0x130 kernel/softirq.c:454 >> >> Address the issue explicitly bounding the maximum GSO size to what MPTCP >> actually allows. >> >> Reported-by: Christoph Paasch <cpaasch@apple.com> >> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/450 >> Fixes: 7c4e983c4f3c ("net: allow gso_max_size to exceed 65536") >> Signed-off-by: Paolo Abeni <pabeni@redhat.com> >> --- >> this is a rough quick attempt to address the issue. A cleaner one >> would be catching sk_gso_max_size in the control path and bound the >> value there. Done this way to: >> - allow earlier testing >> - create a much smaller patch. > > To share more details, the other option will need to check and bound > sk_gso_max_size every time/place where such field is updated: > > 1) after icsk_af_ops->syn_recv_sock > 2) after icsk_af_ops->rebuild_header > 3) after icsk_af_ops->mtu_reduced > 4) after icsk_af_ops->queue_xmit > 5) after connect() > > 1) would be straight forward, 2) will require separate modification for > ipv4 and ipv6, 3 & 4 the same, plus additional new subflow specific > implementation, and finally 5 is problematic in the fastopen case, > because we would need a new hook inside the tcp code. > > All in all I think we are better off adding just an additional > conditional in the fastpath. Hi Paolo - I only see one place where sk_gso_max_size is updated, in sk_setup_caps(): sk->sk_gso_max_size = sk_dst_gso_max_size(sk, dst); Looks like your list above is the call sites for sk_setup_caps()? Seems like an earlier check and smaller patch to instead update the logic in sk_dst_gso_max_size(): if (max_size > GSO_LEGACY_MAX_SIZE && (!sk_is_tcp(sk) || sk_is_mptcp(sk))) max_size = GSO_LEGACY_MAX_SIZE; Fortunately by the time the sk_is_mptcp() is evaluated it has already been checked that sk is a struct tcp_sock *. Makes me wonder if sk_is_mptcp() should accept any sock pointer, I'll add a github issue for that. - Mat
© 2016 - 2024 Red Hat, Inc.