net/core/skbuff.c | 4 ++++ 1 file changed, 4 insertions(+)
From: Feng Yang <yangfeng@kylinos.cn>
The "MSG_MORE" flag is added to improve the transmission performance of large packets.
The improvement is more significant for TCP, while there is a slight enhancement for UDP.
When using sockmap for forwarding, the average latency for different packet sizes
after sending 10,000 packets(TCP) is as follows:
size old(us) new(us)
512 56 55
1472 58 58
1600 106 81
3000 145 105
5000 182 125
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
---
Changes in v3:
- Use Msg_MORE flag. Thanks: Eric Dumazet, David Laight.
- Link to v2: https://lore.kernel.org/all/20250627094406.100919-1-yangfeng59949@163.com/
Changes in v2:
- Delete dynamic memory allocation, thanks: Paolo Abeni,Stanislav Fomichev.
- Link to v1: https://lore.kernel.org/all/20250623084212.122284-1-yangfeng59949@163.com/
---
net/core/skbuff.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 85fc82f72d26..cd1ed96607a5 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3252,6 +3252,8 @@ static int __skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset,
kv.iov_len = slen;
memset(&msg, 0, sizeof(msg));
msg.msg_flags = MSG_DONTWAIT | flags;
+ if (slen < len)
+ msg.msg_flags |= MSG_MORE;
iov_iter_kvec(&msg.msg_iter, ITER_SOURCE, &kv, 1, slen);
ret = INDIRECT_CALL_2(sendmsg, sendmsg_locked,
@@ -3292,6 +3294,8 @@ static int __skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset,
flags,
};
+ if (slen < len)
+ msg.msg_flags |= MSG_MORE;
bvec_set_page(&bvec, skb_frag_page(frag), slen,
skb_frag_off(frag) + offset);
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1,
--
2.43.0
On 6/30/25 9:10 AM, Feng Yang wrote: > From: Feng Yang <yangfeng@kylinos.cn> > > The "MSG_MORE" flag is added to improve the transmission performance of large packets. > The improvement is more significant for TCP, while there is a slight enhancement for UDP. I'm sorry for the conflicting input, but i fear we can't do this for UDP: unconditionally changing the wire packet layout may break the application, and or at very least incur in unexpected fragmentation issues. /P
On Thu, 3 Jul 2025 10:48:40 +0200 Paolo Abeni <pabeni@redhat.com> wrote: > On 6/30/25 9:10 AM, Feng Yang wrote: > > From: Feng Yang <yangfeng@kylinos.cn> > > > > The "MSG_MORE" flag is added to improve the transmission performance of large packets. > > The improvement is more significant for TCP, while there is a slight enhancement for UDP. > > I'm sorry for the conflicting input, but i fear we can't do this for > UDP: unconditionally changing the wire packet layout may break the > application, and or at very least incur in unexpected fragmentation issues. Does the code currently work for UDP? I'd have thought the skb being sent was an entire datagram. But each semdmsg() is going to send a separate datagram. IIRC for UDP MSG_MORE indicates that the next send() will be part of the same datagram - so the actual send can't be done until the final fragment (without MSG_MORE) is sent. None of the versions is right for SCTP. The skb being sent needs to be processed as a single entity. Here MSG_MORE tells the stack that more messages follow and can be put into a single ethernet frame - but they are separate protocol messages. OTOH I've not looked at where this code is called from. In particular, when it would be called with non-linear skb. David
Thu, 3 Jul 2025 12:44:53 +0100 david.laight.linux@gmail.com wrote: > On Thu, 3 Jul 2025 10:48:40 +0200 > Paolo Abeni <pabeni@redhat.com> wrote: > > > On 6/30/25 9:10 AM, Feng Yang wrote: > > > From: Feng Yang <yangfeng@kylinos.cn> > > > > > > The "MSG_MORE" flag is added to improve the transmission performance of large packets. > > > The improvement is more significant for TCP, while there is a slight enhancement for UDP. > > > > I'm sorry for the conflicting input, but i fear we can't do this for > > UDP: unconditionally changing the wire packet layout may break the > > application, and or at very least incur in unexpected fragmentation issues. > > Does the code currently work for UDP? > > I'd have thought the skb being sent was an entire datagram. > But each semdmsg() is going to send a separate datagram. > IIRC for UDP MSG_MORE indicates that the next send() will be > part of the same datagram - so the actual send can't be done > until the final fragment (without MSG_MORE) is sent. If we add MSG_MORE, won't the entire skb be sent out all at once? Why doesn't this work for UDP? If that's not feasible, would the v2 version of the code work for UDP? Thanks. > None of the versions is right for SCTP. __skb_send_sock ...... INDIRECT_CALL_2(sendmsg, sendmsg_locked, sendmsg_unlocked, sk, &msg); ...... This sending code doesn't seem to call sctp_sendmsg. > The skb being sent needs to be processed as a single entity. > Here MSG_MORE tells the stack that more messages follow and can be put > into a single ethernet frame - but they are separate protocol messages. > > OTOH I've not looked at where this code is called from. > In particular, when it would be called with non-linear skb. > > David
On 7/4/25 11:26 AM, Feng Yang wrote: > Thu, 3 Jul 2025 12:44:53 +0100 david.laight.linux@gmail.com wrote: > >> On Thu, 3 Jul 2025 10:48:40 +0200 >> Paolo Abeni <pabeni@redhat.com> wrote: >> >>> On 6/30/25 9:10 AM, Feng Yang wrote: >>>> From: Feng Yang <yangfeng@kylinos.cn> >>>> >>>> The "MSG_MORE" flag is added to improve the transmission performance of large packets. >>>> The improvement is more significant for TCP, while there is a slight enhancement for UDP. >>> >>> I'm sorry for the conflicting input, but i fear we can't do this for >>> UDP: unconditionally changing the wire packet layout may break the >>> application, and or at very least incur in unexpected fragmentation issues. >> >> Does the code currently work for UDP? >> >> I'd have thought the skb being sent was an entire datagram. >> But each semdmsg() is going to send a separate datagram. >> IIRC for UDP MSG_MORE indicates that the next send() will be >> part of the same datagram - so the actual send can't be done >> until the final fragment (without MSG_MORE) is sent. > > If we add MSG_MORE, won't the entire skb be sent out all at once? Why doesn't this work for UDP? Without MSG_MORE N sendmsg() calls will emit on the wire N (small) packets. With MSG_MORE on the first N-1 calls, the stack will emit a single packet with larger size. UDP application may relay on packet size for protocol semantic. i.e. the application level message size could be expected to be equal to the (wire) packet size itself. Unexpectedly aggregating the packets may break the application. Also it can lead to IP fragmentation, which in turn could kill performances. > If that's not feasible, would the v2 version of the code work for UDP? My ask is to explicitly avoid MSG_MORE when the transport is UDP. /P
On Sat, 5 Jul 2025 08:16:40 +0100 David Laight <david.laight.linux@gmail.com> wrote: > On Fri, 4 Jul 2025 17:50:42 +0200 > Paolo Abeni <pabeni@redhat.com> wrote: > > > On 7/4/25 11:26 AM, Feng Yang wrote: > > > Thu, 3 Jul 2025 12:44:53 +0100 david.laight.linux@gmail.com wrote: > > > > > >> On Thu, 3 Jul 2025 10:48:40 +0200 > > >> Paolo Abeni <pabeni@redhat.com> wrote: > > >> > > >>> On 6/30/25 9:10 AM, Feng Yang wrote: > > >>>> From: Feng Yang <yangfeng@kylinos.cn> > > >>>> > > >>>> The "MSG_MORE" flag is added to improve the transmission performance of large packets. > > >>>> The improvement is more significant for TCP, while there is a slight enhancement for UDP. > > >>> > > >>> I'm sorry for the conflicting input, but i fear we can't do this for > > >>> UDP: unconditionally changing the wire packet layout may break the > > >>> application, and or at very least incur in unexpected fragmentation issues. > > >> > > >> Does the code currently work for UDP? > > >> > > >> I'd have thought the skb being sent was an entire datagram. > > >> But each semdmsg() is going to send a separate datagram. > > >> IIRC for UDP MSG_MORE indicates that the next send() will be > > >> part of the same datagram - so the actual send can't be done > > >> until the final fragment (without MSG_MORE) is sent. > > > > > > If we add MSG_MORE, won't the entire skb be sent out all at once? Why doesn't this work for UDP? > > > > Without MSG_MORE N sendmsg() calls will emit on the wire N (small) packets. > > > > With MSG_MORE on the first N-1 calls, the stack will emit a single > > packet with larger size. > > > > UDP application may relay on packet size for protocol semantic. i.e. the > > application level message size could be expected to be equal to the > > (wire) packet size itself. > > Correct, but the function is __skb_send_sock() - so you'd expect it to > send the 'message' held in the skb to the socket. > I don't think that the fact that the skb has fragments should make any > difference to what is sent. > In other words it ought to be valid for any code to 'linearize' the skb. > > David Okay, thank you for your explanations. > > > > Unexpectedly aggregating the packets may break the application. Also it > > can lead to IP fragmentation, which in turn could kill performances. > > > > > If that's not feasible, would the v2 version of the code work for UDP? > > > > My ask is to explicitly avoid MSG_MORE when the transport is UDP. > > > > /P > > So do I need to resend the v2 version again (https://lore.kernel.org/all/20250627094406.100919-1-yangfeng59949@163.com/), or is this version also inapplicable in some cases?
On Sun, Jul 6, 2025 at 11:17 PM Feng Yang <yangfeng59949@163.com> wrote: > > So do I need to resend the v2 version again (https://lore.kernel.org/all/20250627094406.100919-1-yangfeng59949@163.com/), > or is this version also inapplicable in some cases? Or a V3 perhaps, limiting MSG_MORE hint to TCP sockets where it is definitely safe. diff --git a/net/core/skbuff.c b/net/core/skbuff.c index d6420b74ea9c6a9c53a7c16634cce82a1cd1bbd3..dc440252a68e5e7bb0588ab230fbc5b7a656e220 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3235,6 +3235,7 @@ typedef int (*sendmsg_func)(struct sock *sk, struct msghdr *msg); static int __skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset, int len, sendmsg_func sendmsg, int flags) { + int more_hint = sk_is_tcp(sk) ? MSG_MORE : 0; unsigned int orig_len = len; struct sk_buff *head = skb; unsigned short fragidx; @@ -3252,7 +3253,8 @@ static int __skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset, kv.iov_len = slen; memset(&msg, 0, sizeof(msg)); msg.msg_flags = MSG_DONTWAIT | flags; - + if (slen < len) + msg.msg_flags |= more_hint; iov_iter_kvec(&msg.msg_iter, ITER_SOURCE, &kv, 1, slen); ret = INDIRECT_CALL_2(sendmsg, sendmsg_locked, sendmsg_unlocked, sk, &msg); @@ -3292,6 +3294,8 @@ static int __skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset, flags, }; + if (slen < len) + msg.msg_flags |= more_hint; bvec_set_page(&bvec, skb_frag_page(frag), slen, skb_frag_off(frag) + offset); iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1,
On Fri, 4 Jul 2025 17:50:42 +0200 Paolo Abeni <pabeni@redhat.com> wrote: > On 7/4/25 11:26 AM, Feng Yang wrote: > > Thu, 3 Jul 2025 12:44:53 +0100 david.laight.linux@gmail.com wrote: > > > >> On Thu, 3 Jul 2025 10:48:40 +0200 > >> Paolo Abeni <pabeni@redhat.com> wrote: > >> > >>> On 6/30/25 9:10 AM, Feng Yang wrote: > >>>> From: Feng Yang <yangfeng@kylinos.cn> > >>>> > >>>> The "MSG_MORE" flag is added to improve the transmission performance of large packets. > >>>> The improvement is more significant for TCP, while there is a slight enhancement for UDP. > >>> > >>> I'm sorry for the conflicting input, but i fear we can't do this for > >>> UDP: unconditionally changing the wire packet layout may break the > >>> application, and or at very least incur in unexpected fragmentation issues. > >> > >> Does the code currently work for UDP? > >> > >> I'd have thought the skb being sent was an entire datagram. > >> But each semdmsg() is going to send a separate datagram. > >> IIRC for UDP MSG_MORE indicates that the next send() will be > >> part of the same datagram - so the actual send can't be done > >> until the final fragment (without MSG_MORE) is sent. > > > > If we add MSG_MORE, won't the entire skb be sent out all at once? Why doesn't this work for UDP? > > Without MSG_MORE N sendmsg() calls will emit on the wire N (small) packets. > > With MSG_MORE on the first N-1 calls, the stack will emit a single > packet with larger size. > > UDP application may relay on packet size for protocol semantic. i.e. the > application level message size could be expected to be equal to the > (wire) packet size itself. Correct, but the function is __skb_send_sock() - so you'd expect it to send the 'message' held in the skb to the socket. I don't think that the fact that the skb has fragments should make any difference to what is sent. In other words it ought to be valid for any code to 'linearize' the skb. David > > Unexpectedly aggregating the packets may break the application. Also it > can lead to IP fragmentation, which in turn could kill performances. > > > If that's not feasible, would the v2 version of the code work for UDP? > > My ask is to explicitly avoid MSG_MORE when the transport is UDP. > > /P >
© 2016 - 2025 Red Hat, Inc.