[PATCH v3] skbuff: Add MSG_MORE flag to optimize large packet transmission

Feng Yang posted 1 patch 3 months, 1 week ago
net/core/skbuff.c | 4 ++++
1 file changed, 4 insertions(+)
[PATCH v3] skbuff: Add MSG_MORE flag to optimize large packet transmission
Posted by Feng Yang 3 months, 1 week ago
From: Feng Yang <yangfeng@kylinos.cn>

The "MSG_MORE" flag is added to improve the transmission performance of large packets.
The improvement is more significant for TCP, while there is a slight enhancement for UDP.

When using sockmap for forwarding, the average latency for different packet sizes
after sending 10,000 packets(TCP) is as follows:
size    old(us)         new(us)
512     56              55
1472    58              58
1600    106             81
3000    145             105
5000    182             125

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
---
Changes in v3:
- Use Msg_MORE flag. Thanks: Eric Dumazet, David Laight.
- Link to v2: https://lore.kernel.org/all/20250627094406.100919-1-yangfeng59949@163.com/

Changes in v2:
- Delete dynamic memory allocation, thanks: Paolo Abeni,Stanislav Fomichev.
- Link to v1: https://lore.kernel.org/all/20250623084212.122284-1-yangfeng59949@163.com/
---
 net/core/skbuff.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 85fc82f72d26..cd1ed96607a5 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3252,6 +3252,8 @@ static int __skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset,
 		kv.iov_len = slen;
 		memset(&msg, 0, sizeof(msg));
 		msg.msg_flags = MSG_DONTWAIT | flags;
+		if (slen < len)
+			msg.msg_flags |= MSG_MORE;
 
 		iov_iter_kvec(&msg.msg_iter, ITER_SOURCE, &kv, 1, slen);
 		ret = INDIRECT_CALL_2(sendmsg, sendmsg_locked,
@@ -3292,6 +3294,8 @@ static int __skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset,
 					     flags,
 			};
 
+			if (slen < len)
+				msg.msg_flags |= MSG_MORE;
 			bvec_set_page(&bvec, skb_frag_page(frag), slen,
 				      skb_frag_off(frag) + offset);
 			iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1,
-- 
2.43.0
Re: [PATCH v3] skbuff: Add MSG_MORE flag to optimize large packet transmission
Posted by Paolo Abeni 3 months ago
On 6/30/25 9:10 AM, Feng Yang wrote:
> From: Feng Yang <yangfeng@kylinos.cn>
> 
> The "MSG_MORE" flag is added to improve the transmission performance of large packets.
> The improvement is more significant for TCP, while there is a slight enhancement for UDP.

I'm sorry for the conflicting input, but i fear we can't do this for
UDP: unconditionally changing the wire packet layout may break the
application, and or at very least incur in unexpected fragmentation issues.

/P
Re: [PATCH v3] skbuff: Add MSG_MORE flag to optimize large packet transmission
Posted by David Laight 3 months ago
On Thu, 3 Jul 2025 10:48:40 +0200
Paolo Abeni <pabeni@redhat.com> wrote:

> On 6/30/25 9:10 AM, Feng Yang wrote:
> > From: Feng Yang <yangfeng@kylinos.cn>
> > 
> > The "MSG_MORE" flag is added to improve the transmission performance of large packets.
> > The improvement is more significant for TCP, while there is a slight enhancement for UDP.  
> 
> I'm sorry for the conflicting input, but i fear we can't do this for
> UDP: unconditionally changing the wire packet layout may break the
> application, and or at very least incur in unexpected fragmentation issues.

Does the code currently work for UDP?

I'd have thought the skb being sent was an entire datagram.
But each semdmsg() is going to send a separate datagram.
IIRC for UDP MSG_MORE indicates that the next send() will be
part of the same datagram - so the actual send can't be done
until the final fragment (without MSG_MORE) is sent.

None of the versions is right for SCTP.
The skb being sent needs to be processed as a single entity.
Here MSG_MORE tells the stack that more messages follow and can be put
into a single ethernet frame - but they are separate protocol messages.

OTOH I've not looked at where this code is called from.
In particular, when it would be called with non-linear skb.

	David
Re: [PATCH v3] skbuff: Add MSG_MORE flag to optimize large packet transmission
Posted by Feng Yang 3 months ago
Thu, 3 Jul 2025 12:44:53 +0100 david.laight.linux@gmail.com wrote:

> On Thu, 3 Jul 2025 10:48:40 +0200
> Paolo Abeni <pabeni@redhat.com> wrote:
> 
> > On 6/30/25 9:10 AM, Feng Yang wrote:
> > > From: Feng Yang <yangfeng@kylinos.cn>
> > > 
> > > The "MSG_MORE" flag is added to improve the transmission performance of large packets.
> > > The improvement is more significant for TCP, while there is a slight enhancement for UDP.  
> > 
> > I'm sorry for the conflicting input, but i fear we can't do this for
> > UDP: unconditionally changing the wire packet layout may break the
> > application, and or at very least incur in unexpected fragmentation issues.
> 
> Does the code currently work for UDP?
> 
> I'd have thought the skb being sent was an entire datagram.
> But each semdmsg() is going to send a separate datagram.
> IIRC for UDP MSG_MORE indicates that the next send() will be
> part of the same datagram - so the actual send can't be done
> until the final fragment (without MSG_MORE) is sent.

If we add MSG_MORE, won't the entire skb be sent out all at once? Why doesn't this work for UDP?
If that's not feasible, would the v2 version of the code work for UDP?
Thanks.

> None of the versions is right for SCTP.
__skb_send_sock
	......
	INDIRECT_CALL_2(sendmsg, sendmsg_locked, sendmsg_unlocked, sk, &msg);
	......
This sending code doesn't seem to call sctp_sendmsg.

> The skb being sent needs to be processed as a single entity.
> Here MSG_MORE tells the stack that more messages follow and can be put
> into a single ethernet frame - but they are separate protocol messages.
> 
> OTOH I've not looked at where this code is called from.
> In particular, when it would be called with non-linear skb.
> 
> 	David
Re: [PATCH v3] skbuff: Add MSG_MORE flag to optimize large packet transmission
Posted by Paolo Abeni 3 months ago
On 7/4/25 11:26 AM, Feng Yang wrote:
> Thu, 3 Jul 2025 12:44:53 +0100 david.laight.linux@gmail.com wrote:
> 
>> On Thu, 3 Jul 2025 10:48:40 +0200
>> Paolo Abeni <pabeni@redhat.com> wrote:
>>
>>> On 6/30/25 9:10 AM, Feng Yang wrote:
>>>> From: Feng Yang <yangfeng@kylinos.cn>
>>>>
>>>> The "MSG_MORE" flag is added to improve the transmission performance of large packets.
>>>> The improvement is more significant for TCP, while there is a slight enhancement for UDP.  
>>>
>>> I'm sorry for the conflicting input, but i fear we can't do this for
>>> UDP: unconditionally changing the wire packet layout may break the
>>> application, and or at very least incur in unexpected fragmentation issues.
>>
>> Does the code currently work for UDP?
>>
>> I'd have thought the skb being sent was an entire datagram.
>> But each semdmsg() is going to send a separate datagram.
>> IIRC for UDP MSG_MORE indicates that the next send() will be
>> part of the same datagram - so the actual send can't be done
>> until the final fragment (without MSG_MORE) is sent.
> 
> If we add MSG_MORE, won't the entire skb be sent out all at once? Why doesn't this work for UDP?

Without MSG_MORE N sendmsg() calls will emit on the wire N (small) packets.

With MSG_MORE on the first N-1 calls, the stack will emit a single
packet with larger size.

UDP application may relay on packet size for protocol semantic. i.e. the
application level message size could be expected to be equal to the
(wire) packet size itself.

Unexpectedly aggregating the packets may break the application. Also it
can lead to IP fragmentation, which in turn could kill performances.

> If that's not feasible, would the v2 version of the code work for UDP?

My ask is to explicitly avoid MSG_MORE when the transport is UDP.

/P
Re: [PATCH v3] skbuff: Add MSG_MORE flag to optimize large packet transmission
Posted by Feng Yang 3 months ago
On Sat, 5 Jul 2025 08:16:40 +0100 David Laight <david.laight.linux@gmail.com> wrote:

> On Fri, 4 Jul 2025 17:50:42 +0200
> Paolo Abeni <pabeni@redhat.com> wrote:
> 
> > On 7/4/25 11:26 AM, Feng Yang wrote:
> > > Thu, 3 Jul 2025 12:44:53 +0100 david.laight.linux@gmail.com wrote:
> > >   
> > >> On Thu, 3 Jul 2025 10:48:40 +0200
> > >> Paolo Abeni <pabeni@redhat.com> wrote:
> > >>  
> > >>> On 6/30/25 9:10 AM, Feng Yang wrote:  
> > >>>> From: Feng Yang <yangfeng@kylinos.cn>
> > >>>>
> > >>>> The "MSG_MORE" flag is added to improve the transmission performance of large packets.
> > >>>> The improvement is more significant for TCP, while there is a slight enhancement for UDP.    
> > >>>
> > >>> I'm sorry for the conflicting input, but i fear we can't do this for
> > >>> UDP: unconditionally changing the wire packet layout may break the
> > >>> application, and or at very least incur in unexpected fragmentation issues.  
> > >>
> > >> Does the code currently work for UDP?
> > >>
> > >> I'd have thought the skb being sent was an entire datagram.
> > >> But each semdmsg() is going to send a separate datagram.
> > >> IIRC for UDP MSG_MORE indicates that the next send() will be
> > >> part of the same datagram - so the actual send can't be done
> > >> until the final fragment (without MSG_MORE) is sent.  
> > > 
> > > If we add MSG_MORE, won't the entire skb be sent out all at once? Why doesn't this work for UDP?  
> > 
> > Without MSG_MORE N sendmsg() calls will emit on the wire N (small) packets.
> > 
> > With MSG_MORE on the first N-1 calls, the stack will emit a single
> > packet with larger size.
> > 
> > UDP application may relay on packet size for protocol semantic. i.e. the
> > application level message size could be expected to be equal to the
> > (wire) packet size itself.
> 
> Correct, but the function is __skb_send_sock() - so you'd expect it to
> send the 'message' held in the skb to the socket.
> I don't think that the fact that the skb has fragments should make any
> difference to what is sent.
> In other words it ought to be valid for any code to 'linearize' the skb.
> 
> 	David

Okay, thank you for your explanations.

> > 
> > Unexpectedly aggregating the packets may break the application. Also it
> > can lead to IP fragmentation, which in turn could kill performances.
> > 
> > > If that's not feasible, would the v2 version of the code work for UDP?  
> > 
> > My ask is to explicitly avoid MSG_MORE when the transport is UDP.
> > 
> > /P
> > 

So do I need to resend the v2 version again (https://lore.kernel.org/all/20250627094406.100919-1-yangfeng59949@163.com/), 
or is this version also inapplicable in some cases?
Re: [PATCH v3] skbuff: Add MSG_MORE flag to optimize large packet transmission
Posted by Eric Dumazet 3 months ago
On Sun, Jul 6, 2025 at 11:17 PM Feng Yang <yangfeng59949@163.com> wrote:

>
> So do I need to resend the v2 version again (https://lore.kernel.org/all/20250627094406.100919-1-yangfeng59949@163.com/),
> or is this version also inapplicable in some cases?

Or a V3 perhaps, limiting MSG_MORE hint to TCP sockets where it is
definitely safe.

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d6420b74ea9c6a9c53a7c16634cce82a1cd1bbd3..dc440252a68e5e7bb0588ab230fbc5b7a656e220
100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3235,6 +3235,7 @@ typedef int (*sendmsg_func)(struct sock *sk,
struct msghdr *msg);
 static int __skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset,
                           int len, sendmsg_func sendmsg, int flags)
 {
+       int more_hint = sk_is_tcp(sk) ? MSG_MORE : 0;
        unsigned int orig_len = len;
        struct sk_buff *head = skb;
        unsigned short fragidx;
@@ -3252,7 +3253,8 @@ static int __skb_send_sock(struct sock *sk,
struct sk_buff *skb, int offset,
                kv.iov_len = slen;
                memset(&msg, 0, sizeof(msg));
                msg.msg_flags = MSG_DONTWAIT | flags;
-
+               if (slen < len)
+                       msg.msg_flags |= more_hint;
                iov_iter_kvec(&msg.msg_iter, ITER_SOURCE, &kv, 1, slen);
                ret = INDIRECT_CALL_2(sendmsg, sendmsg_locked,
                                      sendmsg_unlocked, sk, &msg);
@@ -3292,6 +3294,8 @@ static int __skb_send_sock(struct sock *sk,
struct sk_buff *skb, int offset,
                                             flags,
                        };

+                       if (slen < len)
+                               msg.msg_flags |= more_hint;
                        bvec_set_page(&bvec, skb_frag_page(frag), slen,
                                      skb_frag_off(frag) + offset);
                        iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1,
Re: [PATCH v3] skbuff: Add MSG_MORE flag to optimize large packet transmission
Posted by David Laight 3 months ago
On Fri, 4 Jul 2025 17:50:42 +0200
Paolo Abeni <pabeni@redhat.com> wrote:

> On 7/4/25 11:26 AM, Feng Yang wrote:
> > Thu, 3 Jul 2025 12:44:53 +0100 david.laight.linux@gmail.com wrote:
> >   
> >> On Thu, 3 Jul 2025 10:48:40 +0200
> >> Paolo Abeni <pabeni@redhat.com> wrote:
> >>  
> >>> On 6/30/25 9:10 AM, Feng Yang wrote:  
> >>>> From: Feng Yang <yangfeng@kylinos.cn>
> >>>>
> >>>> The "MSG_MORE" flag is added to improve the transmission performance of large packets.
> >>>> The improvement is more significant for TCP, while there is a slight enhancement for UDP.    
> >>>
> >>> I'm sorry for the conflicting input, but i fear we can't do this for
> >>> UDP: unconditionally changing the wire packet layout may break the
> >>> application, and or at very least incur in unexpected fragmentation issues.  
> >>
> >> Does the code currently work for UDP?
> >>
> >> I'd have thought the skb being sent was an entire datagram.
> >> But each semdmsg() is going to send a separate datagram.
> >> IIRC for UDP MSG_MORE indicates that the next send() will be
> >> part of the same datagram - so the actual send can't be done
> >> until the final fragment (without MSG_MORE) is sent.  
> > 
> > If we add MSG_MORE, won't the entire skb be sent out all at once? Why doesn't this work for UDP?  
> 
> Without MSG_MORE N sendmsg() calls will emit on the wire N (small) packets.
> 
> With MSG_MORE on the first N-1 calls, the stack will emit a single
> packet with larger size.
> 
> UDP application may relay on packet size for protocol semantic. i.e. the
> application level message size could be expected to be equal to the
> (wire) packet size itself.

Correct, but the function is __skb_send_sock() - so you'd expect it to
send the 'message' held in the skb to the socket.
I don't think that the fact that the skb has fragments should make any
difference to what is sent.
In other words it ought to be valid for any code to 'linearize' the skb.

	David

> 
> Unexpectedly aggregating the packets may break the application. Also it
> can lead to IP fragmentation, which in turn could kill performances.
> 
> > If that's not feasible, would the v2 version of the code work for UDP?  
> 
> My ask is to explicitly avoid MSG_MORE when the transport is UDP.
> 
> /P
>