[PATCH mptcp-next] mptcp: preserve MSG_EOR semantics in sendmsg path

Gang Yan posted 1 patch 1 week, 6 days ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/multipath-tcp/mptcp_net-next tags/patchew/20260309025431.125943-1-gang.yan@linux.dev
net/mptcp/protocol.c | 24 ++++++++++++++++++++++--
net/mptcp/protocol.h |  4 +++-
2 files changed, 25 insertions(+), 3 deletions(-)
[PATCH mptcp-next] mptcp: preserve MSG_EOR semantics in sendmsg path
Posted by Gang Yan 1 week, 6 days ago
From: Gang Yan <yangang@kylinos.cn>

Extend MPTCP's sendmsg handling to recognize and honor the MSG_EOR flag,
which marks the end of a record for application-level message boundaries.

Data fragments tagged with MSG_EOR are explicitly marked in the
mptcp_data_frag structure and skb context to prevent unintended
coalescing with subsequent data chunks. This ensures the intent of
applications using MSG_EOR is preserved across MPTCP subflows,
maintaining consistent message segmentation behavior.

Signed-off-by: Gang Yan <yangang@kylinos.cn>
---

Notes:
      - This patch incorporates feedback and suggestions from Paolo Abeni
        and Geliang Tang, including memory alignment optimizations for the
        mptcp_data_frag struct (shrinking overhead to u8 and using bitfield
        for eor to avoid size increase) and compile-time checks with BUILD_BUG_ON.
      - Packetdrill test cases validating this feature are available at:
        https://github.com/multipath-tcp/packetdrill/pull/189/changes/d6ce92a4786704fe749bbd848ced0c047632282e

 net/mptcp/protocol.c | 24 ++++++++++++++++++++++--
 net/mptcp/protocol.h |  4 +++-
 2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 17e43aff4459..3e574c87301b 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1174,6 +1174,7 @@ mptcp_carve_data_frag(const struct mptcp_sock *msk, struct page_frag *pfrag,
 	dfrag->offset = offset + sizeof(struct mptcp_data_frag);
 	dfrag->already_sent = 0;
 	dfrag->page = pfrag->page;
+	dfrag->eor = 0;
 
 	return dfrag;
 }
@@ -1435,6 +1436,13 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
 		mptcp_update_infinite_map(msk, ssk, mpext);
 	trace_mptcp_sendmsg_frag(mpext);
 	mptcp_subflow_ctx(ssk)->rel_write_seq += copy;
+
+	/* If this is the last chunk of a dfrag with MSG_EOR set,
+	 * mark the skb to prevent coalescing with subsequent data.
+	 */
+	if (dfrag->eor && info->sent + copy >= dfrag->data_len)
+		TCP_SKB_CB(skb)->eor = 1;
+
 	return copy;
 }
 
@@ -1895,7 +1903,8 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	long timeo;
 
 	/* silently ignore everything else */
-	msg->msg_flags &= MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_FASTOPEN;
+	msg->msg_flags &= MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
+			  MSG_FASTOPEN | MSG_EOR;
 
 	lock_sock(sk);
 
@@ -2002,8 +2011,16 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 			goto do_error;
 	}
 
-	if (copied)
+	if (copied) {
+		/* Mark the last dfrag with EOR if MSG_EOR was set */
+		if (msg->msg_flags & MSG_EOR) {
+			struct mptcp_data_frag *dfrag = mptcp_pending_tail(sk);
+
+			if (dfrag)
+				dfrag->eor = 1;
+		}
 		__mptcp_push_pending(sk, msg->msg_flags);
+	}
 
 out:
 	release_sock(sk);
@@ -4621,6 +4638,9 @@ void __init mptcp_proto_init(void)
 	inet_register_protosw(&mptcp_protosw);
 
 	BUILD_BUG_ON(sizeof(struct mptcp_skb_cb) > sizeof_field(struct sk_buff, cb));
+	/* Compile-time check: ensure 'overhead' (alignment + struct size) fits in u8 */
+	BUILD_BUG_ON(ALIGN(1, sizeof(long)) + sizeof(struct mptcp_data_frag) > U8_MAX);
+
 }
 
 #if IS_ENABLED(CONFIG_MPTCP_IPV6)
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index f5d4d7d030f2..db96f2945cbd 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -264,7 +264,9 @@ struct mptcp_data_frag {
 	u64 data_seq;
 	u16 data_len;
 	u16 offset;
-	u16 overhead;
+	u8 overhead;
+	u8 eor:1,
+	   __unused:7;
 	u16 already_sent;
 	struct page *page;
 };
-- 
2.43.0
Re: [PATCH mptcp-next] mptcp: preserve MSG_EOR semantics in sendmsg path
Posted by MPTCP CI 1 week, 6 days ago
Hi Gang,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Unstable: 1 failed test(s): packetdrill_dss 🔴
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/22836823300

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/070dbf41676b
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1063383


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)