[PATCH mptcp-next v11 0/4] mptcp: MSG_ERRQUEUE support on the parent socket

David Carlier posted 4 patches 1 week ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/multipath-tcp/mptcp_net-next tags/patchew/20260531145955.322337-1-devnexen@gmail.com
net/mptcp/protocol.c                          |  63 ++++++--
net/mptcp/sockopt.c                           | 153 +++++++++++++++---
.../selftests/net/mptcp/mptcp_sockopt.c       |  55 +++++++
3 files changed, 237 insertions(+), 34 deletions(-)
[PATCH mptcp-next v11 0/4] mptcp: MSG_ERRQUEUE support on the parent socket
Posted by David Carlier 1 week ago
This series lets MPTCP applications use poll(EPOLLERR) and
recvmsg(MSG_ERRQUEUE) on the parent socket to drain TX timestamps,
MSG_ZEROCOPY completion notifications and SO_EE_ORIGIN_LOCAL events
through the standard inet ABI, the same way they would on a plain TCP
socket. ICMP-derived errors stay on the subflow queue: the legacy
RECVERR ABI cannot convey their per-subflow peer identity, and they
are intended for a future MPTCP_RECERR channel.

Patch 1 factors the existing inet_flags subflow-propagation hard-coded
list into a mask, so subsequent patches can extend it without churn.

Patch 2 makes IP_RECVERR / IPV6_RECVERR (and the RFC4884 variants)
propagate to the subflows. The parent stores the bit so MPTCP-aware
helpers can branch on it.

Patch 3 splices subflow err-skbs onto the parent's sk_error_queue at
error-report time. All forwarded events go through sock_queue_err_skb(),
which re-homes skb->sk onto the parent and charges sk_rmem_alloc, so the
parent's error queue stays bounded by sk_rcvbuf and is dropped under rmem
pressure, matching tcp's tx-timestamp path and ip_icmp_error() /
ipv6_icmp_error(). MPTCP never originates MSG_ZEROCOPY or OPT_ID
tx-timestamp completions -- its data path copies into msk-owned pages and
bypasses tcp_sendmsg_locked() -- so no subflow-relative ee_data sequence
is ever forwarded. mptcp_recvmsg(MSG_ERRQUEUE) forwards directly to
inet_recv_error(), and mptcp_poll() advertises EPOLLERR purely on the
parent's sk_err / sk_error_queue, matching tcp_poll().

Patch 4 is a selftest covering the propagation path.

Changes in v11 (addresses sashiko v10 review,
https://sashiko.dev/#/patchset/20260529174524.260199-1-devnexen@gmail.com):
 - patch 3/4: route MSG_ZEROCOPY completions through sock_queue_err_skb()
   like every other forwarded event, rather than orphaning them and
   queueing to the parent by hand. The hand-rolled path ran the subflow
   destructor (refunding its memory charge) but never charged the parent,
   so completions could pile up unbounded on the parent err queue and
   exhaust memory (OOM). The "never drop or we leak pinned pages" premise
   was also wrong: __msg_zerocopy_callback() calls
   mm_unaccount_pinned_pages() before queueing, so a dropped notification
   loses only the notification, not the pages. (sashiko v10, High)
 - no functional change for the ee_data concern: MPTCP originates neither
   MSG_ZEROCOPY nor OPT_ID tx-timestamp completions, so no subflow-relative
   sequence is ever spliced to the parent. (sashiko v10, High)
 - patch 2/4: initialise val in mptcp_setsockopt_recverr() to silence a
   latent -Wmaybe-uninitialized on the switch without a default case.

v10: https://lore.kernel.org/mptcp/20260529174524.260199-1-devnexen@gmail.com/
v9: https://lore.kernel.org/mptcp/20260528055459.55133-1-devnexen@gmail.com/

David Carlier (4):
  mptcp: sockopt: factor inet_flags propagation into a mask
  mptcp: propagate RECVERR sockopts to subflows
  mptcp: support MSG_ERRQUEUE on the parent socket
  selftests: mptcp: cover IP_RECVERR sockopt propagation

 net/mptcp/protocol.c                          |  63 ++++++--
 net/mptcp/sockopt.c                           | 153 +++++++++++++++---
 .../selftests/net/mptcp/mptcp_sockopt.c       |  55 +++++++
 3 files changed, 237 insertions(+), 34 deletions(-)


base-commit: e05cbdb611ff815528cdf90e29a96663b9af48c6
-- 
2.53.0
Re: [PATCH mptcp-next v11 0/4] mptcp: MSG_ERRQUEUE support on the parent socket
Posted by MPTCP CI 1 week ago
Hi David,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join): Unstable: 1 failed test(s): packetdrill_add_addr ⚠️ 
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Unstable: 1 failed test(s): selftest_mptcp_connect_checksum ⚠️ 
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26716494041

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/5dd33dfffc0d
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1103609


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)