This series lets MPTCP applications use poll(EPOLLERR) and
recvmsg(MSG_ERRQUEUE) on the parent socket to drain TX timestamps,
MSG_ZEROCOPY completion notifications and SO_EE_ORIGIN_LOCAL events
through the standard inet ABI, the same way they would on a plain TCP
socket. ICMP-derived errors stay on the subflow queue: the legacy
RECVERR ABI cannot convey their per-subflow peer identity, and they
are intended for a future MPTCP_RECERR channel.
Patch 1 factors the existing inet_flags subflow-propagation hard-coded
list into a mask, so subsequent patches can extend it without churn.
Patch 2 makes IP_RECVERR / IPV6_RECVERR (and the RFC4884 variants)
propagate to the subflows. The parent stores the bit so MPTCP-aware
helpers can branch on it.
Patch 3 splices subflow err-skbs onto the parent's sk_error_queue at
error-report time. All forwarded events go through sock_queue_err_skb(),
which re-homes skb->sk onto the parent and charges sk_rmem_alloc, so the
parent's error queue stays bounded by sk_rcvbuf and is dropped under rmem
pressure, matching tcp's tx-timestamp path and ip_icmp_error() /
ipv6_icmp_error(). MPTCP never originates MSG_ZEROCOPY or OPT_ID
tx-timestamp completions -- its data path copies into msk-owned pages and
bypasses tcp_sendmsg_locked() -- so no subflow-relative ee_data sequence
is ever forwarded. mptcp_recvmsg(MSG_ERRQUEUE) forwards directly to
inet_recv_error(), and mptcp_poll() advertises EPOLLERR purely on the
parent's sk_err / sk_error_queue, matching tcp_poll().
Patch 4 is a selftest covering the propagation path.
Changes in v11 (addresses sashiko v10 review,
https://sashiko.dev/#/patchset/20260529174524.260199-1-devnexen@gmail.com):
- patch 3/4: route MSG_ZEROCOPY completions through sock_queue_err_skb()
like every other forwarded event, rather than orphaning them and
queueing to the parent by hand. The hand-rolled path ran the subflow
destructor (refunding its memory charge) but never charged the parent,
so completions could pile up unbounded on the parent err queue and
exhaust memory (OOM). The "never drop or we leak pinned pages" premise
was also wrong: __msg_zerocopy_callback() calls
mm_unaccount_pinned_pages() before queueing, so a dropped notification
loses only the notification, not the pages. (sashiko v10, High)
- no functional change for the ee_data concern: MPTCP originates neither
MSG_ZEROCOPY nor OPT_ID tx-timestamp completions, so no subflow-relative
sequence is ever spliced to the parent. (sashiko v10, High)
- patch 2/4: initialise val in mptcp_setsockopt_recverr() to silence a
latent -Wmaybe-uninitialized on the switch without a default case.
v10: https://lore.kernel.org/mptcp/20260529174524.260199-1-devnexen@gmail.com/
v9: https://lore.kernel.org/mptcp/20260528055459.55133-1-devnexen@gmail.com/
David Carlier (4):
mptcp: sockopt: factor inet_flags propagation into a mask
mptcp: propagate RECVERR sockopts to subflows
mptcp: support MSG_ERRQUEUE on the parent socket
selftests: mptcp: cover IP_RECVERR sockopt propagation
net/mptcp/protocol.c | 63 ++++++--
net/mptcp/sockopt.c | 153 +++++++++++++++---
.../selftests/net/mptcp/mptcp_sockopt.c | 55 +++++++
3 files changed, 237 insertions(+), 34 deletions(-)
base-commit: e05cbdb611ff815528cdf90e29a96663b9af48c6
--
2.53.0