This series lets MPTCP applications use poll(EPOLLERR) and
recvmsg(MSG_ERRQUEUE) on the parent socket to drain TX timestamps,
MSG_ZEROCOPY completion notifications and SO_EE_ORIGIN_LOCAL events
that are produced by the subflows, the same way they would on a plain
TCP socket. ICMP-derived errors stay on the subflow queue: the legacy
RECVERR ABI cannot convey their per-subflow peer identity, and they
are intended for a future MPTCP_RECERR channel.
Patch 1 factors the existing inet_flags subflow-propagation hard-coded
list into a mask, so subsequent patches can extend it without churn.
Patch 2 makes IP_RECVERR / IPV6_RECVERR (and the RFC4884 variants)
propagate to the subflows. The parent stores the bit so MPTCP-aware
helpers can branch on it.
Patch 3 splices subflow err-skbs onto the parent's sk_error_queue at
error-report time. mptcp_poll() and __mptcp_subflow_error_report()
already handle the parent path, so user-visible behaviour matches
plain TCP.
Patch 4 is a selftest covering the propagation path.
Changes in v6 (addresses sashiko v5 review,
https://sashiko.dev/#/patchset/cover.1777756707.git.devnexen@gmail.com):
- patch 2/4: take lock_sock() before the parent ip_setsockopt() and
re-read the freshly stored RECVERR bit via inet_test_bit() inside
the critical section, then propagate that to subflows. Two racing
setsockopt() callers can no longer leave parent and subflows
desynchronised. (sashiko v5 #1, High)
- patch 2/4: drop the local 4-byte snapshot and pass the user buffer
straight through to ip_setsockopt() / ipv6_setsockopt(), so 1-byte
boolean writes (char on=1; setsockopt(.., IP_RECVERR, &on, 1))
keep the same ABI as plain TCP. (sashiko v5 #2, High)
- patch 3/4: drain the parent err-queue first in mptcp_recv_error(),
then splice from the subflows. A previous splice that failed under
rmem pressure is retried once recvmsg(MSG_ERRQUEUE) frees parent
space, and the successful sock_queue_err_skb() re-asserts EPOLLERR
so userspace knows to drain again. No permanent event loss.
(sashiko v5 #3, High)
- patch 3/4: use skb_queue_empty_lockless() in mptcp_recv_error()'s
subflow loop, matching what mptcp_poll() already does. The plain
skb_queue_empty() pointer compare tripped KCSAN against softirq
writers. (sashiko v5 #4, Medium)
v5: https://lore.kernel.org/mptcp/cover.1777756707.git.devnexen@gmail.com/
David Carlier (4):
mptcp: sockopt: factor inet_flags propagation into a mask
mptcp: propagate RECVERR sockopts to subflows
mptcp: support MSG_ERRQUEUE on the parent socket
selftests: mptcp: cover IP_RECVERR sockopt propagation
net/mptcp/protocol.c | 74 +++++++++-
net/mptcp/sockopt.c | 136 ++++++++++++++----
.../selftests/net/mptcp/mptcp_sockopt.c | 55 +++++++
3 files changed, 235 insertions(+), 30 deletions(-)
base-commit: aa15c271d79edde595fb6f4eedb52fbc16325a83
--
2.53.0