[PATCH mptcp-next v7 0/4] mptcp: MSG_ERRQUEUE support on the parent socket

David Carlier posted 4 patches 1 month ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/multipath-tcp/mptcp_net-next tags/patchew/20260509211651.104934-1-devnexen@gmail.com
There is a newer version of this series
net/mptcp/protocol.c                          |  92 ++++++++++-
net/mptcp/sockopt.c                           | 146 ++++++++++++++----
.../selftests/net/mptcp/mptcp_sockopt.c       |  55 +++++++
3 files changed, 261 insertions(+), 32 deletions(-)
[PATCH mptcp-next v7 0/4] mptcp: MSG_ERRQUEUE support on the parent socket
Posted by David Carlier 1 month ago
This series adds MSG_ERRQUEUE support on the MPTCP parent socket, so
poll() and recvmsg(MSG_ERRQUEUE) observe TX timestamps and MSG_ZEROCOPY
completion notifications through the standard inet ABI. IP_RECVERR /
IPV6_RECVERR (and their RFC4884 variants) are propagated to existing
and future subflows.

Patch 1 factors per-flag inet_assign_bit() calls in
sync_socket_options() into a mask-driven loop so future propagated
flags only need to extend MPTCP_INET_FLAGS_MASK.

Patch 2 wires up RECVERR setsockopt/getsockopt: snapshot the value,
apply it on the parent, and forward to every subflow under lock_sock()
so concurrent setsockopt callers cannot leave parent and subflows
desynchronized. Newly-joining subflows pick up the four RECVERR bits
through sync_socket_options().

Patch 3 splices forwardable err skbs (TIMESTAMPING / ZEROCOPY / LOCAL)
from each subflow's error queue onto the parent's, so pollers see
EPOLLERR and recvmsg(MSG_ERRQUEUE) on the parent drains them. Subflow
ICMP errors are dropped — they will be carried by a future
MPTCP_RECERR channel.

Patch 4 covers IP_RECVERR / IPV6_RECVERR propagation and the empty-
errqueue EAGAIN contract on MSG_ERRQUEUE | MSG_DONTWAIT in selftest.

v6 -> v7:
 - patch 2: gate SOL_IPV6 setsockopt/getsockopt dispatch on
   sk_family == AF_INET6, returning -ENOPROTOOPT otherwise, mirroring
   plain TCP. Addresses the sashiko Medium finding on v6 where
   IPV6_RECVERR silently succeeded on AF_INET MPTCP sockets.
 - patch 3: track moved skbs in mptcp_recv_error() and retry
   inet_recv_error() when ret == -EAGAIN && moved, so a successful
   subflow splice is not masked by the initial drain returning EAGAIN
   (sashiko High #2 on v6).
 - patch 3: add mptcp_subflow_errqueue_pending() and OR it into the
   EPOLLERR check in mptcp_poll(), so events stranded on a subflow
   when the parent is under rmem pressure still wake userspace
   (sashiko High #1 on v6).
 - rebased on current export.

Tested with KVM-validation auto-normal: 25/25 pass.

David Carlier (4):
  mptcp: sockopt: factor inet_flags propagation into a mask
  mptcp: propagate RECVERR sockopts to subflows
  mptcp: support MSG_ERRQUEUE on the parent socket
  selftests: mptcp: cover IP_RECVERR sockopt propagation

 net/mptcp/protocol.c                          |  92 ++++++++++-
 net/mptcp/sockopt.c                           | 146 ++++++++++++++----
 .../selftests/net/mptcp/mptcp_sockopt.c       |  55 +++++++
 3 files changed, 261 insertions(+), 32 deletions(-)


base-commit: 63b133728231ebba5167bd1e53dda9bcf0bee7c7
-- 
2.53.0

Re: [PATCH mptcp-next v7 0/4] mptcp: MSG_ERRQUEUE support on the parent socket
Posted by Matthieu Baerts 1 week, 6 days ago
Hi David,

On 10/05/2026 07:16, David Carlier wrote:
> This series adds MSG_ERRQUEUE support on the MPTCP parent socket, so
> poll() and recvmsg(MSG_ERRQUEUE) observe TX timestamps and MSG_ZEROCOPY
> completion notifications through the standard inet ABI. IP_RECVERR /
> IPV6_RECVERR (and their RFC4884 variants) are propagated to existing
> and future subflows.
> 
> Patch 1 factors per-flag inet_assign_bit() calls in
> sync_socket_options() into a mask-driven loop so future propagated
> flags only need to extend MPTCP_INET_FLAGS_MASK.
> 
> Patch 2 wires up RECVERR setsockopt/getsockopt: snapshot the value,
> apply it on the parent, and forward to every subflow under lock_sock()
> so concurrent setsockopt callers cannot leave parent and subflows
> desynchronized. Newly-joining subflows pick up the four RECVERR bits
> through sync_socket_options().
> 
> Patch 3 splices forwardable err skbs (TIMESTAMPING / ZEROCOPY / LOCAL)
> from each subflow's error queue onto the parent's, so pollers see
> EPOLLERR and recvmsg(MSG_ERRQUEUE) on the parent drains them. Subflow
> ICMP errors are dropped — they will be carried by a future
> MPTCP_RECERR channel.

Sorry for the delay: I saw Sashiko had some comments [1], and because I
noticed you checked it before, I thought you were going to send a reply
or a new version, and I forgot to ask here. So here it is: is the review
correct?

[1]
https://sashiko.dev/#/patchset/20260509211651.104934-1-devnexen@gmail.com

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH mptcp-next v7 0/4] mptcp: MSG_ERRQUEUE support on the parent socket
Posted by David CARLIER 1 week, 6 days ago
Hi,

On Wed, 27 May 2026 at 06:08, Matthieu Baerts <matttbe@kernel.org> wrote:
>
> Hi David,
>
> On 10/05/2026 07:16, David Carlier wrote:
> > This series adds MSG_ERRQUEUE support on the MPTCP parent socket, so
> > poll() and recvmsg(MSG_ERRQUEUE) observe TX timestamps and MSG_ZEROCOPY
> > completion notifications through the standard inet ABI. IP_RECVERR /
> > IPV6_RECVERR (and their RFC4884 variants) are propagated to existing
> > and future subflows.
> >
> > Patch 1 factors per-flag inet_assign_bit() calls in
> > sync_socket_options() into a mask-driven loop so future propagated
> > flags only need to extend MPTCP_INET_FLAGS_MASK.
> >
> > Patch 2 wires up RECVERR setsockopt/getsockopt: snapshot the value,
> > apply it on the parent, and forward to every subflow under lock_sock()
> > so concurrent setsockopt callers cannot leave parent and subflows
> > desynchronized. Newly-joining subflows pick up the four RECVERR bits
> > through sync_socket_options().
> >
> > Patch 3 splices forwardable err skbs (TIMESTAMPING / ZEROCOPY / LOCAL)
> > from each subflow's error queue onto the parent's, so pollers see
> > EPOLLERR and recvmsg(MSG_ERRQUEUE) on the parent drains them. Subflow
> > ICMP errors are dropped — they will be carried by a future
> > MPTCP_RECERR channel.
>
> Sorry for the delay: I saw Sashiko had some comments [1], and because I
> noticed you checked it before, I thought you were going to send a reply
> or a new version, and I forgot to ask here. So here it is: is the review
> correct?
>
> [1]
> https://sashiko.dev/#/patchset/20260509211651.104934-1-devnexen@gmail.com
>
> Cheers,
> Matt
> --
> Sponsored by the NGI0 Core fund.
>

Yes, both findings are real.

For v8 I'll drop the skb on splice failure (matches sock_queue_err_skb()'s
own behaviour under rmem pressure: -ENOMEM + sk_drops++, the skb is freed
by the caller). With nothing retained on subflow err queues,
mptcp_subflow_errqueue_pending() can go from mptcp_poll() — which fixes
the lockless conn_list walk too — and the recvmsg retry in
mptcp_recv_error() goes with it.

Cheers
Re: [PATCH mptcp-next v7 0/4] mptcp: MSG_ERRQUEUE support on the parent socket
Posted by MPTCP CI 1 month ago
Hi David,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Success! ✅
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/25612442092

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/0f646cd55809
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1092123


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)