[PATCH mptcp-next v3 0/3] mptcp: MSG_ERRQUEUE support on the parent socket

David Carlier posted 3 patches 1 week, 6 days ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/multipath-tcp/mptcp_net-next tags/patchew/20260421223338.52743-1-devnexen@gmail.com
There is a newer version of this series
net/mptcp/protocol.c                          | 123 ++++++++++++++---
net/mptcp/sockopt.c                           | 129 ++++++++++++++++++
.../selftests/net/mptcp/mptcp_sockopt.c       |  55 ++++++++
3 files changed, 287 insertions(+), 20 deletions(-)
[PATCH mptcp-next v3 0/3] mptcp: MSG_ERRQUEUE support on the parent socket
Posted by David Carlier 1 week, 6 days ago
MPTCP already advertises IP_RECVERR/IPV6_RECVERR as supported, but the
parent socket does not currently provide usable MSG_ERRQUEUE handling.

This series wires the MPTCP socket up to the IPv4/IPv6 error queue
paths. It propagates RECVERR-related sockopts to existing and future
subflows, makes poll() report pending errqueue activity through the
parent socket, and allows recvmsg(MSG_ERRQUEUE) on the MPTCP socket to
consume queued errors with the parent socket ABI.

The series also handles mixed-family subflows by applying the matching
sockopt according to each subflow family, and avoids silently losing an
error skb if requeueing to the parent socket fails under rmem pressure.

v2 -> v3:
  - Only consume ssk->sk_err in the fallback / MPC-connect branch of
    __mptcp_subflow_error_report(). Steady-state MPTCP now leaves
    TCP's one-shot sk_err to TCP's own consumer instead of silently
    draining it via sock_error().
  - In mptcp_recv_error(), also route to inet_recv_error() when
    sk->sk_err is set, so a fallback-propagated error reaches userspace
    even when the parent errqueue is empty.
  - Scope the new selftest to IP_RECVERR sockopt propagation only.
    End-to-end errqueue delivery (TX timestamps, ICMP, zerocopy)
    depends on subflow-side producers that are out of scope for this
    series and will be covered by follow-up work. Fixes the
    mptcp_sockopt selftest timeout reported by the MPTCP CI on v2.

v1 -> v2:
  - Retargeted to mptcp-next per Matthieu Baerts' feedback (net-next
    closed during the merge window; iterate on the MPTCP tree).
  - Guard mptcp_setsockopt_v6_recverr() and its dispatch cases in
    mptcp_setsockopt_v6() with #if IS_ENABLED(CONFIG_IPV6) to fix
    the MPTCP CI link break on without_ipv6/with_mptcp configs
    (undefined reference to ipv6_setsockopt).

v1: https://lore.kernel.org/mptcp/20260421152216.38127-1-devnexen@gmail.com/
v2: https://lore.kernel.org/mptcp/20260421191337.58341-1-devnexen@gmail.com/

David Carlier (3):
  mptcp: propagate RECVERR sockopts to subflows
  mptcp: support MSG_ERRQUEUE on the parent socket
  selftests: mptcp: cover IP_RECVERR sockopt propagation

 net/mptcp/protocol.c                          | 123 ++++++++++++++---
 net/mptcp/sockopt.c                           | 129 ++++++++++++++++++
 .../selftests/net/mptcp/mptcp_sockopt.c       |  55 ++++++++
 3 files changed, 287 insertions(+), 20 deletions(-)


base-commit: 4464afe97dc56e817a23b730979cbc6fc48f1912
-- 
2.53.0
[PATCH mptcp-next v4 0/4] mptcp: MSG_ERRQUEUE support on the parent socket
Posted by David Carlier 1 week ago
MPTCP already advertises IP_RECVERR/IPV6_RECVERR as supported, but the
parent socket does not currently provide usable MSG_ERRQUEUE handling.

This series wires the MPTCP socket up to the IPv4/IPv6 error queue
paths. It propagates RECVERR-related sockopts to existing and future
subflows, makes poll() report pending errqueue activity through the
parent socket, and lets recvmsg(MSG_ERRQUEUE) on the MPTCP socket
consume queued errors with the parent socket ABI.

A new prerequisite patch factors the per-flag inet_flags propagation
in sync_socket_options() into a single masked word copy, so further
inet_flags propagated by MPTCP can be added by extending the mask
rather than touching the call site.

Patch 2 then leverages the existing mptcp_setsockopt_all_sf() helper
for the setsockopt path and extends MPTCP_INET_FLAGS_MASK with the
four RECVERR bits, dropping the family-specific helpers from v3.

Based-on: <20260424-mptcp-pm-sockopt-set-all-sf-v1-1-38e7023822f8@kernel.org>

v3 -> v4:
  - New patch 1/4: factor inet_flags propagation in
    sync_socket_options() through MPTCP_INET_FLAGS_MASK, per Paolo's
    review.
  - Patch 2/4 (was 1/3): drop the mptcp_recverr_enabled() and
    mptcp_subflow_set_recverr() helpers; route the setsockopt path
    through mptcp_setsockopt_all_sf(). Inherit the four RECVERR bits
    via MPTCP_INET_FLAGS_MASK in sync_socket_options() instead of
    explicit inet[6]_assign_bit() calls.
  - Patch 3/4 (was 2/3): rework the MSG_ERRQUEUE plumbing per Paolo's
    review. Subflow err skbs are now spliced onto the parent msk's
    sk_error_queue from __mptcp_subflow_error_report() via the new
    __mptcp_subflow_splice_errqueue() helper. recvmsg(MSG_ERRQUEUE)
    on the parent reverts to plain inet_recv_error(), and mptcp_poll()
    only inspects the parent's sk_error_queue -- no more on-demand
    subflow walks, no extra lock_sock() / data_lock() in the poll or
    recv paths. Keep the original early-return structure of
    __mptcp_subflow_error_report() and fix the reverse christmas-tree
    variable order Paolo flagged.

v2 -> v3:
  - Only consume ssk->sk_err in the fallback / MPC-connect branch of
    __mptcp_subflow_error_report(). Steady-state MPTCP now leaves
    TCP's one-shot sk_err to TCP's own consumer instead of silently
    draining it via sock_error().
  - In mptcp_recv_error(), also route to inet_recv_error() when
    sk->sk_err is set, so a fallback-propagated error reaches userspace
    even when the parent errqueue is empty.
  - Scope the new selftest to IP_RECVERR sockopt propagation only.
    End-to-end errqueue delivery (TX timestamps, ICMP, zerocopy)
    depends on subflow-side producers that are out of scope for this
    series and will be covered by follow-up work. Fixes the
    mptcp_sockopt selftest timeout reported by the MPTCP CI on v2.

v1 -> v2:
  - Retargeted to mptcp-next per Matthieu Baerts' feedback (net-next
    closed during the merge window; iterate on the MPTCP tree).
  - Guard mptcp_setsockopt_v6_recverr() and its dispatch cases in
    mptcp_setsockopt_v6() with #if IS_ENABLED(CONFIG_IPV6) to fix
    the MPTCP CI link break on without_ipv6/with_mptcp configs
    (undefined reference to ipv6_setsockopt).

v1: https://lore.kernel.org/mptcp/20260421152216.38127-1-devnexen@gmail.com/
v2: https://lore.kernel.org/mptcp/20260421191337.58341-1-devnexen@gmail.com/
v3: https://lore.kernel.org/mptcp/20260421223338.52743-1-devnexen@gmail.com/

David Carlier (4):
  mptcp: sockopt: factor inet_flags propagation into a mask
  mptcp: propagate RECVERR sockopts to subflows
  mptcp: support MSG_ERRQUEUE on the parent socket
  selftests: mptcp: cover IP_RECVERR sockopt propagation

 net/mptcp/protocol.c                          |  33 +++++-
 net/mptcp/sockopt.c                           | 107 ++++++++++++++----
 .../selftests/net/mptcp/mptcp_sockopt.c       |  55 +++++++++
 3 files changed, 170 insertions(+), 25 deletions(-)

--
2.53.0
Re: [PATCH mptcp-next v4 0/4] mptcp: MSG_ERRQUEUE support on the parent socket
Posted by MPTCP CI 6 days, 14 hours ago
Hi David,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Unstable: 1 failed test(s): selftest_mptcp_join ⚠️ 
- KVM Validation: debug (except selftest_mptcp_join): Unstable: 2 failed test(s): packetdrill_fastopen packetdrill_sockopts ⚠️ 
- KVM Validation: debug (only selftest_mptcp_join): Unstable: 1 failed test(s): selftest_mptcp_join ⚠️ 
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/25071789731

Initiator: Matthieu Baerts (NGI0)
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/7688d292b14a
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1086438


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
Re: [PATCH mptcp-next v4 0/4] mptcp: MSG_ERRQUEUE support on the parent socket
Posted by Matthieu Baerts 6 days, 15 hours ago
Hi David,

Thank you for the new version.

On 27/04/2026 23:10, David Carlier wrote:
> MPTCP already advertises IP_RECVERR/IPV6_RECVERR as supported, but the
> parent socket does not currently provide usable MSG_ERRQUEUE handling.
> 
> This series wires the MPTCP socket up to the IPv4/IPv6 error queue
> paths. It propagates RECVERR-related sockopts to existing and future
> subflows, makes poll() report pending errqueue activity through the
> parent socket, and lets recvmsg(MSG_ERRQUEUE) on the MPTCP socket
> consume queued errors with the parent socket ABI.
> 
> A new prerequisite patch factors the per-flag inet_flags propagation
> in sync_socket_options() into a single masked word copy, so further
> inet_flags propagated by MPTCP can be added by extending the mask
> rather than touching the call site.
> 
> Patch 2 then leverages the existing mptcp_setsockopt_all_sf() helper
> for the setsockopt path and extends MPTCP_INET_FLAGS_MASK with the
> four RECVERR bits, dropping the family-specific helpers from v3.
> 
> Based-on: <20260424-mptcp-pm-sockopt-set-all-sf-v1-1-38e7023822f8@kernel.org>

I didn't review it, but I notice that the CI cannot apply your series,
because it looks like it is not based on the one you mentioned here.

Can you either remove this line, or rebase your series on top of this
other patch?

Also, please don't send your series as a reply to a previous posting,
please use a new thread. That's what is usually done, clearer, plus some
tools don't support replies.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.
Re: [PATCH mptcp-next v4 0/4] mptcp: MSG_ERRQUEUE support on the parent socket
Posted by Matthieu Baerts 6 days, 15 hours ago
On 28/04/2026 20:48, Matthieu Baerts wrote:
> Hi David,
> 
> Thank you for the new version.
> 
> On 27/04/2026 23:10, David Carlier wrote:
>> MPTCP already advertises IP_RECVERR/IPV6_RECVERR as supported, but the
>> parent socket does not currently provide usable MSG_ERRQUEUE handling.
>>
>> This series wires the MPTCP socket up to the IPv4/IPv6 error queue
>> paths. It propagates RECVERR-related sockopts to existing and future
>> subflows, makes poll() report pending errqueue activity through the
>> parent socket, and lets recvmsg(MSG_ERRQUEUE) on the MPTCP socket
>> consume queued errors with the parent socket ABI.
>>
>> A new prerequisite patch factors the per-flag inet_flags propagation
>> in sync_socket_options() into a single masked word copy, so further
>> inet_flags propagated by MPTCP can be added by extending the mask
>> rather than touching the call site.
>>
>> Patch 2 then leverages the existing mptcp_setsockopt_all_sf() helper
>> for the setsockopt path and extends MPTCP_INET_FLAGS_MASK with the
>> four RECVERR bits, dropping the family-specific helpers from v3.
>>
>> Based-on: <20260424-mptcp-pm-sockopt-set-all-sf-v1-1-38e7023822f8@kernel.org>
> 
> I didn't review it, but I notice that the CI cannot apply your series,
> because it looks like it is not based on the one you mentioned here.
> 
> Can you either remove this line, or rebase your series on top of this
> other patch?
> 
> Also, please don't send your series as a reply to a previous posting,
> please use a new thread. That's what is usually done, clearer, plus some
> tools don't support replies.

Note: I just manually resolved the conflicts and sent the series to the
CI, not to have to resend a series just to retrigger the CI.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.
Re: [PATCH mptcp-next v4 0/4] mptcp: MSG_ERRQUEUE support on the parent socket
Posted by Matthieu Baerts 3 days, 19 hours ago
Hi David,

On 28/04/2026 20:56, Matthieu Baerts wrote:
> On 28/04/2026 20:48, Matthieu Baerts wrote:
>> Hi David,
>>
>> Thank you for the new version.
>>
>> On 27/04/2026 23:10, David Carlier wrote:
>>> MPTCP already advertises IP_RECVERR/IPV6_RECVERR as supported, but the
>>> parent socket does not currently provide usable MSG_ERRQUEUE handling.
>>>
>>> This series wires the MPTCP socket up to the IPv4/IPv6 error queue
>>> paths. It propagates RECVERR-related sockopts to existing and future
>>> subflows, makes poll() report pending errqueue activity through the
>>> parent socket, and lets recvmsg(MSG_ERRQUEUE) on the MPTCP socket
>>> consume queued errors with the parent socket ABI.
>>>
>>> A new prerequisite patch factors the per-flag inet_flags propagation
>>> in sync_socket_options() into a single masked word copy, so further
>>> inet_flags propagated by MPTCP can be added by extending the mask
>>> rather than touching the call site.
>>>
>>> Patch 2 then leverages the existing mptcp_setsockopt_all_sf() helper
>>> for the setsockopt path and extends MPTCP_INET_FLAGS_MASK with the
>>> four RECVERR bits, dropping the family-specific helpers from v3.
>>>
>>> Based-on: <20260424-mptcp-pm-sockopt-set-all-sf-v1-1-38e7023822f8@kernel.org>
>>
>> I didn't review it, but I notice that the CI cannot apply your series,
>> because it looks like it is not based on the one you mentioned here.
>>
>> Can you either remove this line, or rebase your series on top of this
>> other patch?
>>
>> Also, please don't send your series as a reply to a previous posting,
>> please use a new thread. That's what is usually done, clearer, plus some
>> tools don't support replies.
> 
> Note: I just manually resolved the conflicts and sent the series to the
> CI, not to have to resend a series just to retrigger the CI.

It looks like the CI (and sashiko) found some issues with this series.

But globally, I'm a bit puzzled: with MPTCP, there might be multiple
paths being used, and reporting errors about all of them when the
"legacy" RECVERR socket options are used will confuse the userspace that
doesn't (have to) know multiple subflows are being used. In this case,
either messages should be filtered (might be hard to handle all
use-cases and maintain that?), or this should be limited to cases where
only one subflow is being used. Which leads me to this question: what's
your use-case exactly? What are you trying to solve?

It might be easier to have a dedicated MPTCP_RECERR, and eventually
propagate more MPTCP-specific messages. Something that could be linked to:

  https://github.com/multipath-tcp/mptcp_net-next/issues/78

WDYT?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.
Re: [PATCH mptcp-next v4 0/4] mptcp: MSG_ERRQUEUE support on the parent socket
Posted by David CARLIER 3 days, 18 hours ago
Hi Matthieu,

  On 01/05/2026 16:49, Matthieu Baerts wrote:
  > It looks like the CI (and sashiko) found some issues with this series.

  For v5:

  - 1/4: per-bit inet_assign_bit() loop instead of WRITE_ONCE(), keeps
    atomicity.
  - 2/4: add missing sockopt_seq_inc(msk).
  - 2/4: skip family-mismatched subflows in the v4/v6 helpers.
  - 2/4: snapshot optval to a local int, pass KERNEL_SOCKPTR(&val) into
    the loop.
  - 3/4: pull-on-drain from mptcp_recv_error() so a parent-ENOMEM does
    not strand subflow skbs.

  Will also re-run the docker repro to check the selftest_mptcp_join /
  packetdrill rows are pre-existing.

  > But globally, I'm a bit puzzled: with MPTCP, there might be multiple
  > paths being used, and reporting errors about all of them when the
  > "legacy" RECVERR socket options are used will confuse the userspace
  > that doesn't (have to) know multiple subflows are being used.

  Fair, and Paolo raised it on v3. The use-case is tx timestamping and
  MSG_ZEROCOPY completions - both are tied to user data, not the
  subflow that carried it, so no subflow identity leaks into the cmsg.
  ICMP/ICMPv6 is the part that does. v5 will filter the splice by
  SO_EE_ORIGIN: forward TIMESTAMPING / ZEROCOPY / LOCAL, drop ICMP.

  > It might be easier to have a dedicated MPTCP_RECERR, and
eventually
  > propagate more MPTCP-specific messages. Something that could be
  > linked to:
  >   https://github.com/multipath-tcp/mptcp_net-next/issues/78

  Agreed - subflow ICMP and #78's lifecycle events belong there. As a
  follow-up once v5 lands.

  Cheers
Re: [PATCH mptcp-next v4 0/4] mptcp: MSG_ERRQUEUE support on the parent socket
Posted by Matthieu Baerts 3 days, 18 hours ago
On 01/05/2026 17:28, David CARLIER wrote:
> Hi Matthieu,
> 
>   On 01/05/2026 16:49, Matthieu Baerts wrote:
>   > It looks like the CI (and sashiko) found some issues with this series.

(Please do fix your email client to avoid this formatting: some of your
emails are OK, but not all of them.)

>   For v5:
> 
>   - 1/4: per-bit inet_assign_bit() loop instead of WRITE_ONCE(), keeps
>     atomicity.
>   - 2/4: add missing sockopt_seq_inc(msk).
>   - 2/4: skip family-mismatched subflows in the v4/v6 helpers.
>   - 2/4: snapshot optval to a local int, pass KERNEL_SOCKPTR(&val) into
>     the loop.

(While at it, your new helpers mptcp_setsockopt_v[46]_recverr could have
a generic name)

>   - 3/4: pull-on-drain from mptcp_recv_error() so a parent-ENOMEM does
>     not strand subflow skbs.
> 
>   Will also re-run the docker repro to check the selftest_mptcp_join /
>   packetdrill rows are pre-existing.

The packetdrill errors might be pre-existing, someone should look at
improving the situation there:

  https://ci-results.mptcp.dev/flakes.html

>   > But globally, I'm a bit puzzled: with MPTCP, there might be multiple
>   > paths being used, and reporting errors about all of them when the
>   > "legacy" RECVERR socket options are used will confuse the userspace
>   > that doesn't (have to) know multiple subflows are being used.
> 
>   Fair, and Paolo raised it on v3. The use-case is tx timestamping and
>   MSG_ZEROCOPY completions - both are tied to user data, not the
>   subflow that carried it, so no subflow identity leaks into the cmsg.
>   ICMP/ICMPv6 is the part that does. v5 will filter the splice by
>   SO_EE_ORIGIN: forward TIMESTAMPING / ZEROCOPY / LOCAL, drop ICMP.

Maybe OK with this filter indeed..

>   > It might be easier to have a dedicated MPTCP_RECERR, and
> eventually
>   > propagate more MPTCP-specific messages. Something that could be
>   > linked to:
>   >   https://github.com/multipath-tcp/mptcp_net-next/issues/78
> 
>   Agreed - subflow ICMP and #78's lifecycle events belong there. As a
>   follow-up once v5 lands.

Indeed, better to split them.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.
Re: [PATCH mptcp-next v4 0/4] mptcp: MSG_ERRQUEUE support on the parent socket
Posted by David CARLIER 6 days, 14 hours ago
Hi Mathieu,

On Tue, 28 Apr 2026 at 19:56, Matthieu Baerts <matttbe@kernel.org> wrote:
>
> On 28/04/2026 20:48, Matthieu Baerts wrote:
> > Hi David,
> >
> > Thank you for the new version.
> >
> > On 27/04/2026 23:10, David Carlier wrote:
> >> MPTCP already advertises IP_RECVERR/IPV6_RECVERR as supported, but the
> >> parent socket does not currently provide usable MSG_ERRQUEUE handling.
> >>
> >> This series wires the MPTCP socket up to the IPv4/IPv6 error queue
> >> paths. It propagates RECVERR-related sockopts to existing and future
> >> subflows, makes poll() report pending errqueue activity through the
> >> parent socket, and lets recvmsg(MSG_ERRQUEUE) on the MPTCP socket
> >> consume queued errors with the parent socket ABI.
> >>
> >> A new prerequisite patch factors the per-flag inet_flags propagation
> >> in sync_socket_options() into a single masked word copy, so further
> >> inet_flags propagated by MPTCP can be added by extending the mask
> >> rather than touching the call site.
> >>
> >> Patch 2 then leverages the existing mptcp_setsockopt_all_sf() helper
> >> for the setsockopt path and extends MPTCP_INET_FLAGS_MASK with the
> >> four RECVERR bits, dropping the family-specific helpers from v3.
> >>
> >> Based-on: <20260424-mptcp-pm-sockopt-set-all-sf-v1-1-38e7023822f8@kernel.org>
> >
> > I didn't review it, but I notice that the CI cannot apply your series,
> > because it looks like it is not based on the one you mentioned here.
> >
> > Can you either remove this line, or rebase your series on top of this
> > other patch?
> >
> > Also, please don't send your series as a reply to a previous posting,
> > please use a new thread. That's what is usually done, clearer, plus some
> > tools don't support replies.
>
> Note: I just manually resolved the conflicts and sent the series to the
> CI, not to have to resend a series just to retrigger the CI.

appreciated. Cheers.
>
> Cheers,
> Matt
> --
> Sponsored by the NGI0 Core fund.
>
[PATCH mptcp-next v4 1/4] mptcp: sockopt: factor inet_flags propagation into a mask
Posted by David Carlier 1 week ago
Replace the per-flag inet_assign_bit() calls in sync_socket_options()
with a masked word-level copy of inet_sk()->inet_flags. Introduce
MPTCP_INET_FLAGS_MASK so further flags propagated by MPTCP can be
added by extending the mask rather than touching the call site.

No functional change.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Carlier <devnexen@gmail.com>
---
 net/mptcp/sockopt.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index 0efe40be2fde..41c9dc9cf95e 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -16,6 +16,10 @@
 
 #define MIN_INFO_OPTLEN_SIZE		16
 #define MIN_FULL_INFO_OPTLEN_SIZE	40
+#define MPTCP_INET_FLAGS_MASK \
+	(BIT(INET_FLAGS_TRANSPARENT) | \
+	 BIT(INET_FLAGS_FREEBIND) | \
+	 BIT(INET_FLAGS_BIND_ADDRESS_NO_PORT))
 
 static struct sock *__mptcp_tcp_fallback(struct mptcp_sock *msk)
 {
@@ -1536,6 +1540,7 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
 {
 	static const unsigned int tx_rx_locks = SOCK_RCVBUF_LOCK | SOCK_SNDBUF_LOCK;
 	struct sock *sk = (struct sock *)msk;
+	unsigned long flags;
 	bool keep_open;
 
 	keep_open = sock_flag(sk, SOCK_KEEPOPEN);
@@ -1582,9 +1587,10 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
 	tcp_sock_set_keepcnt(ssk, msk->keepalive_cnt);
 	tcp_sock_set_maxseg(ssk, msk->maxseg);
 
-	inet_assign_bit(TRANSPARENT, ssk, inet_test_bit(TRANSPARENT, sk));
-	inet_assign_bit(FREEBIND, ssk, inet_test_bit(FREEBIND, sk));
-	inet_assign_bit(BIND_ADDRESS_NO_PORT, ssk, inet_test_bit(BIND_ADDRESS_NO_PORT, sk));
+	flags = inet_sk(ssk)->inet_flags;
+	flags &= ~MPTCP_INET_FLAGS_MASK;
+	flags |= inet_sk(sk)->inet_flags & MPTCP_INET_FLAGS_MASK;
+	WRITE_ONCE(inet_sk(ssk)->inet_flags, flags);
 	WRITE_ONCE(inet_sk(ssk)->local_port_range, READ_ONCE(inet_sk(sk)->local_port_range));
 }
 
-- 
2.53.0
[PATCH mptcp-next v4 2/4] mptcp: propagate RECVERR sockopts to subflows
Posted by David Carlier 1 week ago
Propagate IP_RECVERR/IP_RECVERR_RFC4884 and
IPV6_RECVERR/IPV6_RECVERR_RFC4884 from the MPTCP socket to existing
and future subflows. The setsockopt path forwards each option to
every subflow via mptcp_setsockopt_all_sf(); newly-joining subflows
inherit the four RECVERR bits through sync_socket_options() now that
MPTCP_INET_FLAGS_MASK covers them.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Assisted-by: Codex:gpt-5
Signed-off-by: David Carlier <devnexen@gmail.com>
---
 net/mptcp/sockopt.c | 97 ++++++++++++++++++++++++++++++++++++---------
 1 file changed, 79 insertions(+), 18 deletions(-)

diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index 41c9dc9cf95e..171e83e66a97 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -8,6 +8,8 @@
 
 #include <linux/kernel.h>
 #include <linux/module.h>
+#include <net/ip.h>
+#include <net/ipv6.h>
 #include <net/sock.h>
 #include <net/protocol.h>
 #include <net/tcp.h>
@@ -19,7 +21,11 @@
 #define MPTCP_INET_FLAGS_MASK \
 	(BIT(INET_FLAGS_TRANSPARENT) | \
 	 BIT(INET_FLAGS_FREEBIND) | \
-	 BIT(INET_FLAGS_BIND_ADDRESS_NO_PORT))
+	 BIT(INET_FLAGS_BIND_ADDRESS_NO_PORT) | \
+	 BIT(INET_FLAGS_RECVERR) | \
+	 BIT(INET_FLAGS_RECVERR_RFC4884) | \
+	 BIT(INET_FLAGS_RECVERR6) | \
+	 BIT(INET_FLAGS_RECVERR6_RFC4884))
 
 static struct sock *__mptcp_tcp_fallback(struct mptcp_sock *msk)
 {
@@ -388,6 +394,41 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
 	return -EOPNOTSUPP;
 }
 
+static int mptcp_setsockopt_all_sf(struct mptcp_sock *msk, int level,
+				   int optname, sockptr_t optval,
+				   unsigned int optlen)
+{
+	struct mptcp_subflow_context *subflow;
+	int ret = 0;
+
+	mptcp_for_each_subflow(msk, subflow) {
+		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+
+		ret = tcp_setsockopt(ssk, level, optname, optval, optlen);
+		if (ret)
+			break;
+	}
+	return ret;
+}
+
+#if IS_ENABLED(CONFIG_IPV6)
+static int mptcp_setsockopt_v6_recverr(struct mptcp_sock *msk, int optname,
+				       sockptr_t optval, unsigned int optlen)
+{
+	struct sock *sk = (struct sock *)msk;
+	int ret;
+
+	ret = ipv6_setsockopt(sk, SOL_IPV6, optname, optval, optlen);
+	if (ret)
+		return ret;
+
+	lock_sock(sk);
+	ret = mptcp_setsockopt_all_sf(msk, SOL_IPV6, optname, optval, optlen);
+	release_sock(sk);
+	return ret;
+}
+#endif
+
 static int mptcp_setsockopt_v6(struct mptcp_sock *msk, int optname,
 			       sockptr_t optval, unsigned int optlen)
 {
@@ -430,6 +471,12 @@ static int mptcp_setsockopt_v6(struct mptcp_sock *msk, int optname,
 
 		release_sock(sk);
 		break;
+#if IS_ENABLED(CONFIG_IPV6)
+	case IPV6_RECVERR:
+	case IPV6_RECVERR_RFC4884:
+		ret = mptcp_setsockopt_v6_recverr(msk, optname, optval, optlen);
+		break;
+#endif
 	}
 
 	return ret;
@@ -764,6 +811,22 @@ static int mptcp_setsockopt_v4_set_tos(struct mptcp_sock *msk, int optname,
 	return 0;
 }
 
+static int mptcp_setsockopt_v4_recverr(struct mptcp_sock *msk, int optname,
+				       sockptr_t optval, unsigned int optlen)
+{
+	struct sock *sk = (struct sock *)msk;
+	int ret;
+
+	ret = ip_setsockopt(sk, SOL_IP, optname, optval, optlen);
+	if (ret)
+		return ret;
+
+	lock_sock(sk);
+	ret = mptcp_setsockopt_all_sf(msk, SOL_IP, optname, optval, optlen);
+	release_sock(sk);
+	return ret;
+}
+
 static int mptcp_setsockopt_v4(struct mptcp_sock *msk, int optname,
 			       sockptr_t optval, unsigned int optlen)
 {
@@ -775,6 +838,9 @@ static int mptcp_setsockopt_v4(struct mptcp_sock *msk, int optname,
 		return mptcp_setsockopt_sol_ip_set(msk, optname, optval, optlen);
 	case IP_TOS:
 		return mptcp_setsockopt_v4_set_tos(msk, optname, optval, optlen);
+	case IP_RECVERR:
+	case IP_RECVERR_RFC4884:
+		return mptcp_setsockopt_v4_recverr(msk, optname, optval, optlen);
 	}
 
 	return -EOPNOTSUPP;
@@ -802,23 +868,6 @@ static int mptcp_setsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
 	return ret;
 }
 
-static int mptcp_setsockopt_all_sf(struct mptcp_sock *msk, int level,
-				   int optname, sockptr_t optval,
-				   unsigned int optlen)
-{
-	struct mptcp_subflow_context *subflow;
-	int ret = 0;
-
-	mptcp_for_each_subflow(msk, subflow) {
-		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
-
-		ret = tcp_setsockopt(ssk, level, optname, optval, optlen);
-		if (ret)
-			break;
-	}
-	return ret;
-}
-
 static int mptcp_setsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
 				    sockptr_t optval, unsigned int optlen)
 {
@@ -1463,6 +1512,12 @@ static int mptcp_getsockopt_v4(struct mptcp_sock *msk, int optname,
 	case IP_LOCAL_PORT_RANGE:
 		return mptcp_put_int_option(msk, optval, optlen,
 				READ_ONCE(inet_sk(sk)->local_port_range));
+	case IP_RECVERR:
+		return mptcp_put_int_option(msk, optval, optlen,
+				inet_test_bit(RECVERR, sk));
+	case IP_RECVERR_RFC4884:
+		return mptcp_put_int_option(msk, optval, optlen,
+				inet_test_bit(RECVERR_RFC4884, sk));
 	}
 
 	return -EOPNOTSUPP;
@@ -1483,6 +1538,12 @@ static int mptcp_getsockopt_v6(struct mptcp_sock *msk, int optname,
 	case IPV6_FREEBIND:
 		return mptcp_put_int_option(msk, optval, optlen,
 					    inet_test_bit(FREEBIND, sk));
+	case IPV6_RECVERR:
+		return mptcp_put_int_option(msk, optval, optlen,
+					    inet6_test_bit(RECVERR6, sk));
+	case IPV6_RECVERR_RFC4884:
+		return mptcp_put_int_option(msk, optval, optlen,
+					    inet6_test_bit(RECVERR6_RFC4884, sk));
 	}
 
 	return -EOPNOTSUPP;
-- 
2.53.0
Re: [PATCH mptcp-next v4 2/4] mptcp: propagate RECVERR sockopts to subflows
Posted by Matthieu Baerts 3 days, 18 hours ago
On 27/04/2026 23:10, David Carlier wrote:
> Propagate IP_RECVERR/IP_RECVERR_RFC4884 and
> IPV6_RECVERR/IPV6_RECVERR_RFC4884 from the MPTCP socket to existing
> and future subflows. The setsockopt path forwards each option to
> every subflow via mptcp_setsockopt_all_sf(); newly-joining subflows
> inherit the four RECVERR bits through sync_socket_options() now that
> MPTCP_INET_FLAGS_MASK covers them.
> 
> Suggested-by: Paolo Abeni <pabeni@redhat.com>
> Assisted-by: Codex:gpt-5
> Signed-off-by: David Carlier <devnexen@gmail.com>
> ---
>  net/mptcp/sockopt.c | 97 ++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 79 insertions(+), 18 deletions(-)
> 
> diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
> index 41c9dc9cf95e..171e83e66a97 100644
> --- a/net/mptcp/sockopt.c
> +++ b/net/mptcp/sockopt.c

(...)

> @@ -388,6 +394,41 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
>  	return -EOPNOTSUPP;
>  }
>  
> +static int mptcp_setsockopt_all_sf(struct mptcp_sock *msk, int level,
> +				   int optname, sockptr_t optval,
> +				   unsigned int optlen)
> +{
> +	struct mptcp_subflow_context *subflow;
> +	int ret = 0;
> +
> +	mptcp_for_each_subflow(msk, subflow) {
> +		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> +
> +		ret = tcp_setsockopt(ssk, level, optname, optval, optlen);
> +		if (ret)
> +			break;
> +	}
> +	return ret;
> +}
> +
> +#if IS_ENABLED(CONFIG_IPV6)
> +static int mptcp_setsockopt_v6_recverr(struct mptcp_sock *msk, int optname,
> +				       sockptr_t optval, unsigned int optlen)
> +{
> +	struct sock *sk = (struct sock *)msk;
> +	int ret;
> +
> +	ret = ipv6_setsockopt(sk, SOL_IPV6, optname, optval, optlen);
> +	if (ret)
> +		return ret;
> +
> +	lock_sock(sk);
> +	ret = mptcp_setsockopt_all_sf(msk, SOL_IPV6, optname, optval, optlen);
> +	release_sock(sk);
> +	return ret;
> +}
> +#endif

Maybe you could have one generic helper to call xxx_setsockopt() on the
MPTCP socket, and then call mptcp_setsockopt_all_sf(). You can pass the
level, and call the right function.

(...)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.
[PATCH mptcp-next v4 3/4] mptcp: support MSG_ERRQUEUE on the parent socket
Posted by David Carlier 1 week ago
Splice pending err skbs from each subflow's error queue onto the
parent msk's error queue at error-report time, so poll() and
recvmsg(MSG_ERRQUEUE) on the parent socket observe ICMP, tx
timestamp, and zerocopy completion notifications through the
standard inet ABI.

If sock_queue_err_skb() on the parent fails (rmem-limited), the
skb is left on the subflow queue and retried on the next error
report, avoiding silent loss.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Carlier <devnexen@gmail.com>
---
 net/mptcp/protocol.c | 33 ++++++++++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 0db50e3715c3..131fb6ddfcd9 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -815,21 +815,39 @@ static bool __mptcp_ofo_queue(struct mptcp_sock *msk)
 	return moved;
 }
 
+static bool __mptcp_subflow_splice_errqueue(struct sock *sk, struct sock *ssk)
+{
+	struct sk_buff *skb;
+	bool moved = false;
+
+	while ((skb = skb_dequeue(&ssk->sk_error_queue))) {
+		if (sock_queue_err_skb(sk, skb)) {
+			skb_queue_head(&ssk->sk_error_queue, skb);
+			break;
+		}
+		moved = true;
+	}
+
+	return moved;
+}
+
 static bool __mptcp_subflow_error_report(struct sock *sk, struct sock *ssk)
 {
 	int ssk_state;
+	bool report;
 	int err;
 
+	report = __mptcp_subflow_splice_errqueue(sk, ssk);
+
 	/* only propagate errors on fallen-back sockets or
 	 * on MPC connect
 	 */
 	if (sk->sk_state != TCP_SYN_SENT && !__mptcp_check_fallback(mptcp_sk(sk)))
-		return false;
+		goto out;
 
 	err = sock_error(ssk);
 	if (!err)
-		return false;
-
+		goto out;
 	/* We need to propagate only transition to CLOSE state.
 	 * Orphaned socket will see such state change via
 	 * subflow_sched_work_if_closed() and that path will properly
@@ -839,6 +857,11 @@ static bool __mptcp_subflow_error_report(struct sock *sk, struct sock *ssk)
 	if (ssk_state == TCP_CLOSE && !sock_flag(sk, SOCK_DEAD))
 		mptcp_set_state(sk, ssk_state);
 	WRITE_ONCE(sk->sk_err, -err);
+	report = true;
+
+out:
+	if (!report)
+		return false;
 
 	/* This barrier is coupled with smp_rmb() in mptcp_poll() */
 	smp_wmb();
@@ -2295,7 +2318,6 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 	int target;
 	long timeo;
 
-	/* MSG_ERRQUEUE is really a no-op till we support IP_RECVERR */
 	if (unlikely(flags & MSG_ERRQUEUE))
 		return inet_recv_error(sk, msg, len);
 
@@ -4340,7 +4362,8 @@ static __poll_t mptcp_poll(struct file *file, struct socket *sock,
 
 	/* This barrier is coupled with smp_wmb() in __mptcp_error_report() */
 	smp_rmb();
-	if (READ_ONCE(sk->sk_err))
+	if (READ_ONCE(sk->sk_err) ||
+	    !skb_queue_empty_lockless(&sk->sk_error_queue))
 		mask |= EPOLLERR;
 
 	return mask;
-- 
2.53.0
[PATCH mptcp-next v4 4/4] selftests: mptcp: cover IP_RECVERR sockopt propagation
Posted by David Carlier 1 week ago
Exercise setsockopt/getsockopt of IP_RECVERR and IPV6_RECVERR on the
MPTCP parent socket, including the empty-errqueue EAGAIN contract on
MSG_ERRQUEUE|MSG_DONTWAIT.

End-to-end errqueue delivery (ICMP, TX timestamps, zerocopy) depends on
subflow-side producers that are out of scope for this series and will be
covered by follow-up work.

Assisted-by: Codex:gpt-5
Signed-off-by: David Carlier <devnexen@gmail.com>
---
 .../selftests/net/mptcp/mptcp_sockopt.c       | 55 +++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_sockopt.c b/tools/testing/selftests/net/mptcp/mptcp_sockopt.c
index b6e58d936ebe..95bb2cc8e2ff 100644
--- a/tools/testing/selftests/net/mptcp/mptcp_sockopt.c
+++ b/tools/testing/selftests/net/mptcp/mptcp_sockopt.c
@@ -769,6 +769,60 @@ static void test_ip_tos_sockopt(int fd)
 		xerror("expect socklen_t == -1");
 }
 
+static void test_ip_recverr_sockopt(int fd)
+{
+	struct iovec iov = {
+		.iov_base = &(char){ 0 },
+		.iov_len = 1,
+	};
+	struct msghdr msg = {
+		.msg_iov = &iov,
+		.msg_iovlen = 1,
+	};
+	int one = 1, zero = 0, val = -1;
+	socklen_t s = sizeof(val);
+	int level, optname, r;
+
+	switch (pf) {
+	case AF_INET:
+		level = SOL_IP;
+		optname = IP_RECVERR;
+		break;
+	case AF_INET6:
+		level = SOL_IPV6;
+		optname = IPV6_RECVERR;
+		break;
+	default:
+		xerror("Unknown pf %d\n", pf);
+	}
+
+	r = setsockopt(fd, level, optname, &one, sizeof(one));
+	if (r)
+		die_perror("setsockopt recverr on");
+
+	r = getsockopt(fd, level, optname, &val, &s);
+	if (r)
+		die_perror("getsockopt recverr on");
+	if (s != sizeof(val) || val != one)
+		xerror("recverr on mismatch val=%d len=%u", val, s);
+
+	r = recvmsg(fd, &msg, MSG_ERRQUEUE | MSG_DONTWAIT);
+	if (r != -1 || errno != EAGAIN)
+		xerror("expected empty errqueue to return EAGAIN, ret=%d errno=%d", r, errno);
+
+	r = setsockopt(fd, level, optname, &zero, sizeof(zero));
+	if (r)
+		die_perror("setsockopt recverr off");
+
+	val = -1;
+	s = sizeof(val);
+	r = getsockopt(fd, level, optname, &val, &s);
+	if (r)
+		die_perror("getsockopt recverr off");
+	if (s != sizeof(val) || val != zero)
+		xerror("recverr off mismatch val=%d len=%u", val, s);
+}
+
 static int client(int pipefd)
 {
 	int fd = -1;
@@ -787,6 +841,7 @@ static int client(int pipefd)
 	}
 
 	test_ip_tos_sockopt(fd);
+	test_ip_recverr_sockopt(fd);
 
 	connect_one_server(fd, pipefd);
 
-- 
2.53.0
Re: [PATCH mptcp-next v3 0/3] mptcp: MSG_ERRQUEUE support on the parent socket
Posted by Matthieu Baerts 1 week, 6 days ago
Hi David,

On 22/04/2026 00:33, David Carlier wrote:
> MPTCP already advertises IP_RECVERR/IPV6_RECVERR as supported, but the
> parent socket does not currently provide usable MSG_ERRQUEUE handling.
> 
> This series wires the MPTCP socket up to the IPv4/IPv6 error queue
> paths. It propagates RECVERR-related sockopts to existing and future
> subflows, makes poll() report pending errqueue activity through the
> parent socket, and allows recvmsg(MSG_ERRQUEUE) on the MPTCP socket to
> consume queued errors with the parent socket ABI.
> 
> The series also handles mixed-family subflows by applying the matching
> sockopt according to each subflow family, and avoids silently losing an
> error skb if requeueing to the parent socket fails under rmem pressure.
> 
> v2 -> v3:

Thank you for the v3.

Do you mind sending max 1 series per day, please? Each version generates
a lot of emails that are sent and need to be triaged, it is then harder
for us to follow, plus a lot of shared resources are used.

If you need CI support, either execute the tests locally with the docker
image (preferred), or send your patches on a public fork on GitHub,
after having enabled "Actions" support there ;)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.
Re: [PATCH mptcp-next v3 0/3] mptcp: MSG_ERRQUEUE support on the parent socket
Posted by David CARLIER 1 week, 6 days ago
Hi,

On Wed, 22 Apr 2026 at 09:22, Matthieu Baerts <matttbe@kernel.org> wrote:
>
> Hi David,
>
> On 22/04/2026 00:33, David Carlier wrote:
> > MPTCP already advertises IP_RECVERR/IPV6_RECVERR as supported, but the
> > parent socket does not currently provide usable MSG_ERRQUEUE handling.
> >
> > This series wires the MPTCP socket up to the IPv4/IPv6 error queue
> > paths. It propagates RECVERR-related sockopts to existing and future
> > subflows, makes poll() report pending errqueue activity through the
> > parent socket, and allows recvmsg(MSG_ERRQUEUE) on the MPTCP socket to
> > consume queued errors with the parent socket ABI.
> >
> > The series also handles mixed-family subflows by applying the matching
> > sockopt according to each subflow family, and avoids silently losing an
> > error skb if requeueing to the parent socket fails under rmem pressure.
> >
> > v2 -> v3:
>
> Thank you for the v3.
>
> Do you mind sending max 1 series per day, please? Each version generates
> a lot of emails that are sent and need to be triaged, it is then harder
> for us to follow, plus a lot of shared resources are used.
>

Dully noted.


> If you need CI support, either execute the tests locally with the docker
> image (preferred), or send your patches on a public fork on GitHub,
> after having enabled "Actions" support there ;)


Yes I realised that only when I did the v3 :) ok I ll go through all
the remarks later. Cheers.
>
> Cheers,
> Matt
> --
> Sponsored by the NGI0 Core fund.
>
Re: [PATCH mptcp-next v3 0/3] mptcp: MSG_ERRQUEUE support on the parent socket
Posted by MPTCP CI 1 week, 6 days ago
Hi David,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Unstable: 1 failed test(s): selftest_mptcp_join ⚠️ 
- KVM Validation: debug (except selftest_mptcp_join): Unstable: 1 failed test(s): packetdrill_fastopen ⚠️ 
- KVM Validation: debug (only selftest_mptcp_join): Unstable: 1 failed test(s): selftest_mptcp_join ⚠️ 
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/24750414123

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/3d39e1ac876f
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1084059


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)