net/mptcp/protocol.c | 123 ++++++++++++++--- net/mptcp/sockopt.c | 129 ++++++++++++++++++ .../selftests/net/mptcp/mptcp_sockopt.c | 55 ++++++++ 3 files changed, 287 insertions(+), 20 deletions(-)
MPTCP already advertises IP_RECVERR/IPV6_RECVERR as supported, but the
parent socket does not currently provide usable MSG_ERRQUEUE handling.
This series wires the MPTCP socket up to the IPv4/IPv6 error queue
paths. It propagates RECVERR-related sockopts to existing and future
subflows, makes poll() report pending errqueue activity through the
parent socket, and allows recvmsg(MSG_ERRQUEUE) on the MPTCP socket to
consume queued errors with the parent socket ABI.
The series also handles mixed-family subflows by applying the matching
sockopt according to each subflow family, and avoids silently losing an
error skb if requeueing to the parent socket fails under rmem pressure.
v2 -> v3:
- Only consume ssk->sk_err in the fallback / MPC-connect branch of
__mptcp_subflow_error_report(). Steady-state MPTCP now leaves
TCP's one-shot sk_err to TCP's own consumer instead of silently
draining it via sock_error().
- In mptcp_recv_error(), also route to inet_recv_error() when
sk->sk_err is set, so a fallback-propagated error reaches userspace
even when the parent errqueue is empty.
- Scope the new selftest to IP_RECVERR sockopt propagation only.
End-to-end errqueue delivery (TX timestamps, ICMP, zerocopy)
depends on subflow-side producers that are out of scope for this
series and will be covered by follow-up work. Fixes the
mptcp_sockopt selftest timeout reported by the MPTCP CI on v2.
v1 -> v2:
- Retargeted to mptcp-next per Matthieu Baerts' feedback (net-next
closed during the merge window; iterate on the MPTCP tree).
- Guard mptcp_setsockopt_v6_recverr() and its dispatch cases in
mptcp_setsockopt_v6() with #if IS_ENABLED(CONFIG_IPV6) to fix
the MPTCP CI link break on without_ipv6/with_mptcp configs
(undefined reference to ipv6_setsockopt).
v1: https://lore.kernel.org/mptcp/20260421152216.38127-1-devnexen@gmail.com/
v2: https://lore.kernel.org/mptcp/20260421191337.58341-1-devnexen@gmail.com/
David Carlier (3):
mptcp: propagate RECVERR sockopts to subflows
mptcp: support MSG_ERRQUEUE on the parent socket
selftests: mptcp: cover IP_RECVERR sockopt propagation
net/mptcp/protocol.c | 123 ++++++++++++++---
net/mptcp/sockopt.c | 129 ++++++++++++++++++
.../selftests/net/mptcp/mptcp_sockopt.c | 55 ++++++++
3 files changed, 287 insertions(+), 20 deletions(-)
base-commit: 4464afe97dc56e817a23b730979cbc6fc48f1912
--
2.53.0
MPTCP already advertises IP_RECVERR/IPV6_RECVERR as supported, but the
parent socket does not currently provide usable MSG_ERRQUEUE handling.
This series wires the MPTCP socket up to the IPv4/IPv6 error queue
paths. It propagates RECVERR-related sockopts to existing and future
subflows, makes poll() report pending errqueue activity through the
parent socket, and lets recvmsg(MSG_ERRQUEUE) on the MPTCP socket
consume queued errors with the parent socket ABI.
A new prerequisite patch factors the per-flag inet_flags propagation
in sync_socket_options() into a single masked word copy, so further
inet_flags propagated by MPTCP can be added by extending the mask
rather than touching the call site.
Patch 2 then leverages the existing mptcp_setsockopt_all_sf() helper
for the setsockopt path and extends MPTCP_INET_FLAGS_MASK with the
four RECVERR bits, dropping the family-specific helpers from v3.
Based-on: <20260424-mptcp-pm-sockopt-set-all-sf-v1-1-38e7023822f8@kernel.org>
v3 -> v4:
- New patch 1/4: factor inet_flags propagation in
sync_socket_options() through MPTCP_INET_FLAGS_MASK, per Paolo's
review.
- Patch 2/4 (was 1/3): drop the mptcp_recverr_enabled() and
mptcp_subflow_set_recverr() helpers; route the setsockopt path
through mptcp_setsockopt_all_sf(). Inherit the four RECVERR bits
via MPTCP_INET_FLAGS_MASK in sync_socket_options() instead of
explicit inet[6]_assign_bit() calls.
- Patch 3/4 (was 2/3): rework the MSG_ERRQUEUE plumbing per Paolo's
review. Subflow err skbs are now spliced onto the parent msk's
sk_error_queue from __mptcp_subflow_error_report() via the new
__mptcp_subflow_splice_errqueue() helper. recvmsg(MSG_ERRQUEUE)
on the parent reverts to plain inet_recv_error(), and mptcp_poll()
only inspects the parent's sk_error_queue -- no more on-demand
subflow walks, no extra lock_sock() / data_lock() in the poll or
recv paths. Keep the original early-return structure of
__mptcp_subflow_error_report() and fix the reverse christmas-tree
variable order Paolo flagged.
v2 -> v3:
- Only consume ssk->sk_err in the fallback / MPC-connect branch of
__mptcp_subflow_error_report(). Steady-state MPTCP now leaves
TCP's one-shot sk_err to TCP's own consumer instead of silently
draining it via sock_error().
- In mptcp_recv_error(), also route to inet_recv_error() when
sk->sk_err is set, so a fallback-propagated error reaches userspace
even when the parent errqueue is empty.
- Scope the new selftest to IP_RECVERR sockopt propagation only.
End-to-end errqueue delivery (TX timestamps, ICMP, zerocopy)
depends on subflow-side producers that are out of scope for this
series and will be covered by follow-up work. Fixes the
mptcp_sockopt selftest timeout reported by the MPTCP CI on v2.
v1 -> v2:
- Retargeted to mptcp-next per Matthieu Baerts' feedback (net-next
closed during the merge window; iterate on the MPTCP tree).
- Guard mptcp_setsockopt_v6_recverr() and its dispatch cases in
mptcp_setsockopt_v6() with #if IS_ENABLED(CONFIG_IPV6) to fix
the MPTCP CI link break on without_ipv6/with_mptcp configs
(undefined reference to ipv6_setsockopt).
v1: https://lore.kernel.org/mptcp/20260421152216.38127-1-devnexen@gmail.com/
v2: https://lore.kernel.org/mptcp/20260421191337.58341-1-devnexen@gmail.com/
v3: https://lore.kernel.org/mptcp/20260421223338.52743-1-devnexen@gmail.com/
David Carlier (4):
mptcp: sockopt: factor inet_flags propagation into a mask
mptcp: propagate RECVERR sockopts to subflows
mptcp: support MSG_ERRQUEUE on the parent socket
selftests: mptcp: cover IP_RECVERR sockopt propagation
net/mptcp/protocol.c | 33 +++++-
net/mptcp/sockopt.c | 107 ++++++++++++++----
.../selftests/net/mptcp/mptcp_sockopt.c | 55 +++++++++
3 files changed, 170 insertions(+), 25 deletions(-)
--
2.53.0
Hi David,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Unstable: 1 failed test(s): selftest_mptcp_join ⚠️
- KVM Validation: debug (except selftest_mptcp_join): Unstable: 2 failed test(s): packetdrill_fastopen packetdrill_sockopts ⚠️
- KVM Validation: debug (only selftest_mptcp_join): Unstable: 1 failed test(s): selftest_mptcp_join ⚠️
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/25071789731
Initiator: Matthieu Baerts (NGI0)
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/7688d292b14a
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1086438
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-normal
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
Hi David, Thank you for the new version. On 27/04/2026 23:10, David Carlier wrote: > MPTCP already advertises IP_RECVERR/IPV6_RECVERR as supported, but the > parent socket does not currently provide usable MSG_ERRQUEUE handling. > > This series wires the MPTCP socket up to the IPv4/IPv6 error queue > paths. It propagates RECVERR-related sockopts to existing and future > subflows, makes poll() report pending errqueue activity through the > parent socket, and lets recvmsg(MSG_ERRQUEUE) on the MPTCP socket > consume queued errors with the parent socket ABI. > > A new prerequisite patch factors the per-flag inet_flags propagation > in sync_socket_options() into a single masked word copy, so further > inet_flags propagated by MPTCP can be added by extending the mask > rather than touching the call site. > > Patch 2 then leverages the existing mptcp_setsockopt_all_sf() helper > for the setsockopt path and extends MPTCP_INET_FLAGS_MASK with the > four RECVERR bits, dropping the family-specific helpers from v3. > > Based-on: <20260424-mptcp-pm-sockopt-set-all-sf-v1-1-38e7023822f8@kernel.org> I didn't review it, but I notice that the CI cannot apply your series, because it looks like it is not based on the one you mentioned here. Can you either remove this line, or rebase your series on top of this other patch? Also, please don't send your series as a reply to a previous posting, please use a new thread. That's what is usually done, clearer, plus some tools don't support replies. Cheers, Matt -- Sponsored by the NGI0 Core fund.
On 28/04/2026 20:48, Matthieu Baerts wrote: > Hi David, > > Thank you for the new version. > > On 27/04/2026 23:10, David Carlier wrote: >> MPTCP already advertises IP_RECVERR/IPV6_RECVERR as supported, but the >> parent socket does not currently provide usable MSG_ERRQUEUE handling. >> >> This series wires the MPTCP socket up to the IPv4/IPv6 error queue >> paths. It propagates RECVERR-related sockopts to existing and future >> subflows, makes poll() report pending errqueue activity through the >> parent socket, and lets recvmsg(MSG_ERRQUEUE) on the MPTCP socket >> consume queued errors with the parent socket ABI. >> >> A new prerequisite patch factors the per-flag inet_flags propagation >> in sync_socket_options() into a single masked word copy, so further >> inet_flags propagated by MPTCP can be added by extending the mask >> rather than touching the call site. >> >> Patch 2 then leverages the existing mptcp_setsockopt_all_sf() helper >> for the setsockopt path and extends MPTCP_INET_FLAGS_MASK with the >> four RECVERR bits, dropping the family-specific helpers from v3. >> >> Based-on: <20260424-mptcp-pm-sockopt-set-all-sf-v1-1-38e7023822f8@kernel.org> > > I didn't review it, but I notice that the CI cannot apply your series, > because it looks like it is not based on the one you mentioned here. > > Can you either remove this line, or rebase your series on top of this > other patch? > > Also, please don't send your series as a reply to a previous posting, > please use a new thread. That's what is usually done, clearer, plus some > tools don't support replies. Note: I just manually resolved the conflicts and sent the series to the CI, not to have to resend a series just to retrigger the CI. Cheers, Matt -- Sponsored by the NGI0 Core fund.
Hi David, On 28/04/2026 20:56, Matthieu Baerts wrote: > On 28/04/2026 20:48, Matthieu Baerts wrote: >> Hi David, >> >> Thank you for the new version. >> >> On 27/04/2026 23:10, David Carlier wrote: >>> MPTCP already advertises IP_RECVERR/IPV6_RECVERR as supported, but the >>> parent socket does not currently provide usable MSG_ERRQUEUE handling. >>> >>> This series wires the MPTCP socket up to the IPv4/IPv6 error queue >>> paths. It propagates RECVERR-related sockopts to existing and future >>> subflows, makes poll() report pending errqueue activity through the >>> parent socket, and lets recvmsg(MSG_ERRQUEUE) on the MPTCP socket >>> consume queued errors with the parent socket ABI. >>> >>> A new prerequisite patch factors the per-flag inet_flags propagation >>> in sync_socket_options() into a single masked word copy, so further >>> inet_flags propagated by MPTCP can be added by extending the mask >>> rather than touching the call site. >>> >>> Patch 2 then leverages the existing mptcp_setsockopt_all_sf() helper >>> for the setsockopt path and extends MPTCP_INET_FLAGS_MASK with the >>> four RECVERR bits, dropping the family-specific helpers from v3. >>> >>> Based-on: <20260424-mptcp-pm-sockopt-set-all-sf-v1-1-38e7023822f8@kernel.org> >> >> I didn't review it, but I notice that the CI cannot apply your series, >> because it looks like it is not based on the one you mentioned here. >> >> Can you either remove this line, or rebase your series on top of this >> other patch? >> >> Also, please don't send your series as a reply to a previous posting, >> please use a new thread. That's what is usually done, clearer, plus some >> tools don't support replies. > > Note: I just manually resolved the conflicts and sent the series to the > CI, not to have to resend a series just to retrigger the CI. It looks like the CI (and sashiko) found some issues with this series. But globally, I'm a bit puzzled: with MPTCP, there might be multiple paths being used, and reporting errors about all of them when the "legacy" RECVERR socket options are used will confuse the userspace that doesn't (have to) know multiple subflows are being used. In this case, either messages should be filtered (might be hard to handle all use-cases and maintain that?), or this should be limited to cases where only one subflow is being used. Which leads me to this question: what's your use-case exactly? What are you trying to solve? It might be easier to have a dedicated MPTCP_RECERR, and eventually propagate more MPTCP-specific messages. Something that could be linked to: https://github.com/multipath-tcp/mptcp_net-next/issues/78 WDYT? Cheers, Matt -- Sponsored by the NGI0 Core fund.
Hi Matthieu,
On 01/05/2026 16:49, Matthieu Baerts wrote:
> It looks like the CI (and sashiko) found some issues with this series.
For v5:
- 1/4: per-bit inet_assign_bit() loop instead of WRITE_ONCE(), keeps
atomicity.
- 2/4: add missing sockopt_seq_inc(msk).
- 2/4: skip family-mismatched subflows in the v4/v6 helpers.
- 2/4: snapshot optval to a local int, pass KERNEL_SOCKPTR(&val) into
the loop.
- 3/4: pull-on-drain from mptcp_recv_error() so a parent-ENOMEM does
not strand subflow skbs.
Will also re-run the docker repro to check the selftest_mptcp_join /
packetdrill rows are pre-existing.
> But globally, I'm a bit puzzled: with MPTCP, there might be multiple
> paths being used, and reporting errors about all of them when the
> "legacy" RECVERR socket options are used will confuse the userspace
> that doesn't (have to) know multiple subflows are being used.
Fair, and Paolo raised it on v3. The use-case is tx timestamping and
MSG_ZEROCOPY completions - both are tied to user data, not the
subflow that carried it, so no subflow identity leaks into the cmsg.
ICMP/ICMPv6 is the part that does. v5 will filter the splice by
SO_EE_ORIGIN: forward TIMESTAMPING / ZEROCOPY / LOCAL, drop ICMP.
> It might be easier to have a dedicated MPTCP_RECERR, and
eventually
> propagate more MPTCP-specific messages. Something that could be
> linked to:
> https://github.com/multipath-tcp/mptcp_net-next/issues/78
Agreed - subflow ICMP and #78's lifecycle events belong there. As a
follow-up once v5 lands.
Cheers
On 01/05/2026 17:28, David CARLIER wrote: > Hi Matthieu, > > On 01/05/2026 16:49, Matthieu Baerts wrote: > > It looks like the CI (and sashiko) found some issues with this series. (Please do fix your email client to avoid this formatting: some of your emails are OK, but not all of them.) > For v5: > > - 1/4: per-bit inet_assign_bit() loop instead of WRITE_ONCE(), keeps > atomicity. > - 2/4: add missing sockopt_seq_inc(msk). > - 2/4: skip family-mismatched subflows in the v4/v6 helpers. > - 2/4: snapshot optval to a local int, pass KERNEL_SOCKPTR(&val) into > the loop. (While at it, your new helpers mptcp_setsockopt_v[46]_recverr could have a generic name) > - 3/4: pull-on-drain from mptcp_recv_error() so a parent-ENOMEM does > not strand subflow skbs. > > Will also re-run the docker repro to check the selftest_mptcp_join / > packetdrill rows are pre-existing. The packetdrill errors might be pre-existing, someone should look at improving the situation there: https://ci-results.mptcp.dev/flakes.html > > But globally, I'm a bit puzzled: with MPTCP, there might be multiple > > paths being used, and reporting errors about all of them when the > > "legacy" RECVERR socket options are used will confuse the userspace > > that doesn't (have to) know multiple subflows are being used. > > Fair, and Paolo raised it on v3. The use-case is tx timestamping and > MSG_ZEROCOPY completions - both are tied to user data, not the > subflow that carried it, so no subflow identity leaks into the cmsg. > ICMP/ICMPv6 is the part that does. v5 will filter the splice by > SO_EE_ORIGIN: forward TIMESTAMPING / ZEROCOPY / LOCAL, drop ICMP. Maybe OK with this filter indeed.. > > It might be easier to have a dedicated MPTCP_RECERR, and > eventually > > propagate more MPTCP-specific messages. Something that could be > > linked to: > > https://github.com/multipath-tcp/mptcp_net-next/issues/78 > > Agreed - subflow ICMP and #78's lifecycle events belong there. As a > follow-up once v5 lands. Indeed, better to split them. Cheers, Matt -- Sponsored by the NGI0 Core fund.
Hi Mathieu, On Tue, 28 Apr 2026 at 19:56, Matthieu Baerts <matttbe@kernel.org> wrote: > > On 28/04/2026 20:48, Matthieu Baerts wrote: > > Hi David, > > > > Thank you for the new version. > > > > On 27/04/2026 23:10, David Carlier wrote: > >> MPTCP already advertises IP_RECVERR/IPV6_RECVERR as supported, but the > >> parent socket does not currently provide usable MSG_ERRQUEUE handling. > >> > >> This series wires the MPTCP socket up to the IPv4/IPv6 error queue > >> paths. It propagates RECVERR-related sockopts to existing and future > >> subflows, makes poll() report pending errqueue activity through the > >> parent socket, and lets recvmsg(MSG_ERRQUEUE) on the MPTCP socket > >> consume queued errors with the parent socket ABI. > >> > >> A new prerequisite patch factors the per-flag inet_flags propagation > >> in sync_socket_options() into a single masked word copy, so further > >> inet_flags propagated by MPTCP can be added by extending the mask > >> rather than touching the call site. > >> > >> Patch 2 then leverages the existing mptcp_setsockopt_all_sf() helper > >> for the setsockopt path and extends MPTCP_INET_FLAGS_MASK with the > >> four RECVERR bits, dropping the family-specific helpers from v3. > >> > >> Based-on: <20260424-mptcp-pm-sockopt-set-all-sf-v1-1-38e7023822f8@kernel.org> > > > > I didn't review it, but I notice that the CI cannot apply your series, > > because it looks like it is not based on the one you mentioned here. > > > > Can you either remove this line, or rebase your series on top of this > > other patch? > > > > Also, please don't send your series as a reply to a previous posting, > > please use a new thread. That's what is usually done, clearer, plus some > > tools don't support replies. > > Note: I just manually resolved the conflicts and sent the series to the > CI, not to have to resend a series just to retrigger the CI. appreciated. Cheers. > > Cheers, > Matt > -- > Sponsored by the NGI0 Core fund. >
Replace the per-flag inet_assign_bit() calls in sync_socket_options()
with a masked word-level copy of inet_sk()->inet_flags. Introduce
MPTCP_INET_FLAGS_MASK so further flags propagated by MPTCP can be
added by extending the mask rather than touching the call site.
No functional change.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Carlier <devnexen@gmail.com>
---
net/mptcp/sockopt.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index 0efe40be2fde..41c9dc9cf95e 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -16,6 +16,10 @@
#define MIN_INFO_OPTLEN_SIZE 16
#define MIN_FULL_INFO_OPTLEN_SIZE 40
+#define MPTCP_INET_FLAGS_MASK \
+ (BIT(INET_FLAGS_TRANSPARENT) | \
+ BIT(INET_FLAGS_FREEBIND) | \
+ BIT(INET_FLAGS_BIND_ADDRESS_NO_PORT))
static struct sock *__mptcp_tcp_fallback(struct mptcp_sock *msk)
{
@@ -1536,6 +1540,7 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
{
static const unsigned int tx_rx_locks = SOCK_RCVBUF_LOCK | SOCK_SNDBUF_LOCK;
struct sock *sk = (struct sock *)msk;
+ unsigned long flags;
bool keep_open;
keep_open = sock_flag(sk, SOCK_KEEPOPEN);
@@ -1582,9 +1587,10 @@ static void sync_socket_options(struct mptcp_sock *msk, struct sock *ssk)
tcp_sock_set_keepcnt(ssk, msk->keepalive_cnt);
tcp_sock_set_maxseg(ssk, msk->maxseg);
- inet_assign_bit(TRANSPARENT, ssk, inet_test_bit(TRANSPARENT, sk));
- inet_assign_bit(FREEBIND, ssk, inet_test_bit(FREEBIND, sk));
- inet_assign_bit(BIND_ADDRESS_NO_PORT, ssk, inet_test_bit(BIND_ADDRESS_NO_PORT, sk));
+ flags = inet_sk(ssk)->inet_flags;
+ flags &= ~MPTCP_INET_FLAGS_MASK;
+ flags |= inet_sk(sk)->inet_flags & MPTCP_INET_FLAGS_MASK;
+ WRITE_ONCE(inet_sk(ssk)->inet_flags, flags);
WRITE_ONCE(inet_sk(ssk)->local_port_range, READ_ONCE(inet_sk(sk)->local_port_range));
}
--
2.53.0
Propagate IP_RECVERR/IP_RECVERR_RFC4884 and
IPV6_RECVERR/IPV6_RECVERR_RFC4884 from the MPTCP socket to existing
and future subflows. The setsockopt path forwards each option to
every subflow via mptcp_setsockopt_all_sf(); newly-joining subflows
inherit the four RECVERR bits through sync_socket_options() now that
MPTCP_INET_FLAGS_MASK covers them.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Assisted-by: Codex:gpt-5
Signed-off-by: David Carlier <devnexen@gmail.com>
---
net/mptcp/sockopt.c | 97 ++++++++++++++++++++++++++++++++++++---------
1 file changed, 79 insertions(+), 18 deletions(-)
diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index 41c9dc9cf95e..171e83e66a97 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -8,6 +8,8 @@
#include <linux/kernel.h>
#include <linux/module.h>
+#include <net/ip.h>
+#include <net/ipv6.h>
#include <net/sock.h>
#include <net/protocol.h>
#include <net/tcp.h>
@@ -19,7 +21,11 @@
#define MPTCP_INET_FLAGS_MASK \
(BIT(INET_FLAGS_TRANSPARENT) | \
BIT(INET_FLAGS_FREEBIND) | \
- BIT(INET_FLAGS_BIND_ADDRESS_NO_PORT))
+ BIT(INET_FLAGS_BIND_ADDRESS_NO_PORT) | \
+ BIT(INET_FLAGS_RECVERR) | \
+ BIT(INET_FLAGS_RECVERR_RFC4884) | \
+ BIT(INET_FLAGS_RECVERR6) | \
+ BIT(INET_FLAGS_RECVERR6_RFC4884))
static struct sock *__mptcp_tcp_fallback(struct mptcp_sock *msk)
{
@@ -388,6 +394,41 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
return -EOPNOTSUPP;
}
+static int mptcp_setsockopt_all_sf(struct mptcp_sock *msk, int level,
+ int optname, sockptr_t optval,
+ unsigned int optlen)
+{
+ struct mptcp_subflow_context *subflow;
+ int ret = 0;
+
+ mptcp_for_each_subflow(msk, subflow) {
+ struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+
+ ret = tcp_setsockopt(ssk, level, optname, optval, optlen);
+ if (ret)
+ break;
+ }
+ return ret;
+}
+
+#if IS_ENABLED(CONFIG_IPV6)
+static int mptcp_setsockopt_v6_recverr(struct mptcp_sock *msk, int optname,
+ sockptr_t optval, unsigned int optlen)
+{
+ struct sock *sk = (struct sock *)msk;
+ int ret;
+
+ ret = ipv6_setsockopt(sk, SOL_IPV6, optname, optval, optlen);
+ if (ret)
+ return ret;
+
+ lock_sock(sk);
+ ret = mptcp_setsockopt_all_sf(msk, SOL_IPV6, optname, optval, optlen);
+ release_sock(sk);
+ return ret;
+}
+#endif
+
static int mptcp_setsockopt_v6(struct mptcp_sock *msk, int optname,
sockptr_t optval, unsigned int optlen)
{
@@ -430,6 +471,12 @@ static int mptcp_setsockopt_v6(struct mptcp_sock *msk, int optname,
release_sock(sk);
break;
+#if IS_ENABLED(CONFIG_IPV6)
+ case IPV6_RECVERR:
+ case IPV6_RECVERR_RFC4884:
+ ret = mptcp_setsockopt_v6_recverr(msk, optname, optval, optlen);
+ break;
+#endif
}
return ret;
@@ -764,6 +811,22 @@ static int mptcp_setsockopt_v4_set_tos(struct mptcp_sock *msk, int optname,
return 0;
}
+static int mptcp_setsockopt_v4_recverr(struct mptcp_sock *msk, int optname,
+ sockptr_t optval, unsigned int optlen)
+{
+ struct sock *sk = (struct sock *)msk;
+ int ret;
+
+ ret = ip_setsockopt(sk, SOL_IP, optname, optval, optlen);
+ if (ret)
+ return ret;
+
+ lock_sock(sk);
+ ret = mptcp_setsockopt_all_sf(msk, SOL_IP, optname, optval, optlen);
+ release_sock(sk);
+ return ret;
+}
+
static int mptcp_setsockopt_v4(struct mptcp_sock *msk, int optname,
sockptr_t optval, unsigned int optlen)
{
@@ -775,6 +838,9 @@ static int mptcp_setsockopt_v4(struct mptcp_sock *msk, int optname,
return mptcp_setsockopt_sol_ip_set(msk, optname, optval, optlen);
case IP_TOS:
return mptcp_setsockopt_v4_set_tos(msk, optname, optval, optlen);
+ case IP_RECVERR:
+ case IP_RECVERR_RFC4884:
+ return mptcp_setsockopt_v4_recverr(msk, optname, optval, optlen);
}
return -EOPNOTSUPP;
@@ -802,23 +868,6 @@ static int mptcp_setsockopt_first_sf_only(struct mptcp_sock *msk, int level, int
return ret;
}
-static int mptcp_setsockopt_all_sf(struct mptcp_sock *msk, int level,
- int optname, sockptr_t optval,
- unsigned int optlen)
-{
- struct mptcp_subflow_context *subflow;
- int ret = 0;
-
- mptcp_for_each_subflow(msk, subflow) {
- struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
-
- ret = tcp_setsockopt(ssk, level, optname, optval, optlen);
- if (ret)
- break;
- }
- return ret;
-}
-
static int mptcp_setsockopt_sol_tcp(struct mptcp_sock *msk, int optname,
sockptr_t optval, unsigned int optlen)
{
@@ -1463,6 +1512,12 @@ static int mptcp_getsockopt_v4(struct mptcp_sock *msk, int optname,
case IP_LOCAL_PORT_RANGE:
return mptcp_put_int_option(msk, optval, optlen,
READ_ONCE(inet_sk(sk)->local_port_range));
+ case IP_RECVERR:
+ return mptcp_put_int_option(msk, optval, optlen,
+ inet_test_bit(RECVERR, sk));
+ case IP_RECVERR_RFC4884:
+ return mptcp_put_int_option(msk, optval, optlen,
+ inet_test_bit(RECVERR_RFC4884, sk));
}
return -EOPNOTSUPP;
@@ -1483,6 +1538,12 @@ static int mptcp_getsockopt_v6(struct mptcp_sock *msk, int optname,
case IPV6_FREEBIND:
return mptcp_put_int_option(msk, optval, optlen,
inet_test_bit(FREEBIND, sk));
+ case IPV6_RECVERR:
+ return mptcp_put_int_option(msk, optval, optlen,
+ inet6_test_bit(RECVERR6, sk));
+ case IPV6_RECVERR_RFC4884:
+ return mptcp_put_int_option(msk, optval, optlen,
+ inet6_test_bit(RECVERR6_RFC4884, sk));
}
return -EOPNOTSUPP;
--
2.53.0
On 27/04/2026 23:10, David Carlier wrote:
> Propagate IP_RECVERR/IP_RECVERR_RFC4884 and
> IPV6_RECVERR/IPV6_RECVERR_RFC4884 from the MPTCP socket to existing
> and future subflows. The setsockopt path forwards each option to
> every subflow via mptcp_setsockopt_all_sf(); newly-joining subflows
> inherit the four RECVERR bits through sync_socket_options() now that
> MPTCP_INET_FLAGS_MASK covers them.
>
> Suggested-by: Paolo Abeni <pabeni@redhat.com>
> Assisted-by: Codex:gpt-5
> Signed-off-by: David Carlier <devnexen@gmail.com>
> ---
> net/mptcp/sockopt.c | 97 ++++++++++++++++++++++++++++++++++++---------
> 1 file changed, 79 insertions(+), 18 deletions(-)
>
> diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
> index 41c9dc9cf95e..171e83e66a97 100644
> --- a/net/mptcp/sockopt.c
> +++ b/net/mptcp/sockopt.c
(...)
> @@ -388,6 +394,41 @@ static int mptcp_setsockopt_sol_socket(struct mptcp_sock *msk, int optname,
> return -EOPNOTSUPP;
> }
>
> +static int mptcp_setsockopt_all_sf(struct mptcp_sock *msk, int level,
> + int optname, sockptr_t optval,
> + unsigned int optlen)
> +{
> + struct mptcp_subflow_context *subflow;
> + int ret = 0;
> +
> + mptcp_for_each_subflow(msk, subflow) {
> + struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> +
> + ret = tcp_setsockopt(ssk, level, optname, optval, optlen);
> + if (ret)
> + break;
> + }
> + return ret;
> +}
> +
> +#if IS_ENABLED(CONFIG_IPV6)
> +static int mptcp_setsockopt_v6_recverr(struct mptcp_sock *msk, int optname,
> + sockptr_t optval, unsigned int optlen)
> +{
> + struct sock *sk = (struct sock *)msk;
> + int ret;
> +
> + ret = ipv6_setsockopt(sk, SOL_IPV6, optname, optval, optlen);
> + if (ret)
> + return ret;
> +
> + lock_sock(sk);
> + ret = mptcp_setsockopt_all_sf(msk, SOL_IPV6, optname, optval, optlen);
> + release_sock(sk);
> + return ret;
> +}
> +#endif
Maybe you could have one generic helper to call xxx_setsockopt() on the
MPTCP socket, and then call mptcp_setsockopt_all_sf(). You can pass the
level, and call the right function.
(...)
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
Splice pending err skbs from each subflow's error queue onto the
parent msk's error queue at error-report time, so poll() and
recvmsg(MSG_ERRQUEUE) on the parent socket observe ICMP, tx
timestamp, and zerocopy completion notifications through the
standard inet ABI.
If sock_queue_err_skb() on the parent fails (rmem-limited), the
skb is left on the subflow queue and retried on the next error
report, avoiding silent loss.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David Carlier <devnexen@gmail.com>
---
net/mptcp/protocol.c | 33 ++++++++++++++++++++++++++++-----
1 file changed, 28 insertions(+), 5 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 0db50e3715c3..131fb6ddfcd9 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -815,21 +815,39 @@ static bool __mptcp_ofo_queue(struct mptcp_sock *msk)
return moved;
}
+static bool __mptcp_subflow_splice_errqueue(struct sock *sk, struct sock *ssk)
+{
+ struct sk_buff *skb;
+ bool moved = false;
+
+ while ((skb = skb_dequeue(&ssk->sk_error_queue))) {
+ if (sock_queue_err_skb(sk, skb)) {
+ skb_queue_head(&ssk->sk_error_queue, skb);
+ break;
+ }
+ moved = true;
+ }
+
+ return moved;
+}
+
static bool __mptcp_subflow_error_report(struct sock *sk, struct sock *ssk)
{
int ssk_state;
+ bool report;
int err;
+ report = __mptcp_subflow_splice_errqueue(sk, ssk);
+
/* only propagate errors on fallen-back sockets or
* on MPC connect
*/
if (sk->sk_state != TCP_SYN_SENT && !__mptcp_check_fallback(mptcp_sk(sk)))
- return false;
+ goto out;
err = sock_error(ssk);
if (!err)
- return false;
-
+ goto out;
/* We need to propagate only transition to CLOSE state.
* Orphaned socket will see such state change via
* subflow_sched_work_if_closed() and that path will properly
@@ -839,6 +857,11 @@ static bool __mptcp_subflow_error_report(struct sock *sk, struct sock *ssk)
if (ssk_state == TCP_CLOSE && !sock_flag(sk, SOCK_DEAD))
mptcp_set_state(sk, ssk_state);
WRITE_ONCE(sk->sk_err, -err);
+ report = true;
+
+out:
+ if (!report)
+ return false;
/* This barrier is coupled with smp_rmb() in mptcp_poll() */
smp_wmb();
@@ -2295,7 +2318,6 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
int target;
long timeo;
- /* MSG_ERRQUEUE is really a no-op till we support IP_RECVERR */
if (unlikely(flags & MSG_ERRQUEUE))
return inet_recv_error(sk, msg, len);
@@ -4340,7 +4362,8 @@ static __poll_t mptcp_poll(struct file *file, struct socket *sock,
/* This barrier is coupled with smp_wmb() in __mptcp_error_report() */
smp_rmb();
- if (READ_ONCE(sk->sk_err))
+ if (READ_ONCE(sk->sk_err) ||
+ !skb_queue_empty_lockless(&sk->sk_error_queue))
mask |= EPOLLERR;
return mask;
--
2.53.0
Exercise setsockopt/getsockopt of IP_RECVERR and IPV6_RECVERR on the
MPTCP parent socket, including the empty-errqueue EAGAIN contract on
MSG_ERRQUEUE|MSG_DONTWAIT.
End-to-end errqueue delivery (ICMP, TX timestamps, zerocopy) depends on
subflow-side producers that are out of scope for this series and will be
covered by follow-up work.
Assisted-by: Codex:gpt-5
Signed-off-by: David Carlier <devnexen@gmail.com>
---
.../selftests/net/mptcp/mptcp_sockopt.c | 55 +++++++++++++++++++
1 file changed, 55 insertions(+)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_sockopt.c b/tools/testing/selftests/net/mptcp/mptcp_sockopt.c
index b6e58d936ebe..95bb2cc8e2ff 100644
--- a/tools/testing/selftests/net/mptcp/mptcp_sockopt.c
+++ b/tools/testing/selftests/net/mptcp/mptcp_sockopt.c
@@ -769,6 +769,60 @@ static void test_ip_tos_sockopt(int fd)
xerror("expect socklen_t == -1");
}
+static void test_ip_recverr_sockopt(int fd)
+{
+ struct iovec iov = {
+ .iov_base = &(char){ 0 },
+ .iov_len = 1,
+ };
+ struct msghdr msg = {
+ .msg_iov = &iov,
+ .msg_iovlen = 1,
+ };
+ int one = 1, zero = 0, val = -1;
+ socklen_t s = sizeof(val);
+ int level, optname, r;
+
+ switch (pf) {
+ case AF_INET:
+ level = SOL_IP;
+ optname = IP_RECVERR;
+ break;
+ case AF_INET6:
+ level = SOL_IPV6;
+ optname = IPV6_RECVERR;
+ break;
+ default:
+ xerror("Unknown pf %d\n", pf);
+ }
+
+ r = setsockopt(fd, level, optname, &one, sizeof(one));
+ if (r)
+ die_perror("setsockopt recverr on");
+
+ r = getsockopt(fd, level, optname, &val, &s);
+ if (r)
+ die_perror("getsockopt recverr on");
+ if (s != sizeof(val) || val != one)
+ xerror("recverr on mismatch val=%d len=%u", val, s);
+
+ r = recvmsg(fd, &msg, MSG_ERRQUEUE | MSG_DONTWAIT);
+ if (r != -1 || errno != EAGAIN)
+ xerror("expected empty errqueue to return EAGAIN, ret=%d errno=%d", r, errno);
+
+ r = setsockopt(fd, level, optname, &zero, sizeof(zero));
+ if (r)
+ die_perror("setsockopt recverr off");
+
+ val = -1;
+ s = sizeof(val);
+ r = getsockopt(fd, level, optname, &val, &s);
+ if (r)
+ die_perror("getsockopt recverr off");
+ if (s != sizeof(val) || val != zero)
+ xerror("recverr off mismatch val=%d len=%u", val, s);
+}
+
static int client(int pipefd)
{
int fd = -1;
@@ -787,6 +841,7 @@ static int client(int pipefd)
}
test_ip_tos_sockopt(fd);
+ test_ip_recverr_sockopt(fd);
connect_one_server(fd, pipefd);
--
2.53.0
Hi David, On 22/04/2026 00:33, David Carlier wrote: > MPTCP already advertises IP_RECVERR/IPV6_RECVERR as supported, but the > parent socket does not currently provide usable MSG_ERRQUEUE handling. > > This series wires the MPTCP socket up to the IPv4/IPv6 error queue > paths. It propagates RECVERR-related sockopts to existing and future > subflows, makes poll() report pending errqueue activity through the > parent socket, and allows recvmsg(MSG_ERRQUEUE) on the MPTCP socket to > consume queued errors with the parent socket ABI. > > The series also handles mixed-family subflows by applying the matching > sockopt according to each subflow family, and avoids silently losing an > error skb if requeueing to the parent socket fails under rmem pressure. > > v2 -> v3: Thank you for the v3. Do you mind sending max 1 series per day, please? Each version generates a lot of emails that are sent and need to be triaged, it is then harder for us to follow, plus a lot of shared resources are used. If you need CI support, either execute the tests locally with the docker image (preferred), or send your patches on a public fork on GitHub, after having enabled "Actions" support there ;) Cheers, Matt -- Sponsored by the NGI0 Core fund.
Hi, On Wed, 22 Apr 2026 at 09:22, Matthieu Baerts <matttbe@kernel.org> wrote: > > Hi David, > > On 22/04/2026 00:33, David Carlier wrote: > > MPTCP already advertises IP_RECVERR/IPV6_RECVERR as supported, but the > > parent socket does not currently provide usable MSG_ERRQUEUE handling. > > > > This series wires the MPTCP socket up to the IPv4/IPv6 error queue > > paths. It propagates RECVERR-related sockopts to existing and future > > subflows, makes poll() report pending errqueue activity through the > > parent socket, and allows recvmsg(MSG_ERRQUEUE) on the MPTCP socket to > > consume queued errors with the parent socket ABI. > > > > The series also handles mixed-family subflows by applying the matching > > sockopt according to each subflow family, and avoids silently losing an > > error skb if requeueing to the parent socket fails under rmem pressure. > > > > v2 -> v3: > > Thank you for the v3. > > Do you mind sending max 1 series per day, please? Each version generates > a lot of emails that are sent and need to be triaged, it is then harder > for us to follow, plus a lot of shared resources are used. > Dully noted. > If you need CI support, either execute the tests locally with the docker > image (preferred), or send your patches on a public fork on GitHub, > after having enabled "Actions" support there ;) Yes I realised that only when I did the v3 :) ok I ll go through all the remarks later. Cheers. > > Cheers, > Matt > -- > Sponsored by the NGI0 Core fund. >
Hi David,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Unstable: 1 failed test(s): selftest_mptcp_join ⚠️
- KVM Validation: debug (except selftest_mptcp_join): Unstable: 1 failed test(s): packetdrill_fastopen ⚠️
- KVM Validation: debug (only selftest_mptcp_join): Unstable: 1 failed test(s): selftest_mptcp_join ⚠️
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/24750414123
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/3d39e1ac876f
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1084059
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-normal
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
© 2016 - 2026 Red Hat, Inc.