When TCP option space is insufficient (e.g., IPv6 with tcp_timestamps
enabled), the original code jumped to out_unlock without clearing the
addr_signal flag. This caused mptcp_pm_add_timer to keep rescheduling
indefinitely without sending ADD_ADDR, preventing the endpoint list from
being traversed.
In a pure ACK scenario (indicated by drop_other_suboptions=true), if
the option space is insufficient to carry the ADD_ADDR suboption, it
is appropriate to drop this address signal to allow the timer handler
to move on to other addresses.
Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout")
Signed-off-by: Li Xiasong <lixiasong1@huawei.com>
---
Seeking feedback on:
When announcing addresses to the peer, MPTCP sends a pure ACK packet
to carry MPTCP options (ADD_ADDR). In this scenario, if the option space
is insufficient for ADD_ADDR, clearing addr_signal would:
- Prevent the timer from retrying infinitely
- Allow the timer to continue traversing and processing other addresses
- Not block other subflow creation or address announcement operations
Is there any scenario where we should retry later instead of clearing
the address signal/echo flag? However, if a pure ACK doesn't have
enough space for the flag, subsequent packets won't either.
---
net/mptcp/pm.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 57a456690406..1d49779c6a1f 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -881,19 +881,18 @@ bool mptcp_pm_add_addr_signal(struct mptcp_sock *msk, const struct sk_buff *skb,
}
*echo = mptcp_pm_should_add_signal_echo(msk);
+ add_addr = msk->pm.addr_signal &
+ ~(*echo ? BIT(MPTCP_ADD_ADDR_ECHO) : BIT(MPTCP_ADD_ADDR_SIGNAL));
port = !!(*echo ? msk->pm.remote.port : msk->pm.local.port);
-
family = *echo ? msk->pm.remote.family : msk->pm.local.family;
- if (remaining < mptcp_add_addr_len(family, *echo, port))
- goto out_unlock;
- if (*echo) {
- *addr = msk->pm.remote;
- add_addr = msk->pm.addr_signal & ~BIT(MPTCP_ADD_ADDR_ECHO);
- } else {
- *addr = msk->pm.local;
- add_addr = msk->pm.addr_signal & ~BIT(MPTCP_ADD_ADDR_SIGNAL);
+ if (remaining < mptcp_add_addr_len(family, *echo, port)) {
+ if (*drop_other_suboptions)
+ WRITE_ONCE(msk->pm.addr_signal, add_addr);
+ goto out_unlock;
}
+
+ *addr = *echo ? msk->pm.remote : msk->pm.local;
WRITE_ONCE(msk->pm.addr_signal, add_addr);
ret = true;
--
2.34.1
Hi Li,
On 18/04/2026 12:00, Li Xiasong wrote:
> When TCP option space is insufficient (e.g., IPv6 with tcp_timestamps
> enabled), the original code jumped to out_unlock without clearing the
> addr_signal flag. This caused mptcp_pm_add_timer to keep rescheduling
> indefinitely without sending ADD_ADDR,
Funny, I was looking at this issue on Friday evening :)
> preventing the endpoint list from being traversed.
It might help to add a bit of context: I guess here you meant that it
prevent advertising other ADD_ADDR, not using other subflows when
sending data, right?
> In a pure ACK scenario (indicated by drop_other_suboptions=true), if
> the option space is insufficient to carry the ADD_ADDR suboption, it
> is appropriate to drop this address signal to allow the timer handler
> to move on to other addresses.
>
> Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout")
> Signed-off-by: Li Xiasong <lixiasong1@huawei.com>
> ---
>
> Seeking feedback on:
>
> When announcing addresses to the peer, MPTCP sends a pure ACK packet
> to carry MPTCP options (ADD_ADDR). In this scenario, if the option space
> is insufficient for ADD_ADDR, clearing addr_signal would:
>
> - Prevent the timer from retrying infinitely
> - Allow the timer to continue traversing and processing other addresses
> - Not block other subflow creation or address announcement operations
>
> Is there any scenario where we should retry later instead of clearing
> the address signal/echo flag? However, if a pure ACK doesn't have
> enough space for the flag, subsequent packets won't either.
That's correct: for the moment, if it is a pure ACK and there is not
enough space, no need to retry later because it is not possible to have
more space. It should only happen with an ADD_ADDR containing an IPv6
address and a port number. It might be good to specify this in the
commit message.
> ---
> net/mptcp/pm.c | 17 ++++++++---------
> 1 file changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
> index 57a456690406..1d49779c6a1f 100644
> --- a/net/mptcp/pm.c
> +++ b/net/mptcp/pm.c
> @@ -881,19 +881,18 @@ bool mptcp_pm_add_addr_signal(struct mptcp_sock *msk, const struct sk_buff *skb,
> }
>
> *echo = mptcp_pm_should_add_signal_echo(msk);
> + add_addr = msk->pm.addr_signal &
> + ~(*echo ? BIT(MPTCP_ADD_ADDR_ECHO) : BIT(MPTCP_ADD_ADDR_SIGNAL));
> port = !!(*echo ? msk->pm.remote.port : msk->pm.local.port);
> -
> family = *echo ? msk->pm.remote.family : msk->pm.local.family;
nit: while at it, maybe clearer to have a dedicated 'if (*echo)' instead
of 3 lines with '*echo ? ... : ..., no?
if (*echo) {
add_addr = ...
port = ...
family = ...
} else {
add_addr = ...
port = ...
family = ...
}
> - if (remaining < mptcp_add_addr_len(family, *echo, port))
> - goto out_unlock;
>
> - if (*echo) {
> - *addr = msk->pm.remote;
> - add_addr = msk->pm.addr_signal & ~BIT(MPTCP_ADD_ADDR_ECHO);
> - } else {
> - *addr = msk->pm.local;
> - add_addr = msk->pm.addr_signal & ~BIT(MPTCP_ADD_ADDR_SIGNAL);
> + if (remaining < mptcp_add_addr_len(family, *echo, port)) {
> + if (*drop_other_suboptions)
> + WRITE_ONCE(msk->pm.addr_signal, add_addr);
If it is dropped, it would be helpful to increment the ADDADDRTXDROP MIB
counter, and ideally check that in the MPTCP selftests (e.g. adding a
new subtest in mptcp_join.sh, in add_addr_ports_tests()?).
Also, I wonder if it would not be clearer to jump to a new label here...
> + goto out_unlock;
> }
> +
> + *addr = *echo ? msk->pm.remote : msk->pm.local;
> WRITE_ONCE(msk->pm.addr_signal, add_addr);
> ret = true;
... inverting the two lines above, and adding "drop_signal_mark" label?
Apart from the comments above, I think your patch is doing the right thing.
Also, one last request: do you mind sending the v2 only to the mptcp ML,
please? I have a bunch of related fixes [1] plus this one is not urgent.
In fact, except for (urgent) fixes, it might be better to send MPTCP
patches only the to MPTCP ML: to a restricted number of people for the
first versions, there is enough traffic on Netdev.
[1]
https://lore.kernel.org/20260415-mptcp-inc-limits-v5-0-e54c3bf80e4e@kernel.org
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
Hi Li,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Success! ✅
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/24602264963
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/32c3fb79b0b4
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1082765
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-normal
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
© 2016 - 2026 Red Hat, Inc.