... | ... | ||
---|---|---|---|
15 | to force connections with quite a few SYN drops to finally use MPTCP at | 15 | to force connections with quite a few SYN drops to finally use MPTCP at |
16 | the end. | 16 | the end. |
17 | 17 | ||
18 | In this series, we have: | 18 | In this series, we have: |
19 | 19 | ||
20 | - A small fix for the doc. | 20 | - A small fix for the doc. (applied) |
21 | 21 | ||
22 | - A new sysctl to change the number of SYN retransmitted with MPTCP | 22 | - A new sysctl to change the number of SYN retransmitted with MPTCP |
23 | options before falling back to TCP. The modification looks simple | 23 | options before falling back to TCP. The modification looks simple |
24 | enough to still be sent to netdev before the closure I think. | 24 | enough to still be sent to netdev before the closure I think. (applied) |
25 | 25 | ||
26 | - A fix to only turn on the blackhole protection only when the first SYN | 26 | - A fix to only turn on the blackhole protection only when the first SYN |
27 | retransmitted without MPTCP option is accepted, instead of any after. | 27 | retransmitted without MPTCP option is accepted, instead of any after. |
28 | The blackhole feature was supposed to do that from the beginning, but | 28 | The blackhole feature was supposed to do that from the beginning, but |
29 | a check was wrongly placed. I think we should consider this as a fix, | 29 | a check was wrongly placed. I think we should consider this as a fix, |
... | ... | ||
32 | more unlikely for an "MPTCP firewall blackhole", and I guess not all | 32 | more unlikely for an "MPTCP firewall blackhole", and I guess not all |
33 | future MPTCP connections will behave exactly like that. It sounds then | 33 | future MPTCP connections will behave exactly like that. It sounds then |
34 | safer to reduce the possibilities of enabling the blackhole protection | 34 | safer to reduce the possibilities of enabling the blackhole protection |
35 | by accident, and apply this patch. | 35 | by accident, and apply this patch. |
36 | 36 | ||
37 | - A small cleanup to exit early. | ||
38 | |||
39 | Please note that patches 1 and 2 have already been applied in our tree. | ||
40 | So this version only has patches 3 and 4. Patch 3 is a fix, maybe not | ||
41 | patch 4, but it might be easier to squash patch 3 and 4 and send it as a | ||
42 | fix to -net. | ||
43 | |||
37 | Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> | 44 | Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> |
38 | --- | 45 | --- |
39 | Matthieu Baerts (NGI0) (3): | 46 | Changes in v2: |
40 | doc: mptcp: sysctl: blackhole_timeout is per-netns | 47 | - Previous patches 1 & 2 have been applied. |
41 | mptcp: sysctl: add syn_retrans_before_tcp_fallback | 48 | - Patch 1 (ex 3) has not been modified. |
49 | - Patch 2 has been added, it can be squashed to the previous one. | ||
50 | - Link to v1: https://lore.kernel.org/r/20250114-mpc-no-blackhole-v1-0-994bd2a357fb@kernel.org | ||
51 | |||
52 | --- | ||
53 | Matthieu Baerts (NGI0) (2): | ||
42 | mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted | 54 | mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted |
55 | mptcp: blackhole: avoid checking the state twice | ||
43 | 56 | ||
44 | Documentation/networking/mptcp-sysctl.rst | 18 +++++++++++++++++- | 57 | net/mptcp/ctrl.c | 30 +++++++++++++++++------------- |
45 | net/mptcp/ctrl.c | 25 +++++++++++++++++++------ | 58 | 1 file changed, 17 insertions(+), 13 deletions(-) |
46 | 2 files changed, 36 insertions(+), 7 deletions(-) | ||
47 | --- | 59 | --- |
48 | base-commit: 9336324d1aec351496e048ec5b6bbda07944ad16 | 60 | base-commit: a9a670c7f9f94ac66e6a79ecb81e4e7a6e20f001 |
49 | change-id: 20250114-mpc-no-blackhole-526a61ea0334 | 61 | change-id: 20250114-mpc-no-blackhole-526a61ea0334 |
50 | 62 | ||
51 | Best regards, | 63 | Best regards, |
52 | -- | 64 | -- |
53 | Matthieu Baerts (NGI0) <matttbe@kernel.org> | 65 | Matthieu Baerts (NGI0) <matttbe@kernel.org> | diff view generated by jsdifflib |
Deleted patch | |||
---|---|---|---|
1 | All other sysctl entries mention it, and it is a per-namespace sysctl. | ||
2 | 1 | ||
3 | So mention it as well. | ||
4 | |||
5 | Fixes: 27069e7cb3d1 ("mptcp: disable active MPTCP in case of blackhole") | ||
6 | Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> | ||
7 | --- | ||
8 | Documentation/networking/mptcp-sysctl.rst | 2 +- | ||
9 | 1 file changed, 1 insertion(+), 1 deletion(-) | ||
10 | |||
11 | diff --git a/Documentation/networking/mptcp-sysctl.rst b/Documentation/networking/mptcp-sysctl.rst | ||
12 | index XXXXXXX..XXXXXXX 100644 | ||
13 | --- a/Documentation/networking/mptcp-sysctl.rst | ||
14 | +++ b/Documentation/networking/mptcp-sysctl.rst | ||
15 | @@ -XXX,XX +XXX,XX @@ blackhole_timeout - INTEGER (seconds) | ||
16 | MPTCP is re-enabled and will reset to the initial value when the | ||
17 | blackhole issue goes away. | ||
18 | |||
19 | - 0 to disable the blackhole detection. | ||
20 | + 0 to disable the blackhole detection. This is a per-namespace sysctl. | ||
21 | |||
22 | Default: 3600 | ||
23 | |||
24 | |||
25 | -- | ||
26 | 2.47.1 | diff view generated by jsdifflib |
1 | The Fixes commit mentioned this: | 1 | The Fixes commit mentioned this: |
---|---|---|---|
2 | 2 | ||
3 | > An MPTCP firewall blackhole can be detected if the following SYN | 3 | > An MPTCP firewall blackhole can be detected if the following SYN |
4 | > retransmission after a fallback to "plain" TCP is accepted. | 4 | > retransmission after a fallback to "plain" TCP is accepted. |
5 | 5 | ||
6 | But in fact, this blackhole was detected if any following SYN | 6 | But in fact, this blackhole was detected if any following SYN |
7 | retransmissions after a fallback to TCP was accepted. | 7 | retransmissions after a fallback to TCP was accepted. |
8 | 8 | ||
9 | That's because 'mptcp_subflow_early_fallback()' will set 'request_mptcp' | 9 | That's because 'mptcp_subflow_early_fallback()' will set 'request_mptcp' |
10 | to 0, and 'mpc_drop' will never be reset to 0 after. | 10 | to 0, and 'mpc_drop' will never be reset to 0 after. |
11 | 11 | ||
12 | This is an issue, because some not so unusual situations might cause the | 12 | This is an issue, because some not so unusual situations might cause the |
13 | kernel to detect a false-positive blackhole, e.g. a client trying to | 13 | kernel to detect a false-positive blackhole, e.g. a client trying to |
14 | connect to a server while the network is not ready yet, causing a few | 14 | connect to a server while the network is not ready yet, causing a few |
15 | SYN retransmissions, before reaching the end server. | 15 | SYN retransmissions, before reaching the end server. |
16 | 16 | ||
17 | Fixes: 27069e7cb3d1 ("mptcp: disable active MPTCP in case of blackhole") | 17 | Fixes: 27069e7cb3d1 ("mptcp: disable active MPTCP in case of blackhole") |
18 | Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> | 18 | Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> |
19 | --- | 19 | --- |
20 | net/mptcp/ctrl.c | 4 ++-- | 20 | net/mptcp/ctrl.c | 4 ++-- |
21 | 1 file changed, 2 insertions(+), 2 deletions(-) | 21 | 1 file changed, 2 insertions(+), 2 deletions(-) |
22 | 22 | ||
23 | diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c | 23 | diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c |
24 | index XXXXXXX..XXXXXXX 100644 | 24 | index XXXXXXX..XXXXXXX 100644 |
25 | --- a/net/mptcp/ctrl.c | 25 | --- a/net/mptcp/ctrl.c |
26 | +++ b/net/mptcp/ctrl.c | 26 | +++ b/net/mptcp/ctrl.c |
27 | @@ -XXX,XX +XXX,XX @@ void mptcp_active_detect_blackhole(struct sock *ssk, bool expired) | 27 | @@ -XXX,XX +XXX,XX @@ void mptcp_active_detect_blackhole(struct sock *ssk, bool expired) |
28 | MPTCP_INC_STATS(net, MPTCP_MIB_MPCAPABLEACTIVEDROP); | 28 | MPTCP_INC_STATS(net, MPTCP_MIB_MPCAPABLEACTIVEDROP); |
29 | subflow->mpc_drop = 1; | 29 | subflow->mpc_drop = 1; |
30 | mptcp_subflow_early_fallback(mptcp_sk(subflow->conn), subflow); | 30 | mptcp_subflow_early_fallback(mptcp_sk(subflow->conn), subflow); |
31 | - } else { | 31 | - } else { |
32 | - subflow->mpc_drop = 0; | 32 | - subflow->mpc_drop = 0; |
33 | } | 33 | } |
34 | + } else if (ssk->sk_state == TCP_SYN_SENT) { | 34 | + } else if (ssk->sk_state == TCP_SYN_SENT) { |
35 | + subflow->mpc_drop = 0; | 35 | + subflow->mpc_drop = 0; |
36 | } | 36 | } |
37 | } | 37 | } |
38 | 38 | ||
39 | 39 | ||
40 | -- | 40 | -- |
41 | 2.47.1 | 41 | 2.47.1 | diff view generated by jsdifflib |
1 | The number of SYN + MPC retransmissions before falling back to TCP was | 1 | A small cleanup, reordering the conditions to avoid checking things |
---|---|---|---|
2 | fixed to 2. This is certainly a good default value, but having a fixed | 2 | twice. |
3 | number can be problem in some environments. | ||
4 | 3 | ||
5 | The current behaviour means that if all packets are dropped, there will | 4 | The code here is called in case of timeout on a TCP connection, before |
6 | be: | 5 | triggering a retransmission. But it only acts on SYN + MPC packets. |
7 | 6 | ||
8 | - The initial SYN + MPC | 7 | So the conditions can be re-order to exit early in case of non-MPTCP |
9 | 8 | SYN + MPC. This also reduce the indentation levels. | |
10 | - 2 retransmissions with MPC | ||
11 | |||
12 | - The next ones will be without MPTCP. | ||
13 | |||
14 | So typically ~3 seconds before falling back to TCP. In some networks | ||
15 | where some temporally blackholes are unfortunately frequent, or when a | ||
16 | client tries to initiate connections while the network is not ready yet, | ||
17 | this can cause new connections not to have MPTCP connections. | ||
18 | |||
19 | In such environments, it is now possible to increase the number of SYN | ||
20 | retransmissions with MPTCP options to make sure MPTCP is used. | ||
21 | |||
22 | Interesting values are: | ||
23 | |||
24 | - 0: the first retransmission will be done without MPTCP options: quite | ||
25 | aggressive, but also a higher risk of detecting false-positive | ||
26 | MPTCP blackholes. | ||
27 | |||
28 | - >= 128: all SYN retransmissions will keep the MPTCP options: back to | ||
29 | the < 6.12 behaviour. | ||
30 | |||
31 | The default behaviour is not changed here. | ||
32 | 9 | ||
33 | Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> | 10 | Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> |
34 | --- | 11 | --- |
35 | Documentation/networking/mptcp-sysctl.rst | 16 ++++++++++++++++ | 12 | Notes: if it is easier, this patch can be squashed in the previous one, |
36 | net/mptcp/ctrl.c | 21 +++++++++++++++++---- | 13 | and sent as a fix to -net. There will be conflicts with the previous |
37 | 2 files changed, 33 insertions(+), 4 deletions(-) | 14 | versions, but not complex to fix -- and the new sysctl could even be |
15 | backported if that's what the stable team prefers. | ||
16 | --- | ||
17 | net/mptcp/ctrl.c | 32 ++++++++++++++++++-------------- | ||
18 | 1 file changed, 18 insertions(+), 14 deletions(-) | ||
38 | 19 | ||
39 | diff --git a/Documentation/networking/mptcp-sysctl.rst b/Documentation/networking/mptcp-sysctl.rst | ||
40 | index XXXXXXX..XXXXXXX 100644 | ||
41 | --- a/Documentation/networking/mptcp-sysctl.rst | ||
42 | +++ b/Documentation/networking/mptcp-sysctl.rst | ||
43 | @@ -XXX,XX +XXX,XX @@ stale_loss_cnt - INTEGER | ||
44 | This is a per-namespace sysctl. | ||
45 | |||
46 | Default: 4 | ||
47 | + | ||
48 | +syn_retrans_before_tcp_fallback - INTEGER | ||
49 | + The number of SYN + MP_CAPABLE retransmissions before falling back to | ||
50 | + TCP, i.e. dropping the MPTCP options. In other words, if all the packets | ||
51 | + are dropped on the way, there will be: | ||
52 | + | ||
53 | + * The initial SYN with MPTCP support | ||
54 | + * This number of SYN retransmitted with MPTCP support | ||
55 | + * The next SYN retransmissions will be without MPTCP support | ||
56 | + | ||
57 | + 0 means the first retransmission will be done without MPTCP options. | ||
58 | + >= 128 means that all SYN retransmissions will keep the MPTCP options. A | ||
59 | + lower number might increase false-positive MPTCP blackholes detections. | ||
60 | + This is a per-namespace sysctl. | ||
61 | + | ||
62 | + Default: 2 | ||
63 | diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c | 20 | diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c |
64 | index XXXXXXX..XXXXXXX 100644 | 21 | index XXXXXXX..XXXXXXX 100644 |
65 | --- a/net/mptcp/ctrl.c | 22 | --- a/net/mptcp/ctrl.c |
66 | +++ b/net/mptcp/ctrl.c | 23 | +++ b/net/mptcp/ctrl.c |
67 | @@ -XXX,XX +XXX,XX @@ struct mptcp_pernet { | ||
68 | unsigned int close_timeout; | ||
69 | unsigned int stale_loss_cnt; | ||
70 | atomic_t active_disable_times; | ||
71 | + u8 syn_retrans_before_tcp_fallback; | ||
72 | unsigned long active_disable_stamp; | ||
73 | u8 mptcp_enabled; | ||
74 | u8 checksum_enabled; | ||
75 | @@ -XXX,XX +XXX,XX @@ static void mptcp_pernet_set_defaults(struct mptcp_pernet *pernet) | ||
76 | pernet->mptcp_enabled = 1; | ||
77 | pernet->add_addr_timeout = TCP_RTO_MAX; | ||
78 | pernet->blackhole_timeout = 3600; | ||
79 | + pernet->syn_retrans_before_tcp_fallback = 2; | ||
80 | atomic_set(&pernet->active_disable_times, 0); | ||
81 | pernet->close_timeout = TCP_TIMEWAIT_LEN; | ||
82 | pernet->checksum_enabled = 0; | ||
83 | @@ -XXX,XX +XXX,XX @@ static struct ctl_table mptcp_sysctl_table[] = { | ||
84 | .proc_handler = proc_blackhole_detect_timeout, | ||
85 | .extra1 = SYSCTL_ZERO, | ||
86 | }, | ||
87 | + { | ||
88 | + .procname = "syn_retrans_before_tcp_fallback", | ||
89 | + .maxlen = sizeof(u8), | ||
90 | + .mode = 0644, | ||
91 | + .proc_handler = proc_dou8vec_minmax, | ||
92 | + }, | ||
93 | }; | ||
94 | |||
95 | static int mptcp_pernet_new_table(struct net *net, struct mptcp_pernet *pernet) | ||
96 | @@ -XXX,XX +XXX,XX @@ static int mptcp_pernet_new_table(struct net *net, struct mptcp_pernet *pernet) | ||
97 | /* table[7] is for available_schedulers which is read-only info */ | ||
98 | table[8].data = &pernet->close_timeout; | ||
99 | table[9].data = &pernet->blackhole_timeout; | ||
100 | + table[10].data = &pernet->syn_retrans_before_tcp_fallback; | ||
101 | |||
102 | hdr = register_net_sysctl_sz(net, MPTCP_SYSCTL_PATH, table, | ||
103 | ARRAY_SIZE(mptcp_sysctl_table)); | ||
104 | @@ -XXX,XX +XXX,XX @@ void mptcp_active_enable(struct sock *sk) | 24 | @@ -XXX,XX +XXX,XX @@ void mptcp_active_enable(struct sock *sk) |
105 | void mptcp_active_detect_blackhole(struct sock *ssk, bool expired) | 25 | void mptcp_active_detect_blackhole(struct sock *ssk, bool expired) |
106 | { | 26 | { |
107 | struct mptcp_subflow_context *subflow; | 27 | struct mptcp_subflow_context *subflow; |
108 | - u32 timeouts; | 28 | + u8 timeouts, to_max; |
109 | 29 | + struct net *net; | |
110 | if (!sk_is_mptcp(ssk)) | 30 | |
31 | - if (!sk_is_mptcp(ssk)) | ||
32 | + /* Only check MPTCP SYN ... */ | ||
33 | + if (likely(!sk_is_mptcp(ssk) || ssk->sk_state != TCP_SYN_SENT)) | ||
111 | return; | 34 | return; |
112 | 35 | ||
113 | - timeouts = inet_csk(ssk)->icsk_retransmits; | ||
114 | subflow = mptcp_subflow_ctx(ssk); | 36 | subflow = mptcp_subflow_ctx(ssk); |
115 | 37 | ||
116 | if (subflow->request_mptcp && ssk->sk_state == TCP_SYN_SENT) { | 38 | - if (subflow->request_mptcp && ssk->sk_state == TCP_SYN_SENT) { |
117 | - if (timeouts == 2 || (timeouts < 2 && expired)) { | 39 | - struct net *net = sock_net(ssk); |
118 | - MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_MPCAPABLEACTIVEDROP); | 40 | - u8 timeouts, to_max; |
119 | + struct net *net = sock_net(ssk); | 41 | - |
120 | + u8 timeouts, to_max; | 42 | - timeouts = inet_csk(ssk)->icsk_retransmits; |
43 | - to_max = mptcp_get_pernet(net)->syn_retrans_before_tcp_fallback; | ||
44 | - | ||
45 | - if (timeouts == to_max || (timeouts < to_max && expired)) { | ||
46 | - MPTCP_INC_STATS(net, MPTCP_MIB_MPCAPABLEACTIVEDROP); | ||
47 | - subflow->mpc_drop = 1; | ||
48 | - mptcp_subflow_early_fallback(mptcp_sk(subflow->conn), subflow); | ||
49 | - } | ||
50 | - } else if (ssk->sk_state == TCP_SYN_SENT) { | ||
51 | + /* ... + MP_CAPABLE */ | ||
52 | + if (!subflow->request_mptcp) { | ||
53 | + /* Mark as blackhole iif the 1st non-MPTCP SYN is accepted */ | ||
54 | subflow->mpc_drop = 0; | ||
55 | + return; | ||
56 | + } | ||
121 | + | 57 | + |
122 | + timeouts = inet_csk(ssk)->icsk_retransmits; | 58 | + net = sock_net(ssk); |
123 | + to_max = mptcp_get_pernet(net)->syn_retrans_before_tcp_fallback; | 59 | + timeouts = inet_csk(ssk)->icsk_retransmits; |
60 | + to_max = mptcp_get_pernet(net)->syn_retrans_before_tcp_fallback; | ||
124 | + | 61 | + |
125 | + if (timeouts == to_max || (timeouts < to_max && expired)) { | 62 | + if (timeouts == to_max || (timeouts < to_max && expired)) { |
126 | + MPTCP_INC_STATS(net, MPTCP_MIB_MPCAPABLEACTIVEDROP); | 63 | + MPTCP_INC_STATS(net, MPTCP_MIB_MPCAPABLEACTIVEDROP); |
127 | subflow->mpc_drop = 1; | 64 | + subflow->mpc_drop = 1; |
128 | mptcp_subflow_early_fallback(mptcp_sk(subflow->conn), subflow); | 65 | + mptcp_subflow_early_fallback(mptcp_sk(subflow->conn), subflow); |
129 | } else { | 66 | } |
67 | } | ||
68 | |||
130 | 69 | ||
131 | -- | 70 | -- |
132 | 2.47.1 | 71 | 2.47.1 | diff view generated by jsdifflib |