...
...
15
to force connections with quite a few SYN drops to finally use MPTCP at
15
to force connections with quite a few SYN drops to finally use MPTCP at
16
the end.
16
the end.
17
17
18
In this series, we have:
18
In this series, we have:
19
19
20
- A small fix for the doc.
20
- A small fix for the doc. (applied)
21
21
22
- A new sysctl to change the number of SYN retransmitted with MPTCP
22
- A new sysctl to change the number of SYN retransmitted with MPTCP
23
options before falling back to TCP. The modification looks simple
23
options before falling back to TCP. The modification looks simple
24
enough to still be sent to netdev before the closure I think.
24
enough to still be sent to netdev before the closure I think. (applied)
25
25
26
- A fix to only turn on the blackhole protection only when the first SYN
26
- A fix to only turn on the blackhole protection only when the first SYN
27
retransmitted without MPTCP option is accepted, instead of any after.
27
retransmitted without MPTCP option is accepted, instead of any after.
28
The blackhole feature was supposed to do that from the beginning, but
28
The blackhole feature was supposed to do that from the beginning, but
29
a check was wrongly placed. I think we should consider this as a fix,
29
a check was wrongly placed. I think we should consider this as a fix,
...
...
32
more unlikely for an "MPTCP firewall blackhole", and I guess not all
32
more unlikely for an "MPTCP firewall blackhole", and I guess not all
33
future MPTCP connections will behave exactly like that. It sounds then
33
future MPTCP connections will behave exactly like that. It sounds then
34
safer to reduce the possibilities of enabling the blackhole protection
34
safer to reduce the possibilities of enabling the blackhole protection
35
by accident, and apply this patch.
35
by accident, and apply this patch.
36
36
37
- A small cleanup to exit early.
38
39
Please note that patches 1 and 2 have already been applied in our tree.
40
So this version only has patches 3 and 4. Patch 3 is a fix, maybe not
41
patch 4, but it might be easier to squash patch 3 and 4 and send it as a
42
fix to -net.
43
37
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
44
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
38
---
45
---
39
Matthieu Baerts (NGI0) (3):
46
Changes in v2:
40
doc: mptcp: sysctl: blackhole_timeout is per-netns
47
- Previous patches 1 & 2 have been applied.
41
mptcp: sysctl: add syn_retrans_before_tcp_fallback
48
- Patch 1 (ex 3) has not been modified.
49
- Patch 2 has been added, it can be squashed to the previous one.
50
- Link to v1: https://lore.kernel.org/r/20250114-mpc-no-blackhole-v1-0-994bd2a357fb@kernel.org
51
52
---
53
Matthieu Baerts (NGI0) (2):
42
mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted
54
mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted
55
mptcp: blackhole: avoid checking the state twice
43
56
44
Documentation/networking/mptcp-sysctl.rst | 18 +++++++++++++++++-
57
net/mptcp/ctrl.c | 30 +++++++++++++++++-------------
45
net/mptcp/ctrl.c | 25 +++++++++++++++++++------
58
1 file changed, 17 insertions(+), 13 deletions(-)
46
2 files changed, 36 insertions(+), 7 deletions(-)
47
---
59
---
48
base-commit: 9336324d1aec351496e048ec5b6bbda07944ad16
60
base-commit: a9a670c7f9f94ac66e6a79ecb81e4e7a6e20f001
49
change-id: 20250114-mpc-no-blackhole-526a61ea0334
61
change-id: 20250114-mpc-no-blackhole-526a61ea0334
50
62
51
Best regards,
63
Best regards,
52
--
64
--
53
Matthieu Baerts (NGI0) <matttbe@kernel.org>
65
Matthieu Baerts (NGI0) <matttbe@kernel.org>
diff view generated by jsdifflib
Deleted patch
1
All other sysctl entries mention it, and it is a per-namespace sysctl.
2
1
3
So mention it as well.
4
5
Fixes: 27069e7cb3d1 ("mptcp: disable active MPTCP in case of blackhole")
6
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
7
---
8
Documentation/networking/mptcp-sysctl.rst | 2 +-
9
1 file changed, 1 insertion(+), 1 deletion(-)
10
11
diff --git a/Documentation/networking/mptcp-sysctl.rst b/Documentation/networking/mptcp-sysctl.rst
12
index XXXXXXX..XXXXXXX 100644
13
--- a/Documentation/networking/mptcp-sysctl.rst
14
+++ b/Documentation/networking/mptcp-sysctl.rst
15
@@ -XXX,XX +XXX,XX @@ blackhole_timeout - INTEGER (seconds)
16
    MPTCP is re-enabled and will reset to the initial value when the
17
    blackhole issue goes away.
18
19
-    0 to disable the blackhole detection.
20
+    0 to disable the blackhole detection. This is a per-namespace sysctl.
21
22
    Default: 3600
23
24
25
--
26
2.47.1
diff view generated by jsdifflib
1
The Fixes commit mentioned this:
1
The Fixes commit mentioned this:
2
2
3
> An MPTCP firewall blackhole can be detected if the following SYN
3
> An MPTCP firewall blackhole can be detected if the following SYN
4
> retransmission after a fallback to "plain" TCP is accepted.
4
> retransmission after a fallback to "plain" TCP is accepted.
5
5
6
But in fact, this blackhole was detected if any following SYN
6
But in fact, this blackhole was detected if any following SYN
7
retransmissions after a fallback to TCP was accepted.
7
retransmissions after a fallback to TCP was accepted.
8
8
9
That's because 'mptcp_subflow_early_fallback()' will set 'request_mptcp'
9
That's because 'mptcp_subflow_early_fallback()' will set 'request_mptcp'
10
to 0, and 'mpc_drop' will never be reset to 0 after.
10
to 0, and 'mpc_drop' will never be reset to 0 after.
11
11
12
This is an issue, because some not so unusual situations might cause the
12
This is an issue, because some not so unusual situations might cause the
13
kernel to detect a false-positive blackhole, e.g. a client trying to
13
kernel to detect a false-positive blackhole, e.g. a client trying to
14
connect to a server while the network is not ready yet, causing a few
14
connect to a server while the network is not ready yet, causing a few
15
SYN retransmissions, before reaching the end server.
15
SYN retransmissions, before reaching the end server.
16
16
17
Fixes: 27069e7cb3d1 ("mptcp: disable active MPTCP in case of blackhole")
17
Fixes: 27069e7cb3d1 ("mptcp: disable active MPTCP in case of blackhole")
18
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
18
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
19
---
19
---
20
net/mptcp/ctrl.c | 4 ++--
20
net/mptcp/ctrl.c | 4 ++--
21
1 file changed, 2 insertions(+), 2 deletions(-)
21
1 file changed, 2 insertions(+), 2 deletions(-)
22
22
23
diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c
23
diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c
24
index XXXXXXX..XXXXXXX 100644
24
index XXXXXXX..XXXXXXX 100644
25
--- a/net/mptcp/ctrl.c
25
--- a/net/mptcp/ctrl.c
26
+++ b/net/mptcp/ctrl.c
26
+++ b/net/mptcp/ctrl.c
27
@@ -XXX,XX +XXX,XX @@ void mptcp_active_detect_blackhole(struct sock *ssk, bool expired)
27
@@ -XXX,XX +XXX,XX @@ void mptcp_active_detect_blackhole(struct sock *ssk, bool expired)
28
            MPTCP_INC_STATS(net, MPTCP_MIB_MPCAPABLEACTIVEDROP);
28
            MPTCP_INC_STATS(net, MPTCP_MIB_MPCAPABLEACTIVEDROP);
29
            subflow->mpc_drop = 1;
29
            subflow->mpc_drop = 1;
30
            mptcp_subflow_early_fallback(mptcp_sk(subflow->conn), subflow);
30
            mptcp_subflow_early_fallback(mptcp_sk(subflow->conn), subflow);
31
-        } else {
31
-        } else {
32
-            subflow->mpc_drop = 0;
32
-            subflow->mpc_drop = 0;
33
        }
33
        }
34
+    } else if (ssk->sk_state == TCP_SYN_SENT) {
34
+    } else if (ssk->sk_state == TCP_SYN_SENT) {
35
+        subflow->mpc_drop = 0;
35
+        subflow->mpc_drop = 0;
36
    }
36
    }
37
}
37
}
38
38
39
39
40
--
40
--
41
2.47.1
41
2.47.1
diff view generated by jsdifflib
1
The number of SYN + MPC retransmissions before falling back to TCP was
1
A small cleanup, reordering the conditions to avoid checking things
2
fixed to 2. This is certainly a good default value, but having a fixed
2
twice.
3
number can be problem in some environments.
4
3
5
The current behaviour means that if all packets are dropped, there will
4
The code here is called in case of timeout on a TCP connection, before
6
be:
5
triggering a retransmission. But it only acts on SYN + MPC packets.
7
6
8
- The initial SYN + MPC
7
So the conditions can be re-order to exit early in case of non-MPTCP
9
8
SYN + MPC. This also reduce the indentation levels.
10
- 2 retransmissions with MPC
11
12
- The next ones will be without MPTCP.
13
14
So typically ~3 seconds before falling back to TCP. In some networks
15
where some temporally blackholes are unfortunately frequent, or when a
16
client tries to initiate connections while the network is not ready yet,
17
this can cause new connections not to have MPTCP connections.
18
19
In such environments, it is now possible to increase the number of SYN
20
retransmissions with MPTCP options to make sure MPTCP is used.
21
22
Interesting values are:
23
24
- 0: the first retransmission will be done without MPTCP options: quite
25
aggressive, but also a higher risk of detecting false-positive
26
MPTCP blackholes.
27
28
- >= 128: all SYN retransmissions will keep the MPTCP options: back to
29
the < 6.12 behaviour.
30
31
The default behaviour is not changed here.
32
9
33
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
10
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
34
---
11
---
35
Documentation/networking/mptcp-sysctl.rst | 16 ++++++++++++++++
12
Notes: if it is easier, this patch can be squashed in the previous one,
36
net/mptcp/ctrl.c | 21 +++++++++++++++++----
13
and sent as a fix to -net. There will be conflicts with the previous
37
2 files changed, 33 insertions(+), 4 deletions(-)
14
versions, but not complex to fix -- and the new sysctl could even be
15
backported if that's what the stable team prefers.
16
---
17
net/mptcp/ctrl.c | 32 ++++++++++++++++++--------------
18
1 file changed, 18 insertions(+), 14 deletions(-)
38
19
39
diff --git a/Documentation/networking/mptcp-sysctl.rst b/Documentation/networking/mptcp-sysctl.rst
40
index XXXXXXX..XXXXXXX 100644
41
--- a/Documentation/networking/mptcp-sysctl.rst
42
+++ b/Documentation/networking/mptcp-sysctl.rst
43
@@ -XXX,XX +XXX,XX @@ stale_loss_cnt - INTEGER
44
    This is a per-namespace sysctl.
45
46
    Default: 4
47
+
48
+syn_retrans_before_tcp_fallback - INTEGER
49
+    The number of SYN + MP_CAPABLE retransmissions before falling back to
50
+    TCP, i.e. dropping the MPTCP options. In other words, if all the packets
51
+    are dropped on the way, there will be:
52
+
53
+    * The initial SYN with MPTCP support
54
+    * This number of SYN retransmitted with MPTCP support
55
+    * The next SYN retransmissions will be without MPTCP support
56
+
57
+    0 means the first retransmission will be done without MPTCP options.
58
+    >= 128 means that all SYN retransmissions will keep the MPTCP options. A
59
+    lower number might increase false-positive MPTCP blackholes detections.
60
+    This is a per-namespace sysctl.
61
+
62
+    Default: 2
63
diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c
20
diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c
64
index XXXXXXX..XXXXXXX 100644
21
index XXXXXXX..XXXXXXX 100644
65
--- a/net/mptcp/ctrl.c
22
--- a/net/mptcp/ctrl.c
66
+++ b/net/mptcp/ctrl.c
23
+++ b/net/mptcp/ctrl.c
67
@@ -XXX,XX +XXX,XX @@ struct mptcp_pernet {
68
    unsigned int close_timeout;
69
    unsigned int stale_loss_cnt;
70
    atomic_t active_disable_times;
71
+    u8 syn_retrans_before_tcp_fallback;
72
    unsigned long active_disable_stamp;
73
    u8 mptcp_enabled;
74
    u8 checksum_enabled;
75
@@ -XXX,XX +XXX,XX @@ static void mptcp_pernet_set_defaults(struct mptcp_pernet *pernet)
76
    pernet->mptcp_enabled = 1;
77
    pernet->add_addr_timeout = TCP_RTO_MAX;
78
    pernet->blackhole_timeout = 3600;
79
+    pernet->syn_retrans_before_tcp_fallback = 2;
80
    atomic_set(&pernet->active_disable_times, 0);
81
    pernet->close_timeout = TCP_TIMEWAIT_LEN;
82
    pernet->checksum_enabled = 0;
83
@@ -XXX,XX +XXX,XX @@ static struct ctl_table mptcp_sysctl_table[] = {
84
        .proc_handler = proc_blackhole_detect_timeout,
85
        .extra1 = SYSCTL_ZERO,
86
    },
87
+    {
88
+        .procname = "syn_retrans_before_tcp_fallback",
89
+        .maxlen = sizeof(u8),
90
+        .mode = 0644,
91
+        .proc_handler = proc_dou8vec_minmax,
92
+    },
93
};
94
95
static int mptcp_pernet_new_table(struct net *net, struct mptcp_pernet *pernet)
96
@@ -XXX,XX +XXX,XX @@ static int mptcp_pernet_new_table(struct net *net, struct mptcp_pernet *pernet)
97
    /* table[7] is for available_schedulers which is read-only info */
98
    table[8].data = &pernet->close_timeout;
99
    table[9].data = &pernet->blackhole_timeout;
100
+    table[10].data = &pernet->syn_retrans_before_tcp_fallback;
101
102
    hdr = register_net_sysctl_sz(net, MPTCP_SYSCTL_PATH, table,
103
                 ARRAY_SIZE(mptcp_sysctl_table));
104
@@ -XXX,XX +XXX,XX @@ void mptcp_active_enable(struct sock *sk)
24
@@ -XXX,XX +XXX,XX @@ void mptcp_active_enable(struct sock *sk)
105
void mptcp_active_detect_blackhole(struct sock *ssk, bool expired)
25
void mptcp_active_detect_blackhole(struct sock *ssk, bool expired)
106
{
26
{
107
    struct mptcp_subflow_context *subflow;
27
    struct mptcp_subflow_context *subflow;
108
-    u32 timeouts;
28
+    u8 timeouts, to_max;
109
29
+    struct net *net;
110
    if (!sk_is_mptcp(ssk))
30
31
-    if (!sk_is_mptcp(ssk))
32
+    /* Only check MPTCP SYN ... */
33
+    if (likely(!sk_is_mptcp(ssk) || ssk->sk_state != TCP_SYN_SENT))
111
        return;
34
        return;
112
35
113
-    timeouts = inet_csk(ssk)->icsk_retransmits;
114
    subflow = mptcp_subflow_ctx(ssk);
36
    subflow = mptcp_subflow_ctx(ssk);
115
37
116
    if (subflow->request_mptcp && ssk->sk_state == TCP_SYN_SENT) {
38
-    if (subflow->request_mptcp && ssk->sk_state == TCP_SYN_SENT) {
117
-        if (timeouts == 2 || (timeouts < 2 && expired)) {
39
-        struct net *net = sock_net(ssk);
118
-            MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_MPCAPABLEACTIVEDROP);
40
-        u8 timeouts, to_max;
119
+        struct net *net = sock_net(ssk);
41
-
120
+        u8 timeouts, to_max;
42
-        timeouts = inet_csk(ssk)->icsk_retransmits;
43
-        to_max = mptcp_get_pernet(net)->syn_retrans_before_tcp_fallback;
44
-
45
-        if (timeouts == to_max || (timeouts < to_max && expired)) {
46
-            MPTCP_INC_STATS(net, MPTCP_MIB_MPCAPABLEACTIVEDROP);
47
-            subflow->mpc_drop = 1;
48
-            mptcp_subflow_early_fallback(mptcp_sk(subflow->conn), subflow);
49
-        }
50
-    } else if (ssk->sk_state == TCP_SYN_SENT) {
51
+    /* ... + MP_CAPABLE */
52
+    if (!subflow->request_mptcp) {
53
+        /* Mark as blackhole iif the 1st non-MPTCP SYN is accepted */
54
        subflow->mpc_drop = 0;
55
+        return;
56
+    }
121
+
57
+
122
+        timeouts = inet_csk(ssk)->icsk_retransmits;
58
+    net = sock_net(ssk);
123
+        to_max = mptcp_get_pernet(net)->syn_retrans_before_tcp_fallback;
59
+    timeouts = inet_csk(ssk)->icsk_retransmits;
60
+    to_max = mptcp_get_pernet(net)->syn_retrans_before_tcp_fallback;
124
+
61
+
125
+        if (timeouts == to_max || (timeouts < to_max && expired)) {
62
+    if (timeouts == to_max || (timeouts < to_max && expired)) {
126
+            MPTCP_INC_STATS(net, MPTCP_MIB_MPCAPABLEACTIVEDROP);
63
+        MPTCP_INC_STATS(net, MPTCP_MIB_MPCAPABLEACTIVEDROP);
127
            subflow->mpc_drop = 1;
64
+        subflow->mpc_drop = 1;
128
            mptcp_subflow_early_fallback(mptcp_sk(subflow->conn), subflow);
65
+        mptcp_subflow_early_fallback(mptcp_sk(subflow->conn), subflow);
129
        } else {
66
    }
67
}
68
130
69
131
--
70
--
132
2.47.1
71
2.47.1
diff view generated by jsdifflib