1 | Paolo worked on this RX path refactor for these two main reasons: | 1 | This is a batch of changes I had sitting in my local tree for a while. |
---|---|---|---|
2 | Why another refactor you may ask? Two main resons: | ||
2 | 3 | ||
3 | - Currently, the MPTCP RX path introduces quite a bit of 'exceptional' | 4 | - currently the mptcp RX path introduces quite a bit of 'exceptional' |
4 | accounting/locking processing WRT to plain TCP, adding up to the | 5 | accounting/locking processing WRT to plain TCP, adding up to the |
5 | implementation complexity in a miserable way. | 6 | implementation complexity in a misurable way |
6 | 7 | - the performance gap WRT plain TCP for single subflow connections is | |
7 | - The performance gap WRT plain TCP for single subflow connections is | 8 | quite measurable. |
8 | quite measurable. | ||
9 | 9 | ||
10 | The present refactor addresses both the above items: most of the | 10 | The present refactor addresses both the above items: most of the |
11 | additional complexity is dropped, and single stream performances | 11 | additional complexity is dropped, and single stream performances |
12 | increase measurably, from 55Gbps to 71Gbps in Paolo's loopback test. As | 12 | increase measurably - from 55Gbps to 71Gbps in my loopback test. As a |
13 | a reference, plain TCP was around 84Gbps on the same host. | 13 | reference, plain TCP is around 84Gps on the same host. |
14 | 14 | ||
15 | The above comes to a price: the patch are invasive, even in subtle ways. | 15 | The above comes to a price: the patch are invasive, even in subtle ways: |
16 | the chance of destabilizing the implementation is real (ence the | ||
17 | additional, intentional '-next' into the subj). | ||
16 | 18 | ||
17 | Note: patch 5/7 removes the sk_forward_alloc_get() helper, which caused | 19 | In any case keeping the patch hidden for longer was not going to do any |
18 | some trivial modifications in different places in the net tree: sockets, | 20 | good, so here we are. |
19 | IPv4, sched. That's why a few more people have been Cc here. Feel free | ||
20 | to only look at this patch 5/7. | ||
21 | 21 | ||
22 | Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> | 22 | Changes from v1: |
23 | --- | 23 | - fixed several data stream corruption and wake-up misses due |
24 | to multi subflows races | ||
25 | - added patches 1-3 mainly to address the above | ||
26 | - added an additional follow-up patch (patch 7) with more cleanup | ||
27 | |||
24 | Paolo Abeni (7): | 28 | Paolo Abeni (7): |
25 | mptcp: consolidate subflow cleanup | 29 | mptcp: prevent excessive coalescing on receive |
26 | mptcp: drop __mptcp_fastopen_gen_msk_ackseq() | 30 | tcp: fix recvbuffer adjust on sleeping rcvmsg |
27 | mptcp: move the whole rx path under msk socket lock protection | 31 | mptcp: don't always assume copied data in mptcp_cleanup_rbuf() |
28 | mptcp: cleanup mem accounting | 32 | mptcp: consolidate subflow cleanup |
29 | net: dismiss sk_forward_alloc_get() | 33 | mptcp: move the whole rx path under msk socket lock protection |
30 | mptcp: dismiss __mptcp_rmem() | 34 | mptcp: cleanup mem accounting. |
31 | mptcp: micro-optimize __mptcp_move_skb() | 35 | net: dismiss sk_forward_alloc_get() |
32 | 36 | ||
33 | include/net/sock.h | 13 --- | 37 | include/net/sock.h | 13 --- |
34 | net/core/sock.c | 2 +- | 38 | net/core/sock.c | 2 +- |
35 | net/ipv4/af_inet.c | 2 +- | 39 | net/ipv4/af_inet.c | 2 +- |
36 | net/ipv4/inet_diag.c | 2 +- | 40 | net/ipv4/inet_diag.c | 2 +- |
37 | net/mptcp/fastopen.c | 27 +---- | 41 | net/mptcp/fastopen.c | 4 +- |
38 | net/mptcp/protocol.c | 317 ++++++++++++++++----------------------------------- | 42 | net/mptcp/protocol.c | 259 +++++++++++++------------------------------ |
39 | net/mptcp/protocol.h | 22 ++-- | 43 | net/mptcp/protocol.h | 6 +- |
40 | net/mptcp/subflow.c | 36 +++--- | 44 | net/mptcp/subflow.c | 33 +++--- |
41 | net/sched/em_meta.c | 2 +- | 45 | net/sched/em_meta.c | 2 +- |
42 | 9 files changed, 134 insertions(+), 289 deletions(-) | 46 | 9 files changed, 104 insertions(+), 219 deletions(-) |
43 | --- | ||
44 | base-commit: b4cb730862cf4f59ac3dcb83b9ac4eeb29dbfb0e | ||
45 | change-id: 20250106-net-next-mptcp-rx-path-refactor-f44579efb57c | ||
46 | 47 | ||
47 | Best regards, | ||
48 | -- | 48 | -- |
49 | Matthieu Baerts (NGI0) <matttbe@kernel.org> | 49 | 2.45.2 | diff view generated by jsdifflib |
1 | From: Paolo Abeni <pabeni@redhat.com> | 1 | Currently the skb size after coalescing is only limited by the skb |
---|---|---|---|
2 | 2 | layout (the skb must not carry frag_list). A single coalesced skb | |
3 | When we will move the whole RX path under the msk socket lock, updating | 3 | covering several MSS can potentially fill completely the receive |
4 | the already queued skb for passive fastopen socket at 3rd ack time will | 4 | buffer. In such a case, the snd win will zero until the receive buffer |
5 | be extremely painful and race prone | 5 | will be empty again, affecting tput badly. |
6 | |||
7 | The map_seq for already enqueued skbs is used only to allow correct | ||
8 | coalescing with later data; preventing collapsing to the first skb of | ||
9 | a fastopen connect we can completely remove the | ||
10 | __mptcp_fastopen_gen_msk_ackseq() helper. | ||
11 | |||
12 | Before dropping this helper, a new item had to be added to the | ||
13 | mptcp_skb_cb structure. Because this item will be frequently tested in | ||
14 | the fast path -- almost on every packet -- and because there is free | ||
15 | space there, a single byte is used instead of a bitfield. This micro | ||
16 | optimisation slightly reduces the number of CPU operations to do the | ||
17 | associated check. | ||
18 | 6 | ||
19 | Signed-off-by: Paolo Abeni <pabeni@redhat.com> | 7 | Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
20 | Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> | ||
21 | Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> | ||
22 | --- | 8 | --- |
23 | net/mptcp/fastopen.c | 24 ++---------------------- | 9 | No fixes tag because the problem is not very visible in practice |
24 | net/mptcp/protocol.c | 4 +++- | 10 | currently, but will be apparent after the rx refactor. |
25 | net/mptcp/protocol.h | 5 ++--- | 11 | Still I hope this could affect positively the simlut flows self-tests |
26 | net/mptcp/subflow.c | 3 --- | 12 | --- |
27 | 4 files changed, 7 insertions(+), 29 deletions(-) | 13 | net/mptcp/protocol.c | 1 + |
14 | 1 file changed, 1 insertion(+) | ||
28 | 15 | ||
29 | diff --git a/net/mptcp/fastopen.c b/net/mptcp/fastopen.c | ||
30 | index XXXXXXX..XXXXXXX 100644 | ||
31 | --- a/net/mptcp/fastopen.c | ||
32 | +++ b/net/mptcp/fastopen.c | ||
33 | @@ -XXX,XX +XXX,XX @@ void mptcp_fastopen_subflow_synack_set_params(struct mptcp_subflow_context *subf | ||
34 | tp->copied_seq += skb->len; | ||
35 | subflow->ssn_offset += skb->len; | ||
36 | |||
37 | - /* initialize a dummy sequence number, we will update it at MPC | ||
38 | - * completion, if needed | ||
39 | - */ | ||
40 | + /* Only the sequence delta is relevant */ | ||
41 | MPTCP_SKB_CB(skb)->map_seq = -skb->len; | ||
42 | MPTCP_SKB_CB(skb)->end_seq = 0; | ||
43 | MPTCP_SKB_CB(skb)->offset = 0; | ||
44 | MPTCP_SKB_CB(skb)->has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp; | ||
45 | + MPTCP_SKB_CB(skb)->cant_coalesce = 1; | ||
46 | |||
47 | mptcp_data_lock(sk); | ||
48 | |||
49 | @@ -XXX,XX +XXX,XX @@ void mptcp_fastopen_subflow_synack_set_params(struct mptcp_subflow_context *subf | ||
50 | |||
51 | mptcp_data_unlock(sk); | ||
52 | } | ||
53 | - | ||
54 | -void __mptcp_fastopen_gen_msk_ackseq(struct mptcp_sock *msk, struct mptcp_subflow_context *subflow, | ||
55 | - const struct mptcp_options_received *mp_opt) | ||
56 | -{ | ||
57 | - struct sock *sk = (struct sock *)msk; | ||
58 | - struct sk_buff *skb; | ||
59 | - | ||
60 | - skb = skb_peek_tail(&sk->sk_receive_queue); | ||
61 | - if (skb) { | ||
62 | - WARN_ON_ONCE(MPTCP_SKB_CB(skb)->end_seq); | ||
63 | - pr_debug("msk %p moving seq %llx -> %llx end_seq %llx -> %llx\n", sk, | ||
64 | - MPTCP_SKB_CB(skb)->map_seq, MPTCP_SKB_CB(skb)->map_seq + msk->ack_seq, | ||
65 | - MPTCP_SKB_CB(skb)->end_seq, MPTCP_SKB_CB(skb)->end_seq + msk->ack_seq); | ||
66 | - MPTCP_SKB_CB(skb)->map_seq += msk->ack_seq; | ||
67 | - MPTCP_SKB_CB(skb)->end_seq += msk->ack_seq; | ||
68 | - } | ||
69 | - | ||
70 | - pr_debug("msk=%p ack_seq=%llx\n", msk, msk->ack_seq); | ||
71 | -} | ||
72 | diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c | 16 | diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c |
73 | index XXXXXXX..XXXXXXX 100644 | 17 | index XXXXXXX..XXXXXXX 100644 |
74 | --- a/net/mptcp/protocol.c | 18 | --- a/net/mptcp/protocol.c |
75 | +++ b/net/mptcp/protocol.c | 19 | +++ b/net/mptcp/protocol.c |
76 | @@ -XXX,XX +XXX,XX @@ static bool mptcp_try_coalesce(struct sock *sk, struct sk_buff *to, | 20 | @@ -XXX,XX +XXX,XX @@ static bool mptcp_try_coalesce(struct sock *sk, struct sk_buff *to, |
77 | bool fragstolen; | ||
78 | int delta; | 21 | int delta; |
79 | 22 | ||
80 | - if (MPTCP_SKB_CB(from)->offset || | 23 | if (MPTCP_SKB_CB(from)->offset || |
81 | + if (unlikely(MPTCP_SKB_CB(to)->cant_coalesce) || | 24 | + ((to->len + from->len) > (sk->sk_rcvbuf >> 3)) || |
82 | + MPTCP_SKB_CB(from)->offset || | ||
83 | ((to->len + from->len) > (sk->sk_rcvbuf >> 3)) || | ||
84 | !skb_try_coalesce(to, from, &fragstolen, &delta)) | 25 | !skb_try_coalesce(to, from, &fragstolen, &delta)) |
85 | return false; | 26 | return false; |
86 | @@ -XXX,XX +XXX,XX @@ static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk, | 27 | |
87 | MPTCP_SKB_CB(skb)->end_seq = MPTCP_SKB_CB(skb)->map_seq + copy_len; | ||
88 | MPTCP_SKB_CB(skb)->offset = offset; | ||
89 | MPTCP_SKB_CB(skb)->has_rxtstamp = has_rxtstamp; | ||
90 | + MPTCP_SKB_CB(skb)->cant_coalesce = 0; | ||
91 | |||
92 | if (MPTCP_SKB_CB(skb)->map_seq == msk->ack_seq) { | ||
93 | /* in sequence */ | ||
94 | diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h | ||
95 | index XXXXXXX..XXXXXXX 100644 | ||
96 | --- a/net/mptcp/protocol.h | ||
97 | +++ b/net/mptcp/protocol.h | ||
98 | @@ -XXX,XX +XXX,XX @@ struct mptcp_skb_cb { | ||
99 | u64 map_seq; | ||
100 | u64 end_seq; | ||
101 | u32 offset; | ||
102 | - u8 has_rxtstamp:1; | ||
103 | + u8 has_rxtstamp; | ||
104 | + u8 cant_coalesce; | ||
105 | }; | ||
106 | |||
107 | #define MPTCP_SKB_CB(__skb) ((struct mptcp_skb_cb *)&((__skb)->cb[0])) | ||
108 | @@ -XXX,XX +XXX,XX @@ void mptcp_event_pm_listener(const struct sock *ssk, | ||
109 | enum mptcp_event_type event); | ||
110 | bool mptcp_userspace_pm_active(const struct mptcp_sock *msk); | ||
111 | |||
112 | -void __mptcp_fastopen_gen_msk_ackseq(struct mptcp_sock *msk, struct mptcp_subflow_context *subflow, | ||
113 | - const struct mptcp_options_received *mp_opt); | ||
114 | void mptcp_fastopen_subflow_synack_set_params(struct mptcp_subflow_context *subflow, | ||
115 | struct request_sock *req); | ||
116 | int mptcp_nl_fill_addr(struct sk_buff *skb, | ||
117 | diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c | ||
118 | index XXXXXXX..XXXXXXX 100644 | ||
119 | --- a/net/mptcp/subflow.c | ||
120 | +++ b/net/mptcp/subflow.c | ||
121 | @@ -XXX,XX +XXX,XX @@ void __mptcp_subflow_fully_established(struct mptcp_sock *msk, | ||
122 | subflow_set_remote_key(msk, subflow, mp_opt); | ||
123 | WRITE_ONCE(subflow->fully_established, true); | ||
124 | WRITE_ONCE(msk->fully_established, true); | ||
125 | - | ||
126 | - if (subflow->is_mptfo) | ||
127 | - __mptcp_fastopen_gen_msk_ackseq(msk, subflow, mp_opt); | ||
128 | } | ||
129 | |||
130 | static struct sock *subflow_syn_recv_sock(const struct sock *sk, | ||
131 | |||
132 | -- | 28 | -- |
133 | 2.47.1 | 29 | 2.45.2 | diff view generated by jsdifflib |
1 | From: Paolo Abeni <pabeni@redhat.com> | 1 | If the recvmsg() blocks after receiving some data - i.e. due to |
---|---|---|---|
2 | SO_RCVLOWAT - the MPTCP code will attempt multiple times to | ||
3 | adjust the receive buffer size, wrongly accounting every time the | ||
4 | cumulative of received data - instead of accounting only for the | ||
5 | delta. | ||
2 | 6 | ||
3 | After the RX path refactor the mentioned function is expected to run | 7 | Address the issue moving mptcp_rcv_space_adjust just after the |
4 | frequently, let's optimize it a bit. | 8 | data reception and passing it only the just received bytes. |
5 | 9 | ||
6 | Scan for ready subflow from the last processed one, and stop after | 10 | This also remove an unneeded difference between the TCP and MPTCP |
7 | traversing the list once or reaching the msk memory limit - instead of | 11 | RX code path implementation. |
8 | looking for dubious per-subflow conditions. | ||
9 | Also re-order the memory limit checks, to avoid duplicate tests. | ||
10 | 12 | ||
13 | Fixes: 581302298524 ("mptcp: error out earlier on disconnect") | ||
11 | Signed-off-by: Paolo Abeni <pabeni@redhat.com> | 14 | Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
12 | Reviewed-by: Mat Martineau <martineau@kernel.org> | ||
13 | Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> | ||
14 | --- | 15 | --- |
15 | net/mptcp/protocol.c | 111 +++++++++++++++++++++++---------------------------- | 16 | Not strictly related to the refactor, found while investigating |
16 | net/mptcp/protocol.h | 2 + | 17 | the CI failure |
17 | 2 files changed, 52 insertions(+), 61 deletions(-) | 18 | --- |
19 | net/mptcp/protocol.c | 6 +++--- | ||
20 | 1 file changed, 3 insertions(+), 3 deletions(-) | ||
18 | 21 | ||
19 | diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c | 22 | diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c |
20 | index XXXXXXX..XXXXXXX 100644 | 23 | index XXXXXXX..XXXXXXX 100644 |
21 | --- a/net/mptcp/protocol.c | 24 | --- a/net/mptcp/protocol.c |
22 | +++ b/net/mptcp/protocol.c | 25 | +++ b/net/mptcp/protocol.c |
23 | @@ -XXX,XX +XXX,XX @@ static void mptcp_dss_corruption(struct mptcp_sock *msk, struct sock *ssk) | 26 | @@ -XXX,XX +XXX,XX @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) |
27 | goto out; | ||
24 | } | 28 | } |
25 | 29 | ||
26 | static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, | 30 | +static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied); |
27 | - struct sock *ssk, | ||
28 | - unsigned int *bytes) | ||
29 | + struct sock *ssk) | ||
30 | { | ||
31 | struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk); | ||
32 | struct sock *sk = (struct sock *)msk; | ||
33 | - unsigned int moved = 0; | ||
34 | bool more_data_avail; | ||
35 | struct tcp_sock *tp; | ||
36 | - bool done = false; | ||
37 | + bool ret = false; | ||
38 | |||
39 | pr_debug("msk=%p ssk=%p\n", msk, ssk); | ||
40 | tp = tcp_sk(ssk); | ||
41 | @@ -XXX,XX +XXX,XX @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, | ||
42 | struct sk_buff *skb; | ||
43 | bool fin; | ||
44 | |||
45 | + if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) | ||
46 | + break; | ||
47 | + | 31 | + |
48 | /* try to move as much data as available */ | 32 | static int __mptcp_recvmsg_mskq(struct mptcp_sock *msk, |
49 | map_remaining = subflow->map_data_len - | 33 | struct msghdr *msg, |
50 | mptcp_subflow_get_map_offset(subflow); | 34 | size_t len, int flags, |
51 | 35 | @@ -XXX,XX +XXX,XX @@ static int __mptcp_recvmsg_mskq(struct mptcp_sock *msk, | |
52 | skb = skb_peek(&ssk->sk_receive_queue); | ||
53 | - if (!skb) { | ||
54 | - /* With racing move_skbs_to_msk() and __mptcp_move_skbs(), | ||
55 | - * a different CPU can have already processed the pending | ||
56 | - * data, stop here or we can enter an infinite loop | ||
57 | - */ | ||
58 | - if (!moved) | ||
59 | - done = true; | ||
60 | + if (unlikely(!skb)) | ||
61 | break; | 36 | break; |
62 | - } | 37 | } |
63 | 38 | ||
64 | if (__mptcp_check_fallback(msk)) { | 39 | + mptcp_rcv_space_adjust(msk, copied); |
65 | /* Under fallback skbs have no MPTCP extension and TCP could | 40 | return copied; |
66 | @@ -XXX,XX +XXX,XX @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, | 41 | } |
67 | 42 | ||
68 | offset = seq - TCP_SKB_CB(skb)->seq; | 43 | @@ -XXX,XX +XXX,XX @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, |
69 | fin = TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN; | 44 | } |
70 | - if (fin) { | 45 | |
71 | - done = true; | 46 | pr_debug("block timeout %ld\n", timeo); |
72 | + if (fin) | 47 | - mptcp_rcv_space_adjust(msk, copied); |
73 | seq++; | 48 | err = sk_wait_data(sk, &timeo, NULL); |
74 | - } | 49 | if (err < 0) { |
75 | 50 | err = copied ? : err; | |
76 | if (offset < skb->len) { | 51 | @@ -XXX,XX +XXX,XX @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, |
77 | size_t len = skb->len - offset; | 52 | } |
78 | 53 | } | |
79 | - if (tp->urg_data) | 54 | |
80 | - done = true; | 55 | - mptcp_rcv_space_adjust(msk, copied); |
81 | - | 56 | - |
82 | - if (__mptcp_move_skb(msk, ssk, skb, offset, len)) | 57 | out_err: |
83 | - moved += len; | 58 | if (cmsg_flags && copied >= 0) { |
84 | + ret = __mptcp_move_skb(msk, ssk, skb, offset, len) || ret; | 59 | if (cmsg_flags & MPTCP_CMSG_TS) |
85 | seq += len; | ||
86 | |||
87 | if (unlikely(map_remaining < len)) { | ||
88 | @@ -XXX,XX +XXX,XX @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, | ||
89 | } | ||
90 | |||
91 | sk_eat_skb(ssk, skb); | ||
92 | - done = true; | ||
93 | } | ||
94 | |||
95 | WRITE_ONCE(tp->copied_seq, seq); | ||
96 | more_data_avail = mptcp_subflow_data_available(ssk); | ||
97 | |||
98 | - if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) { | ||
99 | - done = true; | ||
100 | - break; | ||
101 | - } | ||
102 | } while (more_data_avail); | ||
103 | |||
104 | - if (moved > 0) | ||
105 | + if (ret) | ||
106 | msk->last_data_recv = tcp_jiffies32; | ||
107 | - *bytes += moved; | ||
108 | - return done; | ||
109 | + return ret; | ||
110 | } | ||
111 | |||
112 | static bool __mptcp_ofo_queue(struct mptcp_sock *msk) | ||
113 | @@ -XXX,XX +XXX,XX @@ void __mptcp_error_report(struct sock *sk) | ||
114 | static bool move_skbs_to_msk(struct mptcp_sock *msk, struct sock *ssk) | ||
115 | { | ||
116 | struct sock *sk = (struct sock *)msk; | ||
117 | - unsigned int moved = 0; | ||
118 | + bool moved; | ||
119 | |||
120 | - __mptcp_move_skbs_from_subflow(msk, ssk, &moved); | ||
121 | + moved = __mptcp_move_skbs_from_subflow(msk, ssk); | ||
122 | __mptcp_ofo_queue(msk); | ||
123 | if (unlikely(ssk->sk_err)) { | ||
124 | if (!sock_owned_by_user(sk)) | ||
125 | @@ -XXX,XX +XXX,XX @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, struct sock *ssk) | ||
126 | */ | ||
127 | if (mptcp_pending_data_fin(sk, NULL)) | ||
128 | mptcp_schedule_work(sk); | ||
129 | - return moved > 0; | ||
130 | + return moved; | ||
131 | } | ||
132 | |||
133 | static void __mptcp_rcvbuf_update(struct sock *sk, struct sock *ssk) | ||
134 | @@ -XXX,XX +XXX,XX @@ static void __mptcp_data_ready(struct sock *sk, struct sock *ssk) | ||
135 | |||
136 | __mptcp_rcvbuf_update(sk, ssk); | ||
137 | |||
138 | - /* over limit? can't append more skbs to msk, Also, no need to wake-up*/ | ||
139 | - if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) | ||
140 | - return; | ||
141 | - | ||
142 | /* Wake-up the reader only for in-sequence data */ | ||
143 | if (move_skbs_to_msk(msk, ssk) && mptcp_epollin_ready(sk)) | ||
144 | sk->sk_data_ready(sk); | ||
145 | @@ -XXX,XX +XXX,XX @@ bool mptcp_schedule_work(struct sock *sk) | ||
146 | return false; | ||
147 | } | ||
148 | |||
149 | -static struct sock *mptcp_subflow_recv_lookup(const struct mptcp_sock *msk) | ||
150 | -{ | ||
151 | - struct mptcp_subflow_context *subflow; | ||
152 | - | ||
153 | - msk_owned_by_me(msk); | ||
154 | - | ||
155 | - mptcp_for_each_subflow(msk, subflow) { | ||
156 | - if (READ_ONCE(subflow->data_avail)) | ||
157 | - return mptcp_subflow_tcp_sock(subflow); | ||
158 | - } | ||
159 | - | ||
160 | - return NULL; | ||
161 | -} | ||
162 | - | ||
163 | static bool mptcp_skb_can_collapse_to(u64 write_seq, | ||
164 | const struct sk_buff *skb, | ||
165 | const struct mptcp_ext *mpext) | ||
166 | @@ -XXX,XX +XXX,XX @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied) | ||
167 | msk->rcvq_space.time = mstamp; | ||
168 | } | ||
169 | |||
170 | +static struct mptcp_subflow_context * | ||
171 | +__mptcp_first_ready_from(struct mptcp_sock *msk, | ||
172 | + struct mptcp_subflow_context *subflow) | ||
173 | +{ | ||
174 | + struct mptcp_subflow_context *start_subflow = subflow; | ||
175 | + | ||
176 | + while (!READ_ONCE(subflow->data_avail)) { | ||
177 | + subflow = mptcp_next_subflow(msk, subflow); | ||
178 | + if (subflow == start_subflow) | ||
179 | + return NULL; | ||
180 | + } | ||
181 | + return subflow; | ||
182 | +} | ||
183 | + | ||
184 | static bool __mptcp_move_skbs(struct sock *sk) | ||
185 | { | ||
186 | struct mptcp_subflow_context *subflow; | ||
187 | struct mptcp_sock *msk = mptcp_sk(sk); | ||
188 | - unsigned int moved = 0; | ||
189 | - bool ret, done; | ||
190 | + bool ret = false; | ||
191 | + | ||
192 | + if (list_empty(&msk->conn_list)) | ||
193 | + return false; | ||
194 | |||
195 | /* verify we can move any data from the subflow, eventually updating */ | ||
196 | if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) | ||
197 | mptcp_for_each_subflow(msk, subflow) | ||
198 | __mptcp_rcvbuf_update(sk, subflow->tcp_sock); | ||
199 | |||
200 | - if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) | ||
201 | - return false; | ||
202 | - | ||
203 | - do { | ||
204 | - struct sock *ssk = mptcp_subflow_recv_lookup(msk); | ||
205 | + subflow = list_first_entry(&msk->conn_list, | ||
206 | + struct mptcp_subflow_context, node); | ||
207 | + for (;;) { | ||
208 | + struct sock *ssk; | ||
209 | bool slowpath; | ||
210 | |||
211 | - if (unlikely(!ssk)) | ||
212 | + /* | ||
213 | + * As an optimization avoid traversing the subflows list | ||
214 | + * and ev. acquiring the subflow socket lock before baling out | ||
215 | + */ | ||
216 | + if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) | ||
217 | break; | ||
218 | |||
219 | - slowpath = lock_sock_fast(ssk); | ||
220 | - done = __mptcp_move_skbs_from_subflow(msk, ssk, &moved); | ||
221 | + subflow = __mptcp_first_ready_from(msk, subflow); | ||
222 | + if (!subflow) | ||
223 | + break; | ||
224 | |||
225 | + ssk = mptcp_subflow_tcp_sock(subflow); | ||
226 | + slowpath = lock_sock_fast(ssk); | ||
227 | + ret = __mptcp_move_skbs_from_subflow(msk, ssk) || ret; | ||
228 | if (unlikely(ssk->sk_err)) | ||
229 | __mptcp_error_report(sk); | ||
230 | unlock_sock_fast(ssk, slowpath); | ||
231 | - } while (!done); | ||
232 | |||
233 | - ret = moved > 0 || __mptcp_ofo_queue(msk); | ||
234 | + subflow = mptcp_next_subflow(msk, subflow); | ||
235 | + } | ||
236 | + | ||
237 | + __mptcp_ofo_queue(msk); | ||
238 | if (ret) | ||
239 | mptcp_check_data_fin((struct sock *)msk); | ||
240 | return ret; | ||
241 | diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h | ||
242 | index XXXXXXX..XXXXXXX 100644 | ||
243 | --- a/net/mptcp/protocol.h | ||
244 | +++ b/net/mptcp/protocol.h | ||
245 | @@ -XXX,XX +XXX,XX @@ struct mptcp_sock { | ||
246 | list_for_each_entry(__subflow, &((__msk)->conn_list), node) | ||
247 | #define mptcp_for_each_subflow_safe(__msk, __subflow, __tmp) \ | ||
248 | list_for_each_entry_safe(__subflow, __tmp, &((__msk)->conn_list), node) | ||
249 | +#define mptcp_next_subflow(__msk, __subflow) \ | ||
250 | + list_next_entry_circular(__subflow, &((__msk)->conn_list), node) | ||
251 | |||
252 | extern struct genl_family mptcp_genl_family; | ||
253 | |||
254 | |||
255 | -- | 60 | -- |
256 | 2.47.1 | 61 | 2.45.2 | diff view generated by jsdifflib |
1 | From: Paolo Abeni <pabeni@redhat.com> | 1 | Under some corner cases the MPTCP protocol can end-up invoking |
---|---|---|---|
2 | mptcp_cleanup_rbuf() when no data has been copied, but such helper | ||
3 | assumes the opposite condition. | ||
2 | 4 | ||
3 | After the RX path refactor, it become a wrapper for sk_rmem_alloc | 5 | Explicitly drop such assumption and performs the costly call only |
4 | access, with a slightly misleading name. Just drop it. | 6 | when strictly needed - before releasing the msk socket lock. |
5 | 7 | ||
8 | Fixes: fd8976790a6c ("mptcp: be careful on MPTCP-level ack.") | ||
6 | Signed-off-by: Paolo Abeni <pabeni@redhat.com> | 9 | Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
7 | Reviewed-by: Mat Martineau <martineau@kernel.org> | ||
8 | Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> | ||
9 | --- | 10 | --- |
10 | net/mptcp/protocol.c | 8 ++++---- | 11 | net/mptcp/protocol.c | 18 +++++++++--------- |
11 | net/mptcp/protocol.h | 11 ++--------- | 12 | 1 file changed, 9 insertions(+), 9 deletions(-) |
12 | 2 files changed, 6 insertions(+), 13 deletions(-) | ||
13 | 13 | ||
14 | diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c | 14 | diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c |
15 | index XXXXXXX..XXXXXXX 100644 | 15 | index XXXXXXX..XXXXXXX 100644 |
16 | --- a/net/mptcp/protocol.c | 16 | --- a/net/mptcp/protocol.c |
17 | +++ b/net/mptcp/protocol.c | 17 | +++ b/net/mptcp/protocol.c |
18 | @@ -XXX,XX +XXX,XX @@ static void mptcp_cleanup_rbuf(struct mptcp_sock *msk, int copied) | 18 | @@ -XXX,XX +XXX,XX @@ static void mptcp_send_ack(struct mptcp_sock *msk) |
19 | mptcp_subflow_send_ack(mptcp_subflow_tcp_sock(subflow)); | ||
20 | } | ||
21 | |||
22 | -static void mptcp_subflow_cleanup_rbuf(struct sock *ssk) | ||
23 | +static void mptcp_subflow_cleanup_rbuf(struct sock *ssk, int copied) | ||
24 | { | ||
25 | bool slow; | ||
26 | |||
27 | slow = lock_sock_fast(ssk); | ||
28 | if (tcp_can_send_ack(ssk)) | ||
29 | - tcp_cleanup_rbuf(ssk, 1); | ||
30 | + tcp_cleanup_rbuf(ssk, copied); | ||
31 | unlock_sock_fast(ssk, slow); | ||
32 | } | ||
33 | |||
34 | @@ -XXX,XX +XXX,XX @@ static bool mptcp_subflow_could_cleanup(const struct sock *ssk, bool rx_empty) | ||
35 | (ICSK_ACK_PUSHED2 | ICSK_ACK_PUSHED))); | ||
36 | } | ||
37 | |||
38 | -static void mptcp_cleanup_rbuf(struct mptcp_sock *msk) | ||
39 | +static void mptcp_cleanup_rbuf(struct mptcp_sock *msk, int copied) | ||
40 | { | ||
41 | int old_space = READ_ONCE(msk->old_wspace); | ||
42 | struct mptcp_subflow_context *subflow; | ||
43 | @@ -XXX,XX +XXX,XX @@ static void mptcp_cleanup_rbuf(struct mptcp_sock *msk) | ||
44 | int space = __mptcp_space(sk); | ||
19 | bool cleanup, rx_empty; | 45 | bool cleanup, rx_empty; |
20 | 46 | ||
21 | cleanup = (space > 0) && (space >= (old_space << 1)) && copied; | 47 | - cleanup = (space > 0) && (space >= (old_space << 1)); |
22 | - rx_empty = !__mptcp_rmem(sk) && copied; | 48 | - rx_empty = !__mptcp_rmem(sk); |
23 | + rx_empty = !sk_rmem_alloc_get(sk) && copied; | 49 | + cleanup = (space > 0) && (space >= (old_space << 1)) && copied; |
50 | + rx_empty = !__mptcp_rmem(sk) && copied; | ||
24 | 51 | ||
25 | mptcp_for_each_subflow(msk, subflow) { | 52 | mptcp_for_each_subflow(msk, subflow) { |
26 | struct sock *ssk = mptcp_subflow_tcp_sock(subflow); | 53 | struct sock *ssk = mptcp_subflow_tcp_sock(subflow); |
27 | @@ -XXX,XX +XXX,XX @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, | 54 | |
28 | WRITE_ONCE(tp->copied_seq, seq); | 55 | if (cleanup || mptcp_subflow_could_cleanup(ssk, rx_empty)) |
29 | more_data_avail = mptcp_subflow_data_available(ssk); | 56 | - mptcp_subflow_cleanup_rbuf(ssk); |
30 | 57 | + mptcp_subflow_cleanup_rbuf(ssk, copied); | |
31 | - if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) { | 58 | } |
32 | + if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) { | 59 | } |
33 | done = true; | 60 | |
34 | break; | 61 | @@ -XXX,XX +XXX,XX @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, |
62 | |||
63 | copied += bytes_read; | ||
64 | |||
65 | - /* be sure to advertise window change */ | ||
66 | - mptcp_cleanup_rbuf(msk); | ||
67 | - | ||
68 | if (skb_queue_empty(&msk->receive_queue) && __mptcp_move_skbs(msk)) | ||
69 | continue; | ||
70 | |||
71 | @@ -XXX,XX +XXX,XX @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, | ||
35 | } | 72 | } |
36 | @@ -XXX,XX +XXX,XX @@ static void __mptcp_data_ready(struct sock *sk, struct sock *ssk) | 73 | |
37 | __mptcp_rcvbuf_update(sk, ssk); | 74 | pr_debug("block timeout %ld\n", timeo); |
38 | 75 | + mptcp_cleanup_rbuf(msk, copied); | |
39 | /* over limit? can't append more skbs to msk, Also, no need to wake-up*/ | 76 | err = sk_wait_data(sk, &timeo, NULL); |
40 | - if (__mptcp_rmem(sk) > sk->sk_rcvbuf) | 77 | if (err < 0) { |
41 | + if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) | 78 | err = copied ? : err; |
42 | return; | 79 | @@ -XXX,XX +XXX,XX @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, |
43 | 80 | } | |
44 | /* Wake-up the reader only for in-sequence data */ | 81 | } |
45 | @@ -XXX,XX +XXX,XX @@ static bool __mptcp_move_skbs(struct sock *sk) | 82 | |
46 | mptcp_for_each_subflow(msk, subflow) | 83 | + mptcp_cleanup_rbuf(msk, copied); |
47 | __mptcp_rcvbuf_update(sk, subflow->tcp_sock); | 84 | + |
48 | 85 | out_err: | |
49 | - if (__mptcp_rmem(sk) > sk->sk_rcvbuf) | 86 | if (cmsg_flags && copied >= 0) { |
50 | + if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) | 87 | if (cmsg_flags & MPTCP_CMSG_TS) |
51 | return false; | ||
52 | |||
53 | do { | ||
54 | diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h | ||
55 | index XXXXXXX..XXXXXXX 100644 | ||
56 | --- a/net/mptcp/protocol.h | ||
57 | +++ b/net/mptcp/protocol.h | ||
58 | @@ -XXX,XX +XXX,XX @@ static inline void msk_owned_by_me(const struct mptcp_sock *msk) | ||
59 | #define mptcp_sk(ptr) container_of_const(ptr, struct mptcp_sock, sk.icsk_inet.sk) | ||
60 | #endif | ||
61 | |||
62 | -/* the msk socket don't use the backlog, also account for the bulk | ||
63 | - * free memory | ||
64 | - */ | ||
65 | -static inline int __mptcp_rmem(const struct sock *sk) | ||
66 | -{ | ||
67 | - return atomic_read(&sk->sk_rmem_alloc); | ||
68 | -} | ||
69 | - | ||
70 | static inline int mptcp_win_from_space(const struct sock *sk, int space) | ||
71 | { | ||
72 | return __tcp_win_from_space(mptcp_sk(sk)->scaling_ratio, space); | ||
73 | @@ -XXX,XX +XXX,XX @@ static inline int mptcp_space_from_win(const struct sock *sk, int win) | ||
74 | |||
75 | static inline int __mptcp_space(const struct sock *sk) | ||
76 | { | ||
77 | - return mptcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf) - __mptcp_rmem(sk)); | ||
78 | + return mptcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf) - | ||
79 | + sk_rmem_alloc_get(sk)); | ||
80 | } | ||
81 | |||
82 | static inline struct mptcp_data_frag *mptcp_send_head(const struct sock *sk) | ||
83 | |||
84 | -- | 88 | -- |
85 | 2.47.1 | 89 | 2.45.2 | diff view generated by jsdifflib |
1 | From: Paolo Abeni <pabeni@redhat.com> | ||
---|---|---|---|
2 | |||
3 | Consolidate all the cleanup actions requiring the worker in a single | 1 | Consolidate all the cleanup actions requiring the worker in a single |
4 | helper and ensure the dummy data fin creation for fallback socket is | 2 | helper and ensure the dummy data fin creation for fallback socket is |
5 | performed only when the tcp rx queue is empty. | 3 | performed only when the tcp rx queue is empty. |
6 | 4 | ||
7 | There are no functional changes intended, but this will simplify the | 5 | There are no functional changes intended, but this will simplify the |
8 | next patch, when the tcp rx queue spooling could be delayed at release_cb | 6 | next patch, when the tcp rx queue spooling could be delayed at release_cb |
9 | time. | 7 | time. |
10 | 8 | ||
11 | Signed-off-by: Paolo Abeni <pabeni@redhat.com> | 9 | Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
12 | Reviewed-by: Mat Martineau <martineau@kernel.org> | ||
13 | Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> | ||
14 | --- | 10 | --- |
15 | net/mptcp/subflow.c | 33 ++++++++++++++++++--------------- | 11 | net/mptcp/subflow.c | 33 ++++++++++++++++++--------------- |
16 | 1 file changed, 18 insertions(+), 15 deletions(-) | 12 | 1 file changed, 18 insertions(+), 15 deletions(-) |
17 | 13 | ||
18 | diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c | 14 | diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c |
... | ... | ||
78 | - mptcp_update_rcv_data_fin(msk, READ_ONCE(msk->ack_seq), true)) | 74 | - mptcp_update_rcv_data_fin(msk, READ_ONCE(msk->ack_seq), true)) |
79 | - mptcp_schedule_work(parent); | 75 | - mptcp_schedule_work(parent); |
80 | } | 76 | } |
81 | 77 | ||
82 | void mptcp_subflow_queue_clean(struct sock *listener_sk, struct sock *listener_ssk) | 78 | void mptcp_subflow_queue_clean(struct sock *listener_sk, struct sock *listener_ssk) |
83 | |||
84 | -- | 79 | -- |
85 | 2.47.1 | 80 | 2.45.2 | diff view generated by jsdifflib |
1 | From: Paolo Abeni <pabeni@redhat.com> | ||
---|---|---|---|
2 | |||
3 | After commit c2e6048fa1cf ("mptcp: fix race in release_cb") we can | 1 | After commit c2e6048fa1cf ("mptcp: fix race in release_cb") we can |
4 | move the whole MPTCP rx path under the socket lock leveraging the | 2 | move the whole MPTCP rx path under the socket lock leveraging the |
5 | release_cb. | 3 | release_cb. |
6 | 4 | ||
7 | We can drop a bunch of spin_lock pairs in the receive functions, use | 5 | We can drop a bunch of spin_lock pairs in the receive functions, use |
... | ... | ||
19 | When the skbs move is delayed at msk release callback time, even the | 17 | When the skbs move is delayed at msk release callback time, even the |
20 | msk rcvbuf update is delayed; additionally take care of such action in | 18 | msk rcvbuf update is delayed; additionally take care of such action in |
21 | __mptcp_move_skbs(). | 19 | __mptcp_move_skbs(). |
22 | 20 | ||
23 | Signed-off-by: Paolo Abeni <pabeni@redhat.com> | 21 | Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
24 | Reviewed-by: Mat Martineau <martineau@kernel.org> | ||
25 | Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> | ||
26 | --- | 22 | --- |
27 | net/mptcp/fastopen.c | 1 + | 23 | v1 -> v2: |
28 | net/mptcp/protocol.c | 123 ++++++++++++++++++++++++--------------------------- | 24 | - cleanup && fixup msk sk_rcvbuf update |
25 | - fix missed wakeup due to bad placed __mptcp_move_skbs(); move it | ||
26 | form __mptcp_recvmsg_mskq into mptcp_recvmsg() | ||
27 | - added missing '\n' in debug message format string | ||
28 | - keep 'snd_una' && friends update under the mptcp data lock | ||
29 | --- | ||
30 | net/mptcp/fastopen.c | 2 + | ||
31 | net/mptcp/protocol.c | 123 ++++++++++++++++++++----------------------- | ||
29 | net/mptcp/protocol.h | 2 +- | 32 | net/mptcp/protocol.h | 2 +- |
30 | 3 files changed, 60 insertions(+), 66 deletions(-) | 33 | 3 files changed, 61 insertions(+), 66 deletions(-) |
31 | 34 | ||
32 | diff --git a/net/mptcp/fastopen.c b/net/mptcp/fastopen.c | 35 | diff --git a/net/mptcp/fastopen.c b/net/mptcp/fastopen.c |
33 | index XXXXXXX..XXXXXXX 100644 | 36 | index XXXXXXX..XXXXXXX 100644 |
34 | --- a/net/mptcp/fastopen.c | 37 | --- a/net/mptcp/fastopen.c |
35 | +++ b/net/mptcp/fastopen.c | 38 | +++ b/net/mptcp/fastopen.c |
36 | @@ -XXX,XX +XXX,XX @@ void mptcp_fastopen_subflow_synack_set_params(struct mptcp_subflow_context *subf | 39 | @@ -XXX,XX +XXX,XX @@ void mptcp_fastopen_subflow_synack_set_params(struct mptcp_subflow_context *subf |
37 | MPTCP_SKB_CB(skb)->cant_coalesce = 1; | 40 | MPTCP_SKB_CB(skb)->has_rxtstamp = TCP_SKB_CB(skb)->has_rxtstamp; |
38 | 41 | ||
39 | mptcp_data_lock(sk); | 42 | mptcp_data_lock(sk); |
40 | + DEBUG_NET_WARN_ON_ONCE(sock_owned_by_user_nocheck(sk)); | 43 | + DEBUG_NET_WARN_ON_ONCE(sock_owned_by_user_nocheck(sk)); |
41 | 44 | ||
42 | mptcp_set_owner_r(skb, sk); | 45 | mptcp_set_owner_r(skb, sk); |
43 | __skb_queue_tail(&sk->sk_receive_queue, skb); | 46 | __skb_queue_tail(&sk->sk_receive_queue, skb); |
47 | @@ -XXX,XX +XXX,XX @@ void __mptcp_fastopen_gen_msk_ackseq(struct mptcp_sock *msk, struct mptcp_subflo | ||
48 | struct sock *sk = (struct sock *)msk; | ||
49 | struct sk_buff *skb; | ||
50 | |||
51 | + DEBUG_NET_WARN_ON_ONCE(sock_owned_by_user_nocheck(sk)); | ||
52 | skb = skb_peek_tail(&sk->sk_receive_queue); | ||
53 | if (skb) { | ||
54 | WARN_ON_ONCE(MPTCP_SKB_CB(skb)->end_seq); | ||
44 | diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c | 55 | diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c |
45 | index XXXXXXX..XXXXXXX 100644 | 56 | index XXXXXXX..XXXXXXX 100644 |
46 | --- a/net/mptcp/protocol.c | 57 | --- a/net/mptcp/protocol.c |
47 | +++ b/net/mptcp/protocol.c | 58 | +++ b/net/mptcp/protocol.c |
48 | @@ -XXX,XX +XXX,XX @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, | 59 | @@ -XXX,XX +XXX,XX @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, |
... | ... | ||
346 | struct rb_root out_of_order_queue; | 357 | struct rb_root out_of_order_queue; |
347 | - struct sk_buff_head receive_queue; | 358 | - struct sk_buff_head receive_queue; |
348 | struct list_head conn_list; | 359 | struct list_head conn_list; |
349 | struct list_head rtx_queue; | 360 | struct list_head rtx_queue; |
350 | struct mptcp_data_frag *first_pending; | 361 | struct mptcp_data_frag *first_pending; |
351 | |||
352 | -- | 362 | -- |
353 | 2.47.1 | 363 | 2.45.2 | diff view generated by jsdifflib |
1 | From: Paolo Abeni <pabeni@redhat.com> | ||
---|---|---|---|
2 | |||
3 | After the previous patch, updating sk_forward_memory is cheap and | 1 | After the previous patch, updating sk_forward_memory is cheap and |
4 | we can drop a lot of complexity from the MPTCP memory accounting, | 2 | we can drop a lot of complexity from the MPTCP memory acconting, |
5 | removing the custom fwd mem allocations for rmem. | 3 | removing the custom fwd mem allocations for rmem. |
6 | 4 | ||
7 | Signed-off-by: Paolo Abeni <pabeni@redhat.com> | 5 | Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
8 | Reviewed-by: Mat Martineau <martineau@kernel.org> | 6 | --- |
9 | Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> | 7 | v1 -> v2: |
8 | - keep 'snd_una' and recovery-related fields under the msk | ||
9 | data lock | ||
10 | - dropped unneeded code in __mptcp_move_skbs() | ||
10 | --- | 11 | --- |
11 | net/mptcp/fastopen.c | 2 +- | 12 | net/mptcp/fastopen.c | 2 +- |
12 | net/mptcp/protocol.c | 115 ++++----------------------------------------------- | 13 | net/mptcp/protocol.c | 115 +++---------------------------------------- |
13 | net/mptcp/protocol.h | 4 +- | 14 | net/mptcp/protocol.h | 4 +- |
14 | 3 files changed, 10 insertions(+), 111 deletions(-) | 15 | 3 files changed, 10 insertions(+), 111 deletions(-) |
15 | 16 | ||
16 | diff --git a/net/mptcp/fastopen.c b/net/mptcp/fastopen.c | 17 | diff --git a/net/mptcp/fastopen.c b/net/mptcp/fastopen.c |
17 | index XXXXXXX..XXXXXXX 100644 | 18 | index XXXXXXX..XXXXXXX 100644 |
... | ... | ||
285 | - return atomic_read(&sk->sk_rmem_alloc) - READ_ONCE(mptcp_sk(sk)->rmem_released); | 286 | - return atomic_read(&sk->sk_rmem_alloc) - READ_ONCE(mptcp_sk(sk)->rmem_released); |
286 | + return atomic_read(&sk->sk_rmem_alloc); | 287 | + return atomic_read(&sk->sk_rmem_alloc); |
287 | } | 288 | } |
288 | 289 | ||
289 | static inline int mptcp_win_from_space(const struct sock *sk, int space) | 290 | static inline int mptcp_win_from_space(const struct sock *sk, int space) |
290 | |||
291 | -- | 291 | -- |
292 | 2.47.1 | 292 | 2.45.2 | diff view generated by jsdifflib |
1 | From: Paolo Abeni <pabeni@redhat.com> | ||
---|---|---|---|
2 | |||
3 | After the previous patch we can remove the forward_alloc_get | 1 | After the previous patch we can remove the forward_alloc_get |
4 | proto callback, basically reverting commit 292e6077b040 ("net: introduce | 2 | proto callback, basically reverting commit 292e6077b040 ("net: introduce |
5 | sk_forward_alloc_get()") and commit 66d58f046c9d ("net: use | 3 | sk_forward_alloc_get()") and commit 66d58f046c9d ("net: use |
6 | sk_forward_alloc_get() in sk_get_meminfo()"). | 4 | sk_forward_alloc_get() in sk_get_meminfo()"). |
7 | 5 | ||
8 | Signed-off-by: Paolo Abeni <pabeni@redhat.com> | 6 | Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
9 | Acked-by: Mat Martineau <martineau@kernel.org> | ||
10 | Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> | ||
11 | --- | 7 | --- |
12 | include/net/sock.h | 13 ------------- | 8 | include/net/sock.h | 13 ------------- |
13 | net/core/sock.c | 2 +- | 9 | net/core/sock.c | 2 +- |
14 | net/ipv4/af_inet.c | 2 +- | 10 | net/ipv4/af_inet.c | 2 +- |
15 | net/ipv4/inet_diag.c | 2 +- | 11 | net/ipv4/inet_diag.c | 2 +- |
... | ... | ||
97 | - dst->value = sk_forward_alloc_get(sk); | 93 | - dst->value = sk_forward_alloc_get(sk); |
98 | + dst->value = READ_ONCE(sk->sk_forward_alloc); | 94 | + dst->value = READ_ONCE(sk->sk_forward_alloc); |
99 | } | 95 | } |
100 | 96 | ||
101 | META_COLLECTOR(int_sk_sndbuf) | 97 | META_COLLECTOR(int_sk_sndbuf) |
102 | |||
103 | -- | 98 | -- |
104 | 2.47.1 | 99 | 2.45.2 | diff view generated by jsdifflib |