[Patch mptcp-net 2/3] mptcp: fix receive stalls when 'ack_seq' in backlog_list

Gang Yan posted 3 patches 4 days, 7 hours ago
[Patch mptcp-net 2/3] mptcp: fix receive stalls when 'ack_seq' in backlog_list
Posted by Gang Yan 4 days, 7 hours ago
From: Gang Yan <yangang@kylinos.cn>

This patch fixes the first backlog_list issue, which can be 100%
reproduced by the multi_chunk.sh test case.

The problem occurs when out-of-order (OFO) queue data accumulates,
causing sk_rmem_alloc to exceed sk_rcvbuf. In this condition, skbs
in backlog_list fail to move to sk->receive_queue even when the
receive queue is empty, leading to transmission stalls.

This patch adds an empty check for sk->receieve_queue, when it is empty,
the skb should moved too.

Co-developed-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
---
 net/mptcp/protocol.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index ff06bbee7781..b6bafc37eea4 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2176,7 +2176,8 @@ static bool __mptcp_move_skbs(struct sock *sk, struct list_head *skbs, u32 *delt
 	*delta = 0;
 	while (1) {
 		/* If the msk recvbuf is full stop, don't drop */
-		if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
+		if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf &&
+		    !skb_queue_empty(&sk->sk_receive_queue))
 			break;
 
 		prefetch(skb->next);
@@ -2208,7 +2209,8 @@ static bool mptcp_can_spool_backlog(struct sock *sk, struct list_head *skbs)
 
 	/* Don't spool the backlog if the rcvbuf is full. */
 	if (list_empty(&msk->backlog_list) ||
-	    sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
+	    (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf &&
+	     !skb_queue_empty(&sk->sk_receive_queue)))
 		return false;
 
 	INIT_LIST_HEAD(skbs);
-- 
2.43.0
Re: [Patch mptcp-net 2/3] mptcp: fix receive stalls when 'ack_seq' in backlog_list
Posted by Paolo Abeni 4 days, 4 hours ago
On 2/5/26 7:41 AM, Gang Yan wrote:
> From: Gang Yan <yangang@kylinos.cn>
> 
> This patch fixes the first backlog_list issue, which can be 100%
> reproduced by the multi_chunk.sh test case.
> 
> The problem occurs when out-of-order (OFO) queue data accumulates,
> causing sk_rmem_alloc to exceed sk_rcvbuf. In this condition, skbs
> in backlog_list fail to move to sk->receive_queue even when the
> receive queue is empty, leading to transmission stalls.

Very good catch! How did you hit the issue?

> 
> This patch adds an empty check for sk->receieve_queue, when it is empty,
> the skb should moved too.
> 
> Co-developed-by: Geliang Tang <tanggeliang@kylinos.cn>
> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> Signed-off-by: Gang Yan <yangang@kylinos.cn>
> ---
>  net/mptcp/protocol.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index ff06bbee7781..b6bafc37eea4 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -2176,7 +2176,8 @@ static bool __mptcp_move_skbs(struct sock *sk, struct list_head *skbs, u32 *delt
>  	*delta = 0;
>  	while (1) {
>  		/* If the msk recvbuf is full stop, don't drop */
> -		if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
> +		if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf &&
> +		    !skb_queue_empty(&sk->sk_receive_queue))

I think we need to be more conservative, otherwise this could be
exploited to allow unbound increase of the rcvbuf.

Unfortunately it does not look easy, but I think it would be doable:

- convert the backlog to a rbtree, sorted by MPTCP-level sequence
number; that would likely require:
  - factoring out a slightly more generic helper out of mptcp_data_queue_ofo
  - use it for both mptcp_data_queue_ofo() and enqueue to backlog
- implement an helper to check if we can dequeue from the backlog; it
should check that we are below memory bounds or that the first skb in
the backlog is in-sequence.
- use such helper here and in mptcp_can_spool_backlog()

I see it's not easy, but I think is necessary. Simpler alternatives more
than welcome!

(side note, on top of my head I think that even plain TCP stumbled upon
a similar situation, i.e. could overrun some limits for in-sequence skbs).

/P