Fwd: Re: [PATCH mptcp-net v5 4/5] mptcp: fix the dead_lock in mptcp_data_ready

gang.yan@linux.dev posted 1 patch 1 week, 1 day ago
Failed in applying to current master (apply log)
Fwd: Re: [PATCH mptcp-net v5 4/5] mptcp: fix the dead_lock in mptcp_data_ready
Posted by gang.yan@linux.dev 1 week, 1 day ago
-------- Forwarded message -------

From: gang.yan@linux.dev mailto:gang.yan@linux.dev 
To: "Paolo Abeni" <pabeni@redhat.com mailto:pabeni@redhat.com?to=%22Paolo%20Abeni%22%20%3Cpabeni%40redhat.com%3E >
Sent: April 3, 2026 at 2:44 PM
Subject: Re: [PATCH mptcp-net v5 4/5] mptcp: fix the dead_lock in mptcp_data_ready

April 2, 2026 at 4:02 AM, "Paolo Abeni" <pabeni@redhat.com mailto:pabeni@redhat.com?to=%22Paolo%20Abeni%22%20%3Cpabeni%40redhat.com%3E > wrote:



> 
> On 4/1/26 10:54 AM, Gang Yan wrote:
>  
>  
>  From: Gang Yan <yangang@kylinos.cn>
>  
>  This patch defers mptcp_check_data_fin from __mptcp_move_skbs to avoid
>  deadlock.
>  
>  When processing backlogged data in softirq context, __mptcp_move_skbs
>  directly calls mptcp_check_data_fin, which can lead to a deadlock if
>  the msk socket is in TCP_FIN_WAIT2 state. In that case,
>  mptcp_check_data_fin calls mptcp_shutdown_subflows, which attempts to
>  lock the subflow socket with lock_sock_fast() - a sleeping function
>  that cannot be called in atomic context (softirq).
>  
>  The fix for such issue should be squashed inside the patch generating
>  the issue itself.
>  
>  
>  The correct pattern is already used in move_skbs_to_msk: if a data fin
>  is pending, schedule the work to be processed later in process context
>  rather than handling it directly.
>  
>  Reported-by: Geliang Tang <geliang@kernel.org>
>  Co-developed-by: Geliang Tang <geliang@kernel.org>
>  Signed-off-by: Geliang Tang <geliang@kernel.org>
>  Signed-off-by: Gang Yan <yangang@kylinos.cn>
>  ---
>  Notes:
>  
>  Hi Matt,Paolo:
>  
>  Here is the response of AI review for this patch:
>  
>  - Typo problem in subject line:
>  Already fixed in v5.
>  
>  - Remove else branch:
>  It demonstrates a TOCTOU race, though the race window is extremely
>  narrow. The else branch is useless even without the race. So I
>  removed it in v5. WDYT?
>  
>  Thanks,
>  Gang
>  
>  Changelog:
>  v5:
>  - Fix typo in title.
>  - Remove the else branch.
>  
>  ---
>  net/mptcp/protocol.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>  
>  diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>  index 326b5d4d79e7..88e19fa61b84 100644
>  --- a/net/mptcp/protocol.c
>  +++ b/net/mptcp/protocol.c
>  @@ -2247,8 +2247,9 @@ static bool __mptcp_move_skbs(struct sock *sk, struct list_head *skbs, u32 *delt
>  }
>  
>  __mptcp_ofo_queue(msk);
>  - if (moved)
>  - mptcp_check_data_fin((struct sock *)msk);
>  + if (moved && mptcp_pending_data_fin(sk, NULL))
>  + mptcp_schedule_work(sk);
>  
>  Scheduling the worker when we are not in BH could introduce problematic
>  delays. Instead you could factor out a ____mptcp_move_skbs() helper (a
>  better name would be nice!) not including the mptcp_check_data_fin()
>  call, use the latter in mptcp_data_ready() and in such function
>  eventually schedule the mptcp work as needed.
> 
Hi Paolo,

Thanks for your review. 
I'm a little bit confused. Does this mean the 'move_skbs_to_msk' should be
fixed too? like:

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 6ce0d18c86b6..e90c8621683c 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -869,13 +869,6 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, struct sock *ssk)
 if (unlikely(ssk->sk_err))
 __mptcp_subflow_error_report(sk, ssk);
 
- /* If the moves have caught up with the DATA_FIN sequence number
- * it's time to ack the DATA_FIN and change socket state, but
- * this is not a good place to change state. Let the workqueue
- * do it.
- */
- if (mptcp_pending_data_fin(sk, NULL))
- mptcp_schedule_work(sk);
 return moved;
 }
 
@@ -909,6 +902,7 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
 {
 struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
 struct mptcp_sock *msk = mptcp_sk(sk);
+ bool moved = false;
 
 /* The peer can send data while we are shutting down this
 * subflow at subflow destruction time, but we must avoid enqueuing
@@ -920,13 +914,17 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
 mptcp_data_lock(sk);
 mptcp_rcv_rtt_update(msk, subflow);
 if (!sock_owned_by_user(sk)) {
+ moved = move_skbs_to_msk(msk, ssk); 
 /* Wake-up the reader only for in-sequence data */
- if (move_skbs_to_msk(msk, ssk) && mptcp_epollin_ready(sk))
+ if (moved && mptcp_epollin_ready(sk))
 sk->sk_data_ready(sk);
 } else {
 __mptcp_move_skbs_from_subflow(msk, ssk, false);
 }
 mptcp_data_unlock(sk);
+ 
+ if (mptcp_pending_data_fin(sk, NULL))
+ mptcp_check_data_fin(sk);
 }

Looking forward for your reply

Very thanks
Gang

> 
> /P
>