[PATCH v5 mptcp-next 00/10] mptcp: introduce backlog processing

Paolo Abeni posted 10 patches 4 days, 17 hours ago
Failed in applying to current master (apply log)
net/mptcp/pm.c        |   4 +-
net/mptcp/pm_kernel.c |   2 +
net/mptcp/protocol.c  | 323 ++++++++++++++++++++++++++++--------------
net/mptcp/protocol.h  |   8 +-
net/mptcp/subflow.c   |  12 +-
5 files changed, 233 insertions(+), 116 deletions(-)
[PATCH v5 mptcp-next 00/10] mptcp: introduce backlog processing
Posted by Paolo Abeni 4 days, 17 hours ago
This series includes RX path improvement built around backlog processing

The main goals are improving the RX performances _and_ increase the
long term maintainability.

Patches 1-3 prepare the stack for backlog processing, removing
assumptions that will not hold true anymore after backlog introduction.

Patch 4 fixes a long standing issue which is quite hard to reproduce
with the current implementation but will become very apparent with
backlog usage.

Patches 5, 6 and 8 are more cleanups that will make the backlog patch a
little less huge.

Patch 7 is a somewhat unrelated cleanup, included here before I forgot
about it.

The real work is done by patch 9 and 10. Patch 9 introduces the helpers
needed to manipulate the msk-level backlog, and the data struct itself,
without any actual functional change. Patch 10 finally use the backlog
for RX skb processing. Note that MPTCP can't uset the sk_backlog, as
the mptcp release callback can also release and re-acquire the msk-level
spinlock and core backlog processing works under the assumption that
such event is not possible.

Other relevant points are:
- skbs in the backlog are _not_ accounted. TCP does the same, and we
  can't update the fwd mem while enqueuing to the backlog as the caller
  does not own the msk-level socket lock nor can acquire it.
- skbs in the backlog still use the incoming ssk rmem. This allows
  backpressure and implicitly prevent excessive memory usage for the
  backlog itself
- [this is possibly the most critical point]: when the msk rx buf is
  full, we don't add more packets there even when the caller owns the
  msk socket lock. Instead packets are added to the backlog. Note that
  the amount of memory used there is still limited by the above. Also
  note that this implicitly means that such packets could stage in the
  backlog until the receiver flushes the rx buffer - an unbound amount
  of time. That is not supposed to happen for the backlog, ence the
  criticality here.
---
This should address the issues reported by the CI on the previous
iteration (at least here), and feature some more patch splits to make
the last one less big. See the individual patches changelog for the
details.

Side note: local testing hinted we have some unrelated/pre-existend
issues with mptcp-level rcvwin management that I think deserve a better
investigation. Specifically I observe, especially in the peek tests,
RCVWNDSHARED events even with a single flow - and that is quite
unexpected.

Paolo Abeni (10):
  mptcp: borrow forward memory from subflow
  mptcp: cleanup fallback data fin reception
  mptcp: cleanup fallback dummy mapping generation
  mptcp: fix MSG_PEEK stream corruption
  mptcp: ensure the kernel PM does not take action too late
  mptcp: do not miss early first subflow close event notification.
  mptcp: make mptcp_destroy_common() static
  mptcp: drop the __mptcp_data_ready() helper
  mptcp: introduce mptcp-level backlog
  mptcp: leverage the backlog for RX packet processing

 net/mptcp/pm.c        |   4 +-
 net/mptcp/pm_kernel.c |   2 +
 net/mptcp/protocol.c  | 323 ++++++++++++++++++++++++++++--------------
 net/mptcp/protocol.h  |   8 +-
 net/mptcp/subflow.c   |  12 +-
 5 files changed, 233 insertions(+), 116 deletions(-)

-- 
2.51.0
Re: [PATCH v5 mptcp-next 00/10] mptcp: introduce backlog processing
Posted by Matthieu Baerts 4 days, 8 hours ago
Hi Paolo,

On 06/10/2025 10:11, Paolo Abeni wrote:
> This series includes RX path improvement built around backlog processing

Thank you for the new version! This is not a review, but just a note to
tell you patchew didn't manage to apply the patches due to the same
conflict that was already there with the v4 (mptcp_init_skb() parameters
have been moved to the previous line). I just applied the patches
manually. While at it, I also used this test branch for syzkaller to
validate them.

(Also, on patch "mptcp: drop the __mptcp_data_ready() helper", git
complained that there is a trailing whitespace.)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.
Re: [PATCH v5 mptcp-next 00/10] mptcp: introduce backlog processing
Posted by Geliang Tang 2 days, 22 hours ago
Hi Paolo, Matt,

On Mon, 2025-10-06 at 19:07 +0200, Matthieu Baerts wrote:
> Hi Paolo,
> 
> On 06/10/2025 10:11, Paolo Abeni wrote:
> > This series includes RX path improvement built around backlog
> > processing
> Thank you for the new version! This is not a review, but just a note
> to
> tell you patchew didn't manage to apply the patches due to the same
> conflict that was already there with the v4 (mptcp_init_skb()
> parameters
> have been moved to the previous line). I just applied the patches
> manually. While at it, I also used this test branch for syzkaller to
> validate them.
> 
> (Also, on patch "mptcp: drop the __mptcp_data_ready() helper", git
> complained that there is a trailing whitespace.)

Sorry, patches 9-10 break my "implement mptcp read_sock" v12 series. I
rebased this series on patches 1-8, it works well. But after applying
patches 9-10, I changed mptcp_recv_skb() in [1] from

static struct sk_buff *mptcp_recv_skb(struct sock *sk, u32 *off)
{
        struct mptcp_sock *msk = mptcp_sk(sk);
        struct sk_buff *skb;
        u32 offset;

        if (skb_queue_empty(&sk->sk_receive_queue))
                __mptcp_move_skbs(sk);

        while ((skb = skb_peek(&sk->sk_receive_queue)) != NULL) {
                offset = MPTCP_SKB_CB(skb)->offset;
                if (offset < skb->len) {
                        *off = offset;
                        return skb; 
                }    
                mptcp_eat_recv_skb(sk, skb);
        }    
        return NULL;
}

to

static struct sk_buff *mptcp_recv_skb(struct sock *sk, u32 *off)
{
        struct mptcp_sock *msk = mptcp_sk(sk);
        struct sk_buff *skb;
        u32 offset;

        if (!list_empty(&msk->backlog_list))
                mptcp_move_skbs(sk);

        while ((skb = skb_peek(&sk->sk_receive_queue)) != NULL) {
                offset = MPTCP_SKB_CB(skb)->offset;
                if (offset < skb->len) {
                        *off = offset;
                        return skb; 
                }    
                mptcp_eat_recv_skb(sk, skb);
        }    
        return NULL;
}

The splice tests (mptcp_connect_splice.sh) have a low probability
(approximately 1 in 100) of reporting timeout failures:


	=== Attempt: 158 (Wed, 08 Oct 2025 02:35:45 +0000) ===


Selftest Test: ./mptcp_connect_splice.sh
TAP version 13
1..1
# INFO: set ns3-0wY081 dev ns3eth2: ethtool -K  gso off gro off
# INFO: set ns4-MjBWza dev ns4eth3: ethtool -K tso off gro off
# Created /tmp/tmp.rxe4DwYW9E (size 5136 B) containing data sent by
client
# Created /tmp/tmp.0H0GbllUo9 (size 7193203 B) containing data sent by
server
# 01 New MPTCP socket can be blocked via sysctl                       
[ OK ]
# 02 Validating network environment with pings                        
[ OK ]
# INFO: Using loss of 0.07% delay 21 ms reorder 99% 66% with delay 5ms
on ns3eth4
# INFO: extra options:  -m splice
# 03 ns1 MPTCP -> ns1 (10.0.1.1:10000      ) MPTCP     (duration  
152ms) [ OK ]
# 04 ns1 MPTCP -> ns1 (10.0.1.1:10001      ) TCP       (duration  
152ms) [ OK ]
# 05 ns1 TCP   -> ns1 (10.0.1.1:10002      ) MPTCP     (duration  
149ms) [ OK ]
# 06 ns1 MPTCP -> ns1 (dead:beef:1::1:10003) MPTCP     (duration  
151ms) [ OK ]
# 07 ns1 MPTCP -> ns1 (dead:beef:1::1:10004) TCP       (duration  
169ms) [ OK ]
# 08 ns1 TCP   -> ns1 (dead:beef:1::1:10005) MPTCP     (duration  
152ms) [ OK ]
# 09 ns1 MPTCP -> ns2 (10.0.1.2:10006      ) MPTCP     (duration  
172ms) [ OK ]
# 10 ns1 MPTCP -> ns2 (dead:beef:1::2:10007) MPTCP     (duration  
172ms) [ OK ]
# 11 ns1 MPTCP -> ns2 (10.0.2.1:10008      ) MPTCP     (duration  
157ms) [ OK ]
# 12 ns1 MPTCP -> ns2 (dead:beef:2::1:10009) MPTCP     (duration  
157ms) [ OK ]
# 13 ns1 MPTCP -> ns3 (10.0.2.2:10010      ) MPTCP     (duration  
497ms) [ OK ]
# 14 ns1 MPTCP -> ns3 (dead:beef:2::2:10011) MPTCP     (duration  
500ms) [ OK ]
# 15 ns1 MPTCP -> ns3 (10.0.3.2:10012      ) MPTCP     (duration  
602ms) [ OK ]
# 16 ns1 MPTCP -> ns3 (dead:beef:3::2:10013) MPTCP     (duration  
571ms) [ OK ]
# 17 ns1 MPTCP -> ns4 (10.0.3.1:10014      ) MPTCP     (duration  
544ms) [ OK ]
# 18 ns1 MPTCP -> ns4 (dead:beef:3::1:10015) MPTCP     (duration  
627ms) [ OK ]
# 19 ns2 MPTCP -> ns1 (10.0.1.1:10016      ) MPTCP     (duration  
136ms) [ OK ]
# 20 ns2 MPTCP -> ns1 (dead:beef:1::1:10017) MPTCP     (duration  
181ms) [ OK ]
# 21 ns2 MPTCP -> ns3 (10.0.2.2:10018      ) MPTCP     (duration  
415ms) [ OK ]
# 22 ns2 MPTCP -> ns3 (dead:beef:2::2:10019) MPTCP     (duration  
490ms) [ OK ]
# 23 ns2 MPTCP -> ns3 (10.0.3.2:10020      ) MPTCP     (duration  
438ms) [ OK ]
# 24 ns2 MPTCP -> ns3 (dead:beef:3::2:10021) MPTCP     (duration  
498ms) [ OK ]
# 25 ns2 MPTCP -> ns4 (10.0.3.1:10022      ) MPTCP     (duration  
602ms) [ OK ]
# 26 ns2 MPTCP -> ns4 (dead:beef:3::1:10023) MPTCP     (duration  
559ms) [ OK ]
# 27 ns3 MPTCP -> ns1 (10.0.1.1:10024      ) MPTCP     (duration  
580ms) [ OK ]
# 28 ns3 MPTCP -> ns1 (dead:beef:1::1:10025) MPTCP     (duration  
603ms) [ OK ]
# 29 ns3 MPTCP -> ns2 (10.0.1.2:10026      ) MPTCP     (duration  
628ms) [ OK ]
# 30 ns3 MPTCP -> ns2 (dead:beef:1::2:10027) MPTCP     (duration  
451ms) [ OK ]
# 31 ns3 MPTCP -> ns2 (10.0.2.1:10028      ) MPTCP     (duration  
416ms) [ OK ]
# 32 ns3 MPTCP -> ns2 (dead:beef:2::1:10029) MPTCP     (duration  
497ms) [ OK ]
# 33 ns3 MPTCP -> ns4 (10.0.3.1:10030      ) MPTCP     (duration  
159ms) [ OK ]
# 34 ns3 MPTCP -> ns4 (dead:beef:3::1:10031) MPTCP     (duration  
156ms) [ OK ]
# 35 ns4 MPTCP -> ns1 (10.0.1.1:10032      ) MPTCP     (duration  
574ms) [ OK ]
# 36 ns4 MPTCP -> ns1 (dead:beef:1::1:10033) MPTCP     (duration  
863ms) [ OK ]
# 37 ns4 MPTCP -> ns2 (10.0.1.2:10034      ) MPTCP     (duration  
471ms) [ OK ]
# 38 ns4 MPTCP -> ns2 (dead:beef:1::2:10035) MPTCP     (duration  
538ms) [ OK ]
# 39 ns4 MPTCP -> ns2 (10.0.2.1:10036      ) MPTCP     (duration  
520ms) [ OK ]
# 40 ns4 MPTCP -> ns2 (dead:beef:2::1:10037) MPTCP     (duration  
511ms) [ OK ]
# 41 ns4 MPTCP -> ns3 (10.0.2.2:10038      ) MPTCP     (duration  
137ms) [ OK ]
# 42 ns4 MPTCP -> ns3 (dead:beef:2::2:10039) MPTCP     (duration  
155ms) [ OK ]
# 43 ns4 MPTCP -> ns3 (10.0.3.2:10040      ) MPTCP     (duration  
563ms) [ OK ]
# 44 ns4 MPTCP -> ns3 (dead:beef:3::2:10041) MPTCP     (duration  
152ms) [ OK ]
# INFO: with peek mode: saveWithPeek
# 45 ns1 MPTCP -> ns1 (10.0.1.1:10042      ) MPTCP     (duration  
150ms) [ OK ]
# 46 ns1 MPTCP -> ns1 (10.0.1.1:10043      ) TCP       (duration  
184ms) [ OK ]
# 47 ns1 TCP   -> ns1 (10.0.1.1:10044      ) MPTCP     (duration  
153ms) [ OK ]
# 48 ns1 MPTCP -> ns1 (dead:beef:1::1:10045) MPTCP     (duration  
154ms) [ OK ]
# 49 ns1 MPTCP -> ns1 (dead:beef:1::1:10046) TCP       (duration  
148ms) [ OK ]
# 50 ns1 TCP   -> ns1 (dead:beef:1::1:10047) MPTCP     (duration  
175ms) [ OK ]
# INFO: with peek mode: saveAfterPeek
# 51 ns1 MPTCP -> ns1 (10.0.1.1:10048      ) MPTCP     (duration  
175ms) [ OK ]
# 52 ns1 MPTCP -> ns1 (10.0.1.1:10049      ) TCP       (duration  
155ms) [ OK ]
# 53 ns1 TCP   -> ns1 (10.0.1.1:10050      ) MPTCP     (duration  
146ms) [ OK ]
# 54 ns1 MPTCP -> ns1 (dead:beef:1::1:10051) MPTCP     (duration  
153ms) [ OK ]
# 55 ns1 MPTCP -> ns1 (dead:beef:1::1:10052) TCP       (duration  
153ms) [ OK ]
# 56 ns1 TCP   -> ns1 (dead:beef:1::1:10053) MPTCP     (duration  
151ms) [ OK ]
# INFO: with MPTFO start
# 57 ns2 MPTCP -> ns1 (10.0.1.1:10054      ) MPTCP     (duration
60989ms) [FAIL] client exit code 0, server 124
# 
# netns ns1-RqXF2p (listener) socket stat for 10054:
# Failed to find cgroup2 mount
# Failed to find cgroup2 mount
# Failed to find cgroup2 mount
# Netid State    Recv-Q Send-Q Local Address:Port  Peer Address:Port  
# tcp   ESTAB    0      0           10.0.1.1:10054     10.0.1.2:55516
ino:2064372 sk:1 cgroup:unreachable:1 <->
# 	 skmem:(r0,rb131072,t0,tb340992,f0,w0,o0,bl0,d0) sack cubic
wscale:8,8 rto:206 rtt:5.026/10.034 ato:40 mss:1460 pmtu:1500
rcvmss:1436 advmss:1460 cwnd:10 bytes_sent:115312 bytes_retrans:1560
bytes_acked:113752 bytes_received:5136 segs_out:85 segs_in:16
data_segs_out:83 data_segs_in:4 send 23239156bps lastsnd:60939
lastrcv:61035 lastack:60912 pacing_rate 343879640bps delivery_rate
1994680bps delivered:84 busy:123ms sndbuf_limited:41ms(33.3%)
retrans:0/2 dsack_dups:2 rcv_space:14600 rcv_ssthresh:75432
minrtt:0.003 rcv_wnd:75520 tcp-ulp-mptcp flags:Mec
token:0000(id:0)/32ed0950(id:0) seq:2946228641406205031 sfseq:1
ssnoff:1349223625 maplen:5136
# mptcp LAST-ACK 0      0           10.0.1.1:10054     10.0.1.2:55516
timer:(keepalive,59sec,0) ino:0 sk:2 cgroup:unreachable:1 ---
# 	 skmem:(r0,rb131072,t0,tb345088,f4088,w352264,o0,bl0,d0)
subflows_max:2 remote_key token:32ed0950 write_seq:6317574787800720824
snd_una:6317574787800376423 rcv_nxt:2946228641406210168
bytes_sent:113752 bytes_received:5136 bytes_acked:113752
subflows_total:1 last_data_sent:60954 last_data_recv:61036
last_ack_recv:60913                                                   
# TcpPassiveOpens                 1                  0.0
# TcpInSegs                       13                 0.0
# TcpOutSegs                      84                 0.0
# TcpRetransSegs                  2                  0.0
# TcpExtTCPPureAcks               11                 0.0
# TcpExtTCPLossProbes             3                  0.0
# TcpExtTCPDSACKRecv              2                  0.0
# TcpExtTCPDSACKIgnoredNoUndo     2                  0.0
# TcpExtTCPFastOpenCookieReqd     1                  0.0
# TcpExtTCPOrigDataSent           81                 0.0
# TcpExtTCPDelivered              83                 0.0
# TcpExtTCPDSACKRecvSegs          2                  0.0
# MPTcpExtMPCapableSYNRX          1                  0.0
# MPTcpExtMPCapableACKRX          1                  0.0
# 
# netns ns2-xZI1rh (connector) socket stat for 10054:
# Failed to find cgroup2 mount
# Failed to find cgroup2 mount
# Failed to find cgroup2 mount
# Netid State      Recv-Q Send-Q Local Address:Port  Peer Address:Port
# tcp   ESTAB      0      0           10.0.1.2:55516     10.0.1.1:10054
ino:2065678 sk:3 cgroup:unreachable:1 <->
# 	 skmem:(r0,rb131072,t0,tb46080,f12288,w0,o0,bl0,d2) sack cubic
wscale:8,8 rto:201 rtt:0.029/0.016 ato:80 mss:1460 pmtu:1500
rcvmss:1432 advmss:1460 cwnd:10 bytes_sent:5136 bytes_acked:5137
bytes_received:113752 segs_out:16 segs_in:86 data_segs_out:4
data_segs_in:83 send 4027586207bps lastsnd:61068 lastrcv:60986
lastack:60972 pacing_rate 7852100840bps delivery_rate 6674285712bps
delivered:5 rcv_rtt:0.043 rcv_space:14600 rcv_ssthresh:114691
minrtt:0.007 snd_wnd:75520 tcp-ulp-mptcp flags:Mmec
token:0000(id:0)/73d713b3(id:0) seq:6317574787800368999 sfseq:106329
ssnoff:821551077 maplen:7424
# mptcp FIN-WAIT-2 124504 0           10.0.1.2:55516     10.0.1.1:10054
timer:(keepalive,,0) ino:0 sk:4 cgroup:unreachable:1 ---
# 	 skmem:(r124504,rb131072,t0,tb50176,f6568,w0,o0,bl0,d0)
subflows_max:2 remote_key token:73d713b3 write_seq:2946228641406210168
snd_una:2946228641406210168 rcv_nxt:6317574787800376423 bytes_sent:5136
bytes_received:113752 bytes_acked:5137 subflows_total:1
last_data_sent:61068 last_data_recv:60986 last_ack_recv:60972         
# TcpActiveOpens                  1                  0.0
# TcpInSegs                       17                 0.0
# TcpOutSegs                      16                 0.0
# TcpExtDelayedACKs               3                  0.0
# TcpExtDelayedACKLost            2                  0.0
# TcpExtTCPPureAcks               2                  0.0
# TcpExtTCPDSACKOldSent           2                  0.0
# TcpExtTCPToZeroWindowAdv        1                  0.0
# TcpExtTCPOrigDataSent           4                  0.0
# TcpExtTCPDelivered              5                  0.0
# MPTcpExtMPCapableSYNTX          1                  0.0
# MPTcpExtMPCapableSYNACKRX       1                  0.0
# 
# 58 ns2 MPTCP -> ns1 (10.0.1.1:10055      ) MPTCP     (duration
60992ms) [FAIL] client exit code 0, server 124
# 
# netns ns1-RqXF2p (listener) socket stat for 10055:
# Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port
# TcpPassiveOpens                 1                  0.0
# TcpEstabResets                  2                  0.0
# TcpInSegs                       28                 0.0
# TcpOutSegs                      142                0.0
# TcpRetransSegs                  22                 0.0
# TcpExtTCPPureAcks               23                 0.0
# TcpExtTCPLostRetransmit         8                  0.0
# TcpExtTCPSlowStartRetrans       13                 0.0
# TcpExtTCPTimeouts               1                  0.0
# TcpExtTCPLossProbes             1                  0.0
# TcpExtTCPBacklogCoalesce        1                  0.0
# TcpExtTCPFastOpenPassive        1                  0.0
# TcpExtTCPOrigDataSent           138                0.0
# TcpExtTCPDelivered              83                 0.0
# TcpExtTcpTimeoutRehash          1                  0.0
# MPTcpExtMPCapableSYNRX          1                  0.0
# MPTcpExtMPCapableACKRX          1                  0.0
# MPTcpExtMPFastcloseRx           2                  0.0
# MPTcpExtMPRstRx                 2                  0.0
# MPTcpExtSndWndShared            5                  0.0
# 
# netns ns2-xZI1rh (connector) socket stat for 10055:
# Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port
# TcpActiveOpens                  1                  0.0
# TcpEstabResets                  2                  0.0
# TcpInSegs                       32                 0.0
# TcpOutSegs                      30                 0.0
# TcpOutRsts                      2                  0.0
# TcpExtBeyondWindow              4                  0.0
# TcpExtDelayedACKs               2                  0.0
# TcpExtTCPPureAcks               3                  0.0
# TcpExtTCPFastOpenActive         1                  0.0
# TcpExtTCPToZeroWindowAdv        1                  0.0
# TcpExtTCPOrigDataSent           4                  0.0
# TcpExtTCPDelivered              5                  0.0
# TcpExtTCPZeroWindowDrop         10                 0.0
# MPTcpExtMPCapableSYNTX          1                  0.0
# MPTcpExtMPCapableSYNACKRX       1                  0.0
# MPTcpExtMPFastcloseTx           2                  0.0
# MPTcpExtMPRstTx                 2                  0.0
# 
# 59 ns2 MPTCP -> ns1 (dead:beef:1::1:10056) MPTCP     (duration
60983ms) [FAIL] client exit code 0, server 124
# 
# netns ns1-RqXF2p (listener) socket stat for 10056:
# Failed to find cgroup2 mount
# Failed to find cgroup2 mount
# Failed to find cgroup2 mount
# Netid State    Recv-Q Send-Q    Local Address:Port      Peer
Address:Port                                                          
# tcp   ESTAB    0      0      [dead:beef:1::1]:10056
[dead:beef:1::2]:51008 ino:2066517 sk:5 cgroup:unreachable:1 <->
# 	 skmem:(r0,rb131072,t0,tb354816,f0,w0,o0,bl0,d0) sack cubic
wscale:8,8 rto:206 rtt:5.142/10.26 ato:40 mss:1440 pmtu:1500
rcvmss:1416 advmss:1440 cwnd:10 bytes_sent:116192 bytes_retrans:1860
bytes_acked:114332 bytes_received:5136 segs_out:88 segs_in:16
data_segs_out:86 data_segs_in:4 send 22403734bps lastsnd:60928
lastrcv:61025 lastack:60901 pacing_rate 345009112bps delivery_rate
1967640bps delivered:87 busy:123ms sndbuf_limited:41ms(33.3%)
retrans:0/2 dsack_dups:2 rcv_space:14400 rcv_ssthresh:74532
minrtt:0.003 rcv_wnd:74752 tcp-ulp-mptcp flags:Mec
token:0000(id:0)/dfc0f4f3(id:0) seq:4063451370598395855 sfseq:1
ssnoff:3788096358 maplen:5136
# mptcp LAST-ACK 0      0      [dead:beef:1::1]:10056
[dead:beef:1::2]:51008 timer:(keepalive,59sec,0) ino:0 sk:6
cgroup:unreachable:1 ---
# 	 skmem:(r0,rb131072,t0,tb358912,f316,w351940,o0,bl0,d0)
subflows_max:2 remote_key token:dfc0f4f3 write_seq:2127521061748173342
snd_una:2127521061747829521 rcv_nxt:4063451370598400992
bytes_sent:114332 bytes_received:5136 bytes_acked:114332
subflows_total:1 last_data_sent:60942 last_data_recv:61025
last_ack_recv:60901                                                   
# TcpPassiveOpens                 1                  0.0
# TcpInSegs                       13                 0.0
# TcpOutSegs                      87                 0.0
# TcpRetransSegs                  2                  0.0
# TcpExtTCPPureAcks               11                 0.0
# TcpExtTCPLossProbes             3                  0.0
# TcpExtTCPDSACKRecv              2                  0.0
# TcpExtTCPDSACKIgnoredNoUndo     2                  0.0
# TcpExtTCPFastOpenCookieReqd     1                  0.0
# TcpExtTCPOrigDataSent           84                 0.0
# TcpExtTCPDelivered              86                 0.0
# TcpExtTCPDSACKRecvSegs          2                  0.0
# MPTcpExtMPCapableSYNRX          1                  0.0
# MPTcpExtMPCapableACKRX          1                  0.0
# 
# netns ns2-xZI1rh (connector) socket stat for 10056:
# Failed to find cgroup2 mount
# Failed to find cgroup2 mount
# Failed to find cgroup2 mount
# Netid State      Recv-Q Send-Q    Local Address:Port      Peer
Address:Port                                                          
# tcp   ESTAB      0      0      [dead:beef:1::2]:51008
[dead:beef:1::1]:10056 ino:2065857 sk:7 cgroup:unreachable:1 <->
# 	 skmem:(r0,rb131072,t0,tb46080,f12288,w0,o0,bl0,d2) sack cubic
wscale:8,8 rto:201 rtt:0.032/0.018 ato:80 mss:1440 pmtu:1500
rcvmss:1412 advmss:1440 cwnd:10 bytes_sent:5136 bytes_acked:5137
bytes_received:114332 segs_out:16 segs_in:89 data_segs_out:4
data_segs_in:86 send 3600000000bps lastsnd:61060 lastrcv:60977
lastack:60963 pacing_rate 7116602312bps delivery_rate 6582857136bps
delivered:5 rcv_rtt:0.051 rcv_space:14400 rcv_ssthresh:115128
minrtt:0.007 snd_wnd:74752 tcp-ulp-mptcp flags:Mmec
token:0000(id:0)/45f63d89(id:0) seq:2127521061747821841 sfseq:106653
ssnoff:320893875 maplen:7680
# mptcp FIN-WAIT-2 124188 0      [dead:beef:1::2]:51008
[dead:beef:1::1]:10056 timer:(keepalive,,0) ino:0 sk:8
cgroup:unreachable:1 ---
# 	 skmem:(r124188,rb131072,t0,tb50176,f6884,w0,o0,bl0,d0)
subflows_max:2 remote_key token:45f63d89 write_seq:4063451370598400992
snd_una:4063451370598400992 rcv_nxt:2127521061747829521 bytes_sent:5136
bytes_received:114332 bytes_acked:5137 subflows_total:1
last_data_sent:61060 last_data_recv:60977 last_ack_recv:60963         
# TcpActiveOpens                  1                  0.0
# TcpInSegs                       17                 0.0
# TcpOutSegs                      16                 0.0
# TcpExtDelayedACKs               3                  0.0
# TcpExtDelayedACKLost            2                  0.0
# TcpExtTCPPureAcks               2                  0.0
# TcpExtTCPDSACKOldSent           2                  0.0
# TcpExtTCPToZeroWindowAdv        1                  0.0
# TcpExtTCPOrigDataSent           4                  0.0
# TcpExtTCPDelivered              5                  0.0
# MPTcpExtMPCapableSYNTX          1                  0.0
# MPTcpExtMPCapableSYNACKRX       1                  0.0
# 
# 60 ns2 MPTCP -> ns1 (dead:beef:1::1:10057) MPTCP     (duration
60988ms) [FAIL] client exit code 0, server 124
# 
# netns ns1-RqXF2p (listener) socket stat for 10057:
# Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port
# TcpPassiveOpens                 1                  0.0
# TcpEstabResets                  2                  0.0
# TcpInSegs                       29                 0.0
# TcpOutSegs                      144                0.0
# TcpRetransSegs                  22                 0.0
# TcpExtTCPPureAcks               23                 0.0
# TcpExtTCPLostRetransmit         8                  0.0
# TcpExtTCPSlowStartRetrans       13                 0.0
# TcpExtTCPTimeouts               1                  0.0
# TcpExtTCPLossProbes             1                  0.0
# TcpExtTCPBacklogCoalesce        2                  0.0
# TcpExtTCPFastOpenPassive        1                  0.0
# TcpExtTCPOrigDataSent           140                0.0
# TcpExtTCPDelivered              84                 0.0
# TcpExtTcpTimeoutRehash          1                  0.0
# MPTcpExtMPCapableSYNRX          1                  0.0
# MPTcpExtMPCapableACKRX          1                  0.0
# MPTcpExtMPFastcloseRx           2                  0.0
# MPTcpExtMPRstRx                 2                  0.0
# MPTcpExtSndWndShared            5                  0.0
# 
# netns ns2-xZI1rh (connector) socket stat for 10057:
# Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port
# TcpActiveOpens                  1                  0.0
# TcpEstabResets                  2                  0.0
# TcpInSegs                       32                 0.0
# TcpOutSegs                      31                 0.0
# TcpOutRsts                      2                  0.0
# TcpExtBeyondWindow              4                  0.0
# TcpExtDelayedACKs               3                  0.0
# TcpExtTCPPureAcks               3                  0.0
# TcpExtTCPFastOpenActive         1                  0.0
# TcpExtTCPToZeroWindowAdv        1                  0.0
# TcpExtTCPOrigDataSent           4                  0.0
# TcpExtTCPDelivered              5                  0.0
# TcpExtTCPZeroWindowDrop         10                 0.0
# MPTcpExtMPCapableSYNTX          1                  0.0
# MPTcpExtMPCapableSYNACKRX       1                  0.0
# MPTcpExtMPFastcloseTx           2                  0.0
# MPTcpExtMPRstTx                 2                  0.0
# 
# INFO: with MPTFO end
# [FAIL] Tests with MPTFO have failed
# INFO: test tproxy ipv4
# 61 ns1 MPTCP -> ns2 (10.0.3.1:20000      ) MPTCP     (duration  
161ms) [ OK ]
# INFO: tproxy ipv4 pass
# INFO: test tproxy ipv6
# 62 ns1 MPTCP -> ns2 (dead:beef:3::1:20000) MPTCP     (duration  
163ms) [ OK ]
# INFO: tproxy ipv6 pass
# INFO: disconnect
# 63 ns1 MPTCP -> ns1 (10.0.1.1:20001      ) MPTCP     (duration   
54ms) [ OK ]
# 64 ns1 MPTCP -> ns1 (10.0.1.1:20002      ) TCP       (duration   
56ms) [ OK ]
# 65 ns1 TCP   -> ns1 (10.0.1.1:20003      ) MPTCP     (duration   
59ms) [ OK ]
# 66 ns1 MPTCP -> ns1 (dead:beef:1::1:20004) MPTCP     (duration   
60ms) [ OK ]
# 67 ns1 MPTCP -> ns1 (dead:beef:1::1:20005) TCP       (duration   
56ms) [ OK ]
# 68 ns1 TCP   -> ns1 (dead:beef:1::1:20006) MPTCP     (duration   
55ms) [ OK ]
# Time: 288 seconds
not ok 1 test: selftest_mptcp_connect_splice # FAIL
# time=288

=== ERROR after 158 attempts (Wed, 08 Oct 2025 02:40:34 +0000) ===

	Stopped after 158 attempts


I'm not sure if this error indicates a bug in patches 9-10, or if
there's an issue with the implementation of mptcp_recv_skb(). I'm still
unsure how to resolve it. Could you please give me some suggestions?


But patches 1-8 look good to me indeed:

Reviewed-by: Geliang Tang <geliang@kernel.org>

I'm wondering if we can merge patches 1-8 into the export branch first.
I changed the statues of them as "Queued" on patchwork.


Besides, I have one minor comment on patch 9, which I'll reply directly
on patch 9.


Thanks,
-Geliang

[1]
https://patchwork.kernel.org/project/mptcp/patch/2f159972f4aac7002a46ebc03b9d3898ece4c081.1758975929.git.tanggeliang@kylinos.cn/

> 
> Cheers,
> Matt
Re: [PATCH v5 mptcp-next 00/10] mptcp: introduce backlog processing
Posted by Paolo Abeni 2 days, 18 hours ago
On 10/8/25 5:07 AM, Geliang Tang wrote:
> On Mon, 2025-10-06 at 19:07 +0200, Matthieu Baerts wrote:
>> Hi Paolo,
>>
>> On 06/10/2025 10:11, Paolo Abeni wrote:
>>> This series includes RX path improvement built around backlog
>>> processing
>> Thank you for the new version! This is not a review, but just a note
>> to
>> tell you patchew didn't manage to apply the patches due to the same
>> conflict that was already there with the v4 (mptcp_init_skb()
>> parameters
>> have been moved to the previous line). I just applied the patches
>> manually. While at it, I also used this test branch for syzkaller to
>> validate them.
>>
>> (Also, on patch "mptcp: drop the __mptcp_data_ready() helper", git
>> complained that there is a trailing whitespace.)
> 
> Sorry, patches 9-10 break my "implement mptcp read_sock" v12 series. I
> rebased this series on patches 1-8, it works well. But after applying
> patches 9-10, I changed mptcp_recv_skb() in [1] from

Thanks for the feedback, the applied delta looks good to me.

> # INFO: with MPTFO start
> # 57 ns2 MPTCP -> ns1 (10.0.1.1:10054      ) MPTCP     (duration
> 60989ms) [FAIL] client exit code 0, server 124
> # 
> # netns ns1-RqXF2p (listener) socket stat for 10054:
> # Failed to find cgroup2 mount
> # Failed to find cgroup2 mount
> # Failed to find cgroup2 mount
> # Netid State    Recv-Q Send-Q Local Address:Port  Peer Address:Port  
> # tcp   ESTAB    0      0           10.0.1.1:10054     10.0.1.2:55516
> ino:2064372 sk:1 cgroup:unreachable:1 <->
> # 	 skmem:(r0,rb131072,t0,tb340992,f0,w0,o0,bl0,d0) sack cubic
> wscale:8,8 rto:206 rtt:5.026/10.034 ato:40 mss:1460 pmtu:1500
> rcvmss:1436 advmss:1460 cwnd:10 bytes_sent:115312 bytes_retrans:1560
> bytes_acked:113752 bytes_received:5136 segs_out:85 segs_in:16
> data_segs_out:83 data_segs_in:4 send 23239156bps lastsnd:60939
> lastrcv:61035 lastack:60912 pacing_rate 343879640bps delivery_rate
> 1994680bps delivered:84 busy:123ms sndbuf_limited:41ms(33.3%)
> retrans:0/2 dsack_dups:2 rcv_space:14600 rcv_ssthresh:75432
> minrtt:0.003 rcv_wnd:75520 tcp-ulp-mptcp flags:Mec
> token:0000(id:0)/32ed0950(id:0) seq:2946228641406205031 sfseq:1
> ssnoff:1349223625 maplen:5136
> # mptcp LAST-ACK 0      0           10.0.1.1:10054     10.0.1.2:55516
> timer:(keepalive,59sec,0) ino:0 sk:2 cgroup:unreachable:1 ---
> # 	 skmem:(r0,rb131072,t0,tb345088,f4088,w352264,o0,bl0,d0)
> subflows_max:2 remote_key token:32ed0950 write_seq:6317574787800720824
> snd_una:6317574787800376423 rcv_nxt:2946228641406210168
> bytes_sent:113752 bytes_received:5136 bytes_acked:113752
> subflows_total:1 last_data_sent:60954 last_data_recv:61036
> last_ack_recv:60913                                       

bytes_sent == bytes_sent, possibly we are missing a window-open event,
which in turn should be triggered by a mptcp_cleanp_rbuf(), which AFAICS
are correctly invoked in the splice code. TL;DR: I can't find anything
obviously wrong :-P

Also the default rx buf size is suspect.

Can you reproduce the issue while capturing the traffic with tcpdump? if
so, could you please share the capture?

Are TFO cases the only one failing?

Thanks,

Paolo
Re: [PATCH v5 mptcp-next 00/10] mptcp: introduce backlog processing
Posted by Geliang Tang 1 day, 19 hours ago
Hi Paolo,

On Wed, 2025-10-08 at 09:30 +0200, Paolo Abeni wrote:
> On 10/8/25 5:07 AM, Geliang Tang wrote:
> > On Mon, 2025-10-06 at 19:07 +0200, Matthieu Baerts wrote:
> > > Hi Paolo,
> > > 
> > > On 06/10/2025 10:11, Paolo Abeni wrote:
> > > > This series includes RX path improvement built around backlog
> > > > processing
> > > Thank you for the new version! This is not a review, but just a
> > > note
> > > to
> > > tell you patchew didn't manage to apply the patches due to the
> > > same
> > > conflict that was already there with the v4 (mptcp_init_skb()
> > > parameters
> > > have been moved to the previous line). I just applied the patches
> > > manually. While at it, I also used this test branch for syzkaller
> > > to
> > > validate them.
> > > 
> > > (Also, on patch "mptcp: drop the __mptcp_data_ready() helper",
> > > git
> > > complained that there is a trailing whitespace.)
> > 
> > Sorry, patches 9-10 break my "implement mptcp read_sock" v12
> > series. I
> > rebased this series on patches 1-8, it works well. But after
> > applying
> > patches 9-10, I changed mptcp_recv_skb() in [1] from
> 
> Thanks for the feedback, the applied delta looks good to me.
> 
> > # INFO: with MPTFO start
> > # 57 ns2 MPTCP -> ns1 (10.0.1.1:10054      ) MPTCP     (duration
> > 60989ms) [FAIL] client exit code 0, server 124
> > # 
> > # netns ns1-RqXF2p (listener) socket stat for 10054:
> > # Failed to find cgroup2 mount
> > # Failed to find cgroup2 mount
> > # Failed to find cgroup2 mount
> > # Netid State    Recv-Q Send-Q Local Address:Port  Peer
> > Address:Port  
> > # tcp   ESTAB    0      0           10.0.1.1:10054    
> > 10.0.1.2:55516
> > ino:2064372 sk:1 cgroup:unreachable:1 <->
> > # 	 skmem:(r0,rb131072,t0,tb340992,f0,w0,o0,bl0,d0) sack
> > cubic
> > wscale:8,8 rto:206 rtt:5.026/10.034 ato:40 mss:1460 pmtu:1500
> > rcvmss:1436 advmss:1460 cwnd:10 bytes_sent:115312
> > bytes_retrans:1560
> > bytes_acked:113752 bytes_received:5136 segs_out:85 segs_in:16
> > data_segs_out:83 data_segs_in:4 send 23239156bps lastsnd:60939
> > lastrcv:61035 lastack:60912 pacing_rate 343879640bps delivery_rate
> > 1994680bps delivered:84 busy:123ms sndbuf_limited:41ms(33.3%)
> > retrans:0/2 dsack_dups:2 rcv_space:14600 rcv_ssthresh:75432
> > minrtt:0.003 rcv_wnd:75520 tcp-ulp-mptcp flags:Mec
> > token:0000(id:0)/32ed0950(id:0) seq:2946228641406205031 sfseq:1
> > ssnoff:1349223625 maplen:5136
> > # mptcp LAST-ACK 0      0           10.0.1.1:10054    
> > 10.0.1.2:55516
> > timer:(keepalive,59sec,0) ino:0 sk:2 cgroup:unreachable:1 ---
> > # 	 skmem:(r0,rb131072,t0,tb345088,f4088,w352264,o0,bl0,d0)
> > subflows_max:2 remote_key token:32ed0950
> > write_seq:6317574787800720824
> > snd_una:6317574787800376423 rcv_nxt:2946228641406210168
> > bytes_sent:113752 bytes_received:5136 bytes_acked:113752
> > subflows_total:1 last_data_sent:60954 last_data_recv:61036
> > last_ack_recv:60913                                       
> 
> bytes_sent == bytes_sent, possibly we are missing a window-open
> event,
> which in turn should be triggered by a mptcp_cleanp_rbuf(), which
> AFAICS
> are correctly invoked in the splice code. TL;DR: I can't find
> anything
> obviously wrong :-P
> 
> Also the default rx buf size is suspect.
> 
> Can you reproduce the issue while capturing the traffic with tcpdump?
> if
> so, could you please share the capture?

Thank you for your suggestion. I've attached several tcpdump logs from
when the tests failed.

> 
> Are TFO cases the only one failing?

Not all failures occurred in TFO cases.

Thanks,
-Geliang

> 
> Thanks,
> 
> Paolo
> 

Re: [PATCH v5 mptcp-next 00/10] mptcp: introduce backlog processing
Posted by Paolo Abeni 1 day, 18 hours ago
On 10/9/25 8:54 AM, Geliang Tang wrote:
> On Wed, 2025-10-08 at 09:30 +0200, Paolo Abeni wrote:
>> On 10/8/25 5:07 AM, Geliang Tang wrote:
>>> On Mon, 2025-10-06 at 19:07 +0200, Matthieu Baerts wrote:
>>>> Hi Paolo,
>>>>
>>>> On 06/10/2025 10:11, Paolo Abeni wrote:
>>>>> This series includes RX path improvement built around backlog
>>>>> processing
>>>> Thank you for the new version! This is not a review, but just a
>>>> note
>>>> to
>>>> tell you patchew didn't manage to apply the patches due to the
>>>> same
>>>> conflict that was already there with the v4 (mptcp_init_skb()
>>>> parameters
>>>> have been moved to the previous line). I just applied the patches
>>>> manually. While at it, I also used this test branch for syzkaller
>>>> to
>>>> validate them.
>>>>
>>>> (Also, on patch "mptcp: drop the __mptcp_data_ready() helper",
>>>> git
>>>> complained that there is a trailing whitespace.)
>>>
>>> Sorry, patches 9-10 break my "implement mptcp read_sock" v12
>>> series. I
>>> rebased this series on patches 1-8, it works well. But after
>>> applying
>>> patches 9-10, I changed mptcp_recv_skb() in [1] from
>>
>> Thanks for the feedback, the applied delta looks good to me.
>>
>>> # INFO: with MPTFO start
>>> # 57 ns2 MPTCP -> ns1 (10.0.1.1:10054      ) MPTCP     (duration
>>> 60989ms) [FAIL] client exit code 0, server 124
>>> # 
>>> # netns ns1-RqXF2p (listener) socket stat for 10054:
>>> # Failed to find cgroup2 mount
>>> # Failed to find cgroup2 mount
>>> # Failed to find cgroup2 mount
>>> # Netid State    Recv-Q Send-Q Local Address:Port  Peer
>>> Address:Port  
>>> # tcp   ESTAB    0      0           10.0.1.1:10054    
>>> 10.0.1.2:55516
>>> ino:2064372 sk:1 cgroup:unreachable:1 <->
>>> # 	 skmem:(r0,rb131072,t0,tb340992,f0,w0,o0,bl0,d0) sack
>>> cubic
>>> wscale:8,8 rto:206 rtt:5.026/10.034 ato:40 mss:1460 pmtu:1500
>>> rcvmss:1436 advmss:1460 cwnd:10 bytes_sent:115312
>>> bytes_retrans:1560
>>> bytes_acked:113752 bytes_received:5136 segs_out:85 segs_in:16
>>> data_segs_out:83 data_segs_in:4 send 23239156bps lastsnd:60939
>>> lastrcv:61035 lastack:60912 pacing_rate 343879640bps delivery_rate
>>> 1994680bps delivered:84 busy:123ms sndbuf_limited:41ms(33.3%)
>>> retrans:0/2 dsack_dups:2 rcv_space:14600 rcv_ssthresh:75432
>>> minrtt:0.003 rcv_wnd:75520 tcp-ulp-mptcp flags:Mec
>>> token:0000(id:0)/32ed0950(id:0) seq:2946228641406205031 sfseq:1
>>> ssnoff:1349223625 maplen:5136
>>> # mptcp LAST-ACK 0      0           10.0.1.1:10054    
>>> 10.0.1.2:55516
>>> timer:(keepalive,59sec,0) ino:0 sk:2 cgroup:unreachable:1 ---
>>> # 	 skmem:(r0,rb131072,t0,tb345088,f4088,w352264,o0,bl0,d0)
>>> subflows_max:2 remote_key token:32ed0950
>>> write_seq:6317574787800720824
>>> snd_una:6317574787800376423 rcv_nxt:2946228641406210168
>>> bytes_sent:113752 bytes_received:5136 bytes_acked:113752
>>> subflows_total:1 last_data_sent:60954 last_data_recv:61036
>>> last_ack_recv:60913                                       
>>
>> bytes_sent == bytes_sent, possibly we are missing a window-open
>> event,
>> which in turn should be triggered by a mptcp_cleanp_rbuf(), which
>> AFAICS
>> are correctly invoked in the splice code. TL;DR: I can't find
>> anything
>> obviously wrong :-P
>>
>> Also the default rx buf size is suspect.
>>
>> Can you reproduce the issue while capturing the traffic with tcpdump?
>> if
>> so, could you please share the capture?
> 
> Thank you for your suggestion. I've attached several tcpdump logs from
> when the tests failed.

Oh wow! the receiver actually sends the window open notification
(packets 527 and 528 in the trace), but the sender does not react at all.

I have no idea/I haven't digged yet why the sender did not try a zero
window probe (it should!), but it looks like we have some old bug in
sender wakeup since MPTCP_DEQUEUE introduction (which is very
surprising, why we did not catch/observe this earlier ?!?). That could
explain also sporadic mptcp_join failures.

Could you please try the attached patch?

/P

p.s. AFAICS the backlog introduction should just increase the frequency
of an already possible event...
Re: [PATCH v5 mptcp-next 00/10] mptcp: introduce backlog processing
Posted by Geliang Tang 1 day, 17 hours ago
Hi Paolo,

On Thu, 2025-10-09 at 09:52 +0200, Paolo Abeni wrote:
> On 10/9/25 8:54 AM, Geliang Tang wrote:
> > On Wed, 2025-10-08 at 09:30 +0200, Paolo Abeni wrote:
> > > On 10/8/25 5:07 AM, Geliang Tang wrote:
> > > > On Mon, 2025-10-06 at 19:07 +0200, Matthieu Baerts wrote:
> > > > > Hi Paolo,
> > > > > 
> > > > > On 06/10/2025 10:11, Paolo Abeni wrote:
> > > > > > This series includes RX path improvement built around
> > > > > > backlog
> > > > > > processing
> > > > > Thank you for the new version! This is not a review, but just
> > > > > a
> > > > > note
> > > > > to
> > > > > tell you patchew didn't manage to apply the patches due to
> > > > > the
> > > > > same
> > > > > conflict that was already there with the v4 (mptcp_init_skb()
> > > > > parameters
> > > > > have been moved to the previous line). I just applied the
> > > > > patches
> > > > > manually. While at it, I also used this test branch for
> > > > > syzkaller
> > > > > to
> > > > > validate them.
> > > > > 
> > > > > (Also, on patch "mptcp: drop the __mptcp_data_ready()
> > > > > helper",
> > > > > git
> > > > > complained that there is a trailing whitespace.)
> > > > 
> > > > Sorry, patches 9-10 break my "implement mptcp read_sock" v12
> > > > series. I
> > > > rebased this series on patches 1-8, it works well. But after
> > > > applying
> > > > patches 9-10, I changed mptcp_recv_skb() in [1] from
> > > 
> > > Thanks for the feedback, the applied delta looks good to me.
> > > 
> > > > # INFO: with MPTFO start
> > > > # 57 ns2 MPTCP -> ns1 (10.0.1.1:10054      ) MPTCP    
> > > > (duration
> > > > 60989ms) [FAIL] client exit code 0, server 124
> > > > # 
> > > > # netns ns1-RqXF2p (listener) socket stat for 10054:
> > > > # Failed to find cgroup2 mount
> > > > # Failed to find cgroup2 mount
> > > > # Failed to find cgroup2 mount
> > > > # Netid State    Recv-Q Send-Q Local Address:Port  Peer
> > > > Address:Port  
> > > > # tcp   ESTAB    0      0           10.0.1.1:10054    
> > > > 10.0.1.2:55516
> > > > ino:2064372 sk:1 cgroup:unreachable:1 <->
> > > > # 	 skmem:(r0,rb131072,t0,tb340992,f0,w0,o0,bl0,d0) sack
> > > > cubic
> > > > wscale:8,8 rto:206 rtt:5.026/10.034 ato:40 mss:1460 pmtu:1500
> > > > rcvmss:1436 advmss:1460 cwnd:10 bytes_sent:115312
> > > > bytes_retrans:1560
> > > > bytes_acked:113752 bytes_received:5136 segs_out:85 segs_in:16
> > > > data_segs_out:83 data_segs_in:4 send 23239156bps lastsnd:60939
> > > > lastrcv:61035 lastack:60912 pacing_rate 343879640bps
> > > > delivery_rate
> > > > 1994680bps delivered:84 busy:123ms sndbuf_limited:41ms(33.3%)
> > > > retrans:0/2 dsack_dups:2 rcv_space:14600 rcv_ssthresh:75432
> > > > minrtt:0.003 rcv_wnd:75520 tcp-ulp-mptcp flags:Mec
> > > > token:0000(id:0)/32ed0950(id:0) seq:2946228641406205031 sfseq:1
> > > > ssnoff:1349223625 maplen:5136
> > > > # mptcp LAST-ACK 0      0           10.0.1.1:10054    
> > > > 10.0.1.2:55516
> > > > timer:(keepalive,59sec,0) ino:0 sk:2 cgroup:unreachable:1 ---
> > > > # 	
> > > > skmem:(r0,rb131072,t0,tb345088,f4088,w352264,o0,bl0,d0)
> > > > subflows_max:2 remote_key token:32ed0950
> > > > write_seq:6317574787800720824
> > > > snd_una:6317574787800376423 rcv_nxt:2946228641406210168
> > > > bytes_sent:113752 bytes_received:5136 bytes_acked:113752
> > > > subflows_total:1 last_data_sent:60954 last_data_recv:61036
> > > > last_ack_recv:60913                                       
> > > 
> > > bytes_sent == bytes_sent, possibly we are missing a window-open
> > > event,
> > > which in turn should be triggered by a mptcp_cleanp_rbuf(), which
> > > AFAICS
> > > are correctly invoked in the splice code. TL;DR: I can't find
> > > anything
> > > obviously wrong :-P
> > > 
> > > Also the default rx buf size is suspect.
> > > 
> > > Can you reproduce the issue while capturing the traffic with
> > > tcpdump?
> > > if
> > > so, could you please share the capture?
> > 
> > Thank you for your suggestion. I've attached several tcpdump logs
> > from
> > when the tests failed.
> 
> Oh wow! the receiver actually sends the window open notification
> (packets 527 and 528 in the trace), but the sender does not react at
> all.
> 
> I have no idea/I haven't digged yet why the sender did not try a zero
> window probe (it should!), but it looks like we have some old bug in
> sender wakeup since MPTCP_DEQUEUE introduction (which is very
> surprising, why we did not catch/observe this earlier ?!?). That
> could
> explain also sporadic mptcp_join failures.
> 
> Could you please try the attached patch?

Thank you very much. I just tested this patch, but it doesn't work. The
splice test still fails and reports the same error.

-Geliang

> 
> /P
> 
> p.s. AFAICS the backlog introduction should just increase the
> frequency
> of an already possible event...

Re: [PATCH v5 mptcp-next 00/10] mptcp: introduce backlog processing
Posted by Paolo Abeni 1 day, 15 hours ago
On 10/9/25 11:02 AM, Geliang Tang wrote:
> On Thu, 2025-10-09 at 09:52 +0200, Paolo Abeni wrote:
>> On 10/9/25 8:54 AM, Geliang Tang wrote:
>>> On Wed, 2025-10-08 at 09:30 +0200, Paolo Abeni wrote:
>>>> On 10/8/25 5:07 AM, Geliang Tang wrote:
>>>>> On Mon, 2025-10-06 at 19:07 +0200, Matthieu Baerts wrote:
>>>>>> Hi Paolo,
>>>>>>
>>>>>> On 06/10/2025 10:11, Paolo Abeni wrote:
>>>>>>> This series includes RX path improvement built around
>>>>>>> backlog
>>>>>>> processing
>>>>>> Thank you for the new version! This is not a review, but just
>>>>>> a
>>>>>> note
>>>>>> to
>>>>>> tell you patchew didn't manage to apply the patches due to
>>>>>> the
>>>>>> same
>>>>>> conflict that was already there with the v4 (mptcp_init_skb()
>>>>>> parameters
>>>>>> have been moved to the previous line). I just applied the
>>>>>> patches
>>>>>> manually. While at it, I also used this test branch for
>>>>>> syzkaller
>>>>>> to
>>>>>> validate them.
>>>>>>
>>>>>> (Also, on patch "mptcp: drop the __mptcp_data_ready()
>>>>>> helper",
>>>>>> git
>>>>>> complained that there is a trailing whitespace.)
>>>>>
>>>>> Sorry, patches 9-10 break my "implement mptcp read_sock" v12
>>>>> series. I
>>>>> rebased this series on patches 1-8, it works well. But after
>>>>> applying
>>>>> patches 9-10, I changed mptcp_recv_skb() in [1] from
>>>>
>>>> Thanks for the feedback, the applied delta looks good to me.
>>>>
>>>>> # INFO: with MPTFO start
>>>>> # 57 ns2 MPTCP -> ns1 (10.0.1.1:10054      ) MPTCP    
>>>>> (duration
>>>>> 60989ms) [FAIL] client exit code 0, server 124
>>>>> # 
>>>>> # netns ns1-RqXF2p (listener) socket stat for 10054:
>>>>> # Failed to find cgroup2 mount
>>>>> # Failed to find cgroup2 mount
>>>>> # Failed to find cgroup2 mount
>>>>> # Netid State    Recv-Q Send-Q Local Address:Port  Peer
>>>>> Address:Port  
>>>>> # tcp   ESTAB    0      0           10.0.1.1:10054    
>>>>> 10.0.1.2:55516
>>>>> ino:2064372 sk:1 cgroup:unreachable:1 <->
>>>>> # 	 skmem:(r0,rb131072,t0,tb340992,f0,w0,o0,bl0,d0) sack
>>>>> cubic
>>>>> wscale:8,8 rto:206 rtt:5.026/10.034 ato:40 mss:1460 pmtu:1500
>>>>> rcvmss:1436 advmss:1460 cwnd:10 bytes_sent:115312
>>>>> bytes_retrans:1560
>>>>> bytes_acked:113752 bytes_received:5136 segs_out:85 segs_in:16
>>>>> data_segs_out:83 data_segs_in:4 send 23239156bps lastsnd:60939
>>>>> lastrcv:61035 lastack:60912 pacing_rate 343879640bps
>>>>> delivery_rate
>>>>> 1994680bps delivered:84 busy:123ms sndbuf_limited:41ms(33.3%)
>>>>> retrans:0/2 dsack_dups:2 rcv_space:14600 rcv_ssthresh:75432
>>>>> minrtt:0.003 rcv_wnd:75520 tcp-ulp-mptcp flags:Mec
>>>>> token:0000(id:0)/32ed0950(id:0) seq:2946228641406205031 sfseq:1
>>>>> ssnoff:1349223625 maplen:5136
>>>>> # mptcp LAST-ACK 0      0           10.0.1.1:10054    
>>>>> 10.0.1.2:55516
>>>>> timer:(keepalive,59sec,0) ino:0 sk:2 cgroup:unreachable:1 ---
>>>>> # 	
>>>>> skmem:(r0,rb131072,t0,tb345088,f4088,w352264,o0,bl0,d0)
>>>>> subflows_max:2 remote_key token:32ed0950
>>>>> write_seq:6317574787800720824
>>>>> snd_una:6317574787800376423 rcv_nxt:2946228641406210168
>>>>> bytes_sent:113752 bytes_received:5136 bytes_acked:113752
>>>>> subflows_total:1 last_data_sent:60954 last_data_recv:61036
>>>>> last_ack_recv:60913                                       
>>>>
>>>> bytes_sent == bytes_sent, possibly we are missing a window-open
>>>> event,
>>>> which in turn should be triggered by a mptcp_cleanp_rbuf(), which
>>>> AFAICS
>>>> are correctly invoked in the splice code. TL;DR: I can't find
>>>> anything
>>>> obviously wrong :-P
>>>>
>>>> Also the default rx buf size is suspect.
>>>>
>>>> Can you reproduce the issue while capturing the traffic with
>>>> tcpdump?
>>>> if
>>>> so, could you please share the capture?
>>>
>>> Thank you for your suggestion. I've attached several tcpdump logs
>>> from
>>> when the tests failed.
>>
>> Oh wow! the receiver actually sends the window open notification
>> (packets 527 and 528 in the trace), but the sender does not react at
>> all.
>>
>> I have no idea/I haven't digged yet why the sender did not try a zero
>> window probe (it should!), but it looks like we have some old bug in
>> sender wakeup since MPTCP_DEQUEUE introduction (which is very
>> surprising, why we did not catch/observe this earlier ?!?). That
>> could
>> explain also sporadic mptcp_join failures.
>>
>> Could you please try the attached patch?
> 
> Thank you very much. I just tested this patch, but it doesn't work. The
> splice test still fails and reports the same error.

Uhmmm... right, in the pcap trace you shared the relevant ack opened the
(mptcp-level) window, without changing the msk-level ack seq.

So we need something similar for __mptcp_check_push(). I can't do it
right now. Could you please have a look?

Otherwise I'll try to share v2 patch later/tomorrow.

Cheers,

Paolo


Re: [PATCH v5 mptcp-next 00/10] mptcp: introduce backlog processing
Posted by Paolo Abeni 1 day, 12 hours ago
On 10/9/25 12:23 PM, Paolo Abeni wrote:
> On 10/9/25 11:02 AM, Geliang Tang wrote:
>> On Thu, 2025-10-09 at 09:52 +0200, Paolo Abeni wrote:
>>> On 10/9/25 8:54 AM, Geliang Tang wrote:
>>>> On Wed, 2025-10-08 at 09:30 +0200, Paolo Abeni wrote:
>>>>> On 10/8/25 5:07 AM, Geliang Tang wrote:
>>>>>> On Mon, 2025-10-06 at 19:07 +0200, Matthieu Baerts wrote:
>>>>>>> Hi Paolo,
>>>>>>>
>>>>>>> On 06/10/2025 10:11, Paolo Abeni wrote:
>>>>>>>> This series includes RX path improvement built around
>>>>>>>> backlog
>>>>>>>> processing
>>>>>>> Thank you for the new version! This is not a review, but just
>>>>>>> a
>>>>>>> note
>>>>>>> to
>>>>>>> tell you patchew didn't manage to apply the patches due to
>>>>>>> the
>>>>>>> same
>>>>>>> conflict that was already there with the v4 (mptcp_init_skb()
>>>>>>> parameters
>>>>>>> have been moved to the previous line). I just applied the
>>>>>>> patches
>>>>>>> manually. While at it, I also used this test branch for
>>>>>>> syzkaller
>>>>>>> to
>>>>>>> validate them.
>>>>>>>
>>>>>>> (Also, on patch "mptcp: drop the __mptcp_data_ready()
>>>>>>> helper",
>>>>>>> git
>>>>>>> complained that there is a trailing whitespace.)
>>>>>>
>>>>>> Sorry, patches 9-10 break my "implement mptcp read_sock" v12
>>>>>> series. I
>>>>>> rebased this series on patches 1-8, it works well. But after
>>>>>> applying
>>>>>> patches 9-10, I changed mptcp_recv_skb() in [1] from
>>>>>
>>>>> Thanks for the feedback, the applied delta looks good to me.
>>>>>
>>>>>> # INFO: with MPTFO start
>>>>>> # 57 ns2 MPTCP -> ns1 (10.0.1.1:10054      ) MPTCP    
>>>>>> (duration
>>>>>> 60989ms) [FAIL] client exit code 0, server 124
>>>>>> # 
>>>>>> # netns ns1-RqXF2p (listener) socket stat for 10054:
>>>>>> # Failed to find cgroup2 mount
>>>>>> # Failed to find cgroup2 mount
>>>>>> # Failed to find cgroup2 mount
>>>>>> # Netid State    Recv-Q Send-Q Local Address:Port  Peer
>>>>>> Address:Port  
>>>>>> # tcp   ESTAB    0      0           10.0.1.1:10054    
>>>>>> 10.0.1.2:55516
>>>>>> ino:2064372 sk:1 cgroup:unreachable:1 <->
>>>>>> # 	 skmem:(r0,rb131072,t0,tb340992,f0,w0,o0,bl0,d0) sack
>>>>>> cubic
>>>>>> wscale:8,8 rto:206 rtt:5.026/10.034 ato:40 mss:1460 pmtu:1500
>>>>>> rcvmss:1436 advmss:1460 cwnd:10 bytes_sent:115312
>>>>>> bytes_retrans:1560
>>>>>> bytes_acked:113752 bytes_received:5136 segs_out:85 segs_in:16
>>>>>> data_segs_out:83 data_segs_in:4 send 23239156bps lastsnd:60939
>>>>>> lastrcv:61035 lastack:60912 pacing_rate 343879640bps
>>>>>> delivery_rate
>>>>>> 1994680bps delivered:84 busy:123ms sndbuf_limited:41ms(33.3%)
>>>>>> retrans:0/2 dsack_dups:2 rcv_space:14600 rcv_ssthresh:75432
>>>>>> minrtt:0.003 rcv_wnd:75520 tcp-ulp-mptcp flags:Mec
>>>>>> token:0000(id:0)/32ed0950(id:0) seq:2946228641406205031 sfseq:1
>>>>>> ssnoff:1349223625 maplen:5136
>>>>>> # mptcp LAST-ACK 0      0           10.0.1.1:10054    
>>>>>> 10.0.1.2:55516
>>>>>> timer:(keepalive,59sec,0) ino:0 sk:2 cgroup:unreachable:1 ---
>>>>>> # 	
>>>>>> skmem:(r0,rb131072,t0,tb345088,f4088,w352264,o0,bl0,d0)
>>>>>> subflows_max:2 remote_key token:32ed0950
>>>>>> write_seq:6317574787800720824
>>>>>> snd_una:6317574787800376423 rcv_nxt:2946228641406210168
>>>>>> bytes_sent:113752 bytes_received:5136 bytes_acked:113752
>>>>>> subflows_total:1 last_data_sent:60954 last_data_recv:61036
>>>>>> last_ack_recv:60913                                       
>>>>>
>>>>> bytes_sent == bytes_sent, possibly we are missing a window-open
>>>>> event,
>>>>> which in turn should be triggered by a mptcp_cleanp_rbuf(), which
>>>>> AFAICS
>>>>> are correctly invoked in the splice code. TL;DR: I can't find
>>>>> anything
>>>>> obviously wrong :-P
>>>>>
>>>>> Also the default rx buf size is suspect.
>>>>>
>>>>> Can you reproduce the issue while capturing the traffic with
>>>>> tcpdump?
>>>>> if
>>>>> so, could you please share the capture?
>>>>
>>>> Thank you for your suggestion. I've attached several tcpdump logs
>>>> from
>>>> when the tests failed.
>>>
>>> Oh wow! the receiver actually sends the window open notification
>>> (packets 527 and 528 in the trace), but the sender does not react at
>>> all.
>>>
>>> I have no idea/I haven't digged yet why the sender did not try a zero
>>> window probe (it should!), but it looks like we have some old bug in
>>> sender wakeup since MPTCP_DEQUEUE introduction (which is very
>>> surprising, why we did not catch/observe this earlier ?!?). That
>>> could
>>> explain also sporadic mptcp_join failures.
>>>
>>> Could you please try the attached patch?
>>
>> Thank you very much. I just tested this patch, but it doesn't work. The
>> splice test still fails and reports the same error.
> 
> Uhmmm... right, in the pcap trace you shared the relevant ack opened the
> (mptcp-level) window, without changing the msk-level ack seq.
> 
> So we need something similar for __mptcp_check_push(). I can't do it
> right now. Could you please have a look?

I reviewed again the relevant code and my initial assessment was wrong.
i.e. there is no need of additional wake-ups.

@Geliang: if you reproduce the issue multiple times, are there any
common patterns ? i.e. sender files considerably larger than the client
one, or only a specific subsets of all the test-cases failing, or ...

Thanks,

Paolo

Re: [PATCH v5 mptcp-next 00/10] mptcp: introduce backlog processing
Posted by Paolo Abeni 17 hours ago
On 10/9/25 3:58 PM, Paolo Abeni wrote:
> @Geliang: if you reproduce the issue multiple times, are there any
> common patterns ? i.e. sender files considerably larger than the client
> one, or only a specific subsets of all the test-cases failing, or ...

Other questions:
- Can you please share your setup details (VM vs baremetal, debug config
vs non debug, vmg vs plain qemu, number of [v]cores...)? I can't repro
the issue locally.
- Can you please share a pcap capture _and_ the selftest text output for
the same failing  test?

In the log shared previously the sender had data queued at the
mptcp-level, but not at TCP-level. In the shared pcap capture the
receiver sends a couple of acks opening the tcp-level and mptcp-level
window, but the sender never replies.

In such scenario the incoming ack should reach ack_update_msk() ->
__mptcp_check_push() -> __mptcp_subflow_push_pending() (or
mptcp_release_cb -> __mptcp_push_pending() ) -> mptcp_sendmsg_frag() but
such chain is apparently broken somewhere in the failing scenario. Could
you please add probe points the the mentioned funtions and perf record
the test, to try to see where the mentioned chain is interrupted?

Thanks,

Paolo
Re: [PATCH v5 mptcp-next 00/10] mptcp: introduce backlog processing
Posted by Geliang Tang 13 hours ago
Hi Paolo,

On Fri, 2025-10-10 at 10:21 +0200, Paolo Abeni wrote:
> On 10/9/25 3:58 PM, Paolo Abeni wrote:
> > @Geliang: if you reproduce the issue multiple times, are there any
> > common patterns ? i.e. sender files considerably larger than the
> > client
> > one, or only a specific subsets of all the test-cases failing, or
> > ...
> 
> Other questions:
> - Can you please share your setup details (VM vs baremetal, debug
> config
> vs non debug, vmg vs plain qemu, number of [v]cores...)? I can't
> repro
> the issue locally.

Here are my modifications:

https://git.kernel.org/pub/scm/linux/kernel/git/geliang/mptcp_net-next.git/log/?h=splice_new

I used mptcp-upstream-virtme-docker normal config to reproduce it:

docker run \
	-e INPUT_NO_BLOCK=1 \
	-e INPUT_PACKETDRILL_NO_SYNC=1 \
	-v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always ghcr.io/multipath-tcp/mptcp-upstream-virtme-
docker:latest \
	auto-normal

$ cat .virtme-exec-run 
run_loop run_selftest_one ./mptcp_connect_splice.sh

Running mptcp_connect_splice.sh in a loop dozens of times should
reproduce the test failure.

> - Can you please share a pcap capture _and_ the selftest text output
> for
> the same failing  test?
> 
> In the log shared previously the sender had data queued at the
> mptcp-level, but not at TCP-level. In the shared pcap capture the
> receiver sends a couple of acks opening the tcp-level and mptcp-level
> window, but the sender never replies.
> 
> In such scenario the incoming ack should reach ack_update_msk() ->
> __mptcp_check_push() -> __mptcp_subflow_push_pending() (or
> mptcp_release_cb -> __mptcp_push_pending() ) -> mptcp_sendmsg_frag()
> but
> such chain is apparently broken somewhere in the failing scenario.
> Could
> you please add probe points the the mentioned funtions and perf
> record
> the test, to try to see where the mentioned chain is interrupted?

Thank you for your suggestion. I will proceed with testing accordingly.

-Geliang

> 
> Thanks,
> 
> Paolo
> 

Re: [PATCH v5 mptcp-next 00/10] mptcp: introduce backlog processing
Posted by MPTCP CI 4 days, 8 hours ago
Hi Paolo,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Success! ✅
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/18288523358

Initiator: Matthieu Baerts (NGI0)
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/5641b16abf48
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1008615


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)