[PATCH mptcp-next v3 0/3] send() fails with EAGAIN in blocking IO mode #487

Geliang Tang posted 3 patches 6 months, 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/multipath-tcp/mptcp_net-next tags/patchew/cover.1748484770.git.tanggeliang@kylinos.cn
tools/testing/selftests/bpf/network_helpers.c | 20 ++++++++++++++++++-
tools/testing/selftests/bpf/network_helpers.h |  2 +-
.../selftests/bpf/prog_tests/bpf_qdisc.c      |  2 +-
.../selftests/bpf/prog_tests/bpf_tcp_ca.c     |  4 ++--
.../testing/selftests/bpf/prog_tests/mptcp.c  | 13 ++++++++++--
5 files changed, 34 insertions(+), 7 deletions(-)
[PATCH mptcp-next v3 0/3] send() fails with EAGAIN in blocking IO mode #487
Posted by Geliang Tang 6 months, 2 weeks ago
From: Geliang Tang <tanggeliang@kylinos.cn>

Good news! I finally solved the unstable issue of MPTCP BPF sched selftests
I reported a year ago, #487 "send() fails with EAGAIN in blocking IO mode".

The fix is simple, it can be solved by explicitly setting SO_SNDBUF
sockopt, but be sure not to set SO_RCVBUF at the same time
(see sk->sk_userlocks & SOCK_RCVBUF_LOCK in mptcp_rcv_space_adjust()).

With this fix, BPF sched selftests are now very stable, I run loop testing
using mptcp-upstream-virtme-docker (run_loop run_bpftest_all), and can run
it normally for hundreds of times without error:


	=== Attempt: 465 (Thu, 29 May 2025 04:04:09 +0000) ===

BPF Test: test_progs -t mptcp
TAP version 13
1..1
# #198/1   mptcp/base:OK
# #198/2   mptcp/mptcpify:OK
# #198/3   mptcp/subflow:OK
# #198/4   mptcp/iters_subflow:OK
# #198/5   mptcp/default:OK
# #198/6   mptcp/first:OK
# #198/7   mptcp/bkup:OK
# #198/8   mptcp/rr:OK
# #198/9   mptcp/red:OK
# #198/10  mptcp/burst:OK
# #198     mptcp:OK
# Summary: 1/10 PASSED, 0 SKIPPED, 0 FAILED
ok 1 test: bpftest_test_progs_mptcp
# time=3
BPF Test: test_progs-cpuv4 -t mptcp
TAP version 13
1..1
# #198/1   mptcp/base:OK
# #198/2   mptcp/mptcpify:OK
# #198/3   mptcp/subflow:OK
# #198/4   mptcp/iters_subflow:OK
# #198/5   mptcp/default:OK
# #198/6   mptcp/first:OK
# #198/7   mptcp/bkup:OK
# #198/8   mptcp/rr:OK
# #198/9   mptcp/red:OK
# #198/10  mptcp/burst:OK
# #198     mptcp:OK
# Summary: 1/10 PASSED, 0 SKIPPED, 0 FAILED
ok 1 test: bpftest_test_progs-cpuv4_mptcp
# time=4
BPF Test: test_progs-no_alu32 -t mptcp
TAP version 13
1..1
# #198/1   mptcp/base:OK
# #198/2   mptcp/mptcpify:OK
# #198/3   mptcp/subflow:OK
# #198/4   mptcp/iters_subflow:OK
# #198/5   mptcp/default:OK
# #198/6   mptcp/first:OK
# #198/7   mptcp/bkup:OK
# #198/8   mptcp/rr:OK
# #198/9   mptcp/red:OK
# #198/10  mptcp/burst:OK
# #198     mptcp:OK
# Summary: 1/10 PASSED, 0 SKIPPED, 0 FAILED
ok 1 test: bpftest_test_progs-no_alu32_mptcp
# time=3

	=== Attempt: 466 (Thu, 29 May 2025 04:04:19 +0000) ===


This set also invalidates the following set named "add io thread mode
tests":
https://patchwork.kernel.org/project/mptcp/cover/cover.1722502941.git.tanggeliang@kylinos.cn/

v2:
 - mptcp: fix the default value of scaling_ratio
 https://patchwork.kernel.org/project/mptcp/patch/0ccc1c26d27d6ee7be22806a97983d37c6ca548c.1715053270.git.tanggeliang@kylinos.cn/

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/487

Geliang Tang (3):
  selftests/bpf: Add sndbuf for send_recv_data
  Squash to "selftests/bpf: Add bpf scheduler test"
  DO-NOT-MERGE: selftests/bpf: Increase total_bytes of bpf sched tests

 tools/testing/selftests/bpf/network_helpers.c | 20 ++++++++++++++++++-
 tools/testing/selftests/bpf/network_helpers.h |  2 +-
 .../selftests/bpf/prog_tests/bpf_qdisc.c      |  2 +-
 .../selftests/bpf/prog_tests/bpf_tcp_ca.c     |  4 ++--
 .../testing/selftests/bpf/prog_tests/mptcp.c  | 13 ++++++++++--
 5 files changed, 34 insertions(+), 7 deletions(-)

-- 
2.43.0
Re: [PATCH mptcp-next v3 0/3] send() fails with EAGAIN in blocking IO mode #487
Posted by Mat Martineau 6 months ago
On Thu, 29 May 2025, Geliang Tang wrote:

> From: Geliang Tang <tanggeliang@kylinos.cn>
>
> Good news! I finally solved the unstable issue of MPTCP BPF sched selftests
> I reported a year ago, #487 "send() fails with EAGAIN in blocking IO mode".
>
> The fix is simple, it can be solved by explicitly setting SO_SNDBUF
> sockopt, but be sure not to set SO_RCVBUF at the same time
> (see sk->sk_userlocks & SOCK_RCVBUF_LOCK in mptcp_rcv_space_adjust()).
>
> With this fix, BPF sched selftests are now very stable, I run loop testing
> using mptcp-upstream-virtme-docker (run_loop run_bpftest_all), and can run
> it normally for hundreds of times without error:
>

Hi Geliang -

I can see how changing SO_SNDBUF on the sending socket side would shift 
timing behavior in a way that affect the test outcome, but it doesn't 
address the root issue with bug #487:

It is either OK to get an EAGAIN from a blocking send(), or it's not OK.


If it's not ok to ever return EAGAIN from a blocking send, the existing 
test code is a reproducer for a bug, and changing the test is hiding that 
bug.

If EAGAIN is ok, then we should change the code in send_recv_server() to 
allow it.


As Matthieu noted in the comment here 
(https://github.com/multipath-tcp/mptcp_net-next/issues/487#issuecomment-2083123666), 
it may be due to the BPF context that the socket is really non-blocking. 
Did we ever figure out if that was the case - I couldn't find the 
discussion on the mailing list?


- Mat


>
> 	=== Attempt: 465 (Thu, 29 May 2025 04:04:09 +0000) ===
>
> BPF Test: test_progs -t mptcp
> TAP version 13
> 1..1
> # #198/1   mptcp/base:OK
> # #198/2   mptcp/mptcpify:OK
> # #198/3   mptcp/subflow:OK
> # #198/4   mptcp/iters_subflow:OK
> # #198/5   mptcp/default:OK
> # #198/6   mptcp/first:OK
> # #198/7   mptcp/bkup:OK
> # #198/8   mptcp/rr:OK
> # #198/9   mptcp/red:OK
> # #198/10  mptcp/burst:OK
> # #198     mptcp:OK
> # Summary: 1/10 PASSED, 0 SKIPPED, 0 FAILED
> ok 1 test: bpftest_test_progs_mptcp
> # time=3
> BPF Test: test_progs-cpuv4 -t mptcp
> TAP version 13
> 1..1
> # #198/1   mptcp/base:OK
> # #198/2   mptcp/mptcpify:OK
> # #198/3   mptcp/subflow:OK
> # #198/4   mptcp/iters_subflow:OK
> # #198/5   mptcp/default:OK
> # #198/6   mptcp/first:OK
> # #198/7   mptcp/bkup:OK
> # #198/8   mptcp/rr:OK
> # #198/9   mptcp/red:OK
> # #198/10  mptcp/burst:OK
> # #198     mptcp:OK
> # Summary: 1/10 PASSED, 0 SKIPPED, 0 FAILED
> ok 1 test: bpftest_test_progs-cpuv4_mptcp
> # time=4
> BPF Test: test_progs-no_alu32 -t mptcp
> TAP version 13
> 1..1
> # #198/1   mptcp/base:OK
> # #198/2   mptcp/mptcpify:OK
> # #198/3   mptcp/subflow:OK
> # #198/4   mptcp/iters_subflow:OK
> # #198/5   mptcp/default:OK
> # #198/6   mptcp/first:OK
> # #198/7   mptcp/bkup:OK
> # #198/8   mptcp/rr:OK
> # #198/9   mptcp/red:OK
> # #198/10  mptcp/burst:OK
> # #198     mptcp:OK
> # Summary: 1/10 PASSED, 0 SKIPPED, 0 FAILED
> ok 1 test: bpftest_test_progs-no_alu32_mptcp
> # time=3
>
> 	=== Attempt: 466 (Thu, 29 May 2025 04:04:19 +0000) ===
>
>
> This set also invalidates the following set named "add io thread mode
> tests":
> https://patchwork.kernel.org/project/mptcp/cover/cover.1722502941.git.tanggeliang@kylinos.cn/
>
> v2:
> - mptcp: fix the default value of scaling_ratio
> https://patchwork.kernel.org/project/mptcp/patch/0ccc1c26d27d6ee7be22806a97983d37c6ca548c.1715053270.git.tanggeliang@kylinos.cn/
>
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/487
>
> Geliang Tang (3):
>  selftests/bpf: Add sndbuf for send_recv_data
>  Squash to "selftests/bpf: Add bpf scheduler test"
>  DO-NOT-MERGE: selftests/bpf: Increase total_bytes of bpf sched tests
>
> tools/testing/selftests/bpf/network_helpers.c | 20 ++++++++++++++++++-
> tools/testing/selftests/bpf/network_helpers.h |  2 +-
> .../selftests/bpf/prog_tests/bpf_qdisc.c      |  2 +-
> .../selftests/bpf/prog_tests/bpf_tcp_ca.c     |  4 ++--
> .../testing/selftests/bpf/prog_tests/mptcp.c  | 13 ++++++++++--
> 5 files changed, 34 insertions(+), 7 deletions(-)
>
> -- 
> 2.43.0
>
>
>
Re: [PATCH mptcp-next v3 0/3] send() fails with EAGAIN in blocking IO mode #487
Posted by Matthieu Baerts 6 months ago
Hi Mat, Geliang,

On 14/06/2025 01:11, Mat Martineau wrote:
> On Thu, 29 May 2025, Geliang Tang wrote:
> 
>> From: Geliang Tang <tanggeliang@kylinos.cn>
>>
>> Good news! I finally solved the unstable issue of MPTCP BPF sched
>> selftests
>> I reported a year ago, #487 "send() fails with EAGAIN in blocking IO
>> mode".
>>
>> The fix is simple, it can be solved by explicitly setting SO_SNDBUF
>> sockopt, but be sure not to set SO_RCVBUF at the same time
>> (see sk->sk_userlocks & SOCK_RCVBUF_LOCK in mptcp_rcv_space_adjust()).
>>
>> With this fix, BPF sched selftests are now very stable, I run loop
>> testing
>> using mptcp-upstream-virtme-docker (run_loop run_bpftest_all), and can
>> run
>> it normally for hundreds of times without error:
>>
> 
> Hi Geliang -
> 
> I can see how changing SO_SNDBUF on the sending socket side would shift
> timing behavior in a way that affect the test outcome, but it doesn't
> address the root issue with bug #487:
> 
> It is either OK to get an EAGAIN from a blocking send(), or it's not OK.
> 
> 
> If it's not ok to ever return EAGAIN from a blocking send, the existing
> test code is a reproducer for a bug, and changing the test is hiding
> that bug.
> 
> If EAGAIN is ok, then we should change the code in send_recv_server() to
> allow it.

It is now a bit hidden in the middle of #487, but if I'm not mistaken,
it is OK to get EAGAIN with a blocking send() **if** SO_SNDTIMEO is
used, and in case of timeout.

See:
https://github.com/multipath-tcp/mptcp_net-next/issues/487#issuecomment-2485577676

So I think the question should be: is it normal to block for longer than
the timeout period (which is a "long" period, no?)? If yes, then
limiting the send buffer might be a solution, but as Mat said, it looks
better to understand the root cause than hiding a bug :)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.
Re: [PATCH mptcp-next v3 0/3] send() fails with EAGAIN in blocking IO mode #487
Posted by Geliang Tang 6 months ago
Hi Matt, Mat,

On Sun, 2025-06-15 at 23:29 +0200, Matthieu Baerts wrote:
> Hi Mat, Geliang,
> 
> On 14/06/2025 01:11, Mat Martineau wrote:
> > On Thu, 29 May 2025, Geliang Tang wrote:
> > 
> > > From: Geliang Tang <tanggeliang@kylinos.cn>
> > > 
> > > Good news! I finally solved the unstable issue of MPTCP BPF sched
> > > selftests
> > > I reported a year ago, #487 "send() fails with EAGAIN in blocking
> > > IO
> > > mode".
> > > 
> > > The fix is simple, it can be solved by explicitly setting
> > > SO_SNDBUF
> > > sockopt, but be sure not to set SO_RCVBUF at the same time
> > > (see sk->sk_userlocks & SOCK_RCVBUF_LOCK in
> > > mptcp_rcv_space_adjust()).
> > > 
> > > With this fix, BPF sched selftests are now very stable, I run
> > > loop
> > > testing
> > > using mptcp-upstream-virtme-docker (run_loop run_bpftest_all),
> > > and can
> > > run
> > > it normally for hundreds of times without error:
> > > 
> > 
> > Hi Geliang -
> > 
> > I can see how changing SO_SNDBUF on the sending socket side would
> > shift
> > timing behavior in a way that affect the test outcome, but it
> > doesn't
> > address the root issue with bug #487:
> > 
> > It is either OK to get an EAGAIN from a blocking send(), or it's
> > not OK.
> > 
> > 
> > If it's not ok to ever return EAGAIN from a blocking send, the
> > existing
> > test code is a reproducer for a bug, and changing the test is
> > hiding
> > that bug.
> > 
> > If EAGAIN is ok, then we should change the code in
> > send_recv_server() to
> > allow it.

I did try to handle EAGAIN in send_recv_server() but it didn't work.
MPTCP BPF sched selftests still fail. Test code and results are
attached.

I added this in send_recv_server():

	if (errno == EAGAIN && again < 5) {
		again++;
		continue;
	}

And still got the EAGAIN error:

# (network_helpers.c:728: errno: Resource temporarily unavailable) send
7867500 expected 10485760
# (network_helpers.c:782: errno: Resource temporarily unavailable) recv
3469500 ​​expected 10485760
# (network_helpers.c:790: errno: Resource temporarily unavailable)
Failed in thread_ret -11
# send_data_and_verify:FAIL:send_recv_data unexpected error: -11 (errno
11)

In addition, BPF selftests adds a new mechanism that does not allow any
test item to run for more than 10 seconds. Otherwise, the following
error will be reported:

# WATCHDOG: test case mptcp/default executes for 10 seconds...

In my testing, I have not found any other solution besides limiting the
send buffer. This allows data to be sent at a constant rate, which
ensures the stability of MPTCP BPF sched selftests.

In order to avoid hiding this bug, we can add a test item for this in
mptcp selftest in the future, like in [1].

WDYT?

Thanks,
-Geliang

[1]
https://patchwork.kernel.org/project/mptcp/cover/cover.1722502941.git.tanggeliang@kylinos.cn/

> 
> It is now a bit hidden in the middle of #487, but if I'm not
> mistaken,
> it is OK to get EAGAIN with a blocking send() **if** SO_SNDTIMEO is
> used, and in case of timeout.
> 
> See:
> https://github.com/multipath-tcp/mptcp_net-next/issues/487#issuecomment-2485577676
> 
> So I think the question should be: is it normal to block for longer
> than
> the timeout period (which is a "long" period, no?)? If yes, then
> limiting the send buffer might be a solution, but as Mat said, it
> looks
> better to understand the root cause than hiding a bug :)
> 
> Cheers,
> Matt
Re: [PATCH mptcp-next v3 0/3] send() fails with EAGAIN in blocking IO mode #487
Posted by Geliang Tang 6 months ago
Hi Matt, Mat,

On Sun, 2025-06-15 at 23:29 +0200, Matthieu Baerts wrote:
> Hi Mat, Geliang,
> 
> On 14/06/2025 01:11, Mat Martineau wrote:
> > On Thu, 29 May 2025, Geliang Tang wrote:
> > 
> > > From: Geliang Tang <tanggeliang@kylinos.cn>
> > > 
> > > Good news! I finally solved the unstable issue of MPTCP BPF sched
> > > selftests
> > > I reported a year ago, #487 "send() fails with EAGAIN in blocking
> > > IO
> > > mode".
> > > 
> > > The fix is simple, it can be solved by explicitly setting
> > > SO_SNDBUF
> > > sockopt, but be sure not to set SO_RCVBUF at the same time
> > > (see sk->sk_userlocks & SOCK_RCVBUF_LOCK in
> > > mptcp_rcv_space_adjust()).
> > > 
> > > With this fix, BPF sched selftests are now very stable, I run
> > > loop
> > > testing
> > > using mptcp-upstream-virtme-docker (run_loop run_bpftest_all),
> > > and can
> > > run
> > > it normally for hundreds of times without error:
> > > 
> > 
> > Hi Geliang -
> > 
> > I can see how changing SO_SNDBUF on the sending socket side would
> > shift
> > timing behavior in a way that affect the test outcome, but it
> > doesn't
> > address the root issue with bug #487:
> > 
> > It is either OK to get an EAGAIN from a blocking send(), or it's
> > not OK.
> > 
> > 
> > If it's not ok to ever return EAGAIN from a blocking send, the
> > existing
> > test code is a reproducer for a bug, and changing the test is
> > hiding
> > that bug.
> > 
> > If EAGAIN is ok, then we should change the code in
> > send_recv_server() to
> > allow it.

I did try to handle EAGAIN in send_recv_server() but it didn't work.
MPTCP BPF sched selftests still fail. Test code and results are
attached.

I added this in send_recv_server():

	if (errno == EAGAIN && again < 5) {
		again++;
		continue;
	}

And still got the EAGAIN error:

# (network_helpers.c:728: errno: Resource temporarily unavailable) send
7867500 expected 10485760
# (network_helpers.c:782: errno: Resource temporarily unavailable) recv
3469500 ​​expected 10485760
# (network_helpers.c:790: errno: Resource temporarily unavailable)
Failed in thread_ret -11
# send_data_and_verify:FAIL:send_recv_data unexpected error: -11 (errno
11)

In addition, BPF selftests adds a new mechanism that does not allow any
test item to run for more than 10 seconds. Otherwise, the following
error will be reported:

# WATCHDOG: test case mptcp/default executes for 10 seconds...

In my testing, I have not found any other solution besides limiting the
send buffer. This allows data to be sent at a constant rate, which
ensures the stability of MPTCP BPF sched selftests.

In order to avoid hiding this bug, we can add a test item for this in
mptcp selftest in the future, like in [1].

WDYT?

Thanks,
-Geliang

[1]
https://patchwork.kernel.org/project/mptcp/cover/cover.1722502941.git.tanggeliang@kylinos.cn/

> 
> It is now a bit hidden in the middle of #487, but if I'm not
> mistaken,
> it is OK to get EAGAIN with a blocking send() **if** SO_SNDTIMEO is
> used, and in case of timeout.
> 
> See:
> https://github.com/multipath-tcp/mptcp_net-next/issues/487#issuecomment-2485577676
> 
> So I think the question should be: is it normal to block for longer
> than
> the timeout period (which is a "long" period, no?)? If yes, then
> limiting the send buffer might be a solution, but as Mat said, it
> looks
> better to understand the root cause than hiding a bug :)
> 
> Cheers,
> Matt
Re: [PATCH mptcp-next v3 0/3] send() fails with EAGAIN in blocking IO mode #487
Posted by Geliang Tang 5 months, 2 weeks ago
Hi Mat, Matt,

On Mon, 2025-06-16 at 14:34 +0800, Geliang Tang wrote:
> Hi Matt, Mat,
> 
> On Sun, 2025-06-15 at 23:29 +0200, Matthieu Baerts wrote:
> > Hi Mat, Geliang,
> > 
> > On 14/06/2025 01:11, Mat Martineau wrote:
> > > On Thu, 29 May 2025, Geliang Tang wrote:
> > > 
> > > > From: Geliang Tang <tanggeliang@kylinos.cn>
> > > > 
> > > > Good news! I finally solved the unstable issue of MPTCP BPF
> > > > sched
> > > > selftests
> > > > I reported a year ago, #487 "send() fails with EAGAIN in
> > > > blocking
> > > > IO
> > > > mode".
> > > > 
> > > > The fix is simple, it can be solved by explicitly setting
> > > > SO_SNDBUF
> > > > sockopt, but be sure not to set SO_RCVBUF at the same time
> > > > (see sk->sk_userlocks & SOCK_RCVBUF_LOCK in
> > > > mptcp_rcv_space_adjust()).
> > > > 
> > > > With this fix, BPF sched selftests are now very stable, I run
> > > > loop
> > > > testing
> > > > using mptcp-upstream-virtme-docker (run_loop run_bpftest_all),
> > > > and can
> > > > run
> > > > it normally for hundreds of times without error:
> > > > 
> > > 
> > > Hi Geliang -
> > > 
> > > I can see how changing SO_SNDBUF on the sending socket side would
> > > shift
> > > timing behavior in a way that affect the test outcome, but it
> > > doesn't
> > > address the root issue with bug #487:
> > > 
> > > It is either OK to get an EAGAIN from a blocking send(), or it's
> > > not OK.
> > > 
> > > 
> > > If it's not ok to ever return EAGAIN from a blocking send, the
> > > existing
> > > test code is a reproducer for a bug, and changing the test is
> > > hiding
> > > that bug.
> > > 
> > > If EAGAIN is ok, then we should change the code in
> > > send_recv_server() to
> > > allow it.
> 
> I did try to handle EAGAIN in send_recv_server() but it didn't work.
> MPTCP BPF sched selftests still fail. Test code and results are
> attached.
> 
> I added this in send_recv_server():
> 
> 	if (errno == EAGAIN && again < 5) {
> 		again++;
> 		continue;
> 	}
> 
> And still got the EAGAIN error:
> 
> # (network_helpers.c:728: errno: Resource temporarily unavailable)
> send
> 7867500 expected 10485760
> # (network_helpers.c:782: errno: Resource temporarily unavailable)
> recv
> 3469500 ​​expected 10485760
> # (network_helpers.c:790: errno: Resource temporarily unavailable)
> Failed in thread_ret -11
> # send_data_and_verify:FAIL:send_recv_data unexpected error: -11
> (errno
> 11)
> 
> In addition, BPF selftests adds a new mechanism that does not allow
> any
> test item to run for more than 10 seconds. Otherwise, the following
> error will be reported:
> 
> # WATCHDOG: test case mptcp/default executes for 10 seconds...
> 
> In my testing, I have not found any other solution besides limiting
> the
> send buffer. This allows data to be sent at a constant rate, which
> ensures the stability of MPTCP BPF sched selftests.
> 
> In order to avoid hiding this bug, we can add a test item for this in
> mptcp selftest in the future, like in [1].

Last week I debugged this issue further and found something (Thanks to
Gang Yan for his help): when mptcp bpf sched selftests fail, the memory
limit check (if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)) is always true
in __mptcp_move_skbs_from_subflow(), at this time sk->sk_receive_queue
is empty, but no skb is moved from this subflow to sk-
>sk_receive_queue, which causes the transmission to fail.

One fix is ​​to also consider the case where sk->sk_receive_queue is
empty when doing the memory limit check:

  -               if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
  +               if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf &&
  +                   !skb_queue_empty(&sk->sk_receive_queue))
                          break;

In addition, this memory limit check was moved from the end of do {}
while (more_data_avail) to the front in the commit e0ca4057e0ec
("mptcp: micro-optimize __mptcp_move_skb()"), so another better fix is
​​to restore this check to the end of do {} while () so that move skbs
from this subflow to sk->sk_receive_queue always has a chance to do at
least once:

@@ -587,9 +587,6 @@ static bool __mptcp_move_skbs_from_subflow(struct
mptcp_sock *msk,
                struct sk_buff *skb;
                bool fin;
 
-               if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
-                       break;
-
                /* try to move as much data as available */
                map_remaining = subflow->map_data_len -
                                mptcp_subflow_get_map_offset(subflow);
@@ -634,6 +631,8 @@ static bool __mptcp_move_skbs_from_subflow(struct
mptcp_sock *msk,
                WRITE_ONCE(tp->copied_seq, seq);
                more_data_avail = mptcp_subflow_data_available(ssk);
 
+               if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
+                       break;
        } while (more_data_avail);

I think this is the root cause of #487, I would like to hear your
opinions on which fix is ​​better.

Thanks,
-Geliang

> 
> WDYT?
> 
> Thanks,
> -Geliang
> 
> [1]
> https://patchwork.kernel.org/project/mptcp/cover/cover.1722502941.git.tanggeliang@kylinos.cn/
> 
> > 
> > It is now a bit hidden in the middle of #487, but if I'm not
> > mistaken,
> > it is OK to get EAGAIN with a blocking send() **if** SO_SNDTIMEO is
> > used, and in case of timeout.
> > 
> > See:
> > https://github.com/multipath-tcp/mptcp_net-next/issues/487#issuecomment-2485577676
> > 
> > So I think the question should be: is it normal to block for longer
> > than
> > the timeout period (which is a "long" period, no?)? If yes, then
> > limiting the send buffer might be a solution, but as Mat said, it
> > looks
> > better to understand the root cause than hiding a bug :)
> > 
> > Cheers,
> > Matt
Re: [PATCH mptcp-next v3 0/3] send() fails with EAGAIN in blocking IO mode #487
Posted by Mat Martineau 5 months ago
Hi Geliang -

On Sun, 29 Jun 2025, Geliang Tang wrote:

> Hi Mat, Matt,
>
> On Mon, 2025-06-16 at 14:34 +0800, Geliang Tang wrote:
>> Hi Matt, Mat,
>>
>> On Sun, 2025-06-15 at 23:29 +0200, Matthieu Baerts wrote:
>>> Hi Mat, Geliang,
>>>
>>> On 14/06/2025 01:11, Mat Martineau wrote:
>>>> On Thu, 29 May 2025, Geliang Tang wrote:
>>>>
>>>>> From: Geliang Tang <tanggeliang@kylinos.cn>
>>>>>
>>>>> Good news! I finally solved the unstable issue of MPTCP BPF
>>>>> sched
>>>>> selftests
>>>>> I reported a year ago, #487 "send() fails with EAGAIN in
>>>>> blocking
>>>>> IO
>>>>> mode".
>>>>>
>>>>> The fix is simple, it can be solved by explicitly setting
>>>>> SO_SNDBUF
>>>>> sockopt, but be sure not to set SO_RCVBUF at the same time
>>>>> (see sk->sk_userlocks & SOCK_RCVBUF_LOCK in
>>>>> mptcp_rcv_space_adjust()).
>>>>>
>>>>> With this fix, BPF sched selftests are now very stable, I run
>>>>> loop
>>>>> testing
>>>>> using mptcp-upstream-virtme-docker (run_loop run_bpftest_all),
>>>>> and can
>>>>> run
>>>>> it normally for hundreds of times without error:
>>>>>
>>>>
>>>> Hi Geliang -
>>>>
>>>> I can see how changing SO_SNDBUF on the sending socket side would
>>>> shift
>>>> timing behavior in a way that affect the test outcome, but it
>>>> doesn't
>>>> address the root issue with bug #487:
>>>>
>>>> It is either OK to get an EAGAIN from a blocking send(), or it's
>>>> not OK.
>>>>
>>>>
>>>> If it's not ok to ever return EAGAIN from a blocking send, the
>>>> existing
>>>> test code is a reproducer for a bug, and changing the test is
>>>> hiding
>>>> that bug.
>>>>
>>>> If EAGAIN is ok, then we should change the code in
>>>> send_recv_server() to
>>>> allow it.
>>
>> I did try to handle EAGAIN in send_recv_server() but it didn't work.
>> MPTCP BPF sched selftests still fail. Test code and results are
>> attached.
>>
>> I added this in send_recv_server():
>>
>> 	if (errno == EAGAIN && again < 5) {
>> 		again++;
>> 		continue;
>> 	}
>>
>> And still got the EAGAIN error:
>>
>> # (network_helpers.c:728: errno: Resource temporarily unavailable)
>> send
>> 7867500 expected 10485760
>> # (network_helpers.c:782: errno: Resource temporarily unavailable)
>> recv
>> 3469500 ​​expected 10485760
>> # (network_helpers.c:790: errno: Resource temporarily unavailable)
>> Failed in thread_ret -11
>> # send_data_and_verify:FAIL:send_recv_data unexpected error: -11
>> (errno
>> 11)
>>
>> In addition, BPF selftests adds a new mechanism that does not allow
>> any
>> test item to run for more than 10 seconds. Otherwise, the following
>> error will be reported:
>>
>> # WATCHDOG: test case mptcp/default executes for 10 seconds...
>>
>> In my testing, I have not found any other solution besides limiting
>> the
>> send buffer. This allows data to be sent at a constant rate, which
>> ensures the stability of MPTCP BPF sched selftests.
>>
>> In order to avoid hiding this bug, we can add a test item for this in
>> mptcp selftest in the future, like in [1].
>
> Last week I debugged this issue further and found something (Thanks to
> Gang Yan for his help): when mptcp bpf sched selftests fail, the memory
> limit check (if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)) is always true
> in __mptcp_move_skbs_from_subflow(), at this time sk->sk_receive_queue
> is empty,

Thanks for tracking down some more details!

> but no skb is moved from this subflow to sk->
> sk_receive_queue, which causes the transmission to fail.
>

I'm trying to understand this part (above) better. 
__mptcp_move_skbs_from_subflow() is on the receive path. What is the path 
to causing transmission to fail? Does exiting the loop early on the rx 
side leave ssk->sk_err set, so an error propagates in the msk and 
sendmsg() returns early with EAGAIN?

> One fix is ​​to also consider the case where sk->sk_receive_queue is
> empty when doing the memory limit check:
>
>  -               if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
>  +               if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf &&
>  +                   !skb_queue_empty(&sk->sk_receive_queue))
>                          break;

The msk sk_receive_queue is probably empty a lot of the time, does it look 
like the empty queue is related to the failure?

>
> In addition, this memory limit check was moved from the end of do {}
> while (more_data_avail) to the front in the commit e0ca4057e0ec
> ("mptcp: micro-optimize __mptcp_move_skb()"), so another better fix is
> ​​to restore this check to the end of do {} while () so that move skbs
> from this subflow to sk->sk_receive_queue always has a chance to do at
> least once:
>
> @@ -587,9 +587,6 @@ static bool __mptcp_move_skbs_from_subflow(struct
> mptcp_sock *msk,
>                struct sk_buff *skb;
>                bool fin;
>
> -               if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
> -                       break;
> -
>                /* try to move as much data as available */
>                map_remaining = subflow->map_data_len -
>                                mptcp_subflow_get_map_offset(subflow);
> @@ -634,6 +631,8 @@ static bool __mptcp_move_skbs_from_subflow(struct
> mptcp_sock *msk,
>                WRITE_ONCE(tp->copied_seq, seq);
>                more_data_avail = mptcp_subflow_data_available(ssk);
>
> +               if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
> +                       break;
>        } while (more_data_avail);
>
> I think this is the root cause of #487, I would like to hear your
> opinions on which fix is ​​better.

I have a slight preference for the second possible fix since it still 
provides an exit from the loop even when the sk_receive_queue is empty. 
Can you send that patch?

Before we merge a fix I'm still hoping to understand how letting this loop 
body run once on the rx path is changing the tx side behavior!

Thanks,
Mat

>
> Thanks,
> -Geliang
>
>>
>> WDYT?
>>
>> Thanks,
>> -Geliang
>>
>> [1]
>> https://patchwork.kernel.org/project/mptcp/cover/cover.1722502941.git.tanggeliang@kylinos.cn/
>>
>>>
>>> It is now a bit hidden in the middle of #487, but if I'm not
>>> mistaken,
>>> it is OK to get EAGAIN with a blocking send() **if** SO_SNDTIMEO is
>>> used, and in case of timeout.
>>>
>>> See:
>>> https://github.com/multipath-tcp/mptcp_net-next/issues/487#issuecomment-2485577676
>>>
>>> So I think the question should be: is it normal to block for longer
>>> than
>>> the timeout period (which is a "long" period, no?)? If yes, then
>>> limiting the send buffer might be a solution, but as Mat said, it
>>> looks
>>> better to understand the root cause than hiding a bug :)
>>>
>>> Cheers,
>>> Matt
>
Re: [PATCH mptcp-next v3 0/3] send() fails with EAGAIN in blocking IO mode #487
Posted by MPTCP CI 6 months, 2 weeks ago
Hi Geliang,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal: Success! ✅
- KVM Validation: debug: Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/15316193959

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/8bae6145adeb
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=967210


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)