tools/testing/selftests/bpf/network_helpers.c | 20 ++++++++++++++++++- tools/testing/selftests/bpf/network_helpers.h | 2 +- .../selftests/bpf/prog_tests/bpf_qdisc.c | 2 +- .../selftests/bpf/prog_tests/bpf_tcp_ca.c | 4 ++-- .../testing/selftests/bpf/prog_tests/mptcp.c | 13 ++++++++++-- 5 files changed, 34 insertions(+), 7 deletions(-)
From: Geliang Tang <tanggeliang@kylinos.cn> Good news! I finally solved the unstable issue of MPTCP BPF sched selftests I reported a year ago, #487 "send() fails with EAGAIN in blocking IO mode". The fix is simple, it can be solved by explicitly setting SO_SNDBUF sockopt, but be sure not to set SO_RCVBUF at the same time (see sk->sk_userlocks & SOCK_RCVBUF_LOCK in mptcp_rcv_space_adjust()). With this fix, BPF sched selftests are now very stable, I run loop testing using mptcp-upstream-virtme-docker (run_loop run_bpftest_all), and can run it normally for hundreds of times without error: === Attempt: 465 (Thu, 29 May 2025 04:04:09 +0000) === BPF Test: test_progs -t mptcp TAP version 13 1..1 # #198/1 mptcp/base:OK # #198/2 mptcp/mptcpify:OK # #198/3 mptcp/subflow:OK # #198/4 mptcp/iters_subflow:OK # #198/5 mptcp/default:OK # #198/6 mptcp/first:OK # #198/7 mptcp/bkup:OK # #198/8 mptcp/rr:OK # #198/9 mptcp/red:OK # #198/10 mptcp/burst:OK # #198 mptcp:OK # Summary: 1/10 PASSED, 0 SKIPPED, 0 FAILED ok 1 test: bpftest_test_progs_mptcp # time=3 BPF Test: test_progs-cpuv4 -t mptcp TAP version 13 1..1 # #198/1 mptcp/base:OK # #198/2 mptcp/mptcpify:OK # #198/3 mptcp/subflow:OK # #198/4 mptcp/iters_subflow:OK # #198/5 mptcp/default:OK # #198/6 mptcp/first:OK # #198/7 mptcp/bkup:OK # #198/8 mptcp/rr:OK # #198/9 mptcp/red:OK # #198/10 mptcp/burst:OK # #198 mptcp:OK # Summary: 1/10 PASSED, 0 SKIPPED, 0 FAILED ok 1 test: bpftest_test_progs-cpuv4_mptcp # time=4 BPF Test: test_progs-no_alu32 -t mptcp TAP version 13 1..1 # #198/1 mptcp/base:OK # #198/2 mptcp/mptcpify:OK # #198/3 mptcp/subflow:OK # #198/4 mptcp/iters_subflow:OK # #198/5 mptcp/default:OK # #198/6 mptcp/first:OK # #198/7 mptcp/bkup:OK # #198/8 mptcp/rr:OK # #198/9 mptcp/red:OK # #198/10 mptcp/burst:OK # #198 mptcp:OK # Summary: 1/10 PASSED, 0 SKIPPED, 0 FAILED ok 1 test: bpftest_test_progs-no_alu32_mptcp # time=3 === Attempt: 466 (Thu, 29 May 2025 04:04:19 +0000) === This set also invalidates the following set named "add io thread mode tests": https://patchwork.kernel.org/project/mptcp/cover/cover.1722502941.git.tanggeliang@kylinos.cn/ v2: - mptcp: fix the default value of scaling_ratio https://patchwork.kernel.org/project/mptcp/patch/0ccc1c26d27d6ee7be22806a97983d37c6ca548c.1715053270.git.tanggeliang@kylinos.cn/ Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/487 Geliang Tang (3): selftests/bpf: Add sndbuf for send_recv_data Squash to "selftests/bpf: Add bpf scheduler test" DO-NOT-MERGE: selftests/bpf: Increase total_bytes of bpf sched tests tools/testing/selftests/bpf/network_helpers.c | 20 ++++++++++++++++++- tools/testing/selftests/bpf/network_helpers.h | 2 +- .../selftests/bpf/prog_tests/bpf_qdisc.c | 2 +- .../selftests/bpf/prog_tests/bpf_tcp_ca.c | 4 ++-- .../testing/selftests/bpf/prog_tests/mptcp.c | 13 ++++++++++-- 5 files changed, 34 insertions(+), 7 deletions(-) -- 2.43.0
On Thu, 29 May 2025, Geliang Tang wrote: > From: Geliang Tang <tanggeliang@kylinos.cn> > > Good news! I finally solved the unstable issue of MPTCP BPF sched selftests > I reported a year ago, #487 "send() fails with EAGAIN in blocking IO mode". > > The fix is simple, it can be solved by explicitly setting SO_SNDBUF > sockopt, but be sure not to set SO_RCVBUF at the same time > (see sk->sk_userlocks & SOCK_RCVBUF_LOCK in mptcp_rcv_space_adjust()). > > With this fix, BPF sched selftests are now very stable, I run loop testing > using mptcp-upstream-virtme-docker (run_loop run_bpftest_all), and can run > it normally for hundreds of times without error: > Hi Geliang - I can see how changing SO_SNDBUF on the sending socket side would shift timing behavior in a way that affect the test outcome, but it doesn't address the root issue with bug #487: It is either OK to get an EAGAIN from a blocking send(), or it's not OK. If it's not ok to ever return EAGAIN from a blocking send, the existing test code is a reproducer for a bug, and changing the test is hiding that bug. If EAGAIN is ok, then we should change the code in send_recv_server() to allow it. As Matthieu noted in the comment here (https://github.com/multipath-tcp/mptcp_net-next/issues/487#issuecomment-2083123666), it may be due to the BPF context that the socket is really non-blocking. Did we ever figure out if that was the case - I couldn't find the discussion on the mailing list? - Mat > > === Attempt: 465 (Thu, 29 May 2025 04:04:09 +0000) === > > BPF Test: test_progs -t mptcp > TAP version 13 > 1..1 > # #198/1 mptcp/base:OK > # #198/2 mptcp/mptcpify:OK > # #198/3 mptcp/subflow:OK > # #198/4 mptcp/iters_subflow:OK > # #198/5 mptcp/default:OK > # #198/6 mptcp/first:OK > # #198/7 mptcp/bkup:OK > # #198/8 mptcp/rr:OK > # #198/9 mptcp/red:OK > # #198/10 mptcp/burst:OK > # #198 mptcp:OK > # Summary: 1/10 PASSED, 0 SKIPPED, 0 FAILED > ok 1 test: bpftest_test_progs_mptcp > # time=3 > BPF Test: test_progs-cpuv4 -t mptcp > TAP version 13 > 1..1 > # #198/1 mptcp/base:OK > # #198/2 mptcp/mptcpify:OK > # #198/3 mptcp/subflow:OK > # #198/4 mptcp/iters_subflow:OK > # #198/5 mptcp/default:OK > # #198/6 mptcp/first:OK > # #198/7 mptcp/bkup:OK > # #198/8 mptcp/rr:OK > # #198/9 mptcp/red:OK > # #198/10 mptcp/burst:OK > # #198 mptcp:OK > # Summary: 1/10 PASSED, 0 SKIPPED, 0 FAILED > ok 1 test: bpftest_test_progs-cpuv4_mptcp > # time=4 > BPF Test: test_progs-no_alu32 -t mptcp > TAP version 13 > 1..1 > # #198/1 mptcp/base:OK > # #198/2 mptcp/mptcpify:OK > # #198/3 mptcp/subflow:OK > # #198/4 mptcp/iters_subflow:OK > # #198/5 mptcp/default:OK > # #198/6 mptcp/first:OK > # #198/7 mptcp/bkup:OK > # #198/8 mptcp/rr:OK > # #198/9 mptcp/red:OK > # #198/10 mptcp/burst:OK > # #198 mptcp:OK > # Summary: 1/10 PASSED, 0 SKIPPED, 0 FAILED > ok 1 test: bpftest_test_progs-no_alu32_mptcp > # time=3 > > === Attempt: 466 (Thu, 29 May 2025 04:04:19 +0000) === > > > This set also invalidates the following set named "add io thread mode > tests": > https://patchwork.kernel.org/project/mptcp/cover/cover.1722502941.git.tanggeliang@kylinos.cn/ > > v2: > - mptcp: fix the default value of scaling_ratio > https://patchwork.kernel.org/project/mptcp/patch/0ccc1c26d27d6ee7be22806a97983d37c6ca548c.1715053270.git.tanggeliang@kylinos.cn/ > > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/487 > > Geliang Tang (3): > selftests/bpf: Add sndbuf for send_recv_data > Squash to "selftests/bpf: Add bpf scheduler test" > DO-NOT-MERGE: selftests/bpf: Increase total_bytes of bpf sched tests > > tools/testing/selftests/bpf/network_helpers.c | 20 ++++++++++++++++++- > tools/testing/selftests/bpf/network_helpers.h | 2 +- > .../selftests/bpf/prog_tests/bpf_qdisc.c | 2 +- > .../selftests/bpf/prog_tests/bpf_tcp_ca.c | 4 ++-- > .../testing/selftests/bpf/prog_tests/mptcp.c | 13 ++++++++++-- > 5 files changed, 34 insertions(+), 7 deletions(-) > > -- > 2.43.0 > > >
Hi Mat, Geliang, On 14/06/2025 01:11, Mat Martineau wrote: > On Thu, 29 May 2025, Geliang Tang wrote: > >> From: Geliang Tang <tanggeliang@kylinos.cn> >> >> Good news! I finally solved the unstable issue of MPTCP BPF sched >> selftests >> I reported a year ago, #487 "send() fails with EAGAIN in blocking IO >> mode". >> >> The fix is simple, it can be solved by explicitly setting SO_SNDBUF >> sockopt, but be sure not to set SO_RCVBUF at the same time >> (see sk->sk_userlocks & SOCK_RCVBUF_LOCK in mptcp_rcv_space_adjust()). >> >> With this fix, BPF sched selftests are now very stable, I run loop >> testing >> using mptcp-upstream-virtme-docker (run_loop run_bpftest_all), and can >> run >> it normally for hundreds of times without error: >> > > Hi Geliang - > > I can see how changing SO_SNDBUF on the sending socket side would shift > timing behavior in a way that affect the test outcome, but it doesn't > address the root issue with bug #487: > > It is either OK to get an EAGAIN from a blocking send(), or it's not OK. > > > If it's not ok to ever return EAGAIN from a blocking send, the existing > test code is a reproducer for a bug, and changing the test is hiding > that bug. > > If EAGAIN is ok, then we should change the code in send_recv_server() to > allow it. It is now a bit hidden in the middle of #487, but if I'm not mistaken, it is OK to get EAGAIN with a blocking send() **if** SO_SNDTIMEO is used, and in case of timeout. See: https://github.com/multipath-tcp/mptcp_net-next/issues/487#issuecomment-2485577676 So I think the question should be: is it normal to block for longer than the timeout period (which is a "long" period, no?)? If yes, then limiting the send buffer might be a solution, but as Mat said, it looks better to understand the root cause than hiding a bug :) Cheers, Matt -- Sponsored by the NGI0 Core fund.
Hi Matt, Mat,
On Sun, 2025-06-15 at 23:29 +0200, Matthieu Baerts wrote:
> Hi Mat, Geliang,
>
> On 14/06/2025 01:11, Mat Martineau wrote:
> > On Thu, 29 May 2025, Geliang Tang wrote:
> >
> > > From: Geliang Tang <tanggeliang@kylinos.cn>
> > >
> > > Good news! I finally solved the unstable issue of MPTCP BPF sched
> > > selftests
> > > I reported a year ago, #487 "send() fails with EAGAIN in blocking
> > > IO
> > > mode".
> > >
> > > The fix is simple, it can be solved by explicitly setting
> > > SO_SNDBUF
> > > sockopt, but be sure not to set SO_RCVBUF at the same time
> > > (see sk->sk_userlocks & SOCK_RCVBUF_LOCK in
> > > mptcp_rcv_space_adjust()).
> > >
> > > With this fix, BPF sched selftests are now very stable, I run
> > > loop
> > > testing
> > > using mptcp-upstream-virtme-docker (run_loop run_bpftest_all),
> > > and can
> > > run
> > > it normally for hundreds of times without error:
> > >
> >
> > Hi Geliang -
> >
> > I can see how changing SO_SNDBUF on the sending socket side would
> > shift
> > timing behavior in a way that affect the test outcome, but it
> > doesn't
> > address the root issue with bug #487:
> >
> > It is either OK to get an EAGAIN from a blocking send(), or it's
> > not OK.
> >
> >
> > If it's not ok to ever return EAGAIN from a blocking send, the
> > existing
> > test code is a reproducer for a bug, and changing the test is
> > hiding
> > that bug.
> >
> > If EAGAIN is ok, then we should change the code in
> > send_recv_server() to
> > allow it.
I did try to handle EAGAIN in send_recv_server() but it didn't work.
MPTCP BPF sched selftests still fail. Test code and results are
attached.
I added this in send_recv_server():
if (errno == EAGAIN && again < 5) {
again++;
continue;
}
And still got the EAGAIN error:
# (network_helpers.c:728: errno: Resource temporarily unavailable) send
7867500 expected 10485760
# (network_helpers.c:782: errno: Resource temporarily unavailable) recv
3469500 expected 10485760
# (network_helpers.c:790: errno: Resource temporarily unavailable)
Failed in thread_ret -11
# send_data_and_verify:FAIL:send_recv_data unexpected error: -11 (errno
11)
In addition, BPF selftests adds a new mechanism that does not allow any
test item to run for more than 10 seconds. Otherwise, the following
error will be reported:
# WATCHDOG: test case mptcp/default executes for 10 seconds...
In my testing, I have not found any other solution besides limiting the
send buffer. This allows data to be sent at a constant rate, which
ensures the stability of MPTCP BPF sched selftests.
In order to avoid hiding this bug, we can add a test item for this in
mptcp selftest in the future, like in [1].
WDYT?
Thanks,
-Geliang
[1]
https://patchwork.kernel.org/project/mptcp/cover/cover.1722502941.git.tanggeliang@kylinos.cn/
>
> It is now a bit hidden in the middle of #487, but if I'm not
> mistaken,
> it is OK to get EAGAIN with a blocking send() **if** SO_SNDTIMEO is
> used, and in case of timeout.
>
> See:
> https://github.com/multipath-tcp/mptcp_net-next/issues/487#issuecomment-2485577676
>
> So I think the question should be: is it normal to block for longer
> than
> the timeout period (which is a "long" period, no?)? If yes, then
> limiting the send buffer might be a solution, but as Mat said, it
> looks
> better to understand the root cause than hiding a bug :)
>
> Cheers,
> Matt
Hi Matt, Mat,
On Sun, 2025-06-15 at 23:29 +0200, Matthieu Baerts wrote:
> Hi Mat, Geliang,
>
> On 14/06/2025 01:11, Mat Martineau wrote:
> > On Thu, 29 May 2025, Geliang Tang wrote:
> >
> > > From: Geliang Tang <tanggeliang@kylinos.cn>
> > >
> > > Good news! I finally solved the unstable issue of MPTCP BPF sched
> > > selftests
> > > I reported a year ago, #487 "send() fails with EAGAIN in blocking
> > > IO
> > > mode".
> > >
> > > The fix is simple, it can be solved by explicitly setting
> > > SO_SNDBUF
> > > sockopt, but be sure not to set SO_RCVBUF at the same time
> > > (see sk->sk_userlocks & SOCK_RCVBUF_LOCK in
> > > mptcp_rcv_space_adjust()).
> > >
> > > With this fix, BPF sched selftests are now very stable, I run
> > > loop
> > > testing
> > > using mptcp-upstream-virtme-docker (run_loop run_bpftest_all),
> > > and can
> > > run
> > > it normally for hundreds of times without error:
> > >
> >
> > Hi Geliang -
> >
> > I can see how changing SO_SNDBUF on the sending socket side would
> > shift
> > timing behavior in a way that affect the test outcome, but it
> > doesn't
> > address the root issue with bug #487:
> >
> > It is either OK to get an EAGAIN from a blocking send(), or it's
> > not OK.
> >
> >
> > If it's not ok to ever return EAGAIN from a blocking send, the
> > existing
> > test code is a reproducer for a bug, and changing the test is
> > hiding
> > that bug.
> >
> > If EAGAIN is ok, then we should change the code in
> > send_recv_server() to
> > allow it.
I did try to handle EAGAIN in send_recv_server() but it didn't work.
MPTCP BPF sched selftests still fail. Test code and results are
attached.
I added this in send_recv_server():
if (errno == EAGAIN && again < 5) {
again++;
continue;
}
And still got the EAGAIN error:
# (network_helpers.c:728: errno: Resource temporarily unavailable) send
7867500 expected 10485760
# (network_helpers.c:782: errno: Resource temporarily unavailable) recv
3469500 expected 10485760
# (network_helpers.c:790: errno: Resource temporarily unavailable)
Failed in thread_ret -11
# send_data_and_verify:FAIL:send_recv_data unexpected error: -11 (errno
11)
In addition, BPF selftests adds a new mechanism that does not allow any
test item to run for more than 10 seconds. Otherwise, the following
error will be reported:
# WATCHDOG: test case mptcp/default executes for 10 seconds...
In my testing, I have not found any other solution besides limiting the
send buffer. This allows data to be sent at a constant rate, which
ensures the stability of MPTCP BPF sched selftests.
In order to avoid hiding this bug, we can add a test item for this in
mptcp selftest in the future, like in [1].
WDYT?
Thanks,
-Geliang
[1]
https://patchwork.kernel.org/project/mptcp/cover/cover.1722502941.git.tanggeliang@kylinos.cn/
>
> It is now a bit hidden in the middle of #487, but if I'm not
> mistaken,
> it is OK to get EAGAIN with a blocking send() **if** SO_SNDTIMEO is
> used, and in case of timeout.
>
> See:
> https://github.com/multipath-tcp/mptcp_net-next/issues/487#issuecomment-2485577676
>
> So I think the question should be: is it normal to block for longer
> than
> the timeout period (which is a "long" period, no?)? If yes, then
> limiting the send buffer might be a solution, but as Mat said, it
> looks
> better to understand the root cause than hiding a bug :)
>
> Cheers,
> Matt
Hi Mat, Matt,
On Mon, 2025-06-16 at 14:34 +0800, Geliang Tang wrote:
> Hi Matt, Mat,
>
> On Sun, 2025-06-15 at 23:29 +0200, Matthieu Baerts wrote:
> > Hi Mat, Geliang,
> >
> > On 14/06/2025 01:11, Mat Martineau wrote:
> > > On Thu, 29 May 2025, Geliang Tang wrote:
> > >
> > > > From: Geliang Tang <tanggeliang@kylinos.cn>
> > > >
> > > > Good news! I finally solved the unstable issue of MPTCP BPF
> > > > sched
> > > > selftests
> > > > I reported a year ago, #487 "send() fails with EAGAIN in
> > > > blocking
> > > > IO
> > > > mode".
> > > >
> > > > The fix is simple, it can be solved by explicitly setting
> > > > SO_SNDBUF
> > > > sockopt, but be sure not to set SO_RCVBUF at the same time
> > > > (see sk->sk_userlocks & SOCK_RCVBUF_LOCK in
> > > > mptcp_rcv_space_adjust()).
> > > >
> > > > With this fix, BPF sched selftests are now very stable, I run
> > > > loop
> > > > testing
> > > > using mptcp-upstream-virtme-docker (run_loop run_bpftest_all),
> > > > and can
> > > > run
> > > > it normally for hundreds of times without error:
> > > >
> > >
> > > Hi Geliang -
> > >
> > > I can see how changing SO_SNDBUF on the sending socket side would
> > > shift
> > > timing behavior in a way that affect the test outcome, but it
> > > doesn't
> > > address the root issue with bug #487:
> > >
> > > It is either OK to get an EAGAIN from a blocking send(), or it's
> > > not OK.
> > >
> > >
> > > If it's not ok to ever return EAGAIN from a blocking send, the
> > > existing
> > > test code is a reproducer for a bug, and changing the test is
> > > hiding
> > > that bug.
> > >
> > > If EAGAIN is ok, then we should change the code in
> > > send_recv_server() to
> > > allow it.
>
> I did try to handle EAGAIN in send_recv_server() but it didn't work.
> MPTCP BPF sched selftests still fail. Test code and results are
> attached.
>
> I added this in send_recv_server():
>
> if (errno == EAGAIN && again < 5) {
> again++;
> continue;
> }
>
> And still got the EAGAIN error:
>
> # (network_helpers.c:728: errno: Resource temporarily unavailable)
> send
> 7867500 expected 10485760
> # (network_helpers.c:782: errno: Resource temporarily unavailable)
> recv
> 3469500 expected 10485760
> # (network_helpers.c:790: errno: Resource temporarily unavailable)
> Failed in thread_ret -11
> # send_data_and_verify:FAIL:send_recv_data unexpected error: -11
> (errno
> 11)
>
> In addition, BPF selftests adds a new mechanism that does not allow
> any
> test item to run for more than 10 seconds. Otherwise, the following
> error will be reported:
>
> # WATCHDOG: test case mptcp/default executes for 10 seconds...
>
> In my testing, I have not found any other solution besides limiting
> the
> send buffer. This allows data to be sent at a constant rate, which
> ensures the stability of MPTCP BPF sched selftests.
>
> In order to avoid hiding this bug, we can add a test item for this in
> mptcp selftest in the future, like in [1].
Last week I debugged this issue further and found something (Thanks to
Gang Yan for his help): when mptcp bpf sched selftests fail, the memory
limit check (if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)) is always true
in __mptcp_move_skbs_from_subflow(), at this time sk->sk_receive_queue
is empty, but no skb is moved from this subflow to sk-
>sk_receive_queue, which causes the transmission to fail.
One fix is to also consider the case where sk->sk_receive_queue is
empty when doing the memory limit check:
- if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
+ if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf &&
+ !skb_queue_empty(&sk->sk_receive_queue))
break;
In addition, this memory limit check was moved from the end of do {}
while (more_data_avail) to the front in the commit e0ca4057e0ec
("mptcp: micro-optimize __mptcp_move_skb()"), so another better fix is
to restore this check to the end of do {} while () so that move skbs
from this subflow to sk->sk_receive_queue always has a chance to do at
least once:
@@ -587,9 +587,6 @@ static bool __mptcp_move_skbs_from_subflow(struct
mptcp_sock *msk,
struct sk_buff *skb;
bool fin;
- if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
- break;
-
/* try to move as much data as available */
map_remaining = subflow->map_data_len -
mptcp_subflow_get_map_offset(subflow);
@@ -634,6 +631,8 @@ static bool __mptcp_move_skbs_from_subflow(struct
mptcp_sock *msk,
WRITE_ONCE(tp->copied_seq, seq);
more_data_avail = mptcp_subflow_data_available(ssk);
+ if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
+ break;
} while (more_data_avail);
I think this is the root cause of #487, I would like to hear your
opinions on which fix is better.
Thanks,
-Geliang
>
> WDYT?
>
> Thanks,
> -Geliang
>
> [1]
> https://patchwork.kernel.org/project/mptcp/cover/cover.1722502941.git.tanggeliang@kylinos.cn/
>
> >
> > It is now a bit hidden in the middle of #487, but if I'm not
> > mistaken,
> > it is OK to get EAGAIN with a blocking send() **if** SO_SNDTIMEO is
> > used, and in case of timeout.
> >
> > See:
> > https://github.com/multipath-tcp/mptcp_net-next/issues/487#issuecomment-2485577676
> >
> > So I think the question should be: is it normal to block for longer
> > than
> > the timeout period (which is a "long" period, no?)? If yes, then
> > limiting the send buffer might be a solution, but as Mat said, it
> > looks
> > better to understand the root cause than hiding a bug :)
> >
> > Cheers,
> > Matt
Hi Geliang -
On Sun, 29 Jun 2025, Geliang Tang wrote:
> Hi Mat, Matt,
>
> On Mon, 2025-06-16 at 14:34 +0800, Geliang Tang wrote:
>> Hi Matt, Mat,
>>
>> On Sun, 2025-06-15 at 23:29 +0200, Matthieu Baerts wrote:
>>> Hi Mat, Geliang,
>>>
>>> On 14/06/2025 01:11, Mat Martineau wrote:
>>>> On Thu, 29 May 2025, Geliang Tang wrote:
>>>>
>>>>> From: Geliang Tang <tanggeliang@kylinos.cn>
>>>>>
>>>>> Good news! I finally solved the unstable issue of MPTCP BPF
>>>>> sched
>>>>> selftests
>>>>> I reported a year ago, #487 "send() fails with EAGAIN in
>>>>> blocking
>>>>> IO
>>>>> mode".
>>>>>
>>>>> The fix is simple, it can be solved by explicitly setting
>>>>> SO_SNDBUF
>>>>> sockopt, but be sure not to set SO_RCVBUF at the same time
>>>>> (see sk->sk_userlocks & SOCK_RCVBUF_LOCK in
>>>>> mptcp_rcv_space_adjust()).
>>>>>
>>>>> With this fix, BPF sched selftests are now very stable, I run
>>>>> loop
>>>>> testing
>>>>> using mptcp-upstream-virtme-docker (run_loop run_bpftest_all),
>>>>> and can
>>>>> run
>>>>> it normally for hundreds of times without error:
>>>>>
>>>>
>>>> Hi Geliang -
>>>>
>>>> I can see how changing SO_SNDBUF on the sending socket side would
>>>> shift
>>>> timing behavior in a way that affect the test outcome, but it
>>>> doesn't
>>>> address the root issue with bug #487:
>>>>
>>>> It is either OK to get an EAGAIN from a blocking send(), or it's
>>>> not OK.
>>>>
>>>>
>>>> If it's not ok to ever return EAGAIN from a blocking send, the
>>>> existing
>>>> test code is a reproducer for a bug, and changing the test is
>>>> hiding
>>>> that bug.
>>>>
>>>> If EAGAIN is ok, then we should change the code in
>>>> send_recv_server() to
>>>> allow it.
>>
>> I did try to handle EAGAIN in send_recv_server() but it didn't work.
>> MPTCP BPF sched selftests still fail. Test code and results are
>> attached.
>>
>> I added this in send_recv_server():
>>
>> if (errno == EAGAIN && again < 5) {
>> again++;
>> continue;
>> }
>>
>> And still got the EAGAIN error:
>>
>> # (network_helpers.c:728: errno: Resource temporarily unavailable)
>> send
>> 7867500 expected 10485760
>> # (network_helpers.c:782: errno: Resource temporarily unavailable)
>> recv
>> 3469500 expected 10485760
>> # (network_helpers.c:790: errno: Resource temporarily unavailable)
>> Failed in thread_ret -11
>> # send_data_and_verify:FAIL:send_recv_data unexpected error: -11
>> (errno
>> 11)
>>
>> In addition, BPF selftests adds a new mechanism that does not allow
>> any
>> test item to run for more than 10 seconds. Otherwise, the following
>> error will be reported:
>>
>> # WATCHDOG: test case mptcp/default executes for 10 seconds...
>>
>> In my testing, I have not found any other solution besides limiting
>> the
>> send buffer. This allows data to be sent at a constant rate, which
>> ensures the stability of MPTCP BPF sched selftests.
>>
>> In order to avoid hiding this bug, we can add a test item for this in
>> mptcp selftest in the future, like in [1].
>
> Last week I debugged this issue further and found something (Thanks to
> Gang Yan for his help): when mptcp bpf sched selftests fail, the memory
> limit check (if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)) is always true
> in __mptcp_move_skbs_from_subflow(), at this time sk->sk_receive_queue
> is empty,
Thanks for tracking down some more details!
> but no skb is moved from this subflow to sk->
> sk_receive_queue, which causes the transmission to fail.
>
I'm trying to understand this part (above) better.
__mptcp_move_skbs_from_subflow() is on the receive path. What is the path
to causing transmission to fail? Does exiting the loop early on the rx
side leave ssk->sk_err set, so an error propagates in the msk and
sendmsg() returns early with EAGAIN?
> One fix is to also consider the case where sk->sk_receive_queue is
> empty when doing the memory limit check:
>
> - if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
> + if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf &&
> + !skb_queue_empty(&sk->sk_receive_queue))
> break;
The msk sk_receive_queue is probably empty a lot of the time, does it look
like the empty queue is related to the failure?
>
> In addition, this memory limit check was moved from the end of do {}
> while (more_data_avail) to the front in the commit e0ca4057e0ec
> ("mptcp: micro-optimize __mptcp_move_skb()"), so another better fix is
> to restore this check to the end of do {} while () so that move skbs
> from this subflow to sk->sk_receive_queue always has a chance to do at
> least once:
>
> @@ -587,9 +587,6 @@ static bool __mptcp_move_skbs_from_subflow(struct
> mptcp_sock *msk,
> struct sk_buff *skb;
> bool fin;
>
> - if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
> - break;
> -
> /* try to move as much data as available */
> map_remaining = subflow->map_data_len -
> mptcp_subflow_get_map_offset(subflow);
> @@ -634,6 +631,8 @@ static bool __mptcp_move_skbs_from_subflow(struct
> mptcp_sock *msk,
> WRITE_ONCE(tp->copied_seq, seq);
> more_data_avail = mptcp_subflow_data_available(ssk);
>
> + if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf)
> + break;
> } while (more_data_avail);
>
> I think this is the root cause of #487, I would like to hear your
> opinions on which fix is better.
I have a slight preference for the second possible fix since it still
provides an exit from the loop even when the sk_receive_queue is empty.
Can you send that patch?
Before we merge a fix I'm still hoping to understand how letting this loop
body run once on the rx path is changing the tx side behavior!
Thanks,
Mat
>
> Thanks,
> -Geliang
>
>>
>> WDYT?
>>
>> Thanks,
>> -Geliang
>>
>> [1]
>> https://patchwork.kernel.org/project/mptcp/cover/cover.1722502941.git.tanggeliang@kylinos.cn/
>>
>>>
>>> It is now a bit hidden in the middle of #487, but if I'm not
>>> mistaken,
>>> it is OK to get EAGAIN with a blocking send() **if** SO_SNDTIMEO is
>>> used, and in case of timeout.
>>>
>>> See:
>>> https://github.com/multipath-tcp/mptcp_net-next/issues/487#issuecomment-2485577676
>>>
>>> So I think the question should be: is it normal to block for longer
>>> than
>>> the timeout period (which is a "long" period, no?)? If yes, then
>>> limiting the send buffer might be a solution, but as Mat said, it
>>> looks
>>> better to understand the root cause than hiding a bug :)
>>>
>>> Cheers,
>>> Matt
>
Hi Geliang,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal: Success! ✅
- KVM Validation: debug: Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/15316193959
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/8bae6145adeb
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=967210
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-normal
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
© 2016 - 2025 Red Hat, Inc.