[PATCH mptcp-net] mptcp: fix EAGAIN errors in MPTCP BPF tests

Geliang Tang posted 1 patch 2 weeks, 3 days ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/multipath-tcp/mptcp_net-next tags/patchew/4538e7bca21fdb7f997c026d74f886aedb547d39.1714654631.git.tanggeliang@kylinos.cn
net/ipv4/tcp_input.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
[PATCH mptcp-net] mptcp: fix EAGAIN errors in MPTCP BPF tests
Posted by Geliang Tang 2 weeks, 3 days ago
From: Geliang Tang <tanggeliang@kylinos.cn>

BPF tests fail sometimes with "bytes != total_bytes" errors:

 test_default:PASS:sched_init:default 0 nsec
 send_data:PASS:pthread_create 0 nsec
 send_data:FAIL:recv 936000 != 10485760 nr_recv:-1 errno:11
 default: 3041 ms
 server:FAIL:send 7579500 != 10485760 nr_sent:-1 errno:11
 send_data:FAIL:pthread_join thread_ret:-11 test_default:PASS: \
				has_bytes_sent addr_1 0 nsec
 test_default:PASS:has_bytes_sent addr_2 0 nsec
 close_netns:PASS:setns 0 nsec

In this case mptcp_recvmsg gets EAGAIN errors. This issue introduces
by commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale").
This patch fixes it by adding sk_is_mptcp() check defore update
scaling_ratio in tcp_measure_rcv_mss(). Do not update scaling_ratio if
this is a MPTCP socket.

Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/487
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
 net/ipv4/tcp_input.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index ad8fa129fcfe..39237d6713ee 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -235,8 +235,10 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb)
 		/* Note: divides are still a bit expensive.
 		 * For the moment, only adjust scaling_ratio
 		 * when we update icsk_ack.rcv_mss.
+		 *
+		 * This breaks MPTCP BPF tests, skip it.
 		 */
-		if (unlikely(len != icsk->icsk_ack.rcv_mss)) {
+		if (unlikely(len != icsk->icsk_ack.rcv_mss && !sk_is_mptcp(sk))) {
 			u64 val = (u64)skb->len << TCP_RMEM_TO_WIN_SCALE;
 
 			do_div(val, skb->truesize);
-- 
2.43.0
Re: [PATCH mptcp-net] mptcp: fix EAGAIN errors in MPTCP BPF tests
Posted by Mat Martineau 1 week, 5 days ago
On Thu, 2 May 2024, Geliang Tang wrote:

> From: Geliang Tang <tanggeliang@kylinos.cn>
>
> BPF tests fail sometimes with "bytes != total_bytes" errors:
>
> test_default:PASS:sched_init:default 0 nsec
> send_data:PASS:pthread_create 0 nsec
> send_data:FAIL:recv 936000 != 10485760 nr_recv:-1 errno:11
> default: 3041 ms
> server:FAIL:send 7579500 != 10485760 nr_sent:-1 errno:11
> send_data:FAIL:pthread_join thread_ret:-11 test_default:PASS: \
> 				has_bytes_sent addr_1 0 nsec
> test_default:PASS:has_bytes_sent addr_2 0 nsec
> close_netns:PASS:setns 0 nsec
>
> In this case mptcp_recvmsg gets EAGAIN errors. This issue introduces
> by commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale").
> This patch fixes it by adding sk_is_mptcp() check defore update
> scaling_ratio in tcp_measure_rcv_mss(). Do not update scaling_ratio if
> this is a MPTCP socket.
>
> Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/487
> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> ---
> net/ipv4/tcp_input.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index ad8fa129fcfe..39237d6713ee 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -235,8 +235,10 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb)
> 		/* Note: divides are still a bit expensive.
> 		 * For the moment, only adjust scaling_ratio
> 		 * when we update icsk_ack.rcv_mss.
> +		 *
> +		 * This breaks MPTCP BPF tests, skip it.
> 		 */
> -		if (unlikely(len != icsk->icsk_ack.rcv_mss)) {
> +		if (unlikely(len != icsk->icsk_ack.rcv_mss && !sk_is_mptcp(sk))) {


Hi Geliang -

It's not clear to me why avoiding scaling ratio changes when

     len > icsk->icsk_ack.rcv_mss

is the right solution to the problem. From the conditional right above 
this diff, it looks like scaling_ratio still could change when

     len == icsk->icsk_ack.rcv_mss

Does the MPTCP code handle that case, or is that not exercised by the 
tests? Are you sure there isn't an issue with the MPTCP code where it 
should be able to handle the TCP layer updating scaling_ratios on 
different subflows?

Paolo do you have any suggestions or notes?


Thanks,
Mat

> 			u64 val = (u64)skb->len << TCP_RMEM_TO_WIN_SCALE;
>
> 			do_div(val, skb->truesize);
> -- 
> 2.43.0
>
>
>
Re: [PATCH mptcp-net] mptcp: fix EAGAIN errors in MPTCP BPF tests
Posted by Geliang Tang 1 week, 5 days ago
On Mon, 2024-05-06 at 18:02 -0700, Mat Martineau wrote:
> On Thu, 2 May 2024, Geliang Tang wrote:
> 
> > From: Geliang Tang <tanggeliang@kylinos.cn>
> > 
> > BPF tests fail sometimes with "bytes != total_bytes" errors:
> > 
> > test_default:PASS:sched_init:default 0 nsec
> > send_data:PASS:pthread_create 0 nsec
> > send_data:FAIL:recv 936000 != 10485760 nr_recv:-1 errno:11
> > default: 3041 ms
> > server:FAIL:send 7579500 != 10485760 nr_sent:-1 errno:11
> > send_data:FAIL:pthread_join thread_ret:-11 test_default:PASS: \
> > 				has_bytes_sent addr_1 0 nsec
> > test_default:PASS:has_bytes_sent addr_2 0 nsec
> > close_netns:PASS:setns 0 nsec
> > 
> > In this case mptcp_recvmsg gets EAGAIN errors. This issue
> > introduces
> > by commit dfa2f0483360 ("tcp: get rid of
> > sysctl_tcp_adv_win_scale").
> > This patch fixes it by adding sk_is_mptcp() check defore update
> > scaling_ratio in tcp_measure_rcv_mss(). Do not update scaling_ratio
> > if
> > this is a MPTCP socket.
> > 
> > Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
> > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/487
> > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> > ---
> > net/ipv4/tcp_input.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index ad8fa129fcfe..39237d6713ee 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -235,8 +235,10 @@ static void tcp_measure_rcv_mss(struct sock
> > *sk, const struct sk_buff *skb)
> > 		/* Note: divides are still a bit expensive.
> > 		 * For the moment, only adjust scaling_ratio
> > 		 * when we update icsk_ack.rcv_mss.
> > +		 *
> > +		 * This breaks MPTCP BPF tests, skip it.
> > 		 */
> > -		if (unlikely(len != icsk->icsk_ack.rcv_mss)) {
> > +		if (unlikely(len != icsk->icsk_ack.rcv_mss &&
> > !sk_is_mptcp(sk))) {
> 
> 
> Hi Geliang -
> 
> It's not clear to me why avoiding scaling ratio changes when
> 
>      len > icsk->icsk_ack.rcv_mss
> 
> is the right solution to the problem. From the conditional right
> above 
> this diff, it looks like scaling_ratio still could change when
> 
>      len == icsk->icsk_ack.rcv_mss
> 
> Does the MPTCP code handle that case, or is that not exercised by the
> tests? Are you sure there isn't an issue with the MPTCP code where it
> should be able to handle the TCP layer updating scaling_ratios on 
> different subflows?
> 
> Paolo do you have any suggestions or notes?

Thanks Mat, I finally found the root cause of this issue. It is indeed
an MPTCP bug. And v2 has just sent out, renamed as "fix the default
value of scaling_ratio".

-Geliang

> 
> 
> Thanks,
> Mat
> 
> > 			u64 val = (u64)skb->len <<
> > TCP_RMEM_TO_WIN_SCALE;
> > 
> > 			do_div(val, skb->truesize);
> > -- 
> > 2.43.0
> > 
> > 
> > 
>