net/ipv4/tcp_output.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
When CONFIG_HZ defaults to 1000Hz and the network transmission time is
less than 1ms, lsndtime and lrcvtime are likely to be equal, which will
lead to hundreds of interactions before entering pingpong mode.
Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: LemmyHuang <hlm3280@163.com>
---
v2:
* Use !after() wrapping the values. (Jakub Kicinski)
v1: https://lore.kernel.org/netdev/20220719130136.11907-1-hlm3280@163.com/
---
net/ipv4/tcp_output.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 858a15cc2..c1c95dc40 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -172,7 +172,7 @@ static void tcp_event_data_sent(struct tcp_sock *tp,
* and it is a reply for ato after last received packet,
* increase pingpong count.
*/
- if (before(tp->lsndtime, icsk->icsk_ack.lrcvtime) &&
+ if (!after(tp->lsndtime, icsk->icsk_ack.lrcvtime) &&
(u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato)
inet_csk_inc_pingpong_cnt(sk);
--
2.27.0
On Wed, Jul 20, 2022 at 3:25 AM LemmyHuang <hlm3280@163.com> wrote:
>
> When CONFIG_HZ defaults to 1000Hz and the network transmission time is
> less than 1ms, lsndtime and lrcvtime are likely to be equal, which will
> lead to hundreds of interactions before entering pingpong mode.
>
> Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3")
> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: LemmyHuang <hlm3280@163.com>
> ---
> v2:
> * Use !after() wrapping the values. (Jakub Kicinski)
>
> v1: https://lore.kernel.org/netdev/20220719130136.11907-1-hlm3280@163.com/
> ---
> net/ipv4/tcp_output.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 858a15cc2..c1c95dc40 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -172,7 +172,7 @@ static void tcp_event_data_sent(struct tcp_sock *tp,
> * and it is a reply for ato after last received packet,
> * increase pingpong count.
> */
> - if (before(tp->lsndtime, icsk->icsk_ack.lrcvtime) &&
> + if (!after(tp->lsndtime, icsk->icsk_ack.lrcvtime) &&
> (u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato)
> inet_csk_inc_pingpong_cnt(sk);
>
> --
Thanks for pointing out this problem!
AFAICT this patch would result in incorrect behavior.
With this patch, we could have cases where tp->lsndtime ==
icsk->icsk_ack.lrcvtime and (u32)(now - icsk->icsk_ack.lrcvtime) <
icsk->icsk_ack.ato and yet we do not really have a ping-pong exchange.
For example, with this patch we could have:
T1: jiffies=J1; host B receives RPC request from host A
T2: jiffies=J1; host B sends first RPC response data packet to host A;
-> calls inet_csk_inc_pingpong_cnt()
T3: jiffies=J1; host B sends second RPC response data packet to host A;
-> calls inet_csk_inc_pingpong_cnt()
In this scenario there is only one ping-pong exchange but the code
calls inet_csk_inc_pingpong_cnt() twice.
So I'm hoping we can come up with a better fix.
A simpler approach might be to simplify the model and go back to
having a single ping-pong interaction cause delayed ACKs to be enabled
on a connection endpoint. Our team has been seeing good results for a
while with the simpler approach. What do folks think?
neal
At 2022-07-21 02:49:35, "Neal Cardwell" <ncardwell@google.com> wrote:
> On Wed, Jul 20, 2022 at 3:25 AM LemmyHuang <hlm3280@163.com> wrote:
>>
>> When CONFIG_HZ defaults to 1000Hz and the network transmission time is
>> less than 1ms, lsndtime and lrcvtime are likely to be equal, which will
>> lead to hundreds of interactions before entering pingpong mode.
>>
>> Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3")
>> Suggested-by: Jakub Kicinski <kuba@kernel.org>
>> Signed-off-by: LemmyHuang <hlm3280@163.com>
>> ---
>> v2:
>> * Use !after() wrapping the values. (Jakub Kicinski)
>>
>> v1: https://lore.kernel.org/netdev/20220719130136.11907-1-hlm3280@163.com/
>> ---
>> net/ipv4/tcp_output.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> index 858a15cc2..c1c95dc40 100644
>> --- a/net/ipv4/tcp_output.c
>> +++ b/net/ipv4/tcp_output.c
>> @@ -172,7 +172,7 @@ static void tcp_event_data_sent(struct tcp_sock *tp,
>> * and it is a reply for ato after last received packet,
>> * increase pingpong count.
>> */
>> - if (before(tp->lsndtime, icsk->icsk_ack.lrcvtime) &&
>> + if (!after(tp->lsndtime, icsk->icsk_ack.lrcvtime) &&
>> (u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato)
>> inet_csk_inc_pingpong_cnt(sk);
>>
>> --
>
> Thanks for pointing out this problem!
>
> AFAICT this patch would result in incorrect behavior.
>
> With this patch, we could have cases where tp->lsndtime ==
> icsk->icsk_ack.lrcvtime and (u32)(now - icsk->icsk_ack.lrcvtime) <
> icsk->icsk_ack.ato and yet we do not really have a ping-pong exchange.
>
> For example, with this patch we could have:
>
> T1: jiffies=J1; host B receives RPC request from host A
> T2: jiffies=J1; host B sends first RPC response data packet to host A;
> -> calls inet_csk_inc_pingpong_cnt()
> T3: jiffies=J1; host B sends second RPC response data packet to host A;
> -> calls inet_csk_inc_pingpong_cnt()
>
> In this scenario there is only one ping-pong exchange but the code
> calls inet_csk_inc_pingpong_cnt() twice.
>
> So I'm hoping we can come up with a better fix.
>
> A simpler approach might be to simplify the model and go back to
> having a single ping-pong interaction cause delayed ACKs to be enabled
> on a connection endpoint. Our team has been seeing good results for a
> while with the simpler approach. What do folks think?
>
>
> neal
It seems better to go back.
Look at this revert patch:
https://lore.kernel.org/netdev/20220720233156.295074-1-weiwan@google.com/
© 2016 - 2026 Red Hat, Inc.