net/ipv4/tcp_minisocks.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
The same 5-tuple packet may be processed by different CPUSs, so two
CPUs may receive different ack packets at the same time when the
state is TCP_NEW_SYN_RECV.
In that case, req->ts_recent in tcp_check_req may be changed concurrently,
which will probably cause the newsk's ts_recent to be incorrectly large.
So that tcp_validate_incoming will fail.
cpu1 cpu2
tcp_check_req
tcp_check_req
req->ts_recent = rcv_tsval = t1
req->ts_recent = rcv_tsval = t2
syn_recv_sock
newsk->ts_recent = req->ts_recent = t2 // t1 < t2
tcp_child_process
tcp_rcv_state_process
tcp_validate_incoming
tcp_paws_check
if ((s32)(rx_opt->ts_recent - rx_opt->rcv_tsval) <= paws_win)
// t2 - t1 > paws_win, failed
In tcp_check_req, Defer ts_recent changes to this skb's to fix this bug.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
---
v1->v2: Modified the fix logic based on Eric's suggestion. Also modified the msg
net/ipv4/tcp_minisocks.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index b089b08e9617..53700206f498 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -815,12 +815,6 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
/* In sequence, PAWS is OK. */
- /* TODO: We probably should defer ts_recent change once
- * we take ownership of @req.
- */
- if (tmp_opt.saw_tstamp && !after(TCP_SKB_CB(skb)->seq, tcp_rsk(req)->rcv_nxt))
- WRITE_ONCE(req->ts_recent, tmp_opt.rcv_tsval);
-
if (TCP_SKB_CB(skb)->seq == tcp_rsk(req)->rcv_isn) {
/* Truncate SYN, it is out of window starting
at tcp_rsk(req)->rcv_isn + 1. */
@@ -869,6 +863,9 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
if (!child)
goto listen_overflow;
+ if (own_req && tmp_opt.saw_tstamp && !after(TCP_SKB_CB(skb)->seq, tcp_rsk(req)->rcv_nxt))
+ tcp_sk(child)->rx_opt.ts_recent = tmp_opt.rcv_tsval;
+
if (own_req && rsk_drop_req(req)) {
reqsk_queue_removed(&inet_csk(req->rsk_listener)->icsk_accept_queue, req);
inet_csk_reqsk_queue_drop_and_put(req->rsk_listener, req);
--
2.17.1
On Sat, Feb 22, 2025 at 11:41 AM Wang Hai <wanghai38@huawei.com> wrote:
>
> The same 5-tuple packet may be processed by different CPUSs, so two
> CPUs may receive different ack packets at the same time when the
> state is TCP_NEW_SYN_RECV.
>
> In that case, req->ts_recent in tcp_check_req may be changed concurrently,
> which will probably cause the newsk's ts_recent to be incorrectly large.
> So that tcp_validate_incoming will fail.
>
> cpu1 cpu2
> tcp_check_req
> tcp_check_req
> req->ts_recent = rcv_tsval = t1
> req->ts_recent = rcv_tsval = t2
>
> syn_recv_sock
> newsk->ts_recent = req->ts_recent = t2 // t1 < t2
> tcp_child_process
> tcp_rcv_state_process
> tcp_validate_incoming
> tcp_paws_check
> if ((s32)(rx_opt->ts_recent - rx_opt->rcv_tsval) <= paws_win)
> // t2 - t1 > paws_win, failed
>
> In tcp_check_req, Defer ts_recent changes to this skb's to fix this bug.
I think this sentence is a bit misleading.
What your patch does is to no longer change req->ts_recent,
but conditionally update tcp_sk(child)->rx_opt.ts_recent
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: Wang Hai <wanghai38@huawei.com>
> ---
> v1->v2: Modified the fix logic based on Eric's suggestion. Also modified the msg
> net/ipv4/tcp_minisocks.c | 9 +++------
> 1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> index b089b08e9617..53700206f498 100644
> --- a/net/ipv4/tcp_minisocks.c
> +++ b/net/ipv4/tcp_minisocks.c
> @@ -815,12 +815,6 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
>
> /* In sequence, PAWS is OK. */
>
> - /* TODO: We probably should defer ts_recent change once
> - * we take ownership of @req.
> - */
> - if (tmp_opt.saw_tstamp && !after(TCP_SKB_CB(skb)->seq, tcp_rsk(req)->rcv_nxt))
> - WRITE_ONCE(req->ts_recent, tmp_opt.rcv_tsval);
> -
> if (TCP_SKB_CB(skb)->seq == tcp_rsk(req)->rcv_isn) {
> /* Truncate SYN, it is out of window starting
> at tcp_rsk(req)->rcv_isn + 1. */
> @@ -869,6 +863,9 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
> if (!child)
> goto listen_overflow;
>
> + if (own_req && tmp_opt.saw_tstamp && !after(TCP_SKB_CB(skb)->seq, tcp_rsk(req)->rcv_nxt))
> + tcp_sk(child)->rx_opt.ts_recent = tmp_opt.rcv_tsval;
> +
Please split this long line.
if (own_req && tmp_opt.saw_tstamp &&
!after(TCP_SKB_CB(skb)->seq, tcp_rsk(req)->rcv_nxt))
On Sun, Feb 23, 2025 at 8:36 AM Eric Dumazet <edumazet@google.com> wrote: > > On Sat, Feb 22, 2025 at 11:41 AM Wang Hai <wanghai38@huawei.com> wrote: > > > > The same 5-tuple packet may be processed by different CPUSs, so two > > CPUs may receive different ack packets at the same time when the > > state is TCP_NEW_SYN_RECV. > > > > In that case, req->ts_recent in tcp_check_req may be changed concurrently, > > which will probably cause the newsk's ts_recent to be incorrectly large. > > So that tcp_validate_incoming will fail. > > > > cpu1 cpu2 > > tcp_check_req > > tcp_check_req > > req->ts_recent = rcv_tsval = t1 > > req->ts_recent = rcv_tsval = t2 > > > > syn_recv_sock > > newsk->ts_recent = req->ts_recent = t2 // t1 < t2 > > tcp_child_process > > tcp_rcv_state_process > > tcp_validate_incoming > > tcp_paws_check > > if ((s32)(rx_opt->ts_recent - rx_opt->rcv_tsval) <= paws_win) > > // t2 - t1 > paws_win, failed > > > > In tcp_check_req, Defer ts_recent changes to this skb's to fix this bug. > > I think this sentence is a bit misleading. > > What your patch does is to no longer change req->ts_recent, > but conditionally update tcp_sk(child)->rx_opt.ts_recent Also please change the patch title. The fix is about not changing req->ts_recent at all.
On Sat, Feb 22, 2025 at 6:41 PM Wang Hai <wanghai38@huawei.com> wrote:
>
> The same 5-tuple packet may be processed by different CPUSs, so two
> CPUs may receive different ack packets at the same time when the
> state is TCP_NEW_SYN_RECV.
>
> In that case, req->ts_recent in tcp_check_req may be changed concurrently,
> which will probably cause the newsk's ts_recent to be incorrectly large.
> So that tcp_validate_incoming will fail.
>
> cpu1 cpu2
> tcp_check_req
> tcp_check_req
> req->ts_recent = rcv_tsval = t1
> req->ts_recent = rcv_tsval = t2
>
> syn_recv_sock
> newsk->ts_recent = req->ts_recent = t2 // t1 < t2
> tcp_child_process
> tcp_rcv_state_process
> tcp_validate_incoming
> tcp_paws_check
> if ((s32)(rx_opt->ts_recent - rx_opt->rcv_tsval) <= paws_win)
> // t2 - t1 > paws_win, failed
>
> In tcp_check_req, Defer ts_recent changes to this skb's to fix this bug.
Honestly, from my perspective, the commit message doesn't actually
reflect what the real problem you've encountered is and what the
potential bad result could be. Your previous reply is good and
detailed, at least showing to the readers enough information to help
them revisit or analyze in the future.
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: Wang Hai <wanghai38@huawei.com>
Otherwise, it looks good to me. Thanks!
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
> ---
> v1->v2: Modified the fix logic based on Eric's suggestion. Also modified the msg
> net/ipv4/tcp_minisocks.c | 9 +++------
> 1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> index b089b08e9617..53700206f498 100644
> --- a/net/ipv4/tcp_minisocks.c
> +++ b/net/ipv4/tcp_minisocks.c
> @@ -815,12 +815,6 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
>
> /* In sequence, PAWS is OK. */
>
> - /* TODO: We probably should defer ts_recent change once
> - * we take ownership of @req.
> - */
> - if (tmp_opt.saw_tstamp && !after(TCP_SKB_CB(skb)->seq, tcp_rsk(req)->rcv_nxt))
> - WRITE_ONCE(req->ts_recent, tmp_opt.rcv_tsval);
> -
> if (TCP_SKB_CB(skb)->seq == tcp_rsk(req)->rcv_isn) {
> /* Truncate SYN, it is out of window starting
> at tcp_rsk(req)->rcv_isn + 1. */
> @@ -869,6 +863,9 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
> if (!child)
> goto listen_overflow;
>
> + if (own_req && tmp_opt.saw_tstamp && !after(TCP_SKB_CB(skb)->seq, tcp_rsk(req)->rcv_nxt))
> + tcp_sk(child)->rx_opt.ts_recent = tmp_opt.rcv_tsval;
> +
nit: I would suggest using the following format if a re-spin is necessary:
+ if (own_req && tmp_opt.saw_tstamp &&
+ !after(TCP_SKB_CB(skb)->seq, tcp_rsk(req)->rcv_nxt))
+ tcp_sk(child)->rx_opt.ts_recent = tmp_opt.rcv_tsval;
+
Thanks,
Jason
> if (own_req && rsk_drop_req(req)) {
> reqsk_queue_removed(&inet_csk(req->rsk_listener)->icsk_accept_queue, req);
> inet_csk_reqsk_queue_drop_and_put(req->rsk_listener, req);
> --
> 2.17.1
>
© 2016 - 2026 Red Hat, Inc.