From nobody Sun Mar 22 09:49:49 2026 Received: from mail-ot1-f49.google.com (mail-ot1-f49.google.com [209.85.210.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 39CA538424A for ; Sat, 14 Mar 2026 20:14:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519295; cv=none; b=JAPY/XJl+y67uCF+qZ7a/ipQ0KR1n74r1JW7VjRWBDWfMS4lx0s9TGMGr20pTYByyS2CtyPPOb2VQheAFqNtZk0SWaSLZcefYzsOKQamVppnFLOC2RM9LxCaEzWzylJzGHICK37+WVKhBtjFbpIN7UFIUOT06J1OijOa1ddkkzw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519295; c=relaxed/simple; bh=ww0RG9/Q0OVKyskNxv3bbhNoq1jg/FnsAuo6CN8DsUQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hahrg6nM+TOujL3FS/8t/DsFK2D1HO+MEB9gjUvdqXLBNHIVdSpcapnoM+Nyh7G7iYjv13Re9b8dwIyLTGX7Z8QZPwU8d3CNAOS0E6VREZTCcWQTAljQf3JVUclGFVebLrX4cwhRcUu+glVEaLuiiHniMZI7TfcMG6rSZQ08b9w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=AOWC0qXh; arc=none smtp.client-ip=209.85.210.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="AOWC0qXh" Received: by mail-ot1-f49.google.com with SMTP id 46e09a7af769-7d7851e2cc4so1741467a34.3 for ; Sat, 14 Mar 2026 13:14:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773519292; x=1774124092; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=LSj81+CF0f4yFIE+he2UWBsfbEKSyH1G8KBdb3oc/PY=; b=AOWC0qXh+17dY5zV3bR0wBdty1DN4PVf4/qwtPFcaS+XM1Djn8KeVi5gDTy96aBuvq xxTCXRv1DcXqaZS1iOaiahU5QeEVdTPt29VzILWyJLSwn4/Of6Hvu9SeiVuibgzSy1UI BTkqBFEUPrRKibNQJ8EiR0USjSx1NjQfRAcsYaZor/av3f4jS7NJys2xlYBY4h0bKa3O NFewcN29iIS2T+Zwb28EGxipknPGSw0YQ/2R+Ccs7H8fBhzR3WaO3SMRv0kxt7sShsdy gfo17VuqIp6fauojTs7NUc3YQLhSfSPecz8x/2LMVqMMBIU+6c6jcx9RS2GgbprsTzYW xgtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773519292; x=1774124092; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=LSj81+CF0f4yFIE+he2UWBsfbEKSyH1G8KBdb3oc/PY=; b=WJjENXS+cwLaSyeKNKZ7YpdZbyvT86v+fkt+DY8Wt2Z73+capvY105sCNOkgXc21+2 rNOSuPyJb6pifqSLVLz80Ic3lL5jY7HdiLik8vVONLsOQkqn6OVeLXaCY1m8sNUK9Hpg xJhP6jnNLAznfDfjWTMfI5phtVEkiCGT7Uzjla8PUBevUt7i2ihjdh2IyQlq0l/KUZiH 0es3N86lJwm1js3vVbRu3vFaXqsYWGHsDPuLusqYuXtlheXyoLRdxKR7HkMVTRDMTW7J iQ52/wNTMOhkS52+8bGw9pj6PkbP+MHEG8Jq6KUV3O9J1jz/Jc6LLAG2EWLh+a2SeFNj KYzA== X-Gm-Message-State: AOJu0YyvQaO/+wVbHneWGh1V6TlXsmZQ5Xq6p6+cCVoVhjCi8g6TupR1 0q44hBr0nFwZaPwzBoFOmLsTiojx3xfS79yFhrGvejl3sVGmQ2AcTOT6 X-Gm-Gg: ATEYQzxCl8uBN8KYoyLLeZcJBBynZdkSLZZLZdyXim6SRUdNm0mk9DVspqSAlu/oazm 97qqtbf9ui0zHBTClBKyEL7Mufopj9haYtG1fXUIXaHZErfqygIMvRilc1+tyyrSa91xjMUmcv5 iPCrbN0iIaSJLlzRUnw0MX5WNoyFFASV7DqOriYBp+5iP+vKqlvDV+OQz83CxfMcwhO36jkVnGy KDtzx8Ub96z7tDqWmPYElLaNO5/h2C6jk8PDpz1tjaXS6magIqXTX1xSQZcd0NBrf6F3wfpzgwi U1lDF4xDoy1XKO7Qvz9jZ41eo8RPPw3HtA9QCatwrIkElmOG7iZWNRO989FiKb0/dGIZUjxTfrH Yl7dI5azjFAq6G5J25rsbyDhTcInnHccPluOo/rWhRagkqLBPa6CF7cpu/NIS9eBLBnZ1NtoPKZ GQ4oOYbV1lfr9sTQhRKTkXkvRIQljeHIGJWEpuxG9GiItmUwtXPYZCxqCfpTEW3xuAenurzl+WB Dr64kyPXio7FxbjegvZZe0T39kInJEpFwkcHGt2Y5raRWHWRho= X-Received: by 2002:a05:6870:309:b0:40e:95b9:40e6 with SMTP id 586e51a60fabf-417b946c51fmr4109655fac.40.1773519292142; Sat, 14 Mar 2026 13:14:52 -0700 (PDT) Received: from Atwell-Laptop.. (108-212-132-20.lightspeed.irvnca.sbcglobal.net. [108.212.132.20]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-4177e5e8185sm11914165fac.12.2026.03.14.13.14.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Mar 2026 13:14:51 -0700 (PDT) From: atwellwea@gmail.com To: netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, ncardwell@google.com Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-trace-kernel@vger.kernel.org, mptcp@lists.linux.dev, dsahern@kernel.org, horms@kernel.org, kuniyu@google.com, andrew+netdev@lunn.ch, willemdebruijn.kernel@gmail.com, jasowang@redhat.com, skhan@linuxfoundation.org, corbet@lwn.net, matttbe@kernel.org, martineau@kernel.org, geliang@kernel.org, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, 0x7f454c46@gmail.com Subject: [PATCH net-next v2 07/14] tcp: honor the maximum advertised window after live retraction Date: Sat, 14 Mar 2026 14:13:41 -0600 Message-ID: <20260314201348.1786972-8-atwellwea@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260314201348.1786972-1-atwellwea@gmail.com> References: <20260314201348.1786972-1-atwellwea@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wesley Atwell If receive-side accounting retracts the live rwnd below a larger sender-visible window that was already advertised, allow one in-order skb within that historical bound to repair its backing and reach the normal receive path. Hard receive-memory admission is still enforced through the existing prune and collapse path. The rescue only changes how data already inside sender-visible sequence space is classified and backed. Signed-off-by: Wesley Atwell --- net/ipv4/tcp_input.c | 92 +++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 86 insertions(+), 6 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index d76e4e4c0e57..4b9309c37e99 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5376,24 +5376,86 @@ static void tcp_ofo_queue(struct sock *sk) static bool tcp_prune_ofo_queue(struct sock *sk, const struct sk_buff *in_= skb); static int tcp_prune_queue(struct sock *sk, const struct sk_buff *in_skb); =20 +/* Sequence checks run against the sender-visible receive window before th= is + * point. If later receive-side accounting retracts the live receive window + * below the maximum right edge we already advertised, allow one in-order = skb + * which still fits inside that sender-visible bound to reach the normal + * receive queue path. + * + * Keep receive-memory admission itself on the legacy hard-cap path so pru= ne + * and collapse behavior stay aligned with the established retracted-window + * handling. + */ +static bool tcp_skb_in_retracted_window(const struct tcp_sock *tp, + const struct sk_buff *skb) +{ + u32 live_end =3D tp->rcv_nxt + tcp_receive_window(tp); + u32 max_end =3D tp->rcv_nxt + tcp_max_receive_window(tp); + + return after(max_end, live_end) && + after(TCP_SKB_CB(skb)->end_seq, live_end) && + !after(TCP_SKB_CB(skb)->end_seq, max_end); +} + static bool tcp_can_ingest(const struct sock *sk, const struct sk_buff *sk= b) { - unsigned int rmem =3D atomic_read(&sk->sk_rmem_alloc); + return tcp_rmem_used(sk) <=3D READ_ONCE(sk->sk_rcvbuf); +} + +/* Caller already established that @skb extends into the retracted-but-sti= ll- + * valid sender-visible window. For in-order progress, regrow sk_rcvbuf be= fore + * falling into prune/forced-mem handling. + * + * This path intentionally repairs backing for one in-order skb that is al= ready + * within sender-visible sequence space, rather than treating it like ordi= nary + * receive-buffer autotuning. + * + * Keep this rescue bounded to the span accepted by this skb instead of the + * full historical tp->rcv_mwnd_seq. However, never grow below skb->truesi= ze, + * because sk_rmem_schedule() still charges hard memory, not sender-visible + * window bytes. + */ +static void tcp_try_grow_retracted_skb(struct sock *sk, + const struct sk_buff *skb) +{ + struct tcp_sock *tp =3D tcp_sk(sk); + int needed =3D skb->truesize; + int span_space; + u32 span_win; + + if (TCP_SKB_CB(skb)->seq !=3D tp->rcv_nxt) + return; + + span_win =3D TCP_SKB_CB(skb)->end_seq - tp->rcv_nxt; + if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) + span_win--; + + if (tcp_space_from_rcv_mwnd(tp, span_win, &span_space)) + needed =3D max_t(int, needed, span_space); =20 - return rmem <=3D sk->sk_rcvbuf; + tcp_try_grow_rcvbuf(sk, needed); } =20 +/* Sender-visible window rescue does not relax hard receive-memory admissi= on. + * If growth did not make room, fall back to the established prune/collapse + * path. + */ static int tcp_try_rmem_schedule(struct sock *sk, const struct sk_buff *sk= b, unsigned int size) { - if (!tcp_can_ingest(sk, skb) || - !sk_rmem_schedule(sk, skb, size)) { + bool can_ingest =3D tcp_can_ingest(sk, skb); + bool scheduled =3D can_ingest && sk_rmem_schedule(sk, skb, size); + + if (!scheduled) { + int pruned =3D tcp_prune_queue(sk, skb); =20 - if (tcp_prune_queue(sk, skb) < 0) + if (pruned < 0) return -1; =20 while (!sk_rmem_schedule(sk, skb, size)) { - if (!tcp_prune_ofo_queue(sk, skb)) + bool pruned_ofo =3D tcp_prune_ofo_queue(sk, skb); + + if (!pruned_ofo) return -1; } } @@ -5629,6 +5691,7 @@ void tcp_data_ready(struct sock *sk) static void tcp_data_queue(struct sock *sk, struct sk_buff *skb) { struct tcp_sock *tp =3D tcp_sk(sk); + bool retracted; enum skb_drop_reason reason; bool fragstolen; int eaten; @@ -5647,6 +5710,7 @@ static void tcp_data_queue(struct sock *sk, struct sk= _buff *skb) } tcp_cleanup_skb(skb); __skb_pull(skb, tcp_hdr(skb)->doff * 4); + retracted =3D skb->len && tcp_skb_in_retracted_window(tp, skb); =20 reason =3D SKB_DROP_REASON_NOT_SPECIFIED; tp->rx_opt.dsack =3D 0; @@ -5667,6 +5731,9 @@ static void tcp_data_queue(struct sock *sk, struct sk= _buff *skb) (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)) goto queue_and_out; =20 + if (retracted) + goto queue_and_out; + reason =3D SKB_DROP_REASON_TCP_ZEROWINDOW; NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPZEROWINDOWDROP); goto out_of_window; @@ -5674,7 +5741,20 @@ static void tcp_data_queue(struct sock *sk, struct s= k_buff *skb) =20 /* Ok. In sequence. In window. */ queue_and_out: + if (unlikely(retracted)) + tcp_try_grow_retracted_skb(sk, skb); + if (tcp_try_rmem_schedule(sk, skb, skb->truesize)) { + /* If the live rwnd collapsed to zero while rescuing an + * skb that still fit in sender-visible sequence space, + * report zero-window rather than generic proto-mem. + */ + if (unlikely(!tcp_receive_window(tp) && retracted)) { + reason =3D SKB_DROP_REASON_TCP_ZEROWINDOW; + NET_INC_STATS(sock_net(sk), + LINUX_MIB_TCPZEROWINDOWDROP); + goto out_of_window; + } /* TODO: maybe ratelimit these WIN 0 ACK ? */ inet_csk(sk)->icsk_ack.pending |=3D (ICSK_ACK_NOMEM | ICSK_ACK_NOW); --=20 2.43.0