Always initialize tp->snd_cwnd_clamp from the corresponding dst entry.
There are two issues with setting it from the TCP metrics:
1. If the cwnd option is changed in the routing table, the new value is not
used as long as there is a cached TCP metric for the destination.
2. After evicting the cached TCP metric entry, the next connection will use
the default value (i.e. no limit). Only after this connection is
finished, a new entry is created, and this entry gets the locked value
from the routing table.
As a result, the following shenanigan is required to set a new locked cwnd
value:
- update the route (``ip route replace ... cwnd lock $value``)
- flush any existing TCP metric entry (``ip tcp_metrics flush $dest``)
- create and finish a dummy connection to the destination to create a TCP
metric entry with the new value
- *next* connection to this destination will use the new value
It does not seem to be intentional.
Fixes: 51c5d0c4b169 ("tcp: Maintain dynamic metrics in local cache.")
Signed-off-by: Petr Tesarik <ptesarik@suse.com>
---
net/ipv4/tcp_metrics.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index 4251670e328c8..dd8f3457bd72e 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -477,6 +477,9 @@ void tcp_init_metrics(struct sock *sk)
if (!dst)
goto reset;
+ if (dst_metric_locked(dst, RTAX_CWND))
+ tp->snd_cwnd_clamp = dst_metric(dst, RTAX_CWND);
+
rcu_read_lock();
tm = tcp_get_metrics(sk, dst, false);
if (!tm) {
@@ -484,9 +487,6 @@ void tcp_init_metrics(struct sock *sk)
goto reset;
}
- if (tcp_metric_locked(tm, TCP_METRIC_CWND))
- tp->snd_cwnd_clamp = tcp_metric_get(tm, TCP_METRIC_CWND);
-
val = READ_ONCE(net->ipv4.sysctl_tcp_no_ssthresh_metrics_save) ?
0 : tcp_metric_get(tm, TCP_METRIC_SSTHRESH);
if (val) {
--
2.49.0
On 6/13/25 12:20 PM, Petr Tesarik wrote: > diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c > index 4251670e328c8..dd8f3457bd72e 100644 > --- a/net/ipv4/tcp_metrics.c > +++ b/net/ipv4/tcp_metrics.c > @@ -477,6 +477,9 @@ void tcp_init_metrics(struct sock *sk) > if (!dst) > goto reset; > > + if (dst_metric_locked(dst, RTAX_CWND)) > + tp->snd_cwnd_clamp = dst_metric(dst, RTAX_CWND); > + > rcu_read_lock(); > tm = tcp_get_metrics(sk, dst, false); > if (!tm) { > @@ -484,9 +487,6 @@ void tcp_init_metrics(struct sock *sk) > goto reset; > } > > - if (tcp_metric_locked(tm, TCP_METRIC_CWND)) > - tp->snd_cwnd_clamp = tcp_metric_get(tm, TCP_METRIC_CWND); > - > val = READ_ONCE(net->ipv4.sysctl_tcp_no_ssthresh_metrics_save) ? > 0 : tcp_metric_get(tm, TCP_METRIC_SSTHRESH); > if (val) { It's unclear to me why you drop the tcp_metric_get() here. It looks like the above will cause a functional regression, with unlocked cached metrics no longer taking effects? /P
On Tue, 17 Jun 2025 13:00:53 +0200 Paolo Abeni <pabeni@redhat.com> wrote: > On 6/13/25 12:20 PM, Petr Tesarik wrote: > > diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c > > index 4251670e328c8..dd8f3457bd72e 100644 > > --- a/net/ipv4/tcp_metrics.c > > +++ b/net/ipv4/tcp_metrics.c > > @@ -477,6 +477,9 @@ void tcp_init_metrics(struct sock *sk) > > if (!dst) > > goto reset; > > > > + if (dst_metric_locked(dst, RTAX_CWND)) > > + tp->snd_cwnd_clamp = dst_metric(dst, RTAX_CWND); > > + > > rcu_read_lock(); > > tm = tcp_get_metrics(sk, dst, false); > > if (!tm) { > > @@ -484,9 +487,6 @@ void tcp_init_metrics(struct sock *sk) > > goto reset; > > } > > > > - if (tcp_metric_locked(tm, TCP_METRIC_CWND)) > > - tp->snd_cwnd_clamp = tcp_metric_get(tm, TCP_METRIC_CWND); > > - > > val = READ_ONCE(net->ipv4.sysctl_tcp_no_ssthresh_metrics_save) ? > > 0 : tcp_metric_get(tm, TCP_METRIC_SSTHRESH); > > if (val) { > > It's unclear to me why you drop the tcp_metric_get() here. It looks like > the above will cause a functional regression, with unlocked cached > metrics no longer taking effects? Unlocked cached TCP_METRIC_CWND has never taken effects. As you can see, tcp_metric_get() was executed only if the metric was locked. In fact, the cwnd parameter in the route does not have any effect either. It's even documented in the manual page of ip-route(8): cwnd NUMBER (Linux 2.3.15+ only) the clamp for congestion window. It is ignored if the lock flag is not used. Note that here is also an initcwnd parameter, and I'm not changing anything about the handling of that one. Now, if you think that this TCP_METRIC_CWND is quite useless, then I wholeheartedly agree with you, but we cannot simply remove it, as it has become part of uapi, defined in include/uapi/linux/tcp_metrics.h. Petr T
On Tue, 17 Jun 2025 13:39:35 +0200 Petr Tesarik <ptesarik@suse.com> wrote: > On Tue, 17 Jun 2025 13:00:53 +0200 > Paolo Abeni <pabeni@redhat.com> wrote: > > > On 6/13/25 12:20 PM, Petr Tesarik wrote: > > > diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c > > > index 4251670e328c8..dd8f3457bd72e 100644 > > > --- a/net/ipv4/tcp_metrics.c > > > +++ b/net/ipv4/tcp_metrics.c > > > @@ -477,6 +477,9 @@ void tcp_init_metrics(struct sock *sk) > > > if (!dst) > > > goto reset; > > > > > > + if (dst_metric_locked(dst, RTAX_CWND)) > > > + tp->snd_cwnd_clamp = dst_metric(dst, RTAX_CWND); > > > + > > > rcu_read_lock(); > > > tm = tcp_get_metrics(sk, dst, false); > > > if (!tm) { > > > @@ -484,9 +487,6 @@ void tcp_init_metrics(struct sock *sk) > > > goto reset; > > > } > > > > > > - if (tcp_metric_locked(tm, TCP_METRIC_CWND)) > > > - tp->snd_cwnd_clamp = tcp_metric_get(tm, TCP_METRIC_CWND); > > > - > > > val = READ_ONCE(net->ipv4.sysctl_tcp_no_ssthresh_metrics_save) ? > > > 0 : tcp_metric_get(tm, TCP_METRIC_SSTHRESH); > > > if (val) { > > > > It's unclear to me why you drop the tcp_metric_get() here. It looks like > > the above will cause a functional regression, with unlocked cached > > metrics no longer taking effects? > > Unlocked cached TCP_METRIC_CWND has never taken effects. As you can > see, tcp_metric_get() was executed only if the metric was locked. > > In fact, the cwnd parameter in the route does not have any effect > either. It's even documented in the manual page of ip-route(8): > > cwnd NUMBER (Linux 2.3.15+ only) > the clamp for congestion window. It is ignored if > the lock flag is not used. > > Note that here is also an initcwnd parameter, and I'm not changing > anything about the handling of that one. > > Now, if you think that this TCP_METRIC_CWND is quite useless, then I > wholeheartedly agree with you, but we cannot simply remove it, as it > has become part of uapi, defined in include/uapi/linux/tcp_metrics.h. As an afterthought, I'm not quite sure about the semantics of this metric. The value calculated in tcp_update_metrics() has never been used for anything since it was introduced in 2.3.15. So there is: - either a locked cwnd value, which is used to clamp cwnd on a route and never updated, - or an unlocked cwnd value, which is updated upon connection termination but never used for anything by the kernel. OK, the unlocked value can be read by userspace, but what is it supposed to mean? The manual page for route-tcp_metrics(8) says: “CWND metric value”, which sounds like the author did not have a clue either. Unless someone here _has_ a clue, I'll just leave it as is, except the clamp value will be taken from the routing table, as it makes no sense to wait until the very same value propagates to a tcp_metrics_block (where it is then never updated). Petr T
On 6/18/25 7:01 PM, Petr Tesarik wrote: > On Tue, 17 Jun 2025 13:39:35 +0200 Petr Tesarik <ptesarik@suse.com> wrote: >> On Tue, 17 Jun 2025 13:00:53 +0200 Paolo Abeni <pabeni@redhat.com> wrote: >>> On 6/13/25 12:20 PM, Petr Tesarik wrote: >>>> diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c >>>> index 4251670e328c8..dd8f3457bd72e 100644 >>>> --- a/net/ipv4/tcp_metrics.c >>>> +++ b/net/ipv4/tcp_metrics.c >>>> @@ -477,6 +477,9 @@ void tcp_init_metrics(struct sock *sk) >>>> if (!dst) >>>> goto reset; >>>> >>>> + if (dst_metric_locked(dst, RTAX_CWND)) >>>> + tp->snd_cwnd_clamp = dst_metric(dst, RTAX_CWND); >>>> + >>>> rcu_read_lock(); >>>> tm = tcp_get_metrics(sk, dst, false); >>>> if (!tm) { >>>> @@ -484,9 +487,6 @@ void tcp_init_metrics(struct sock *sk) >>>> goto reset; >>>> } >>>> >>>> - if (tcp_metric_locked(tm, TCP_METRIC_CWND)) >>>> - tp->snd_cwnd_clamp = tcp_metric_get(tm, TCP_METRIC_CWND); >>>> - >>>> val = READ_ONCE(net->ipv4.sysctl_tcp_no_ssthresh_metrics_save) ? >>>> 0 : tcp_metric_get(tm, TCP_METRIC_SSTHRESH); >>>> if (val) { >>> >>> It's unclear to me why you drop the tcp_metric_get() here. It looks like >>> the above will cause a functional regression, with unlocked cached >>> metrics no longer taking effects? >> >> Unlocked cached TCP_METRIC_CWND has never taken effects. As you can >> see, tcp_metric_get() was executed only if the metric was locked. Uhm... the locking propagation from dst to tcp storage was not so straight forward to me, I missed it. Please be a little more verbose about this part in the commit message. Thanks, Paolo
© 2016 - 2025 Red Hat, Inc.