[PATCH net 1/2] tcp_metrics: set maximum cwnd from the dst entry

Petr Tesarik posted 2 patches 3 months, 4 weeks ago
There is a newer version of this series
[PATCH net 1/2] tcp_metrics: set maximum cwnd from the dst entry
Posted by Petr Tesarik 3 months, 4 weeks ago
Always initialize tp->snd_cwnd_clamp from the corresponding dst entry.
There are two issues with setting it from the TCP metrics:

1. If the cwnd option is changed in the routing table, the new value is not
   used as long as there is a cached TCP metric for the destination.

2. After evicting the cached TCP metric entry, the next connection will use
   the default value (i.e. no limit). Only after this connection is
   finished, a new entry is created, and this entry gets the locked value
   from the routing table.

As a result, the following shenanigan is required to set a new locked cwnd
value:

- update the route (``ip route replace ... cwnd lock $value``)
- flush any existing TCP metric entry (``ip tcp_metrics flush $dest``)
- create and finish a dummy connection to the destination to create a TCP
  metric entry with the new value
- *next* connection to this destination will use the new value

It does not seem to be intentional.

Fixes: 51c5d0c4b169 ("tcp: Maintain dynamic metrics in local cache.")
Signed-off-by: Petr Tesarik <ptesarik@suse.com>
---
 net/ipv4/tcp_metrics.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index 4251670e328c8..dd8f3457bd72e 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -477,6 +477,9 @@ void tcp_init_metrics(struct sock *sk)
 	if (!dst)
 		goto reset;
 
+	if (dst_metric_locked(dst, RTAX_CWND))
+		tp->snd_cwnd_clamp = dst_metric(dst, RTAX_CWND);
+
 	rcu_read_lock();
 	tm = tcp_get_metrics(sk, dst, false);
 	if (!tm) {
@@ -484,9 +487,6 @@ void tcp_init_metrics(struct sock *sk)
 		goto reset;
 	}
 
-	if (tcp_metric_locked(tm, TCP_METRIC_CWND))
-		tp->snd_cwnd_clamp = tcp_metric_get(tm, TCP_METRIC_CWND);
-
 	val = READ_ONCE(net->ipv4.sysctl_tcp_no_ssthresh_metrics_save) ?
 	      0 : tcp_metric_get(tm, TCP_METRIC_SSTHRESH);
 	if (val) {
-- 
2.49.0
Re: [PATCH net 1/2] tcp_metrics: set maximum cwnd from the dst entry
Posted by Paolo Abeni 3 months, 3 weeks ago
On 6/13/25 12:20 PM, Petr Tesarik wrote:
> diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
> index 4251670e328c8..dd8f3457bd72e 100644
> --- a/net/ipv4/tcp_metrics.c
> +++ b/net/ipv4/tcp_metrics.c
> @@ -477,6 +477,9 @@ void tcp_init_metrics(struct sock *sk)
>  	if (!dst)
>  		goto reset;
>  
> +	if (dst_metric_locked(dst, RTAX_CWND))
> +		tp->snd_cwnd_clamp = dst_metric(dst, RTAX_CWND);
> +
>  	rcu_read_lock();
>  	tm = tcp_get_metrics(sk, dst, false);
>  	if (!tm) {
> @@ -484,9 +487,6 @@ void tcp_init_metrics(struct sock *sk)
>  		goto reset;
>  	}
>  
> -	if (tcp_metric_locked(tm, TCP_METRIC_CWND))
> -		tp->snd_cwnd_clamp = tcp_metric_get(tm, TCP_METRIC_CWND);
> -
>  	val = READ_ONCE(net->ipv4.sysctl_tcp_no_ssthresh_metrics_save) ?
>  	      0 : tcp_metric_get(tm, TCP_METRIC_SSTHRESH);
>  	if (val) {

It's unclear to me why you drop the tcp_metric_get() here. It looks like
the above will cause a functional regression, with unlocked cached
metrics no longer taking effects?

/P
Re: [PATCH net 1/2] tcp_metrics: set maximum cwnd from the dst entry
Posted by Petr Tesarik 3 months, 3 weeks ago
On Tue, 17 Jun 2025 13:00:53 +0200
Paolo Abeni <pabeni@redhat.com> wrote:

> On 6/13/25 12:20 PM, Petr Tesarik wrote:
> > diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
> > index 4251670e328c8..dd8f3457bd72e 100644
> > --- a/net/ipv4/tcp_metrics.c
> > +++ b/net/ipv4/tcp_metrics.c
> > @@ -477,6 +477,9 @@ void tcp_init_metrics(struct sock *sk)
> >  	if (!dst)
> >  		goto reset;
> >  
> > +	if (dst_metric_locked(dst, RTAX_CWND))
> > +		tp->snd_cwnd_clamp = dst_metric(dst, RTAX_CWND);
> > +
> >  	rcu_read_lock();
> >  	tm = tcp_get_metrics(sk, dst, false);
> >  	if (!tm) {
> > @@ -484,9 +487,6 @@ void tcp_init_metrics(struct sock *sk)
> >  		goto reset;
> >  	}
> >  
> > -	if (tcp_metric_locked(tm, TCP_METRIC_CWND))
> > -		tp->snd_cwnd_clamp = tcp_metric_get(tm, TCP_METRIC_CWND);
> > -
> >  	val = READ_ONCE(net->ipv4.sysctl_tcp_no_ssthresh_metrics_save) ?
> >  	      0 : tcp_metric_get(tm, TCP_METRIC_SSTHRESH);
> >  	if (val) {  
> 
> It's unclear to me why you drop the tcp_metric_get() here. It looks like
> the above will cause a functional regression, with unlocked cached
> metrics no longer taking effects?

Unlocked cached TCP_METRIC_CWND has never taken effects. As you can
see, tcp_metric_get() was executed only if the metric was locked. 

In fact, the cwnd parameter in the route does not have any effect
either. It's even documented in the manual page of ip-route(8):

              cwnd NUMBER (Linux 2.3.15+ only)
                     the clamp for congestion window. It is ignored if
                     the lock flag is not used.

Note that here is also an initcwnd parameter, and I'm not changing
anything about the handling of that one.

Now, if you think that this TCP_METRIC_CWND is quite useless, then I
wholeheartedly agree with you, but we cannot simply remove it, as it
has become part of uapi, defined in include/uapi/linux/tcp_metrics.h.

Petr T
Re: [PATCH net 1/2] tcp_metrics: set maximum cwnd from the dst entry
Posted by Petr Tesarik 3 months, 3 weeks ago
On Tue, 17 Jun 2025 13:39:35 +0200
Petr Tesarik <ptesarik@suse.com> wrote:

> On Tue, 17 Jun 2025 13:00:53 +0200
> Paolo Abeni <pabeni@redhat.com> wrote:
> 
> > On 6/13/25 12:20 PM, Petr Tesarik wrote:  
> > > diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
> > > index 4251670e328c8..dd8f3457bd72e 100644
> > > --- a/net/ipv4/tcp_metrics.c
> > > +++ b/net/ipv4/tcp_metrics.c
> > > @@ -477,6 +477,9 @@ void tcp_init_metrics(struct sock *sk)
> > >  	if (!dst)
> > >  		goto reset;
> > >  
> > > +	if (dst_metric_locked(dst, RTAX_CWND))
> > > +		tp->snd_cwnd_clamp = dst_metric(dst, RTAX_CWND);
> > > +
> > >  	rcu_read_lock();
> > >  	tm = tcp_get_metrics(sk, dst, false);
> > >  	if (!tm) {
> > > @@ -484,9 +487,6 @@ void tcp_init_metrics(struct sock *sk)
> > >  		goto reset;
> > >  	}
> > >  
> > > -	if (tcp_metric_locked(tm, TCP_METRIC_CWND))
> > > -		tp->snd_cwnd_clamp = tcp_metric_get(tm, TCP_METRIC_CWND);
> > > -
> > >  	val = READ_ONCE(net->ipv4.sysctl_tcp_no_ssthresh_metrics_save) ?
> > >  	      0 : tcp_metric_get(tm, TCP_METRIC_SSTHRESH);
> > >  	if (val) {    
> > 
> > It's unclear to me why you drop the tcp_metric_get() here. It looks like
> > the above will cause a functional regression, with unlocked cached
> > metrics no longer taking effects?  
> 
> Unlocked cached TCP_METRIC_CWND has never taken effects. As you can
> see, tcp_metric_get() was executed only if the metric was locked. 
> 
> In fact, the cwnd parameter in the route does not have any effect
> either. It's even documented in the manual page of ip-route(8):
> 
>               cwnd NUMBER (Linux 2.3.15+ only)
>                      the clamp for congestion window. It is ignored if
>                      the lock flag is not used.
> 
> Note that here is also an initcwnd parameter, and I'm not changing
> anything about the handling of that one.
> 
> Now, if you think that this TCP_METRIC_CWND is quite useless, then I
> wholeheartedly agree with you, but we cannot simply remove it, as it
> has become part of uapi, defined in include/uapi/linux/tcp_metrics.h.

As an afterthought, I'm not quite sure about the semantics of this
metric. The value calculated in tcp_update_metrics() has never been
used for anything since it was introduced in 2.3.15. So there is:

- either a locked cwnd value, which is used to clamp cwnd on a
  route and never updated,
- or an unlocked cwnd value, which is updated upon connection
  termination but never used for anything by the kernel.

OK, the unlocked value can be read by userspace, but what is it
supposed to mean? The manual page for route-tcp_metrics(8) says: “CWND
metric value”, which sounds like the author did not have a clue either.

Unless someone here _has_ a clue, I'll just leave it as is, except the
clamp value will be taken from the routing table, as it makes no sense
to wait until the very same value propagates to a tcp_metrics_block
(where it is then never updated).

Petr T
Re: [PATCH net 1/2] tcp_metrics: set maximum cwnd from the dst entry
Posted by Paolo Abeni 3 months, 3 weeks ago
On 6/18/25 7:01 PM, Petr Tesarik wrote:
> On Tue, 17 Jun 2025 13:39:35 +0200 Petr Tesarik <ptesarik@suse.com> wrote:
>> On Tue, 17 Jun 2025 13:00:53 +0200 Paolo Abeni <pabeni@redhat.com> wrote:
>>> On 6/13/25 12:20 PM, Petr Tesarik wrote:  
>>>> diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
>>>> index 4251670e328c8..dd8f3457bd72e 100644
>>>> --- a/net/ipv4/tcp_metrics.c
>>>> +++ b/net/ipv4/tcp_metrics.c
>>>> @@ -477,6 +477,9 @@ void tcp_init_metrics(struct sock *sk)
>>>>  	if (!dst)
>>>>  		goto reset;
>>>>  
>>>> +	if (dst_metric_locked(dst, RTAX_CWND))
>>>> +		tp->snd_cwnd_clamp = dst_metric(dst, RTAX_CWND);
>>>> +
>>>>  	rcu_read_lock();
>>>>  	tm = tcp_get_metrics(sk, dst, false);
>>>>  	if (!tm) {
>>>> @@ -484,9 +487,6 @@ void tcp_init_metrics(struct sock *sk)
>>>>  		goto reset;
>>>>  	}
>>>>  
>>>> -	if (tcp_metric_locked(tm, TCP_METRIC_CWND))
>>>> -		tp->snd_cwnd_clamp = tcp_metric_get(tm, TCP_METRIC_CWND);
>>>> -
>>>>  	val = READ_ONCE(net->ipv4.sysctl_tcp_no_ssthresh_metrics_save) ?
>>>>  	      0 : tcp_metric_get(tm, TCP_METRIC_SSTHRESH);
>>>>  	if (val) {    
>>>
>>> It's unclear to me why you drop the tcp_metric_get() here. It looks like
>>> the above will cause a functional regression, with unlocked cached
>>> metrics no longer taking effects?  
>>
>> Unlocked cached TCP_METRIC_CWND has never taken effects. As you can
>> see, tcp_metric_get() was executed only if the metric was locked. 

Uhm... the locking propagation from dst to tcp storage was not so
straight forward to me, I missed it. Please be a little more verbose
about this part in the commit message.

Thanks,

Paolo