[PATCH net 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow

Tariq Toukan posted 3 patches 3 weeks, 1 day ago
[PATCH net 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow
Posted by Tariq Toukan 3 weeks, 1 day ago
From: Shahar Shitrit <shshitrit@nvidia.com>

When a netdev issues an RX async resync request, the TLS module
increments rcd_delta for each new record that arrives. This tracks
how far the current record is from the point where synchronization
was lost.

When rcd_delta reaches its threshold, it indicates that the device
response is either excessively delayed or unlikely to arrive at all
(at that point, tcp_sn may have wrapped around, so a match would no
longer be valid anyway).

Previous patch introduced tls_offload_rx_resync_async_request_cancel()
to explicitly cancel resync requests when a device response failure
is detected.

This patch adds a final safeguard: cancel the async resync request when
rcd_delta crosses its threshold, as reaching this point implies that
earlier cancellation did not occur.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 net/tls/tls_device.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index f672a62a9a52..56c14f1647a4 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async,
 		/* shouldn't get to wraparound:
 		 * too long in async stage, something bad happened
 		 */
-		if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX))
+		if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) {
+			/* cancel resync request */
+			atomic64_set(&resync_async->req, 0);
 			return false;
+		}
 
 		/* asynchronous stage: log all headers seq such that
 		 * req_seq <= seq <= end_seq, and wait for real resync request
-- 
2.31.1
Re: [PATCH net 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow
Posted by Jakub Kicinski 2 weeks, 4 days ago
On Wed, 10 Sep 2025 09:47:40 +0300 Tariq Toukan wrote:
> When a netdev issues an RX async resync request, the TLS module
> increments rcd_delta for each new record that arrives. This tracks
> how far the current record is from the point where synchronization
> was lost.
> 
> When rcd_delta reaches its threshold, it indicates that the device
> response is either excessively delayed or unlikely to arrive at all
> (at that point, tcp_sn may have wrapped around, so a match would no
> longer be valid anyway).
> 
> Previous patch introduced tls_offload_rx_resync_async_request_cancel()
> to explicitly cancel resync requests when a device response failure
> is detected.
> 
> This patch adds a final safeguard: cancel the async resync request when
> rcd_delta crosses its threshold, as reaching this point implies that
> earlier cancellation did not occur.

Missing a Fixes tag

> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> index f672a62a9a52..56c14f1647a4 100644
> --- a/net/tls/tls_device.c
> +++ b/net/tls/tls_device.c
> @@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async,
>  		/* shouldn't get to wraparound:
>  		 * too long in async stage, something bad happened
>  		 */
> -		if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX))
> +		if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) {
> +			/* cancel resync request */
> +			atomic64_set(&resync_async->req, 0);

we should probably use the helper added by the previous patch (I'd
probably squash them TBH)
Re: [PATCH net 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow
Posted by Shahar Shitrit 1 week, 3 days ago

On 14/09/2025 21:53, Jakub Kicinski wrote:
> On Wed, 10 Sep 2025 09:47:40 +0300 Tariq Toukan wrote:
>> When a netdev issues an RX async resync request, the TLS module
>> increments rcd_delta for each new record that arrives. This tracks
>> how far the current record is from the point where synchronization
>> was lost.
>>
>> When rcd_delta reaches its threshold, it indicates that the device
>> response is either excessively delayed or unlikely to arrive at all
>> (at that point, tcp_sn may have wrapped around, so a match would no
>> longer be valid anyway).
>>
>> Previous patch introduced tls_offload_rx_resync_async_request_cancel()
>> to explicitly cancel resync requests when a device response failure
>> is detected.
>>
>> This patch adds a final safeguard: cancel the async resync request when
>> rcd_delta crosses its threshold, as reaching this point implies that
>> earlier cancellation did not occur.
> 
> Missing a Fixes tag
Will add
> 
>> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
>> index f672a62a9a52..56c14f1647a4 100644
>> --- a/net/tls/tls_device.c
>> +++ b/net/tls/tls_device.c
>> @@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async,
>>  		/* shouldn't get to wraparound:
>>  		 * too long in async stage, something bad happened
>>  		 */
>> -		if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX))
>> +		if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) {
>> +			/* cancel resync request */
>> +			atomic64_set(&resync_async->req, 0);
> 
> we should probably use the helper added by the previous patch (I'd
> probably squash them TBH)
It's not trivial to use the helper here, since we don't have the socket.
We can maybe add another inner helper that performs the 0 setting and
call it here and inside the helper introduced in previous patch.
Re: [PATCH net 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow
Posted by Sabrina Dubroca 1 week, 3 days ago
2025-09-22, 10:18:52 +0300, Shahar Shitrit wrote:
> 
> 
> On 14/09/2025 21:53, Jakub Kicinski wrote:
> > On Wed, 10 Sep 2025 09:47:40 +0300 Tariq Toukan wrote:
> >> When a netdev issues an RX async resync request, the TLS module
> >> increments rcd_delta for each new record that arrives. This tracks
> >> how far the current record is from the point where synchronization
> >> was lost.
> >>
> >> When rcd_delta reaches its threshold, it indicates that the device
> >> response is either excessively delayed or unlikely to arrive at all
> >> (at that point, tcp_sn may have wrapped around, so a match would no
> >> longer be valid anyway).
> >>
> >> Previous patch introduced tls_offload_rx_resync_async_request_cancel()
> >> to explicitly cancel resync requests when a device response failure
> >> is detected.
> >>
> >> This patch adds a final safeguard: cancel the async resync request when
> >> rcd_delta crosses its threshold, as reaching this point implies that
> >> earlier cancellation did not occur.
> > 
> > Missing a Fixes tag
> Will add
> > 
> >> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> >> index f672a62a9a52..56c14f1647a4 100644
> >> --- a/net/tls/tls_device.c
> >> +++ b/net/tls/tls_device.c
> >> @@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async,
> >>  		/* shouldn't get to wraparound:
> >>  		 * too long in async stage, something bad happened
> >>  		 */
> >> -		if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX))
> >> +		if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) {
> >> +			/* cancel resync request */
> >> +			atomic64_set(&resync_async->req, 0);
> > 
> > we should probably use the helper added by the previous patch (I'd
> > probably squash them TBH)
>
> It's not trivial to use the helper here, since we don't have the socket.

tls_device_rx_resync_async doesn't currently get the socket, but it
has only one caller, tls_device_rx_resync_new_rec, which does. So
tls_device_rx_resync_async could easily get the socket. Or just pass
resync_async to tls_offload_rx_resync_async_request_cancel, since
that's what it really needs?

-- 
Sabrina
Re: [PATCH net 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow
Posted by Shahar Shitrit 4 days, 16 hours ago

On 22/09/2025 18:54, Sabrina Dubroca wrote:
> 2025-09-22, 10:18:52 +0300, Shahar Shitrit wrote:
>>
>>
>> On 14/09/2025 21:53, Jakub Kicinski wrote:
>>> On Wed, 10 Sep 2025 09:47:40 +0300 Tariq Toukan wrote:
>>>> When a netdev issues an RX async resync request, the TLS module
>>>> increments rcd_delta for each new record that arrives. This tracks
>>>> how far the current record is from the point where synchronization
>>>> was lost.
>>>>
>>>> When rcd_delta reaches its threshold, it indicates that the device
>>>> response is either excessively delayed or unlikely to arrive at all
>>>> (at that point, tcp_sn may have wrapped around, so a match would no
>>>> longer be valid anyway).
>>>>
>>>> Previous patch introduced tls_offload_rx_resync_async_request_cancel()
>>>> to explicitly cancel resync requests when a device response failure
>>>> is detected.
>>>>
>>>> This patch adds a final safeguard: cancel the async resync request when
>>>> rcd_delta crosses its threshold, as reaching this point implies that
>>>> earlier cancellation did not occur.
>>>
>>> Missing a Fixes tag
>> Will add
>>>
>>>> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
>>>> index f672a62a9a52..56c14f1647a4 100644
>>>> --- a/net/tls/tls_device.c
>>>> +++ b/net/tls/tls_device.c
>>>> @@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async,
>>>>  		/* shouldn't get to wraparound:
>>>>  		 * too long in async stage, something bad happened
>>>>  		 */
>>>> -		if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX))
>>>> +		if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) {
>>>> +			/* cancel resync request */
>>>> +			atomic64_set(&resync_async->req, 0);
>>>
>>> we should probably use the helper added by the previous patch (I'd
>>> probably squash them TBH)
>>
>> It's not trivial to use the helper here, since we don't have the socket.
> 
> tls_device_rx_resync_async doesn't currently get the socket, but it
> has only one caller, tls_device_rx_resync_new_rec, which does. So
> tls_device_rx_resync_async could easily get the socket. Or just pass
> resync_async to tls_offload_rx_resync_async_request_cancel, since
> that's what it really needs?
> 
yes these are options, but we don't like too much passing the socket to
tls_device_rx_resync_new_rec() merely for this matter. Also we wanted to
keep tls_offload_rx_resync_async_request_cancel in the same format of
tls_offload_rx_resync_async_request_start/end meaning to have the socket
as a parameter.
Re: [PATCH net 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow
Posted by Sabrina Dubroca 3 days, 12 hours ago
2025-09-28, 09:35:48 +0300, Shahar Shitrit wrote:
> 
> 
> On 22/09/2025 18:54, Sabrina Dubroca wrote:
> > 2025-09-22, 10:18:52 +0300, Shahar Shitrit wrote:
> >>
> >>
> >> On 14/09/2025 21:53, Jakub Kicinski wrote:
> >>> On Wed, 10 Sep 2025 09:47:40 +0300 Tariq Toukan wrote:
> >>>> When a netdev issues an RX async resync request, the TLS module
> >>>> increments rcd_delta for each new record that arrives. This tracks
> >>>> how far the current record is from the point where synchronization
> >>>> was lost.
> >>>>
> >>>> When rcd_delta reaches its threshold, it indicates that the device
> >>>> response is either excessively delayed or unlikely to arrive at all
> >>>> (at that point, tcp_sn may have wrapped around, so a match would no
> >>>> longer be valid anyway).
> >>>>
> >>>> Previous patch introduced tls_offload_rx_resync_async_request_cancel()
> >>>> to explicitly cancel resync requests when a device response failure
> >>>> is detected.
> >>>>
> >>>> This patch adds a final safeguard: cancel the async resync request when
> >>>> rcd_delta crosses its threshold, as reaching this point implies that
> >>>> earlier cancellation did not occur.
> >>>
> >>> Missing a Fixes tag
> >> Will add
> >>>
> >>>> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> >>>> index f672a62a9a52..56c14f1647a4 100644
> >>>> --- a/net/tls/tls_device.c
> >>>> +++ b/net/tls/tls_device.c
> >>>> @@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async,
> >>>>  		/* shouldn't get to wraparound:
> >>>>  		 * too long in async stage, something bad happened
> >>>>  		 */
> >>>> -		if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX))
> >>>> +		if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) {
> >>>> +			/* cancel resync request */
> >>>> +			atomic64_set(&resync_async->req, 0);
> >>>
> >>> we should probably use the helper added by the previous patch (I'd
> >>> probably squash them TBH)
> >>
> >> It's not trivial to use the helper here, since we don't have the socket.
> > 
> > tls_device_rx_resync_async doesn't currently get the socket, but it
> > has only one caller, tls_device_rx_resync_new_rec, which does. So
> > tls_device_rx_resync_async could easily get the socket. Or just pass
> > resync_async to tls_offload_rx_resync_async_request_cancel, since
> > that's what it really needs?
> > 
> yes these are options, but we don't like too much passing the socket to
> tls_device_rx_resync_new_rec() merely for this matter.

Why not? If you felt the need to add a comment saying we're canceling
the request, using a helper instead that says it does the canceling is
a pretty decent reason to add whatever argument
tls_device_rx_resync_async needs (or swap resync_async for the socket
if you don't want to add another argument).

> Also we wanted to
> keep tls_offload_rx_resync_async_request_cancel in the same format of
> tls_offload_rx_resync_async_request_start/end meaning to have the socket
> as a parameter.

Then they could easily be changed to make the 3 helpers consistent
(all taking resync_async), since
tls_offload_rx_resync_async_request_start/end are used exactly once
each.

-- 
Sabrina
Re: [PATCH net 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow
Posted by Sabrina Dubroca 2 weeks, 6 days ago
2025-09-10, 09:47:40 +0300, Tariq Toukan wrote:
> From: Shahar Shitrit <shshitrit@nvidia.com>
> 
> When a netdev issues an RX async resync request, the TLS module
> increments rcd_delta for each new record that arrives. This tracks
> how far the current record is from the point where synchronization
> was lost.
> 
> When rcd_delta reaches its threshold, it indicates that the device
> response is either excessively delayed or unlikely to arrive at all
> (at that point, tcp_sn may have wrapped around, so a match would no
> longer be valid anyway).
> 
> Previous patch introduced tls_offload_rx_resync_async_request_cancel()
> to explicitly cancel resync requests when a device response failure
> is detected.
> 
> This patch adds a final safeguard: cancel the async resync request when
> rcd_delta crosses its threshold, as reaching this point implies that
> earlier cancellation did not occur.
> 
> Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>  net/tls/tls_device.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> index f672a62a9a52..56c14f1647a4 100644
> --- a/net/tls/tls_device.c
> +++ b/net/tls/tls_device.c
> @@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async,
>  		/* shouldn't get to wraparound:
>  		 * too long in async stage, something bad happened
>  		 */
> -		if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX))
> +		if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) {

Do we still need to WARN here? It's a condition that can actually
happen (even if it's rare), and that the stack can handle, so maybe
not?

> +			/* cancel resync request */
> +			atomic64_set(&resync_async->req, 0);
>  			return false;
> +		}
>  
>  		/* asynchronous stage: log all headers seq such that
>  		 * req_seq <= seq <= end_seq, and wait for real resync request
> -- 
> 2.31.1
> 

-- 
Sabrina
Re: [PATCH net 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow
Posted by Shahar Shitrit 1 week, 3 days ago

On 12/09/2025 18:14, Sabrina Dubroca wrote:
> 2025-09-10, 09:47:40 +0300, Tariq Toukan wrote:
>> From: Shahar Shitrit <shshitrit@nvidia.com>
>>
>> When a netdev issues an RX async resync request, the TLS module
>> increments rcd_delta for each new record that arrives. This tracks
>> how far the current record is from the point where synchronization
>> was lost.
>>
>> When rcd_delta reaches its threshold, it indicates that the device
>> response is either excessively delayed or unlikely to arrive at all
>> (at that point, tcp_sn may have wrapped around, so a match would no
>> longer be valid anyway).
>>
>> Previous patch introduced tls_offload_rx_resync_async_request_cancel()
>> to explicitly cancel resync requests when a device response failure
>> is detected.
>>
>> This patch adds a final safeguard: cancel the async resync request when
>> rcd_delta crosses its threshold, as reaching this point implies that
>> earlier cancellation did not occur.
>>
>> Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
>> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
>> ---
>>  net/tls/tls_device.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
>> index f672a62a9a52..56c14f1647a4 100644
>> --- a/net/tls/tls_device.c
>> +++ b/net/tls/tls_device.c
>> @@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async,
>>  		/* shouldn't get to wraparound:
>>  		 * too long in async stage, something bad happened
>>  		 */
>> -		if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX))
>> +		if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) {
> 
> Do we still need to WARN here? It's a condition that can actually
> happen (even if it's rare), and that the stack can handle, so maybe
> not?
> 
You are right that now the stack handles this, but removing the WARN
without any alternative, will remove any indication that something went
wrong and will prevent us from improving by searching the error flow
where we didn't cancel the request before reaching here. We can maybe
replace the WARN with a counter. what do you think?

>> +			/* cancel resync request */
>> +			atomic64_set(&resync_async->req, 0);
>>  			return false;
>> +		}
>>  
>>  		/* asynchronous stage: log all headers seq such that
>>  		 * req_seq <= seq <= end_seq, and wait for real resync request
>> -- 
>> 2.31.1
>>
>
Re: [PATCH net 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow
Posted by Sabrina Dubroca 1 week, 2 days ago
2025-09-22, 10:16:21 +0300, Shahar Shitrit wrote:
> 
> 
> On 12/09/2025 18:14, Sabrina Dubroca wrote:
> > 2025-09-10, 09:47:40 +0300, Tariq Toukan wrote:
> >> From: Shahar Shitrit <shshitrit@nvidia.com>
> >>
> >> When a netdev issues an RX async resync request, the TLS module
> >> increments rcd_delta for each new record that arrives. This tracks
> >> how far the current record is from the point where synchronization
> >> was lost.
> >>
> >> When rcd_delta reaches its threshold, it indicates that the device
> >> response is either excessively delayed or unlikely to arrive at all
> >> (at that point, tcp_sn may have wrapped around, so a match would no
> >> longer be valid anyway).
> >>
> >> Previous patch introduced tls_offload_rx_resync_async_request_cancel()
> >> to explicitly cancel resync requests when a device response failure
> >> is detected.
> >>
> >> This patch adds a final safeguard: cancel the async resync request when
> >> rcd_delta crosses its threshold, as reaching this point implies that
> >> earlier cancellation did not occur.
> >>
> >> Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
> >> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> >> ---
> >>  net/tls/tls_device.c | 5 ++++-
> >>  1 file changed, 4 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> >> index f672a62a9a52..56c14f1647a4 100644
> >> --- a/net/tls/tls_device.c
> >> +++ b/net/tls/tls_device.c
> >> @@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async,
> >>  		/* shouldn't get to wraparound:
> >>  		 * too long in async stage, something bad happened
> >>  		 */
> >> -		if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX))
> >> +		if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) {
> > 
> > Do we still need to WARN here? It's a condition that can actually
> > happen (even if it's rare), and that the stack can handle, so maybe
> > not?
> > 
> You are right that now the stack handles this, but removing the WARN
> without any alternative, will remove any indication that something went
> wrong and will prevent us from improving by searching the error flow
> where we didn't cancel the request before reaching here. We can maybe
> replace the WARN with a counter. what do you think?

Do you use CONFIG_DEBUG_NET in your devel/test kernels? If so,
DEBUG_NET_WARN_ONCE would be an option. Or is it more so that
users/customers can report the problem (ie on production kernels
without CONFIG_DEBUG_NET) - in that case, the counter would work
better.
But if you really think that this condition indicates a driver bug,
maybe the WARN is still appropriate. Jakub, what do you think?


BTW, I was also thinking that the documentation
(Documentation/networking/tls-offload.rst) could maybe be improved a
bit with a description of how async resync works and how the driver is
expected to use the tls_offload_rx_resync_async_request_{start,end}
(and now _cancel) helpers. The section on "Stream scan
resynchronization" is pretty abstract.

-- 
Sabrina