[v2] tls: Introduce and use RX async resync request cancel function

[PATCH net V2 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow

Posted by Tariq Toukan 3 months, 3 weeks ago

From: Shahar Shitrit <shshitrit@nvidia.com>

When a netdev issues a RX async resync request for a TLS connection,
the TLS module handles it by logging record headers and attempting to
match them to the tcp_sn provided by the device. If a match is found,
the TLS module approves the tcp_sn for resynchronization.

While waiting for a device response, the TLS module also increments
rcd_delta each time a new TLS record is received, tracking the distance
from the original resync request.

However, if the device response is delayed or fails (e.g due to
unstable connection and device getting out of tracking, hardware
errors, resource exhaustion etc.), the TLS module keeps logging and
incrementing, which can lead to a WARN() when rcd_delta exceeds the
threshold.

To address this, introduce tls_offload_rx_resync_async_request_cancel()
to explicitly cancel resync requests when a device response failure is
detected. Call this helper also as a final safeguard when rcd_delta
crosses its threshold, as reaching this point implies that earlier
cancellation did not occur.

Fixes: 138559b9f99d ("net/tls: Fix wrong record sn in async mode of device resync")
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 include/net/tls.h    | 6 ++++++
 net/tls/tls_device.c | 4 +++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index b90f3b675c3c..c7bcdb3afad7 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -467,6 +467,12 @@ tls_offload_rx_resync_async_request_end(struct tls_offload_resync_async *resync_
 	atomic64_set(&resync_async->req, ((u64)ntohl(seq) << 32) | RESYNC_REQ);
 }
 
+static inline void
+tls_offload_rx_resync_async_request_cancel(struct tls_offload_resync_async *resync_async)
+{
+	atomic64_set(&resync_async->req, 0);
+}
+
 static inline void
 tls_offload_rx_resync_set_type(struct sock *sk, enum tls_offload_sync_type type)
 {
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index a64ae15b1a60..71734411ff4c 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -723,8 +723,10 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async,
 		/* shouldn't get to wraparound:
 		 * too long in async stage, something bad happened
 		 */
-		if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX))
+		if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) {
+			tls_offload_rx_resync_async_request_cancel(resync_async);
 			return false;
+		}
 
 		/* asynchronous stage: log all headers seq such that
 		 * req_seq <= seq <= end_seq, and wait for real resync request
-- 
2.31.1

Re: [PATCH net V2 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow

Posted by Sabrina Dubroca 3 months, 2 weeks ago

nit if you end up respinning, there's a typo in the subject:
s/rdc_delta/rcd_delta/


2025-10-20, 10:05:53 +0300, Tariq Toukan wrote:
> From: Shahar Shitrit <shshitrit@nvidia.com>
> 
> When a netdev issues a RX async resync request for a TLS connection,
> the TLS module handles it by logging record headers and attempting to
> match them to the tcp_sn provided by the device. If a match is found,
> the TLS module approves the tcp_sn for resynchronization.
> 
> While waiting for a device response, the TLS module also increments
> rcd_delta each time a new TLS record is received, tracking the distance
> from the original resync request.
> 
> However, if the device response is delayed or fails (e.g due to
> unstable connection and device getting out of tracking, hardware
> errors, resource exhaustion etc.), the TLS module keeps logging and
> incrementing, which can lead to a WARN() when rcd_delta exceeds the
> threshold.
> 
> To address this, introduce tls_offload_rx_resync_async_request_cancel()
> to explicitly cancel resync requests when a device response failure is
> detected. Call this helper also as a final safeguard when rcd_delta
> crosses its threshold, as reaching this point implies that earlier
> cancellation did not occur.
> 
> Fixes: 138559b9f99d ("net/tls: Fix wrong record sn in async mode of device resync")

The patch itself looks good, but what issue is fixed within this
patch? The helper will be useful in the next patch, but right now
we're only resetting the resync_async status. The only change I see
(without patch 3) is that we won't call tls_device_rx_resync_async()
next time we decrypt a record in SW, but it wouldn't have done
anything.

Actually, also in patch 1/3, there is no "fix" is in that patch.

-- 
Sabrina

Re: [PATCH net V2 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow

Posted by Shahar Shitrit 3 months, 2 weeks ago


On 21/10/2025 18:28, Sabrina Dubroca wrote:
> nit if you end up respinning, there's a typo in the subject:
> s/rdc_delta/rcd_delta/
> 
> 
> 2025-10-20, 10:05:53 +0300, Tariq Toukan wrote:
>> From: Shahar Shitrit <shshitrit@nvidia.com>
>>
>> When a netdev issues a RX async resync request for a TLS connection,
>> the TLS module handles it by logging record headers and attempting to
>> match them to the tcp_sn provided by the device. If a match is found,
>> the TLS module approves the tcp_sn for resynchronization.
>>
>> While waiting for a device response, the TLS module also increments
>> rcd_delta each time a new TLS record is received, tracking the distance
>> from the original resync request.
>>
>> However, if the device response is delayed or fails (e.g due to
>> unstable connection and device getting out of tracking, hardware
>> errors, resource exhaustion etc.), the TLS module keeps logging and
>> incrementing, which can lead to a WARN() when rcd_delta exceeds the
>> threshold.
>>
>> To address this, introduce tls_offload_rx_resync_async_request_cancel()
>> to explicitly cancel resync requests when a device response failure is
>> detected. Call this helper also as a final safeguard when rcd_delta
>> crosses its threshold, as reaching this point implies that earlier
>> cancellation did not occur.
>>
>> Fixes: 138559b9f99d ("net/tls: Fix wrong record sn in async mode of device resync")
> 
> The patch itself looks good, but what issue is fixed within this
> patch? The helper will be useful in the next patch, but right now
> we're only resetting the resync_async status. The only change I see
> (without patch 3) is that we won't call tls_device_rx_resync_async()
> next time we decrypt a record in SW, but it wouldn't have done
> anything.
> 
> Actually, also in patch 1/3, there is no "fix" is in that patch.
> 

I agree about patch 1/3 so I'll remove the fixes tag.

For this patch, indeed at this point the WARN() was already fired,
however, the bug being addressed is the unnecessary work the TLS module
continues to do. For my liking, the wasted CPU cycles and resources
alone justify the fix, even if we've already issued a warning.
What do you think?

Re: [PATCH net V2 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow

Posted by Sabrina Dubroca 3 months, 2 weeks ago

2025-10-22, 14:38:17 +0300, Shahar Shitrit wrote:
> 
> 
> On 21/10/2025 18:28, Sabrina Dubroca wrote:
> > nit if you end up respinning, there's a typo in the subject:
> > s/rdc_delta/rcd_delta/
> > 
> > 
> > 2025-10-20, 10:05:53 +0300, Tariq Toukan wrote:
> >> From: Shahar Shitrit <shshitrit@nvidia.com>
> >>
> >> When a netdev issues a RX async resync request for a TLS connection,
> >> the TLS module handles it by logging record headers and attempting to
> >> match them to the tcp_sn provided by the device. If a match is found,
> >> the TLS module approves the tcp_sn for resynchronization.
> >>
> >> While waiting for a device response, the TLS module also increments
> >> rcd_delta each time a new TLS record is received, tracking the distance
> >> from the original resync request.
> >>
> >> However, if the device response is delayed or fails (e.g due to
> >> unstable connection and device getting out of tracking, hardware
> >> errors, resource exhaustion etc.), the TLS module keeps logging and
> >> incrementing, which can lead to a WARN() when rcd_delta exceeds the
> >> threshold.
> >>
> >> To address this, introduce tls_offload_rx_resync_async_request_cancel()
> >> to explicitly cancel resync requests when a device response failure is
> >> detected. Call this helper also as a final safeguard when rcd_delta
> >> crosses its threshold, as reaching this point implies that earlier
> >> cancellation did not occur.
> >>
> >> Fixes: 138559b9f99d ("net/tls: Fix wrong record sn in async mode of device resync")
> > 
> > The patch itself looks good, but what issue is fixed within this
> > patch? The helper will be useful in the next patch, but right now
> > we're only resetting the resync_async status. The only change I see
> > (without patch 3) is that we won't call tls_device_rx_resync_async()
> > next time we decrypt a record in SW, but it wouldn't have done
> > anything.
> > 
> > Actually, also in patch 1/3, there is no "fix" is in that patch.
> > 
> 
> I agree about patch 1/3 so I'll remove the fixes tag.
> 
> For this patch, indeed at this point the WARN() was already fired,
> however, the bug being addressed is the unnecessary work the TLS module
> continues to do. For my liking, the wasted CPU cycles and resources
> alone justify the fix, even if we've already issued a warning.
> What do you think?

Is there any work being done/avoided other than calling
tls_device_rx_resync_async and returning immediately?

With or without the patch, tls_device_rx_resync_new_rec will be called
during stream parsing.

Currently, resync_async->req doesn't get reset so we'll call
tls_device_rx_resync_async. We're still in async phase, rcd_delta is
still USHRT_MAX, and we're done, tls_device_rx_resync_new_rec returns.

With the patch, we'll see that resync_async->req is 0 and avoid
calling tls_device_rx_resync_async.

Did I miss something else?

-- 
Sabrina

Re: [PATCH net V2 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow

Posted by Shahar Shitrit 3 months, 2 weeks ago


On 22/10/2025 15:47, Sabrina Dubroca wrote:
> 2025-10-22, 14:38:17 +0300, Shahar Shitrit wrote:
>>
>>
>> On 21/10/2025 18:28, Sabrina Dubroca wrote:
>>> nit if you end up respinning, there's a typo in the subject:
>>> s/rdc_delta/rcd_delta/
>>>
>>>
>>> 2025-10-20, 10:05:53 +0300, Tariq Toukan wrote:
>>>> From: Shahar Shitrit <shshitrit@nvidia.com>
>>>>
>>>> When a netdev issues a RX async resync request for a TLS connection,
>>>> the TLS module handles it by logging record headers and attempting to
>>>> match them to the tcp_sn provided by the device. If a match is found,
>>>> the TLS module approves the tcp_sn for resynchronization.
>>>>
>>>> While waiting for a device response, the TLS module also increments
>>>> rcd_delta each time a new TLS record is received, tracking the distance
>>>> from the original resync request.
>>>>
>>>> However, if the device response is delayed or fails (e.g due to
>>>> unstable connection and device getting out of tracking, hardware
>>>> errors, resource exhaustion etc.), the TLS module keeps logging and
>>>> incrementing, which can lead to a WARN() when rcd_delta exceeds the
>>>> threshold.
>>>>
>>>> To address this, introduce tls_offload_rx_resync_async_request_cancel()
>>>> to explicitly cancel resync requests when a device response failure is
>>>> detected. Call this helper also as a final safeguard when rcd_delta
>>>> crosses its threshold, as reaching this point implies that earlier
>>>> cancellation did not occur.
>>>>
>>>> Fixes: 138559b9f99d ("net/tls: Fix wrong record sn in async mode of device resync")
>>>
>>> The patch itself looks good, but what issue is fixed within this
>>> patch? The helper will be useful in the next patch, but right now
>>> we're only resetting the resync_async status. The only change I see
>>> (without patch 3) is that we won't call tls_device_rx_resync_async()
>>> next time we decrypt a record in SW, but it wouldn't have done
>>> anything.
>>>
>>> Actually, also in patch 1/3, there is no "fix" is in that patch.
>>>
>>
>> I agree about patch 1/3 so I'll remove the fixes tag.
>>
>> For this patch, indeed at this point the WARN() was already fired,
>> however, the bug being addressed is the unnecessary work the TLS module
>> continues to do. For my liking, the wasted CPU cycles and resources
>> alone justify the fix, even if we've already issued a warning.
>> What do you think?
> 
> Is there any work being done/avoided other than calling
> tls_device_rx_resync_async and returning immediately?
> 
> With or without the patch, tls_device_rx_resync_new_rec will be called
> during stream parsing.
> 
> Currently, resync_async->req doesn't get reset so we'll call
> tls_device_rx_resync_async. We're still in async phase, rcd_delta is
> still USHRT_MAX, and we're done, tls_device_rx_resync_new_rec returns.
> 
> With the patch, we'll see that resync_async->req is 0 and avoid
> calling tls_device_rx_resync_async.
> 
> Did I miss something else?
> 
My bad, you are right. The unnecessary work the invocation of
tls_device_rx_resync_async.
OK so there are some options; I can either simply remove the fixes tag
and leave the patch as is, or I can also remove the call to
tls_offload_rx_resync_async_request_cancel() at that point so the patch
only introduces the helper (and then submit a patch to net-next that
adds the call to tls_offload_rx_resync_async_request_cancel when
rcd_delta == USHRT_MAX to improve the behavior).

what do you think it's the best to do?

Re: [PATCH net V2 2/3] net: tls: Cancel RX async resync request on rdc_delta overflow

Posted by Sabrina Dubroca 3 months, 2 weeks ago

2025-10-23, 13:44:54 +0300, Shahar Shitrit wrote:
> On 22/10/2025 15:47, Sabrina Dubroca wrote:
> > 2025-10-22, 14:38:17 +0300, Shahar Shitrit wrote:
> >> On 21/10/2025 18:28, Sabrina Dubroca wrote:
> >>> 2025-10-20, 10:05:53 +0300, Tariq Toukan wrote:
> >>>> Fixes: 138559b9f99d ("net/tls: Fix wrong record sn in async mode of device resync")
> >>>
> >>> The patch itself looks good, but what issue is fixed within this
> >>> patch? The helper will be useful in the next patch, but right now
> >>> we're only resetting the resync_async status. The only change I see
> >>> (without patch 3) is that we won't call tls_device_rx_resync_async()
> >>> next time we decrypt a record in SW, but it wouldn't have done
> >>> anything.
> >>>
> >>> Actually, also in patch 1/3, there is no "fix" is in that patch.
> >>>
> >>
> >> I agree about patch 1/3 so I'll remove the fixes tag.
> >>
> >> For this patch, indeed at this point the WARN() was already fired,
> >> however, the bug being addressed is the unnecessary work the TLS module
> >> continues to do. For my liking, the wasted CPU cycles and resources
> >> alone justify the fix, even if we've already issued a warning.
> >> What do you think?
> > 
> > Is there any work being done/avoided other than calling
> > tls_device_rx_resync_async and returning immediately?
> > 
> > With or without the patch, tls_device_rx_resync_new_rec will be called
> > during stream parsing.
> > 
> > Currently, resync_async->req doesn't get reset so we'll call
> > tls_device_rx_resync_async. We're still in async phase, rcd_delta is
> > still USHRT_MAX, and we're done, tls_device_rx_resync_new_rec returns.
> > 
> > With the patch, we'll see that resync_async->req is 0 and avoid
> > calling tls_device_rx_resync_async.
> > 
> > Did I miss something else?
> > 
> My bad, you are right. The unnecessary work the invocation of
> tls_device_rx_resync_async.
> OK so there are some options; I can either simply remove the fixes tag
> and leave the patch as is, or I can also remove the call to
> tls_offload_rx_resync_async_request_cancel() at that point so the patch
> only introduces the helper (and then submit a patch to net-next that
> adds the call to tls_offload_rx_resync_async_request_cancel when
> rcd_delta == USHRT_MAX to improve the behavior).
> 
> what do you think it's the best to do?

I'd leave the patch as is, just without the Fixes tag.

With the Subject typo fixed and the Fixes tag removed:
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>

-- 
Sabrina