From: Shahar Shitrit <shshitrit@nvidia.com>
When a netdev issues an RX async resync request, the TLS module
increments rcd_delta for each new record that arrives. This tracks
how far the current record is from the point where synchronization
was lost.
When rcd_delta reaches its threshold, it indicates that the device
response is either excessively delayed or unlikely to arrive at all
(at that point, tcp_sn may have wrapped around, so a match would no
longer be valid anyway).
Previous patch introduced tls_offload_rx_resync_async_request_cancel()
to explicitly cancel resync requests when a device response failure
is detected.
This patch adds a final safeguard: cancel the async resync request when
rcd_delta crosses its threshold, as reaching this point implies that
earlier cancellation did not occur.
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
net/tls/tls_device.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index f672a62a9a52..56c14f1647a4 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async,
/* shouldn't get to wraparound:
* too long in async stage, something bad happened
*/
- if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX))
+ if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) {
+ /* cancel resync request */
+ atomic64_set(&resync_async->req, 0);
return false;
+ }
/* asynchronous stage: log all headers seq such that
* req_seq <= seq <= end_seq, and wait for real resync request
--
2.31.1
On Wed, 10 Sep 2025 09:47:40 +0300 Tariq Toukan wrote: > When a netdev issues an RX async resync request, the TLS module > increments rcd_delta for each new record that arrives. This tracks > how far the current record is from the point where synchronization > was lost. > > When rcd_delta reaches its threshold, it indicates that the device > response is either excessively delayed or unlikely to arrive at all > (at that point, tcp_sn may have wrapped around, so a match would no > longer be valid anyway). > > Previous patch introduced tls_offload_rx_resync_async_request_cancel() > to explicitly cancel resync requests when a device response failure > is detected. > > This patch adds a final safeguard: cancel the async resync request when > rcd_delta crosses its threshold, as reaching this point implies that > earlier cancellation did not occur. Missing a Fixes tag > diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c > index f672a62a9a52..56c14f1647a4 100644 > --- a/net/tls/tls_device.c > +++ b/net/tls/tls_device.c > @@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async, > /* shouldn't get to wraparound: > * too long in async stage, something bad happened > */ > - if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) > + if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) { > + /* cancel resync request */ > + atomic64_set(&resync_async->req, 0); we should probably use the helper added by the previous patch (I'd probably squash them TBH)
On 14/09/2025 21:53, Jakub Kicinski wrote: > On Wed, 10 Sep 2025 09:47:40 +0300 Tariq Toukan wrote: >> When a netdev issues an RX async resync request, the TLS module >> increments rcd_delta for each new record that arrives. This tracks >> how far the current record is from the point where synchronization >> was lost. >> >> When rcd_delta reaches its threshold, it indicates that the device >> response is either excessively delayed or unlikely to arrive at all >> (at that point, tcp_sn may have wrapped around, so a match would no >> longer be valid anyway). >> >> Previous patch introduced tls_offload_rx_resync_async_request_cancel() >> to explicitly cancel resync requests when a device response failure >> is detected. >> >> This patch adds a final safeguard: cancel the async resync request when >> rcd_delta crosses its threshold, as reaching this point implies that >> earlier cancellation did not occur. > > Missing a Fixes tag Will add > >> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c >> index f672a62a9a52..56c14f1647a4 100644 >> --- a/net/tls/tls_device.c >> +++ b/net/tls/tls_device.c >> @@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async, >> /* shouldn't get to wraparound: >> * too long in async stage, something bad happened >> */ >> - if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) >> + if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) { >> + /* cancel resync request */ >> + atomic64_set(&resync_async->req, 0); > > we should probably use the helper added by the previous patch (I'd > probably squash them TBH) It's not trivial to use the helper here, since we don't have the socket. We can maybe add another inner helper that performs the 0 setting and call it here and inside the helper introduced in previous patch.
2025-09-22, 10:18:52 +0300, Shahar Shitrit wrote: > > > On 14/09/2025 21:53, Jakub Kicinski wrote: > > On Wed, 10 Sep 2025 09:47:40 +0300 Tariq Toukan wrote: > >> When a netdev issues an RX async resync request, the TLS module > >> increments rcd_delta for each new record that arrives. This tracks > >> how far the current record is from the point where synchronization > >> was lost. > >> > >> When rcd_delta reaches its threshold, it indicates that the device > >> response is either excessively delayed or unlikely to arrive at all > >> (at that point, tcp_sn may have wrapped around, so a match would no > >> longer be valid anyway). > >> > >> Previous patch introduced tls_offload_rx_resync_async_request_cancel() > >> to explicitly cancel resync requests when a device response failure > >> is detected. > >> > >> This patch adds a final safeguard: cancel the async resync request when > >> rcd_delta crosses its threshold, as reaching this point implies that > >> earlier cancellation did not occur. > > > > Missing a Fixes tag > Will add > > > >> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c > >> index f672a62a9a52..56c14f1647a4 100644 > >> --- a/net/tls/tls_device.c > >> +++ b/net/tls/tls_device.c > >> @@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async, > >> /* shouldn't get to wraparound: > >> * too long in async stage, something bad happened > >> */ > >> - if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) > >> + if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) { > >> + /* cancel resync request */ > >> + atomic64_set(&resync_async->req, 0); > > > > we should probably use the helper added by the previous patch (I'd > > probably squash them TBH) > > It's not trivial to use the helper here, since we don't have the socket. tls_device_rx_resync_async doesn't currently get the socket, but it has only one caller, tls_device_rx_resync_new_rec, which does. So tls_device_rx_resync_async could easily get the socket. Or just pass resync_async to tls_offload_rx_resync_async_request_cancel, since that's what it really needs? -- Sabrina
On 22/09/2025 18:54, Sabrina Dubroca wrote: > 2025-09-22, 10:18:52 +0300, Shahar Shitrit wrote: >> >> >> On 14/09/2025 21:53, Jakub Kicinski wrote: >>> On Wed, 10 Sep 2025 09:47:40 +0300 Tariq Toukan wrote: >>>> When a netdev issues an RX async resync request, the TLS module >>>> increments rcd_delta for each new record that arrives. This tracks >>>> how far the current record is from the point where synchronization >>>> was lost. >>>> >>>> When rcd_delta reaches its threshold, it indicates that the device >>>> response is either excessively delayed or unlikely to arrive at all >>>> (at that point, tcp_sn may have wrapped around, so a match would no >>>> longer be valid anyway). >>>> >>>> Previous patch introduced tls_offload_rx_resync_async_request_cancel() >>>> to explicitly cancel resync requests when a device response failure >>>> is detected. >>>> >>>> This patch adds a final safeguard: cancel the async resync request when >>>> rcd_delta crosses its threshold, as reaching this point implies that >>>> earlier cancellation did not occur. >>> >>> Missing a Fixes tag >> Will add >>> >>>> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c >>>> index f672a62a9a52..56c14f1647a4 100644 >>>> --- a/net/tls/tls_device.c >>>> +++ b/net/tls/tls_device.c >>>> @@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async, >>>> /* shouldn't get to wraparound: >>>> * too long in async stage, something bad happened >>>> */ >>>> - if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) >>>> + if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) { >>>> + /* cancel resync request */ >>>> + atomic64_set(&resync_async->req, 0); >>> >>> we should probably use the helper added by the previous patch (I'd >>> probably squash them TBH) >> >> It's not trivial to use the helper here, since we don't have the socket. > > tls_device_rx_resync_async doesn't currently get the socket, but it > has only one caller, tls_device_rx_resync_new_rec, which does. So > tls_device_rx_resync_async could easily get the socket. Or just pass > resync_async to tls_offload_rx_resync_async_request_cancel, since > that's what it really needs? > yes these are options, but we don't like too much passing the socket to tls_device_rx_resync_new_rec() merely for this matter. Also we wanted to keep tls_offload_rx_resync_async_request_cancel in the same format of tls_offload_rx_resync_async_request_start/end meaning to have the socket as a parameter.
2025-09-28, 09:35:48 +0300, Shahar Shitrit wrote: > > > On 22/09/2025 18:54, Sabrina Dubroca wrote: > > 2025-09-22, 10:18:52 +0300, Shahar Shitrit wrote: > >> > >> > >> On 14/09/2025 21:53, Jakub Kicinski wrote: > >>> On Wed, 10 Sep 2025 09:47:40 +0300 Tariq Toukan wrote: > >>>> When a netdev issues an RX async resync request, the TLS module > >>>> increments rcd_delta for each new record that arrives. This tracks > >>>> how far the current record is from the point where synchronization > >>>> was lost. > >>>> > >>>> When rcd_delta reaches its threshold, it indicates that the device > >>>> response is either excessively delayed or unlikely to arrive at all > >>>> (at that point, tcp_sn may have wrapped around, so a match would no > >>>> longer be valid anyway). > >>>> > >>>> Previous patch introduced tls_offload_rx_resync_async_request_cancel() > >>>> to explicitly cancel resync requests when a device response failure > >>>> is detected. > >>>> > >>>> This patch adds a final safeguard: cancel the async resync request when > >>>> rcd_delta crosses its threshold, as reaching this point implies that > >>>> earlier cancellation did not occur. > >>> > >>> Missing a Fixes tag > >> Will add > >>> > >>>> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c > >>>> index f672a62a9a52..56c14f1647a4 100644 > >>>> --- a/net/tls/tls_device.c > >>>> +++ b/net/tls/tls_device.c > >>>> @@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async, > >>>> /* shouldn't get to wraparound: > >>>> * too long in async stage, something bad happened > >>>> */ > >>>> - if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) > >>>> + if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) { > >>>> + /* cancel resync request */ > >>>> + atomic64_set(&resync_async->req, 0); > >>> > >>> we should probably use the helper added by the previous patch (I'd > >>> probably squash them TBH) > >> > >> It's not trivial to use the helper here, since we don't have the socket. > > > > tls_device_rx_resync_async doesn't currently get the socket, but it > > has only one caller, tls_device_rx_resync_new_rec, which does. So > > tls_device_rx_resync_async could easily get the socket. Or just pass > > resync_async to tls_offload_rx_resync_async_request_cancel, since > > that's what it really needs? > > > yes these are options, but we don't like too much passing the socket to > tls_device_rx_resync_new_rec() merely for this matter. Why not? If you felt the need to add a comment saying we're canceling the request, using a helper instead that says it does the canceling is a pretty decent reason to add whatever argument tls_device_rx_resync_async needs (or swap resync_async for the socket if you don't want to add another argument). > Also we wanted to > keep tls_offload_rx_resync_async_request_cancel in the same format of > tls_offload_rx_resync_async_request_start/end meaning to have the socket > as a parameter. Then they could easily be changed to make the 3 helpers consistent (all taking resync_async), since tls_offload_rx_resync_async_request_start/end are used exactly once each. -- Sabrina
2025-09-10, 09:47:40 +0300, Tariq Toukan wrote: > From: Shahar Shitrit <shshitrit@nvidia.com> > > When a netdev issues an RX async resync request, the TLS module > increments rcd_delta for each new record that arrives. This tracks > how far the current record is from the point where synchronization > was lost. > > When rcd_delta reaches its threshold, it indicates that the device > response is either excessively delayed or unlikely to arrive at all > (at that point, tcp_sn may have wrapped around, so a match would no > longer be valid anyway). > > Previous patch introduced tls_offload_rx_resync_async_request_cancel() > to explicitly cancel resync requests when a device response failure > is detected. > > This patch adds a final safeguard: cancel the async resync request when > rcd_delta crosses its threshold, as reaching this point implies that > earlier cancellation did not occur. > > Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> > Signed-off-by: Tariq Toukan <tariqt@nvidia.com> > --- > net/tls/tls_device.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c > index f672a62a9a52..56c14f1647a4 100644 > --- a/net/tls/tls_device.c > +++ b/net/tls/tls_device.c > @@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async, > /* shouldn't get to wraparound: > * too long in async stage, something bad happened > */ > - if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) > + if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) { Do we still need to WARN here? It's a condition that can actually happen (even if it's rare), and that the stack can handle, so maybe not? > + /* cancel resync request */ > + atomic64_set(&resync_async->req, 0); > return false; > + } > > /* asynchronous stage: log all headers seq such that > * req_seq <= seq <= end_seq, and wait for real resync request > -- > 2.31.1 > -- Sabrina
On 12/09/2025 18:14, Sabrina Dubroca wrote: > 2025-09-10, 09:47:40 +0300, Tariq Toukan wrote: >> From: Shahar Shitrit <shshitrit@nvidia.com> >> >> When a netdev issues an RX async resync request, the TLS module >> increments rcd_delta for each new record that arrives. This tracks >> how far the current record is from the point where synchronization >> was lost. >> >> When rcd_delta reaches its threshold, it indicates that the device >> response is either excessively delayed or unlikely to arrive at all >> (at that point, tcp_sn may have wrapped around, so a match would no >> longer be valid anyway). >> >> Previous patch introduced tls_offload_rx_resync_async_request_cancel() >> to explicitly cancel resync requests when a device response failure >> is detected. >> >> This patch adds a final safeguard: cancel the async resync request when >> rcd_delta crosses its threshold, as reaching this point implies that >> earlier cancellation did not occur. >> >> Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> >> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> >> --- >> net/tls/tls_device.c | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c >> index f672a62a9a52..56c14f1647a4 100644 >> --- a/net/tls/tls_device.c >> +++ b/net/tls/tls_device.c >> @@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async, >> /* shouldn't get to wraparound: >> * too long in async stage, something bad happened >> */ >> - if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) >> + if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) { > > Do we still need to WARN here? It's a condition that can actually > happen (even if it's rare), and that the stack can handle, so maybe > not? > You are right that now the stack handles this, but removing the WARN without any alternative, will remove any indication that something went wrong and will prevent us from improving by searching the error flow where we didn't cancel the request before reaching here. We can maybe replace the WARN with a counter. what do you think? >> + /* cancel resync request */ >> + atomic64_set(&resync_async->req, 0); >> return false; >> + } >> >> /* asynchronous stage: log all headers seq such that >> * req_seq <= seq <= end_seq, and wait for real resync request >> -- >> 2.31.1 >> >
2025-09-22, 10:16:21 +0300, Shahar Shitrit wrote: > > > On 12/09/2025 18:14, Sabrina Dubroca wrote: > > 2025-09-10, 09:47:40 +0300, Tariq Toukan wrote: > >> From: Shahar Shitrit <shshitrit@nvidia.com> > >> > >> When a netdev issues an RX async resync request, the TLS module > >> increments rcd_delta for each new record that arrives. This tracks > >> how far the current record is from the point where synchronization > >> was lost. > >> > >> When rcd_delta reaches its threshold, it indicates that the device > >> response is either excessively delayed or unlikely to arrive at all > >> (at that point, tcp_sn may have wrapped around, so a match would no > >> longer be valid anyway). > >> > >> Previous patch introduced tls_offload_rx_resync_async_request_cancel() > >> to explicitly cancel resync requests when a device response failure > >> is detected. > >> > >> This patch adds a final safeguard: cancel the async resync request when > >> rcd_delta crosses its threshold, as reaching this point implies that > >> earlier cancellation did not occur. > >> > >> Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com> > >> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> > >> --- > >> net/tls/tls_device.c | 5 ++++- > >> 1 file changed, 4 insertions(+), 1 deletion(-) > >> > >> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c > >> index f672a62a9a52..56c14f1647a4 100644 > >> --- a/net/tls/tls_device.c > >> +++ b/net/tls/tls_device.c > >> @@ -721,8 +721,11 @@ tls_device_rx_resync_async(struct tls_offload_resync_async *resync_async, > >> /* shouldn't get to wraparound: > >> * too long in async stage, something bad happened > >> */ > >> - if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) > >> + if (WARN_ON_ONCE(resync_async->rcd_delta == USHRT_MAX)) { > > > > Do we still need to WARN here? It's a condition that can actually > > happen (even if it's rare), and that the stack can handle, so maybe > > not? > > > You are right that now the stack handles this, but removing the WARN > without any alternative, will remove any indication that something went > wrong and will prevent us from improving by searching the error flow > where we didn't cancel the request before reaching here. We can maybe > replace the WARN with a counter. what do you think? Do you use CONFIG_DEBUG_NET in your devel/test kernels? If so, DEBUG_NET_WARN_ONCE would be an option. Or is it more so that users/customers can report the problem (ie on production kernels without CONFIG_DEBUG_NET) - in that case, the counter would work better. But if you really think that this condition indicates a driver bug, maybe the WARN is still appropriate. Jakub, what do you think? BTW, I was also thinking that the documentation (Documentation/networking/tls-offload.rst) could maybe be improved a bit with a description of how async resync works and how the driver is expected to use the tls_offload_rx_resync_async_request_{start,end} (and now _cancel) helpers. The section on "Stream scan resynchronization" is pretty abstract. -- Sabrina
© 2016 - 2025 Red Hat, Inc.