[PATCH net-next 1/3] net: Introduce netif_xmit_time_out_duration() helper

Tariq Toukan posted 3 patches 6 days, 16 hours ago
[PATCH net-next 1/3] net: Introduce netif_xmit_time_out_duration() helper
Posted by Tariq Toukan 6 days, 16 hours ago
From: Shahar Shitrit <shshitrit@nvidia.com>

Introduce a new helper function netif_xmit_time_out_duration() to
check if a TX queue has timed out and report the timeout duration.
This helper consolidates the logic that is duplicated in several
locations and also encapsulates the check for whether the TX queue
is stopped.

As the first user, convert dev_watchdog() to use this helper.

Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Yael Chemla <ychemla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 include/linux/netdevice.h | 15 +++++++++++++++
 net/sched/sch_generic.c   |  7 +++----
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e808071dbb7d..3cd73769fcfa 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3680,6 +3680,21 @@ static inline bool netif_xmit_stopped(const struct netdev_queue *dev_queue)
 	return dev_queue->state & QUEUE_STATE_ANY_XOFF;
 }
 
+static inline unsigned int
+netif_xmit_timeout_ms(struct netdev_queue *txq, unsigned long *trans_start)
+{
+	unsigned long txq_trans_start = READ_ONCE(txq->trans_start);
+
+	if (trans_start)
+		*trans_start = txq_trans_start;
+
+	if (netif_xmit_stopped(txq) &&
+	    time_after(jiffies, txq_trans_start + txq->dev->watchdog_timeo))
+		return jiffies_to_msecs(jiffies - txq_trans_start);
+
+	return 0;
+}
+
 static inline bool
 netif_xmit_frozen_or_stopped(const struct netdev_queue *dev_queue)
 {
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 852e603c1755..aa6192781a24 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -523,10 +523,9 @@ static void dev_watchdog(struct timer_list *t)
 				 * netdev_tx_sent_queue() and netif_tx_stop_queue().
 				 */
 				smp_mb();
-				trans_start = READ_ONCE(txq->trans_start);
-
-				if (time_after(jiffies, trans_start + dev->watchdog_timeo)) {
-					timedout_ms = jiffies_to_msecs(jiffies - trans_start);
+				timedout_ms = netif_xmit_timeout_ms(txq,
+								    &trans_start);
+				if (timedout_ms) {
 					atomic_long_inc(&txq->trans_timeout);
 					break;
 				}
-- 
2.31.1
Re: [PATCH net-next 1/3] net: Introduce netif_xmit_time_out_duration() helper
Posted by Jakub Kicinski 3 days, 22 hours ago
On Tue, 25 Nov 2025 09:12:54 +0200 Tariq Toukan wrote:
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index e808071dbb7d..3cd73769fcfa 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h

include/net/net_queue.h seems like a better place for new code

> @@ -3680,6 +3680,21 @@ static inline bool netif_xmit_stopped(const struct netdev_queue *dev_queue)
>  	return dev_queue->state & QUEUE_STATE_ANY_XOFF;
>  }
>  
> +static inline unsigned int
> +netif_xmit_timeout_ms(struct netdev_queue *txq, unsigned long *trans_start)
> +{
> +	unsigned long txq_trans_start = READ_ONCE(txq->trans_start);
> +
> +	if (trans_start)
> +		*trans_start = txq_trans_start;

The drivers don't really care about this, AFAICT hns3 uses this
to calculate the stall length (return value of this func.

> +	if (netif_xmit_stopped(txq) &&
> +	    time_after(jiffies, txq_trans_start + txq->dev->watchdog_timeo))
> +		return jiffies_to_msecs(jiffies - txq_trans_start);
> +
> +	return 0;
> +}
> +
>  static inline bool
>  netif_xmit_frozen_or_stopped(const struct netdev_queue *dev_queue)
>  {
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 852e603c1755..aa6192781a24 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -523,10 +523,9 @@ static void dev_watchdog(struct timer_list *t)
>  				 * netdev_tx_sent_queue() and netif_tx_stop_queue().
>  				 */
>  				smp_mb();
> -				trans_start = READ_ONCE(txq->trans_start);
> -
> -				if (time_after(jiffies, trans_start + dev->watchdog_timeo)) {
> -					timedout_ms = jiffies_to_msecs(jiffies - trans_start);
> +				timedout_ms = netif_xmit_timeout_ms(txq,
> +								    &trans_start);
> +				if (timedout_ms) {

The use of the new helper in the core feels a bit forced, I'd leave 
the core as is. Otherwise you need the awkward output param, and
core now duplicates the netif_xmit_stopped(txq) check

>  					atomic_long_inc(&txq->trans_timeout);
>  					break;
>  				}
Re: [PATCH net-next 1/3] net: Introduce netif_xmit_time_out_duration() helper
Posted by Paolo Abeni 4 days, 13 hours ago
On 11/25/25 8:12 AM, Tariq Toukan wrote:
> From: Shahar Shitrit <shshitrit@nvidia.com>
> 
> Introduce a new helper function netif_xmit_time_out_duration() to
> check if a TX queue has timed out and report the timeout duration.
> This helper consolidates the logic that is duplicated in several
> locations and also encapsulates the check for whether the TX queue
> is stopped.
> 
> As the first user, convert dev_watchdog() to use this helper.
> 
> Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
> Reviewed-by: Yael Chemla <ychemla@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>  include/linux/netdevice.h | 15 +++++++++++++++
>  net/sched/sch_generic.c   |  7 +++----
>  2 files changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index e808071dbb7d..3cd73769fcfa 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3680,6 +3680,21 @@ static inline bool netif_xmit_stopped(const struct netdev_queue *dev_queue)
>  	return dev_queue->state & QUEUE_STATE_ANY_XOFF;
>  }
>  
> +static inline unsigned int
> +netif_xmit_timeout_ms(struct netdev_queue *txq, unsigned long *trans_start)
> +{
> +	unsigned long txq_trans_start = READ_ONCE(txq->trans_start);
> +
> +	if (trans_start)
> +		*trans_start = txq_trans_start;

What about making this argument mandatory?

> +
> +	if (netif_xmit_stopped(txq) &&

Why restricting to the <queue stopped> case? AFAICS the watchdog is
intended to additionally catch the scenarios where the rx ring is not
full but the H/W is stuck for whatever reasons, and this change will not
catch them anymore.

/P
Re: [PATCH net-next 1/3] net: Introduce netif_xmit_time_out_duration() helper
Posted by Shahar Shitrit 4 days, 10 hours ago

On 27/11/2025 12:53, Paolo Abeni wrote:
> On 11/25/25 8:12 AM, Tariq Toukan wrote:
>> From: Shahar Shitrit <shshitrit@nvidia.com>
>>
>> Introduce a new helper function netif_xmit_time_out_duration() to
>> check if a TX queue has timed out and report the timeout duration.
>> This helper consolidates the logic that is duplicated in several
>> locations and also encapsulates the check for whether the TX queue
>> is stopped.
>>
>> As the first user, convert dev_watchdog() to use this helper.
>>
>> Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
>> Reviewed-by: Yael Chemla <ychemla@nvidia.com>
>> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
>> ---
>>  include/linux/netdevice.h | 15 +++++++++++++++
>>  net/sched/sch_generic.c   |  7 +++----
>>  2 files changed, 18 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index e808071dbb7d..3cd73769fcfa 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -3680,6 +3680,21 @@ static inline bool netif_xmit_stopped(const struct netdev_queue *dev_queue)
>>  	return dev_queue->state & QUEUE_STATE_ANY_XOFF;
>>  }
>>  
>> +static inline unsigned int
>> +netif_xmit_timeout_ms(struct netdev_queue *txq, unsigned long *trans_start)
>> +{
>> +	unsigned long txq_trans_start = READ_ONCE(txq->trans_start);
>> +
>> +	if (trans_start)
>> +		*trans_start = txq_trans_start;
> 
> What about making this argument mandatory?

Since not all callers are interested in this return value, as in the
case of mlx5, it would be nice to allow them pass NULL.>
>> +
>> +	if (netif_xmit_stopped(txq) &&
> 
> Why restricting to the <queue stopped> case? AFAICS the watchdog is
> intended to additionally catch the scenarios where the rx ring is not
> full but the H/W is stuck for whatever reasons, and this change will not
> catch them anymore.
> 
> /P
>

dev_watchdog catches only the cases where tx queue is both full (queue
stopped) and timed out (last transmit timestamp was longer ago than the
watchdog timeout period). We wanted to preserve the same conditions.

By the way, I notice now that the new helper name in the title doesn't
match the one in the code. We will fix.