RE: [PATCH] net: usb: r8152: fix transmit queue timeout

Hayes Wang posted 1 patch 3 weeks, 2 days ago
RE: [PATCH] net: usb: r8152: fix transmit queue timeout
Posted by Hayes Wang 3 weeks, 2 days ago
lu lu <insyelu@gmail.com>
> Sent: Thursday, January 15, 2026 9:37 AM
[...]
> To reduce the performance impact on the tx_tl tasklet’s transmit path,
> netif_trans_update() has been moved from the main transmit path into
> write_bulk_callback (the USB transfer completion callback).
> The main considerations are as follows:
> 1. Reduce frequent tasklet overhead
> netif_trans_update() is invoked frequently under high-throughput
> conditions. Calling it directly in the main transmit path continuously
> introduces a small but noticeable CPU overhead, degrading the
> scheduling efficiency of the tx_tl tasklet.
> 2. Move non-critical operations out of the critical path
> By deferring netif_trans_update() to the USB callback thread—and
> ensuring it executes after tasklet_schedule(&tp->tx_tl)—the timestamp
> update is removed from the critical transmit scheduling path, further
> reducing the burden on tx_tl.

Excuse me, I do not fully understand the reasoning above.
It seems that this change merely shifts the time (or effort) from tx_tl to the TX completion callback.

While the intention is to make tx_tl run faster, this also delays the completion of the callback,
which in turn may delay both the next callback execution and the next scheduling of tx_tl.

From this perspective, it is unclear what is actually being saved.

Have you observed a measurable difference based on testing?

If you want to reduce the frequency of calling netif_trans_update(),
you could try something like the following. This way,
netif_trans_update() would not be executed on every transmission.

--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -2432,9 +2432,12 @@ static int r8152_tx_agg_fill(struct r8152 *tp, struct tx_agg *agg)

        netif_tx_lock(tp->netdev);

-       if (netif_queue_stopped(tp->netdev) &&
-           skb_queue_len(&tp->tx_queue) < tp->tx_qlen)
+       if (netif_queue_stopped(tp->netdev)) {
+           if (skb_queue_len(&tp->tx_queue) < tp->tx_qlen)
                netif_wake_queue(tp->netdev);
+           else
+               netif_trans_update(tp->netdev);
+       }

        netif_tx_unlock(tp->netdev);

Best Regards,
Hayes

Re: [PATCH] net: usb: r8152: fix transmit queue timeout
Posted by lu lu 3 weeks, 1 day ago
Hayes Wang <hayeswang@realtek.com> 于2026年1月15日周四 19:42写道:
>
> lu lu <insyelu@gmail.com>
> > Sent: Thursday, January 15, 2026 9:37 AM
> [...]
> > To reduce the performance impact on the tx_tl tasklet’s transmit path,
> > netif_trans_update() has been moved from the main transmit path into
> > write_bulk_callback (the USB transfer completion callback).
> > The main considerations are as follows:
> > 1. Reduce frequent tasklet overhead
> > netif_trans_update() is invoked frequently under high-throughput
> > conditions. Calling it directly in the main transmit path continuously
> > introduces a small but noticeable CPU overhead, degrading the
> > scheduling efficiency of the tx_tl tasklet.
> > 2. Move non-critical operations out of the critical path
> > By deferring netif_trans_update() to the USB callback thread—and
> > ensuring it executes after tasklet_schedule(&tp->tx_tl)—the timestamp
> > update is removed from the critical transmit scheduling path, further
> > reducing the burden on tx_tl.
>
> Excuse me, I do not fully understand the reasoning above.
> It seems that this change merely shifts the time (or effort) from tx_tl to the TX completion callback.
>
> While the intention is to make tx_tl run faster, this also delays the completion of the callback,
> which in turn may delay both the next callback execution and the next scheduling of tx_tl.
>
> From this perspective, it is unclear what is actually being saved.
>
> Have you observed a measurable difference based on testing?

HW: RK3399-based development board (dual-core Cortex-A72 + quad-core Cortex-A53)
OS: LEDE (OpenWrt)
NIC: 2.5G USB Ethernet adapter connected via USB 3.0

Due to platform performance limitations, it cannot reach 2.5G. When
processing in the write_bulk_callback and during USB submission,
iperf3 stress testing shows a difference of approximately 8Mbit/s.
This may be related to the platform's big.LITTLE core configuration.


>
> If you want to reduce the frequency of calling netif_trans_update(),
> you could try something like the following. This way,
> netif_trans_update() would not be executed on every transmission.
>
> --- a/drivers/net/usb/r8152.c
> +++ b/drivers/net/usb/r8152.c
> @@ -2432,9 +2432,12 @@ static int r8152_tx_agg_fill(struct r8152 *tp, struct tx_agg *agg)
>
>         netif_tx_lock(tp->netdev);
>
> -       if (netif_queue_stopped(tp->netdev) &&
> -           skb_queue_len(&tp->tx_queue) < tp->tx_qlen)
> +       if (netif_queue_stopped(tp->netdev)) {
> +           if (skb_queue_len(&tp->tx_queue) < tp->tx_qlen)
>                 netif_wake_queue(tp->netdev);
> +           else
> +               netif_trans_update(tp->netdev);
> +       }
The queue was stopped because it exceeded the threshold. Attempting to
refresh the time at this point is clearly too late.

Thank you for your suggestion. I will submit v2 (immediately updating
the transfer start time after successful USB submission).

Best Regards,
insyelu

>
>         netif_tx_unlock(tp->netdev);
>
> Best Regards,
> Hayes
>
RE: [PATCH] net: usb: r8152: fix transmit queue timeout
Posted by Hayes Wang 3 weeks, 1 day ago
lu lu <insyelu@gmail.com>
> Sent: Friday, January 16, 2026 10:11 AM
[...]
> >         netif_tx_lock(tp->netdev);
> >
> > -       if (netif_queue_stopped(tp->netdev) &&
> > -           skb_queue_len(&tp->tx_queue) < tp->tx_qlen)
> > +       if (netif_queue_stopped(tp->netdev)) {
> > +           if (skb_queue_len(&tp->tx_queue) < tp->tx_qlen)
> >                 netif_wake_queue(tp->netdev);
> > +           else
> > +               netif_trans_update(tp->netdev);
> > +       }
> The queue was stopped because it exceeded the threshold. Attempting to
> refresh the time at this point is clearly too late.

Why would this be considered too late?
Based on RTL8152_TX_TIMEOUT, there are about 5 seconds to
wake the queue or update the timestamp before a TX timeout occurs.
I believe 5 seconds should be sufficient.

If there is no TX submission for 5 seconds after the driver stops the queue,
then something is already wrong.

Best Regards,
Hayes

Re: [PATCH] net: usb: r8152: fix transmit queue timeout
Posted by lu lu 3 weeks, 1 day ago
Hayes Wang <hayeswang@realtek.com> 于2026年1月16日周五 11:11写道:
>
> lu lu <insyelu@gmail.com>
> > Sent: Friday, January 16, 2026 10:11 AM
> [...]
> > >         netif_tx_lock(tp->netdev);
> > >
> > > -       if (netif_queue_stopped(tp->netdev) &&
> > > -           skb_queue_len(&tp->tx_queue) < tp->tx_qlen)
> > > +       if (netif_queue_stopped(tp->netdev)) {
> > > +           if (skb_queue_len(&tp->tx_queue) < tp->tx_qlen)
> > >                 netif_wake_queue(tp->netdev);
> > > +           else
> > > +               netif_trans_update(tp->netdev);
> > > +       }
> > The queue was stopped because it exceeded the threshold. Attempting to
> > refresh the time at this point is clearly too late.
>
> Why would this be considered too late?

if (netif_queue_stopped(tp->netdev)) {
    if (skb_queue_len(&tp->tx_queue) < tp->tx_qlen)
        netif_wake_queue(tp->netdev);
    else
        netif_trans_update(tp->netdev);
}
The first time xmit stops the transmit queue, the queue is not full,
and it is successfully woken up afterward — OK.
The second time xmit stops the transmit queue, the network watchdog
times out immediately because the transmit timestamp was not refreshed
when the queue was last resumed — FAIL.
This scenario is logically possible.

There is no clear evidence that netif_trans_update imposes a
significant CPU load.
Please help me review:
https://patchwork.kernel.org/project/linux-usb/patch/20260116023725.8095-1-insyelu@gmail.com

> Based on RTL8152_TX_TIMEOUT, there are about 5 seconds to
> wake the queue or update the timestamp before a TX timeout occurs.
> I believe 5 seconds should be sufficient.

>
> If there is no TX submission for 5 seconds after the driver stops the queue,
> then something is already wrong.
>
> Best Regards,
> Hayes
>
RE: [PATCH] net: usb: r8152: fix transmit queue timeout
Posted by Hayes Wang 2 weeks, 5 days ago
Original Message-----
> From: lu lu <insyelu@gmail.com>
[...]
> if (netif_queue_stopped(tp->netdev)) {
>     if (skb_queue_len(&tp->tx_queue) < tp->tx_qlen)
>         netif_wake_queue(tp->netdev);
>     else
>         netif_trans_update(tp->netdev);
> }
> The first time xmit stops the transmit queue, the queue is not full,
> and it is successfully woken up afterward — OK.
> The second time xmit stops the transmit queue, the network watchdog
> times out immediately because the transmit timestamp was not refreshed
> when the queue was last resumed — FAIL.
> This scenario is logically possible.

This situation should not happen, because trans_start is also updated when the driver stops the TX queue.

https://elixir.bootlin.com/linux/v6.18.6/source/include/linux/netdevice.h#L3629

A TX timeout occurs only if the TX queue has been stopped for longer than RTL8152_TX_TIMEOUT.
It should not occur immediately when the driver stops the TX queue.

Therefore, what needs to be done is to update the timestamp when the TX queue is stopped.
Updating trans_start while the TX queue is not stopped is useless.

Best Regards,
Hayes

Re: [PATCH] net: usb: r8152: fix transmit queue timeout
Posted by lu lu 2 weeks, 5 days ago
Hayes Wang <hayeswang@realtek.com> 于2026年1月19日周一 10:51写道:
>
> Original Message-----
> > From: lu lu <insyelu@gmail.com>
> [...]
> > if (netif_queue_stopped(tp->netdev)) {
> >     if (skb_queue_len(&tp->tx_queue) < tp->tx_qlen)
> >         netif_wake_queue(tp->netdev);
> >     else
> >         netif_trans_update(tp->netdev);
> > }
> > The first time xmit stops the transmit queue, the queue is not full,
> > and it is successfully woken up afterward — OK.
> > The second time xmit stops the transmit queue, the network watchdog
> > times out immediately because the transmit timestamp was not refreshed
> > when the queue was last resumed — FAIL.
> > This scenario is logically possible.
>
> This situation should not happen, because trans_start is also updated when the driver stops the TX queue.
>
> https://elixir.bootlin.com/linux/v6.18.6/source/include/linux/netdevice.h#L3629
>
> A TX timeout occurs only if the TX queue has been stopped for longer than RTL8152_TX_TIMEOUT.
> It should not occur immediately when the driver stops the TX queue.
Thank you for the correction! Upon review, I confirmed that
netif_tx_stop_queue() already updates trans_start,
so the timeout scenario I originally envisioned does not actually occur.

>
> Therefore, what needs to be done is to update the timestamp when the TX queue is stopped.
> Updating trans_start while the TX queue is not stopped is useless.
if (netif_queue_stopped(tp->netdev)) {
    if (skb_queue_len(&tp->tx_queue) < tp->tx_qlen)
        netif_wake_queue(tp->netdev);
    else
        netif_trans_update(tp->netdev);
}
This change continuously updates the trans_start value, even when the
TX queue has been stopped and its length exceeds the threshold.
This may prevent the watchdog timer from ever timing out, thereby
masking potential transmission stall issues.

The timestamp should be updated only upon successful URB submission to
accurately reflect that the transport layer is still operational.

Best regards,
insyelu
RE: [PATCH] net: usb: r8152: fix transmit queue timeout
Posted by Hayes Wang 2 weeks, 5 days ago
lu lu <insyelu@gmail.com>
> Sent: Monday, January 19, 2026 2:58 PM
[...]
> > Therefore, what needs to be done is to update the timestamp when the TX queue is stopped.
> > Updating trans_start while the TX queue is not stopped is useless.
> if (netif_queue_stopped(tp->netdev)) {
>     if (skb_queue_len(&tp->tx_queue) < tp->tx_qlen)
>         netif_wake_queue(tp->netdev);
>     else
>         netif_trans_update(tp->netdev);
> }
> This change continuously updates the trans_start value, even when the
> TX queue has been stopped and its length exceeds the threshold.
> This may prevent the watchdog timer from ever timing out, thereby
> masking potential transmission stall issues.
> 
> The timestamp should be updated only upon successful URB submission to
> accurately reflect that the transport layer is still operational.

Although I think a URB error and a transmission stall are different,
I am fine with the simpler approach in v2.

Best Regards,
Hayes