net/mlx5e: Speedup channel configuration operations

[PATCH net-next 0/6] net/mlx5e: Speedup channel configuration operations

Posted by Tariq Toukan 2 months, 3 weeks ago

Hi,

This series significantly improves the latency of channel configuration
operations, like interface up (create channels), interface down (destroy
channels), and channels reconfiguration (create new set, destroy old
one).

This is achieved by dropping reducing the default number of SQs in a
channel from 4 down to 2.

The first four patches by William do the needed refactoring to avoid
using the async ICOSQ in default.

The following two patches by me remove the egress xdp-redirect SQ by
default. It can still be created by loading a dummy XDP program.

The two remaining default SQs per channel:
1 TXQ SQ (for traffic), and 1 ICOSQ (for internal communication
operations with the device).

Perf numbers:
NIC: Connect-X7.
Setup: 248 channels.

Interface up + down:

Before:		2.605 secs
Patch 4:	2.246 secs (1.16x faster)
Patch 6:	1.798 secs (1.25x faster)

Overall: 1.45x faster in our example.

Regards,
Tariq

Tariq Toukan (2):
  net/mlx5e: Update XDP features in switch channels
  net/mlx5e: Support XDP target xmit with dummy program

William Tu (4):
  net/mlx5e: Move async ICOSQ lock into ICOSQ struct
  net/mlx5e: Use regular ICOSQ for triggering NAPI
  net/mlx5e: Move async ICOSQ to dynamic allocation
  net/mlx5e: Conditionally create async ICOSQ

 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  49 ++++++-
 .../mellanox/mlx5/core/en/reporter_tx.c       |   1 +
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   |   3 +
 .../ethernet/mellanox/mlx5/core/en/xsk/tx.c   |   6 +-
 .../mellanox/mlx5/core/en_accel/ktls.c        |  10 +-
 .../mellanox/mlx5/core/en_accel/ktls_rx.c     |  26 ++--
 .../mellanox/mlx5/core/en_accel/ktls_txrx.h   |   4 +-
 .../ethernet/mellanox/mlx5/core/en_ethtool.c  |  10 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 137 ++++++++++++------
 .../net/ethernet/mellanox/mlx5/core/en_rep.c  |   2 +-
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   |   3 +
 .../net/ethernet/mellanox/mlx5/core/en_txrx.c |   8 +-
 12 files changed, 179 insertions(+), 80 deletions(-)


base-commit: 8da7bea7db692e786165b71729fb68b7ff65ee56
-- 
2.31.1

Re: [PATCH net-next 0/6] net/mlx5e: Speedup channel configuration operations

Posted by Toke Høiland-Jørgensen 2 months, 3 weeks ago

Tariq Toukan <tariqt@nvidia.com> writes:

> Hi,
>
> This series significantly improves the latency of channel configuration
> operations, like interface up (create channels), interface down (destroy
> channels), and channels reconfiguration (create new set, destroy old
> one).

On the topic of improving ifup/ifdown times, I noticed at some point
that mlx5 will call synchronize_net() once for every queue when they are
deactivated (in mlx5e_deactivate_txqsq()). Have you considered changing
that to amortise the sync latency over the full interface bringdown? :)

-Toke

Re: [PATCH net-next 0/6] net/mlx5e: Speedup channel configuration operations

Posted by Tariq Toukan 2 months, 3 weeks ago


On 12/11/2025 12:54, Toke Høiland-Jørgensen wrote:
> Tariq Toukan <tariqt@nvidia.com> writes:
> 
>> Hi,
>>
>> This series significantly improves the latency of channel configuration
>> operations, like interface up (create channels), interface down (destroy
>> channels), and channels reconfiguration (create new set, destroy old
>> one).
> 
> On the topic of improving ifup/ifdown times, I noticed at some point
> that mlx5 will call synchronize_net() once for every queue when they are
> deactivated (in mlx5e_deactivate_txqsq()). Have you considered changing
> that to amortise the sync latency over the full interface bringdown? :)
> 
> -Toke
> 
> 

Correct!
This can be improved and I actually have WIP patches for this, as I'm 
revisiting this code area recently.

Re: [PATCH net-next 0/6] net/mlx5e: Speedup channel configuration operations

Posted by Toke Høiland-Jørgensen 2 months, 3 weeks ago

Tariq Toukan <ttoukan.linux@gmail.com> writes:

> On 12/11/2025 12:54, Toke Høiland-Jørgensen wrote:
>> Tariq Toukan <tariqt@nvidia.com> writes:
>> 
>>> Hi,
>>>
>>> This series significantly improves the latency of channel configuration
>>> operations, like interface up (create channels), interface down (destroy
>>> channels), and channels reconfiguration (create new set, destroy old
>>> one).
>> 
>> On the topic of improving ifup/ifdown times, I noticed at some point
>> that mlx5 will call synchronize_net() once for every queue when they are
>> deactivated (in mlx5e_deactivate_txqsq()). Have you considered changing
>> that to amortise the sync latency over the full interface bringdown? :)
>> 
>> -Toke
>> 
>> 
>
> Correct!
> This can be improved and I actually have WIP patches for this, as I'm 
> revisiting this code area recently.

Excellent! We ran into some issues with this a while back, so would be
great to see this improved.

-Toke

Re: [PATCH net-next 0/6] net/mlx5e: Speedup channel configuration operations

Posted by Tariq Toukan 2 months, 3 weeks ago


On 12/11/2025 18:33, Toke Høiland-Jørgensen wrote:
> Tariq Toukan <ttoukan.linux@gmail.com> writes:
> 
>> On 12/11/2025 12:54, Toke Høiland-Jørgensen wrote:
>>> Tariq Toukan <tariqt@nvidia.com> writes:
>>>
>>>> Hi,
>>>>
>>>> This series significantly improves the latency of channel configuration
>>>> operations, like interface up (create channels), interface down (destroy
>>>> channels), and channels reconfiguration (create new set, destroy old
>>>> one).
>>>
>>> On the topic of improving ifup/ifdown times, I noticed at some point
>>> that mlx5 will call synchronize_net() once for every queue when they are
>>> deactivated (in mlx5e_deactivate_txqsq()). Have you considered changing
>>> that to amortise the sync latency over the full interface bringdown? :)
>>>
>>> -Toke
>>>
>>>
>>
>> Correct!
>> This can be improved and I actually have WIP patches for this, as I'm
>> revisiting this code area recently.
> 
> Excellent! We ran into some issues with this a while back, so would be
> great to see this improved.
> 
> -Toke
> 

Can you elaborate on the test case and issues encountered?
To make sure I'm addressing them.

Re: [PATCH net-next 0/6] net/mlx5e: Speedup channel configuration operations

Posted by Toke Høiland-Jørgensen 2 months, 3 weeks ago

Tariq Toukan <ttoukan.linux@gmail.com> writes:

> On 12/11/2025 18:33, Toke Høiland-Jørgensen wrote:
>> Tariq Toukan <ttoukan.linux@gmail.com> writes:
>> 
>>> On 12/11/2025 12:54, Toke Høiland-Jørgensen wrote:
>>>> Tariq Toukan <tariqt@nvidia.com> writes:
>>>>
>>>>> Hi,
>>>>>
>>>>> This series significantly improves the latency of channel configuration
>>>>> operations, like interface up (create channels), interface down (destroy
>>>>> channels), and channels reconfiguration (create new set, destroy old
>>>>> one).
>>>>
>>>> On the topic of improving ifup/ifdown times, I noticed at some point
>>>> that mlx5 will call synchronize_net() once for every queue when they are
>>>> deactivated (in mlx5e_deactivate_txqsq()). Have you considered changing
>>>> that to amortise the sync latency over the full interface bringdown? :)
>>>>
>>>> -Toke
>>>>
>>>>
>>>
>>> Correct!
>>> This can be improved and I actually have WIP patches for this, as I'm
>>> revisiting this code area recently.
>> 
>> Excellent! We ran into some issues with this a while back, so would be
>> great to see this improved.
>> 
>> -Toke
>> 
>
> Can you elaborate on the test case and issues encountered?
> To make sure I'm addressing them.

Sure, thanks for taking a look!

The high-level issue we've been seeing involves long delays creating and
tearing down OpenShift (Kubernetes) pods that have SR-IOV devices
assigned to them. The worst example of involved a test that basically
reboots an application (tearing down its pods and immediately recreating
them), which takes up to ~10 minutes for ~100 pods.

Because a lot of the wait happens with the RNTL held, we also get
cascading errors to other parts of the system. This is how I ended up
digging into what the mlx5 driver was doing while holding the RTNL,
which is where I noticed the "synchronize_net() in a loop" behaviour.

We're working on reducing the blast radius of the RTNL in general, but
the setup/teardown time seems to be driver specific, so any improvements
here would be welcome, I guess :)

-Toke