drivers/net/ethernet/mellanox/mlx5/core/en.h | 49 ++++++- .../mellanox/mlx5/core/en/reporter_tx.c | 1 + .../ethernet/mellanox/mlx5/core/en/xsk/rx.c | 3 + .../ethernet/mellanox/mlx5/core/en/xsk/tx.c | 6 +- .../mellanox/mlx5/core/en_accel/ktls.c | 10 +- .../mellanox/mlx5/core/en_accel/ktls_rx.c | 26 ++-- .../mellanox/mlx5/core/en_accel/ktls_txrx.h | 4 +- .../ethernet/mellanox/mlx5/core/en_ethtool.c | 10 +- .../net/ethernet/mellanox/mlx5/core/en_main.c | 137 ++++++++++++------ .../net/ethernet/mellanox/mlx5/core/en_rep.c | 2 +- .../net/ethernet/mellanox/mlx5/core/en_rx.c | 3 + .../net/ethernet/mellanox/mlx5/core/en_txrx.c | 8 +- 12 files changed, 179 insertions(+), 80 deletions(-)
Hi, This series significantly improves the latency of channel configuration operations, like interface up (create channels), interface down (destroy channels), and channels reconfiguration (create new set, destroy old one). This is achieved by dropping reducing the default number of SQs in a channel from 4 down to 2. The first four patches by William do the needed refactoring to avoid using the async ICOSQ in default. The following two patches by me remove the egress xdp-redirect SQ by default. It can still be created by loading a dummy XDP program. The two remaining default SQs per channel: 1 TXQ SQ (for traffic), and 1 ICOSQ (for internal communication operations with the device). Perf numbers: NIC: Connect-X7. Setup: 248 channels. Interface up + down: Before: 2.605 secs Patch 4: 2.246 secs (1.16x faster) Patch 6: 1.798 secs (1.25x faster) Overall: 1.45x faster in our example. Regards, Tariq Tariq Toukan (2): net/mlx5e: Update XDP features in switch channels net/mlx5e: Support XDP target xmit with dummy program William Tu (4): net/mlx5e: Move async ICOSQ lock into ICOSQ struct net/mlx5e: Use regular ICOSQ for triggering NAPI net/mlx5e: Move async ICOSQ to dynamic allocation net/mlx5e: Conditionally create async ICOSQ drivers/net/ethernet/mellanox/mlx5/core/en.h | 49 ++++++- .../mellanox/mlx5/core/en/reporter_tx.c | 1 + .../ethernet/mellanox/mlx5/core/en/xsk/rx.c | 3 + .../ethernet/mellanox/mlx5/core/en/xsk/tx.c | 6 +- .../mellanox/mlx5/core/en_accel/ktls.c | 10 +- .../mellanox/mlx5/core/en_accel/ktls_rx.c | 26 ++-- .../mellanox/mlx5/core/en_accel/ktls_txrx.h | 4 +- .../ethernet/mellanox/mlx5/core/en_ethtool.c | 10 +- .../net/ethernet/mellanox/mlx5/core/en_main.c | 137 ++++++++++++------ .../net/ethernet/mellanox/mlx5/core/en_rep.c | 2 +- .../net/ethernet/mellanox/mlx5/core/en_rx.c | 3 + .../net/ethernet/mellanox/mlx5/core/en_txrx.c | 8 +- 12 files changed, 179 insertions(+), 80 deletions(-) base-commit: 8da7bea7db692e786165b71729fb68b7ff65ee56 -- 2.31.1
Tariq Toukan <tariqt@nvidia.com> writes: > Hi, > > This series significantly improves the latency of channel configuration > operations, like interface up (create channels), interface down (destroy > channels), and channels reconfiguration (create new set, destroy old > one). On the topic of improving ifup/ifdown times, I noticed at some point that mlx5 will call synchronize_net() once for every queue when they are deactivated (in mlx5e_deactivate_txqsq()). Have you considered changing that to amortise the sync latency over the full interface bringdown? :) -Toke
On 12/11/2025 12:54, Toke Høiland-Jørgensen wrote: > Tariq Toukan <tariqt@nvidia.com> writes: > >> Hi, >> >> This series significantly improves the latency of channel configuration >> operations, like interface up (create channels), interface down (destroy >> channels), and channels reconfiguration (create new set, destroy old >> one). > > On the topic of improving ifup/ifdown times, I noticed at some point > that mlx5 will call synchronize_net() once for every queue when they are > deactivated (in mlx5e_deactivate_txqsq()). Have you considered changing > that to amortise the sync latency over the full interface bringdown? :) > > -Toke > > Correct! This can be improved and I actually have WIP patches for this, as I'm revisiting this code area recently.
Tariq Toukan <ttoukan.linux@gmail.com> writes: > On 12/11/2025 12:54, Toke Høiland-Jørgensen wrote: >> Tariq Toukan <tariqt@nvidia.com> writes: >> >>> Hi, >>> >>> This series significantly improves the latency of channel configuration >>> operations, like interface up (create channels), interface down (destroy >>> channels), and channels reconfiguration (create new set, destroy old >>> one). >> >> On the topic of improving ifup/ifdown times, I noticed at some point >> that mlx5 will call synchronize_net() once for every queue when they are >> deactivated (in mlx5e_deactivate_txqsq()). Have you considered changing >> that to amortise the sync latency over the full interface bringdown? :) >> >> -Toke >> >> > > Correct! > This can be improved and I actually have WIP patches for this, as I'm > revisiting this code area recently. Excellent! We ran into some issues with this a while back, so would be great to see this improved. -Toke
On 12/11/2025 18:33, Toke Høiland-Jørgensen wrote: > Tariq Toukan <ttoukan.linux@gmail.com> writes: > >> On 12/11/2025 12:54, Toke Høiland-Jørgensen wrote: >>> Tariq Toukan <tariqt@nvidia.com> writes: >>> >>>> Hi, >>>> >>>> This series significantly improves the latency of channel configuration >>>> operations, like interface up (create channels), interface down (destroy >>>> channels), and channels reconfiguration (create new set, destroy old >>>> one). >>> >>> On the topic of improving ifup/ifdown times, I noticed at some point >>> that mlx5 will call synchronize_net() once for every queue when they are >>> deactivated (in mlx5e_deactivate_txqsq()). Have you considered changing >>> that to amortise the sync latency over the full interface bringdown? :) >>> >>> -Toke >>> >>> >> >> Correct! >> This can be improved and I actually have WIP patches for this, as I'm >> revisiting this code area recently. > > Excellent! We ran into some issues with this a while back, so would be > great to see this improved. > > -Toke > Can you elaborate on the test case and issues encountered? To make sure I'm addressing them.
Tariq Toukan <ttoukan.linux@gmail.com> writes: > On 12/11/2025 18:33, Toke Høiland-Jørgensen wrote: >> Tariq Toukan <ttoukan.linux@gmail.com> writes: >> >>> On 12/11/2025 12:54, Toke Høiland-Jørgensen wrote: >>>> Tariq Toukan <tariqt@nvidia.com> writes: >>>> >>>>> Hi, >>>>> >>>>> This series significantly improves the latency of channel configuration >>>>> operations, like interface up (create channels), interface down (destroy >>>>> channels), and channels reconfiguration (create new set, destroy old >>>>> one). >>>> >>>> On the topic of improving ifup/ifdown times, I noticed at some point >>>> that mlx5 will call synchronize_net() once for every queue when they are >>>> deactivated (in mlx5e_deactivate_txqsq()). Have you considered changing >>>> that to amortise the sync latency over the full interface bringdown? :) >>>> >>>> -Toke >>>> >>>> >>> >>> Correct! >>> This can be improved and I actually have WIP patches for this, as I'm >>> revisiting this code area recently. >> >> Excellent! We ran into some issues with this a while back, so would be >> great to see this improved. >> >> -Toke >> > > Can you elaborate on the test case and issues encountered? > To make sure I'm addressing them. Sure, thanks for taking a look! The high-level issue we've been seeing involves long delays creating and tearing down OpenShift (Kubernetes) pods that have SR-IOV devices assigned to them. The worst example of involved a test that basically reboots an application (tearing down its pods and immediately recreating them), which takes up to ~10 minutes for ~100 pods. Because a lot of the wait happens with the RNTL held, we also get cascading errors to other parts of the system. This is how I ended up digging into what the mlx5 driver was doing while holding the RTNL, which is where I noticed the "synchronize_net() in a loop" behaviour. We're working on reducing the blast radius of the RTNL in general, but the setup/teardown time seems to be driver specific, so any improvements here would be welcome, I guess :) -Toke
© 2016 - 2026 Red Hat, Inc.