Hi,
This is V7 of a series originally submitted by Christoph.
When LRO is enabled on the MLX, mlx5e_skb_from_cqe_mpwrq_nonlinear
copies parts of the payload to the linear part of the skb.
This triggers suboptimal processing in GRO, causing slow throughput.
This patch series addresses this by using eth_get_headlen to compute the
size of the protocol headers and only copy those bits. This results in a
significant throughput improvement (detailed results in the specific
patch).
Regards,
Tariq
---
V7:
- Drop cache aligned memcpy patch as it no longer shows benefits on
further testing on other hosts.
- For XDP, pull at most ETH_HLEN bytes into linear part.
- Fix skb pull length calculation for XDP (Amery Hung).
- Switched from min_t() to min() to avoid skb->data_len 16 bit
truncation (David Laigh).
- Improved commit message for last patch to make it clear
that the benchmark is not on native XDP (Sashiko).
V6:
https://lore.kernel.org/all/20260507095330.318892-1-tariqt@nvidia.com/
Christoph Paasch (2):
net/mlx5e: DMA-sync earlier in mlx5e_skb_from_cqe_mpwrq_nonlinear
net/mlx5e: Avoid copying payload to the skb's linear part
.../net/ethernet/mellanox/mlx5/core/en_rx.c | 33 ++++++++++++-------
1 file changed, 22 insertions(+), 11 deletions(-)
base-commit: 8415598365503ced2e3d019491b0a2756c85c494
--
2.44.0