[v6] net/mlx5: Avoid payload in skb's linear part for better GRO-processing

[PATCH net-next V6 0/3] net/mlx5: Avoid payload in skb's linear part for better GRO-processing

Posted by Tariq Toukan 1 month, 1 week ago

Hi,

This is V6 of a series originally submitted by Christoph.

When LRO is enabled on the MLX, mlx5e_skb_from_cqe_mpwrq_nonlinear
copies parts of the payload to the linear part of the skb.

This triggers suboptimal processing in GRO, causing slow throughput.

This patch series addresses this by using eth_get_headlen to compute the
size of the protocol headers and only copy those bits. This results in a
significant throughput improvement (detailed results in the specific
patch).

Regards,
Tariq

---

V6:
- Rebase after Amery's changes.
- Address Amery's concern about header length after XDP pull.
- Add a small optimization to memcpy the header length aligned to cache
  line.

V5: https://lore.kernel.org/all/20250904-cpaasch-pf-927-netmlx5-avoid-copying-the-payload-to-the-malloced-area-v5-0-ea492f7b11ac@openai.com/


Christoph Paasch (2):
  net/mlx5e: DMA-sync earlier in mlx5e_skb_from_cqe_mpwrq_nonlinear
  net/mlx5e: Avoid copying payload to the skb's linear part

Dragos Tatulea (1):
  net/mlx5e: Align header copy to cache line for Striding RQ non-linear

 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 31 +++++++++++++------
 1 file changed, 22 insertions(+), 9 deletions(-)


base-commit: dacf281771a9aed1a723b196120a0de8637910b9
-- 
2.44.0

Re: [PATCH net-next V6 0/3] net/mlx5: Avoid payload in skb's linear part for better GRO-processing

Posted by Christoph Paasch 1 month ago

On Thu, May 7, 2026 at 2:54 AM Tariq Toukan <tariqt@nvidia.com> wrote:
>
> Hi,
>
> This is V6 of a series originally submitted by Christoph.

Sorry for having dropped the ball on this!

Thanks for pushing it forward!


Christoph

>
> When LRO is enabled on the MLX, mlx5e_skb_from_cqe_mpwrq_nonlinear
> copies parts of the payload to the linear part of the skb.
>
> This triggers suboptimal processing in GRO, causing slow throughput.
>
> This patch series addresses this by using eth_get_headlen to compute the
> size of the protocol headers and only copy those bits. This results in a
> significant throughput improvement (detailed results in the specific
> patch).
>
> Regards,
> Tariq
>
> ---
>
> V6:
> - Rebase after Amery's changes.
> - Address Amery's concern about header length after XDP pull.
> - Add a small optimization to memcpy the header length aligned to cache
>   line.
>
> V5: https://lore.kernel.org/all/20250904-cpaasch-pf-927-netmlx5-avoid-copying-the-payload-to-the-malloced-area-v5-0-ea492f7b11ac@openai.com/
>
>
> Christoph Paasch (2):
>   net/mlx5e: DMA-sync earlier in mlx5e_skb_from_cqe_mpwrq_nonlinear
>   net/mlx5e: Avoid copying payload to the skb's linear part
>
> Dragos Tatulea (1):
>   net/mlx5e: Align header copy to cache line for Striding RQ non-linear
>
>  .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 31 +++++++++++++------
>  1 file changed, 22 insertions(+), 9 deletions(-)
>
>
> base-commit: dacf281771a9aed1a723b196120a0de8637910b9
> --
> 2.44.0
>