From nobody Sun Apr 12 10:19:29 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 53EF736F43F; Wed, 4 Mar 2026 16:36:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772642179; cv=none; b=Z3KNVpHHgsSvN7B8/s6kdKLOf/8tWrbFmGqhLzMf11d8Mx695eNrICh86Tz6kSFCLVaRI6zuSS6uT4pu4N8oSXjmn448NhAyD/1fv4SzzMKNV1A8aObSN6o5XZZW1JlcWCkUSl/9n7hrAep7gET967DC1U8E+JceRCiIL7kT9ho= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772642179; c=relaxed/simple; bh=3E37TVKi/ZdyMSzyvl9aONt6910ly+ZHDY6aXGreeo4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=d7T4Fof3TwGeqKtJMSaP6kTZERAj+xTzPoqxvBiH3iTBI76kGJKfCN6oEP5JfYtJAoywL7ThoKUV/D2Gi73lkbFdsqPoZ9JxkmBBSDvpNbJGNZukbCo9/WsfhvegIGuYPwnhBpmM1v9VbttJB3oiE4jCILDPrflDzDxQAnbBIi0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Y0RdE2dh; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Y0RdE2dh" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1772642173; x=1804178173; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3E37TVKi/ZdyMSzyvl9aONt6910ly+ZHDY6aXGreeo4=; b=Y0RdE2dhzHNkrwQwnf9UzXr6+O6hsgEF96XcYi6SO5liwrcmU0sdwy66 n0sSV9yPI3pcs4vEoKP+Sr+gwRjXIosOzugzEH90D5+SSIk4OiJfR/4Cp 14+78Fasxpen86VPuuNMDwovBH3bfa5d2dFIAuM1kgR3OPxmOsmplJVxl Jo77Xe+qFK98kymE4ce6dS96eLvfe89Zr7rItMkkNTmUO7mhgHQZbgIZm MgTusv7NIaZw73mru8RjD3irYan4xflzdP06xfjVszpSMjJRHNtvaNFZn JYdDEaUWrWLayi4hECqSntgHhvF4Pg3P7zsBK+oU90StJ8asd7L6HsNPx w==; X-CSE-ConnectionGUID: TvKxERXiSX2+gH7Zyi6yVg== X-CSE-MsgGUID: cPSzZzdSSIWUY4vmE+p+zA== X-IronPort-AV: E=McAfee;i="6800,10657,11719"; a="72906422" X-IronPort-AV: E=Sophos;i="6.21,324,1763452800"; d="scan'208";a="72906422" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Mar 2026 08:36:12 -0800 X-CSE-ConnectionGUID: NAVU0VL+T6yLJEwUh/etdw== X-CSE-MsgGUID: ycIQVp/aQQ+9VJbj0Nev5g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,324,1763452800"; d="scan'208";a="241405012" Received: from irvmail002.ir.intel.com ([10.43.11.120]) by fmviesa002.fm.intel.com with ESMTP; 04 Mar 2026 08:36:07 -0800 Received: from lincoln.igk.intel.com (lincoln.igk.intel.com [10.102.21.235]) by irvmail002.ir.intel.com (Postfix) with ESMTP id F2953312C7; Wed, 4 Mar 2026 16:36:05 +0000 (GMT) From: Larysa Zaremba To: Tony Nguyen , intel-wired-lan@lists.osuosl.org Cc: Larysa Zaremba , Przemek Kitszel , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Alexander Lobakin , Simon Horman , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , Stanislav Fomichev , Aleksandr Loktionov , Natalia Wochtman , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org Subject: [PATCH iwl-next v3 08/10] ixgbevf: add pseudo header split Date: Wed, 4 Mar 2026 17:03:40 +0100 Message-ID: <20260304160345.1340940-9-larysa.zaremba@intel.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260304160345.1340940-1-larysa.zaremba@intel.com> References: <20260304160345.1340940-1-larysa.zaremba@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Natalia Wochtman Introduce pseudo header split support in the ixgbevf driver, specifically targeting ixgbe_mac_82599_vf. On older hardware (e.g. ixgbe_mac_82599_vf), RX DMA write size can only be limited in 1K increments. This causes issues when attempting to fit multiple packets per page, as a DMA write may overwrite the headroom of the next packet. To address this, introduce pseudo header split support, where the hardware copies the full L2 header into a dedicated header buffer. This avoids the need for HR/TR alignment and allows safe skb construction from the header buffer without risking overwrites. Given that once packet is too big to fit into a single page, the behaviour is the same for all supported HW, use pseudo header split only for smaller packets. Signed-off-by: Natalia Wochtman Reviewed-by: Aleksandr Loktionov Co-developed-by: Larysa Zaremba Signed-off-by: Larysa Zaremba --- drivers/net/ethernet/intel/ixgbevf/ixgbevf.h | 8 + .../net/ethernet/intel/ixgbevf/ixgbevf_main.c | 180 +++++++++++++++--- 2 files changed, 163 insertions(+), 25 deletions(-) diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/eth= ernet/intel/ixgbevf/ixgbevf.h index ea86679e4f81..438328b81855 100644 --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h @@ -89,6 +89,7 @@ struct ixgbevf_ring { u32 truesize; /* Rx buffer full size */ u32 pending; /* Sent-not-completed descriptors */ }; + u32 hdr_truesize; /* Rx header buffer full size */ u16 count; /* amount of descriptors */ u16 next_to_clean; u32 next_to_use; @@ -107,6 +108,8 @@ struct ixgbevf_ring { struct ixgbevf_tx_queue_stats tx_stats; struct ixgbevf_rx_queue_stats rx_stats; }; + struct libeth_fqe *hdr_fqes; + struct page_pool *hdr_pp; struct xdp_rxq_info xdp_rxq; u64 hw_csum_rx_error; u8 __iomem *tail; @@ -116,6 +119,7 @@ struct ixgbevf_ring { */ u16 reg_idx; int queue_index; /* needed for multiqueue queue management */ + u32 hdr_buf_len; u32 rx_buf_len; struct libeth_xdp_buff_stash xdp_stash; unsigned int dma_size; /* length in bytes */ @@ -151,6 +155,8 @@ struct ixgbevf_ring { =20 #define IXGBEVF_RX_PAGE_LEN(hr) (ALIGN_DOWN(LIBETH_RX_PAGE_LEN(hr), \ IXGBE_SRRCTL_BSIZEPKT_STEP)) +#define IXGBEVF_RX_SRRCTL_BUF_SIZE(mtu) (ALIGN((mtu) + LIBETH_RX_LL_LEN, \ + IXGBE_SRRCTL_BSIZEPKT_STEP)) =20 #define IXGBE_TX_FLAGS_CSUM BIT(0) #define IXGBE_TX_FLAGS_VLAN BIT(1) @@ -349,6 +355,8 @@ enum ixbgevf_state_t { __IXGBEVF_QUEUE_RESET_REQUESTED, }; =20 +#define IXGBEVF_FLAG_HSPLIT BIT(0) + enum ixgbevf_boards { board_82599_vf, board_82599_vf_hv, diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/ne= t/ethernet/intel/ixgbevf/ixgbevf_main.c index 2f3b4954ded8..d00d3b307a8f 100644 --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c @@ -561,6 +561,12 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_ri= ng *rx_ring, .truesize =3D rx_ring->truesize, .count =3D rx_ring->count, }; + const struct libeth_fq_fp hdr_fq =3D { + .pp =3D rx_ring->hdr_pp, + .fqes =3D rx_ring->hdr_fqes, + .truesize =3D rx_ring->hdr_truesize, + .count =3D rx_ring->count, + }; u16 ntu =3D rx_ring->next_to_use; =20 /* nothing to do or no valid netdev defined */ @@ -578,6 +584,14 @@ static void ixgbevf_alloc_rx_buffers(struct ixgbevf_ri= ng *rx_ring, =20 rx_desc->read.pkt_addr =3D cpu_to_le64(addr); =20 + if (hdr_fq.pp) { + addr =3D libeth_rx_alloc(&hdr_fq, ntu); + if (addr =3D=3D DMA_MAPPING_ERROR) { + libeth_rx_recycle_slow(fq.fqes[ntu].netmem); + break; + } + } + rx_desc++; ntu++; if (unlikely(ntu =3D=3D fq.count)) { @@ -820,6 +834,32 @@ LIBETH_XDP_DEFINE_FINALIZE(static ixgbevf_xdp_finalize= _xdp_napi, ixgbevf_xdp_flush_tx, ixgbevf_xdp_rs_and_bump); LIBETH_XDP_DEFINE_END(); =20 +static u32 ixgbevf_rx_hsplit_wa(const struct libeth_fqe *hdr, + struct libeth_fqe *buf, u32 data_len) +{ + u32 copy =3D data_len <=3D L1_CACHE_BYTES ? data_len : ETH_HLEN; + struct page *hdr_page, *buf_page; + const void *src; + void *dst; + + if (unlikely(netmem_is_net_iov(buf->netmem)) || + !libeth_rx_sync_for_cpu(buf, copy)) + return 0; + + hdr_page =3D __netmem_to_page(hdr->netmem); + buf_page =3D __netmem_to_page(buf->netmem); + + dst =3D page_address(hdr_page) + hdr->offset + + pp_page_to_nmdesc(hdr_page)->pp->p.offset; + src =3D page_address(buf_page) + buf->offset + + pp_page_to_nmdesc(buf_page)->pp->p.offset; + + memcpy(dst, src, LARGEST_ALIGN(copy)); + buf->offset +=3D copy; + + return copy; +} + static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector, struct ixgbevf_ring *rx_ring, int budget) @@ -859,6 +899,23 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vecto= r *q_vector, rmb(); =20 rx_buffer =3D &rx_ring->rx_fqes[rx_ring->next_to_clean]; + + if (unlikely(rx_ring->hdr_pp)) { + struct libeth_fqe *hdr_buff; + unsigned int hdr_size =3D 0; + + hdr_buff =3D &rx_ring->hdr_fqes[rx_ring->next_to_clean]; + + if (!xdp->data) { + hdr_size =3D ixgbevf_rx_hsplit_wa(hdr_buff, + rx_buffer, + size); + size -=3D hdr_size ? : size; + } + + libeth_xdp_process_buff(xdp, hdr_buff, hdr_size); + } + libeth_xdp_process_buff(xdp, rx_buffer, size); =20 cleaned_count++; @@ -1598,6 +1655,90 @@ static void ixgbevf_setup_vfmrqc(struct ixgbevf_adap= ter *adapter) IXGBE_WRITE_REG(hw, IXGBE_VFMRQC, vfmrqc); } =20 +static void ixgbevf_rx_destroy_pp(struct ixgbevf_ring *rx_ring) +{ + struct libeth_fq fq =3D { + .pp =3D rx_ring->pp, + .fqes =3D rx_ring->rx_fqes, + }; + + libeth_rx_fq_destroy(&fq); + rx_ring->rx_fqes =3D NULL; + rx_ring->pp =3D NULL; + + if (!rx_ring->hdr_pp) + return; + + fq =3D (struct libeth_fq) { + .pp =3D rx_ring->hdr_pp, + .fqes =3D rx_ring->hdr_fqes, + }; + + libeth_rx_fq_destroy(&fq); + rx_ring->hdr_fqes =3D NULL; + rx_ring->hdr_pp =3D NULL; +} + +static int ixgbevf_rx_create_pp(struct ixgbevf_ring *rx_ring) +{ + u32 adapter_flags =3D rx_ring->q_vector->adapter->flags; + struct libeth_fq fq =3D { + .count =3D rx_ring->count, + .nid =3D NUMA_NO_NODE, + .type =3D LIBETH_FQE_MTU, + .xdp =3D !!rx_ring->xdp_prog, + .idx =3D rx_ring->queue_index, + .buf_len =3D IXGBEVF_RX_PAGE_LEN(rx_ring->xdp_prog ? + LIBETH_XDP_HEADROOM : + LIBETH_SKB_HEADROOM), + }; + u32 frame_size; + int ret; + + /* Some HW requires DMA write sizes to be aligned to 1K, + * which warrants fake header split usage, but this is + * not an issue if the frame size is at its maximum of 3K + */ + frame_size =3D + IXGBEVF_RX_SRRCTL_BUF_SIZE(READ_ONCE(rx_ring->netdev->mtu)); + fq.hsplit =3D (adapter_flags & IXGBEVF_FLAG_HSPLIT) && + frame_size < fq.buf_len; + ret =3D libeth_rx_fq_create(&fq, &rx_ring->q_vector->napi); + if (ret) + return ret; + + rx_ring->pp =3D fq.pp; + rx_ring->rx_fqes =3D fq.fqes; + rx_ring->truesize =3D fq.truesize; + rx_ring->rx_buf_len =3D fq.buf_len; + + if (!fq.hsplit) + return 0; + + fq =3D (struct libeth_fq) { + .count =3D rx_ring->count, + .nid =3D NUMA_NO_NODE, + .type =3D LIBETH_FQE_HDR, + .xdp =3D !!rx_ring->xdp_prog, + .idx =3D rx_ring->queue_index, + }; + + ret =3D libeth_rx_fq_create(&fq, &rx_ring->q_vector->napi); + if (ret) + goto err; + + rx_ring->hdr_pp =3D fq.pp; + rx_ring->hdr_fqes =3D fq.fqes; + rx_ring->hdr_truesize =3D fq.truesize; + rx_ring->hdr_buf_len =3D fq.buf_len; + + return 0; + +err: + ixgbevf_rx_destroy_pp(rx_ring); + return ret; +} + static void ixgbevf_configure_rx_ring(struct ixgbevf_adapter *adapter, struct ixgbevf_ring *ring) { @@ -2718,6 +2859,9 @@ static int ixgbevf_sw_init(struct ixgbevf_adapter *ad= apter) goto out; } =20 + if (adapter->hw.mac.type =3D=3D ixgbe_mac_82599_vf) + adapter->flags |=3D IXGBEVF_FLAG_HSPLIT; + /* assume legacy case in which PF would only give VF 2 queues */ hw->mac.max_tx_queues =3D 2; hw->mac.max_rx_queues =3D 2; @@ -3152,43 +3296,29 @@ static int ixgbevf_setup_all_tx_resources(struct ix= gbevf_adapter *adapter) } =20 /** - * ixgbevf_setup_rx_resources - allocate Rx resources (Descriptors) + * ixgbevf_setup_rx_resources - allocate Rx resources * @adapter: board private structure * @rx_ring: Rx descriptor ring (for a specific queue) to setup * - * Returns 0 on success, negative on failure + * Returns: 0 on success, negative on failure. **/ int ixgbevf_setup_rx_resources(struct ixgbevf_adapter *adapter, struct ixgbevf_ring *rx_ring) { - struct libeth_fq fq =3D { - .count =3D rx_ring->count, - .nid =3D NUMA_NO_NODE, - .type =3D LIBETH_FQE_MTU, - .xdp =3D !!rx_ring->xdp_prog, - .idx =3D rx_ring->queue_index, - .buf_len =3D IXGBEVF_RX_PAGE_LEN(rx_ring->xdp_prog ? - LIBETH_XDP_HEADROOM : - LIBETH_SKB_HEADROOM), - }; int ret; =20 - ret =3D libeth_rx_fq_create(&fq, &rx_ring->q_vector->napi); + ret =3D ixgbevf_rx_create_pp(rx_ring); if (ret) return ret; =20 - rx_ring->pp =3D fq.pp; - rx_ring->rx_fqes =3D fq.fqes; - rx_ring->truesize =3D fq.truesize; - rx_ring->rx_buf_len =3D fq.buf_len; - u64_stats_init(&rx_ring->syncp); =20 /* Round up to nearest 4K */ rx_ring->dma_size =3D rx_ring->count * sizeof(union ixgbe_adv_rx_desc); rx_ring->dma_size =3D ALIGN(rx_ring->dma_size, 4096); =20 - rx_ring->desc =3D dma_alloc_coherent(fq.pp->p.dev, rx_ring->dma_size, + rx_ring->desc =3D dma_alloc_coherent(rx_ring->pp->p.dev, + rx_ring->dma_size, &rx_ring->dma, GFP_KERNEL); =20 if (!rx_ring->desc) { @@ -3202,16 +3332,15 @@ int ixgbevf_setup_rx_resources(struct ixgbevf_adapt= er *adapter, if (ret) goto err; =20 - xdp_rxq_info_attach_page_pool(&rx_ring->xdp_rxq, fq.pp); + xdp_rxq_info_attach_page_pool(&rx_ring->xdp_rxq, rx_ring->pp); =20 rx_ring->xdp_prog =3D adapter->xdp_prog; =20 return 0; err: - libeth_rx_fq_destroy(&fq); - rx_ring->rx_fqes =3D NULL; - rx_ring->pp =3D NULL; + ixgbevf_rx_destroy_pp(rx_ring); dev_err(rx_ring->dev, "Unable to allocate memory for the Rx descriptor ri= ng\n"); + return ret; } =20 @@ -4140,10 +4269,11 @@ static int ixgbevf_xdp_setup(struct net_device *dev= , struct bpf_prog *prog, struct bpf_prog *old_prog; bool requires_mbuf; =20 - requires_mbuf =3D frame_size > IXGBEVF_RX_PAGE_LEN(LIBETH_XDP_HEADROOM); + requires_mbuf =3D frame_size > IXGBEVF_RX_PAGE_LEN(LIBETH_XDP_HEADROOM) || + adapter->flags & IXGBEVF_FLAG_HSPLIT; if (prog && !prog->aux->xdp_has_frags && requires_mbuf) { NL_SET_ERR_MSG_MOD(extack, - "Configured MTU requires non-linear frames and XDP prog does not su= pport frags"); + "Configured MTU or HW limitations require non-linear frames and XDP= prog does not support frags"); return -EOPNOTSUPP; } =20 --=20 2.52.0