From nobody Mon Feb 9 12:25:24 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0257D238C09; Tue, 15 Apr 2025 17:30:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744738202; cv=none; b=RpQhnjkTZwXF5rkh88oA5XJKIxBoyKPoaZ8EuMB8FVdVMtQeFHihryoWpLCvtCQv0n+k0yMoXB1jEiBL1Uai4GG4m6DLpvisEvV1jCLt7LJ9rBKIjpwvmqjree9rQpNlPi8owxNPNQ89ELN9YL80arzmKiZkRpp3UQ/o0r4zF6Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744738202; c=relaxed/simple; bh=T7zvRwYUz24J7zyCwHHnzE4WhBWvlthCh4Hi7fmj6Mo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CTHBcmjTiyylrM3b3wGHdwEj6jKLOscqTKN8SH6jiDuInHRjVORR+KsnR4WyciLTrBuXOEnGVhFtW/npGvdUI9hwKLRQCOA1LeSrMDSkPCdLK9xPG/pA0mM3d3XAtqWs3aKDOKc+/sjK/1Jikj28WDyHdZ7gXMlkexwXpNckX6U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=afLdpFAw; arc=none smtp.client-ip=198.175.65.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="afLdpFAw" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1744738201; x=1776274201; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=T7zvRwYUz24J7zyCwHHnzE4WhBWvlthCh4Hi7fmj6Mo=; b=afLdpFAwNBgGrX6u3CuCBbegQ5tkk+32620Cz1sttnASlDxn8mQBvngk e7M858HdWdZ8SW8qFDkCEdJsB/9ei7WCZPiBVyQ+MZQlyvKgon3gpV6se cjiJy+uiqS1JV88/cQ5KE9S4T5xW5erpb/7ZgPq/RrB2ePkNGIZDCH6Qv Mt/vG7FLZSZzPkghzcG/SyzZKS6X/TBYWQGjzVE3iVaT3QJhlC/7ZZd/H fWCacraky3SdvX+yvvPK2iOl6d6Kxa+VadOpVT/AD9r/wsSj0AhWqx+Al hnRjOC2Fsqc3cyhNK+/DktSVIpJ91GkxDO0yttoLyYetCReSKfnThnJnD Q==; X-CSE-ConnectionGUID: n4HUb78ESUGSz6V4afYg5Q== X-CSE-MsgGUID: rICofsqnSAeetmIwLtOqig== X-IronPort-AV: E=McAfee;i="6700,10204,11404"; a="46275829" X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="46275829" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by orvoesa109.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2025 10:30:01 -0700 X-CSE-ConnectionGUID: eGFNhxI4SMagQ5uVy8xXWQ== X-CSE-MsgGUID: K+Iyi2CpRyayfvmeNdNDmA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,213,1739865600"; d="scan'208";a="130729927" Received: from newjersey.igk.intel.com ([10.102.20.203]) by fmviesa010.fm.intel.com with ESMTP; 15 Apr 2025 10:29:58 -0700 From: Alexander Lobakin To: intel-wired-lan@lists.osuosl.org Cc: Alexander Lobakin , Michal Kubiak , Maciej Fijalkowski , Tony Nguyen , Przemek Kitszel , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , Simon Horman , bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH iwl-next 16/16] libeth: xdp, xsk: access adjacent u32s as u64 where applicable Date: Tue, 15 Apr 2025 19:28:25 +0200 Message-ID: <20250415172825.3731091-17-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250415172825.3731091-1-aleksander.lobakin@intel.com> References: <20250415172825.3731091-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" On 64-bit systems, writing/reading one u64 is faster than two u32s even when they're are adjacent in a struct. The compilers won't guarantee they will combine those; I observed both successful and unsuccessful attempts with both GCC and Clang, and it's not easy to say what it depends on. There's a few places in libeth_xdp winning up to several percent from combined access (both performance and object code size, especially when unrolling). Add __LIBETH_WORD_ACCESS and use it there on LE. Drivers are free to optimize HW-specific callbacks under the same definition. Signed-off-by: Alexander Lobakin --- include/net/libeth/xdp.h | 29 ++++++++++++++++++++++++++--- include/net/libeth/xsk.h | 10 +++++----- 2 files changed, 31 insertions(+), 8 deletions(-) diff --git a/include/net/libeth/xdp.h b/include/net/libeth/xdp.h index 85f058482fc7..6d15386aff31 100644 --- a/include/net/libeth/xdp.h +++ b/include/net/libeth/xdp.h @@ -464,6 +464,21 @@ struct libeth_xdp_tx_desc { ((const void *)(uintptr_t)(priv)); \ }) =20 +/* + * On 64-bit systems, assigning one u64 is faster than two u32s. When ::len + * occupies lowest 32 bits (LE), whole ::opts can be assigned directly ins= tead. + */ +#ifdef __LITTLE_ENDIAN +#define __LIBETH_WORD_ACCESS 1 +#endif +#ifdef __LIBETH_WORD_ACCESS +#define __libeth_xdp_tx_len(flen, ...) \ + .opts =3D ((flen) | FIELD_PREP(GENMASK_ULL(63, 32), (__VA_ARGS__ + 0))) +#else +#define __libeth_xdp_tx_len(flen, ...) \ + .len =3D (flen), .flags =3D (__VA_ARGS__ + 0) +#endif + /** * libeth_xdp_tx_xmit_bulk - main XDP Tx function * @bulk: array of frames to send @@ -863,8 +878,7 @@ static inline u32 libeth_xdp_xmit_queue_head(struct lib= eth_xdp_tx_bulk *bq, =20 bq->bulk[bq->count++] =3D (typeof(*bq->bulk)){ .xdpf =3D xdpf, - .len =3D xdpf->len, - .flags =3D LIBETH_XDP_TX_FIRST, + __libeth_xdp_tx_len(xdpf->len, LIBETH_XDP_TX_FIRST), }; =20 if (!xdp_frame_has_frags(xdpf)) @@ -895,7 +909,7 @@ static inline bool libeth_xdp_xmit_queue_frag(struct li= beth_xdp_tx_bulk *bq, =20 bq->bulk[bq->count++] =3D (typeof(*bq->bulk)){ .dma =3D dma, - .len =3D skb_frag_size(frag), + __libeth_xdp_tx_len(skb_frag_size(frag)), }; =20 return true; @@ -1253,6 +1267,7 @@ bool libeth_xdp_buff_add_frag(struct libeth_xdp_buff = *xdp, * Internal, use libeth_xdp_process_buff() instead. Initializes XDP buffer * head with the Rx buffer data: data pointer, length, headroom, and * truesize/tailroom. Zeroes the flags. + * Uses faster single u64 write instead of per-field access. */ static inline void libeth_xdp_prepare_buff(struct libeth_xdp_buff *xdp, const struct libeth_fqe *fqe, @@ -1260,7 +1275,15 @@ static inline void libeth_xdp_prepare_buff(struct li= beth_xdp_buff *xdp, { const struct page *page =3D __netmem_to_page(fqe->netmem); =20 +#ifdef __LIBETH_WORD_ACCESS + static_assert(offsetofend(typeof(xdp->base), flags) - + offsetof(typeof(xdp->base), frame_sz) =3D=3D + sizeof(u64)); + + *(u64 *)&xdp->base.frame_sz =3D fqe->truesize; +#else xdp_init_buff(&xdp->base, fqe->truesize, xdp->base.rxq); +#endif xdp_prepare_buff(&xdp->base, page_address(page) + fqe->offset, page->pp->p.offset, len, true); } diff --git a/include/net/libeth/xsk.h b/include/net/libeth/xsk.h index 213778a68476..481a7b28e6f2 100644 --- a/include/net/libeth/xsk.h +++ b/include/net/libeth/xsk.h @@ -26,8 +26,8 @@ static inline bool libeth_xsk_tx_queue_head(struct libeth= _xdp_tx_bulk *bq, { bq->bulk[bq->count++] =3D (typeof(*bq->bulk)){ .xsk =3D xdp, - .len =3D xdp->base.data_end - xdp->data, - .flags =3D LIBETH_XDP_TX_FIRST, + __libeth_xdp_tx_len(xdp->base.data_end - xdp->data, + LIBETH_XDP_TX_FIRST), }; =20 if (likely(!xdp_buff_has_frags(&xdp->base))) @@ -48,7 +48,7 @@ static inline void libeth_xsk_tx_queue_frag(struct libeth= _xdp_tx_bulk *bq, { bq->bulk[bq->count++] =3D (typeof(*bq->bulk)){ .xsk =3D frag, - .len =3D frag->base.data_end - frag->data, + __libeth_xdp_tx_len(frag->base.data_end - frag->data), }; } =20 @@ -199,7 +199,7 @@ __libeth_xsk_xmit_fill_buf_md(const struct xdp_desc *xd= esc, ctx =3D xsk_buff_raw_get_ctx(sq->pool, xdesc->addr); desc =3D (typeof(desc)){ .addr =3D ctx.dma, - .len =3D xdesc->len, + __libeth_xdp_tx_len(xdesc->len), }; =20 BUILD_BUG_ON(!__builtin_constant_p(tmo =3D=3D libeth_xsktmo)); @@ -226,7 +226,7 @@ __libeth_xsk_xmit_fill_buf(const struct xdp_desc *xdesc, { return (struct libeth_xdp_tx_desc){ .addr =3D xsk_buff_raw_get_dma(sq->pool, xdesc->addr), - .len =3D xdesc->len, + __libeth_xdp_tx_len(xdesc->len), }; } =20 --=20 2.49.0