From nobody Wed Nov 27 14:28:30 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A77B1E0DC8; Wed, 9 Oct 2024 15:28:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487721; cv=none; b=JRLdYojqSFZSXWaHeG1cykQRefdLFM2q7I8GH0uAGiNXf6aAkicj0dUDLYkLxrMRy1ij/FraWqg9tARxkjjV7zKwh+hdL36CXMuHMYrugPf9McYuT5/NdC/jLnmah02DlTCnVNWuRL7dLSuWCAv3Y7c16jUsQg0nHLuJxY48O9M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487721; c=relaxed/simple; bh=tQu14JwbxzrPa5a9MiLdxfRNOKo0ysvLK8e1crTNaM4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gEMIH1E7r66PTQV3EM4ZJZsXmvdIWTGPXcKL1L0K+OB1A6weqLu1hyUohR3DSCn1sD+0808im0qCt6V9tDZtsPctwVDjIcN/PqZJOBnK9IWK2XWkSLf6sHjXPK/XHs7c0bgYUaTV4BLK4rhOs/v84vJkIXoGfCQgMknb3tihcZc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=i++jiO4J; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="i++jiO4J" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1728487720; x=1760023720; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tQu14JwbxzrPa5a9MiLdxfRNOKo0ysvLK8e1crTNaM4=; b=i++jiO4J9IyQ+bJTK2X29tZI0TjGoiIMp8Ezx+aEcpkeHFc7zFZCfwk0 HRvgU15Z24ahr2S3IWuTAjH9laOyLrbqXteGpVz2PaYCPqW9mSq+Z8Mii QkcNCLV+NtpVc/zNt1owdPRqsT6o6+Yk2y0pW6oNhrx9OhDY8w6rzoRBw WPC8sN+BiBnn8YOF6qb6D8YouQsk5Z5PE//u2ZdK4Y3R8zS2Wja2vthS6 FYljohBgVsEWMZjhr8unTMEEhN/WETpdgF96Zkj6soNUNIHsQZxgnm7kd 9z7Af46r3q8ODK7XwSUQzfEGgQvZ/nyKz0nqzT/wfVCrMPgNVwp6r2Lzv w==; X-CSE-ConnectionGUID: lRi171qmRYKWwmgP99zCeQ== X-CSE-MsgGUID: 5PrW0w9ZRFKpPj8j86WKnQ== X-IronPort-AV: E=McAfee;i="6700,10204,11220"; a="27675678" X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="27675678" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2024 08:28:39 -0700 X-CSE-ConnectionGUID: D26v3ednTOyimH8wwB7F8Q== X-CSE-MsgGUID: DqqnZYu/SYuTNbw7Xw0amQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="81305765" Received: from newjersey.igk.intel.com ([10.102.20.203]) by orviesa004.jf.intel.com with ESMTP; 09 Oct 2024 08:28:36 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Stanislav Fomichev , Magnus Karlsson , nex.sw.ncis.osdt.itp.upstreaming@intel.com, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next 01/18] jump_label: export static_key_slow_{inc,dec}_cpuslocked() Date: Wed, 9 Oct 2024 17:27:39 +0200 Message-ID: <20241009152756.3113697-2-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.2 In-Reply-To: <20241009152756.3113697-1-aleksander.lobakin@intel.com> References: <20241009152756.3113697-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Sometimes, there's a need to modify a lot of static keys or modify the same key multiple times in a loop. In that case, it seems more optimal to lock cpu_read_lock once and then call _cpuslocked() variants. The enable/disable functions are already exported, the refcounted counterparts however are not. Fix that to allow modules to save some cycles. Signed-off-by: Alexander Lobakin --- kernel/jump_label.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/jump_label.c b/kernel/jump_label.c index 93a822d3c468..1034c0348995 100644 --- a/kernel/jump_label.c +++ b/kernel/jump_label.c @@ -182,6 +182,7 @@ bool static_key_slow_inc_cpuslocked(struct static_key *= key) } return true; } +EXPORT_SYMBOL_GPL(static_key_slow_inc_cpuslocked); =20 bool static_key_slow_inc(struct static_key *key) { @@ -342,6 +343,7 @@ void static_key_slow_dec_cpuslocked(struct static_key *= key) STATIC_KEY_CHECK_USE(key); __static_key_slow_dec_cpuslocked(key); } +EXPORT_SYMBOL_GPL(static_key_slow_dec_cpuslocked); =20 void __static_key_slow_dec_deferred(struct static_key *key, struct delayed_work *work, --=20 2.46.2 From nobody Wed Nov 27 14:28:30 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 054701E2608; Wed, 9 Oct 2024 15:28:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487725; cv=none; b=qmnyIczrlADfSFHrbiGsLIG8PUDemcFN/fC0t+H4DRUaEanYYOxn43fH/Br4iRMAGIGqpdGh5ixdJTO6CiAKUgpRp3pBvGsFJE6QlfIerAUub+Rt9HsjzosRbs/enribdTbNG2dDc/hhzNcAVFW7IcoMZol/75QXIIPzqiZyDLE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487725; c=relaxed/simple; bh=ZKDYFn+g+Ra1PvQZ5saw4hzUQ5ZvAp4H0Pn6ZIu1nUc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BRLjw8AgrxgRG/ePJa7DtuAm9dixqx1WKl7v56AqBGWAAbhptXJlPeQRPC2uq5ULZLf7EnC2Ksc87PFD+fUJlt3L7+H22U+sYC//oCZAH2yrW1VqNAt6tOHCRsKE7jNu2RZLa+4Jhyx5tV2EtmpOsCw+usgRrQn/JN7VE57MwaE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=HL0EAieX; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="HL0EAieX" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1728487724; x=1760023724; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZKDYFn+g+Ra1PvQZ5saw4hzUQ5ZvAp4H0Pn6ZIu1nUc=; b=HL0EAieXL5LQXZf7BVgyq8NZ8C1bzeJkBHJ8Nhfc3BRrYrggacBoDBQS kz+O59wutKC4jaWyb+J7VXqgksE30cOS9JV8TvwfFLtBcc+9vWOr1FP9f +sA9vy43NSPOk+xZuKAxTVQJv67WgGCgiGTKpcUAvU2dlFxTWK2F9FPDw +sFIkAaaVdxRXYWKExyFDiLYmqZUGW+myGL0Rj2IHAgpn2hL/FuWXNBAh 08w8oVM/1gpmRoEta+I1rsuR3yVssVmhWgCMXFgNuX3ygnNV+LXz5hvd+ zyi2Qf7aKs1HMGGpJB3Ic3Gz5zBI79qytF07YXuw83L4YWVzYu1hE1l8Q Q==; X-CSE-ConnectionGUID: BJJbXelcSMWOHH5ol7TFXw== X-CSE-MsgGUID: 5JBiDZwkStiV3qOd1U/bew== X-IronPort-AV: E=McAfee;i="6700,10204,11220"; a="27675691" X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="27675691" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2024 08:28:43 -0700 X-CSE-ConnectionGUID: Sh+BYIEWRqKzNiNuBl8j9g== X-CSE-MsgGUID: DlLgyHCoQqGKK0vJi6IRHA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="81305802" Received: from newjersey.igk.intel.com ([10.102.20.203]) by orviesa004.jf.intel.com with ESMTP; 09 Oct 2024 08:28:39 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Stanislav Fomichev , Magnus Karlsson , nex.sw.ncis.osdt.itp.upstreaming@intel.com, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next 02/18] skbuff: allow 2-4-argument skb_frag_dma_map() Date: Wed, 9 Oct 2024 17:27:40 +0200 Message-ID: <20241009152756.3113697-3-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.2 In-Reply-To: <20241009152756.3113697-1-aleksander.lobakin@intel.com> References: <20241009152756.3113697-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" skb_frag_dma_map(dev, frag, 0, skb_frag_size(frag), DMA_TO_DEVICE) is repeated across dozens of drivers and really wants a shorthand. Add a macro which will count args and handle all possible number from 2 to 5. Semantics: skb_frag_dma_map(dev, frag) -> __skb_frag_dma_map(dev, frag, 0, skb_frag_size(frag), DMA_TO_DEVICE) skb_frag_dma_map(dev, frag, offset) -> __skb_frag_dma_map(dev, frag, offset, skb_frag_size(frag) - offset, DMA_TO_DEVICE) skb_frag_dma_map(dev, frag, offset, size) -> __skb_frag_dma_map(dev, frag, offset, size, DMA_TO_DEVICE) skb_frag_dma_map(dev, frag, offset, size, dir) -> __skb_frag_dma_map(dev, frag, offset, size, dir) No object code size changes for the existing callers. Users passing less arguments also won't have bigger size comparing to the full equivalent call. Signed-off-by: Alexander Lobakin --- include/linux/skbuff.h | 31 ++++++++++++++++++++++++++----- 1 file changed, 26 insertions(+), 5 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 3d297669efcf..4a07bb3bcec6 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3637,7 +3637,7 @@ static inline void skb_frag_page_copy(skb_frag_t *fra= gto, bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t = prio); =20 /** - * skb_frag_dma_map - maps a paged fragment via the DMA API + * __skb_frag_dma_map - maps a paged fragment via the DMA API * @dev: the device to map the fragment to * @frag: the paged fragment to map * @offset: the offset within the fragment (starting at the @@ -3647,15 +3647,36 @@ bool skb_page_frag_refill(unsigned int sz, struct p= age_frag *pfrag, gfp_t prio); * * Maps the page associated with @frag to @device. */ -static inline dma_addr_t skb_frag_dma_map(struct device *dev, - const skb_frag_t *frag, - size_t offset, size_t size, - enum dma_data_direction dir) +static inline dma_addr_t __skb_frag_dma_map(struct device *dev, + const skb_frag_t *frag, + size_t offset, size_t size, + enum dma_data_direction dir) { return dma_map_page(dev, skb_frag_page(frag), skb_frag_off(frag) + offset, size, dir); } =20 +#define skb_frag_dma_map(dev, frag, ...) \ + CONCATENATE(_skb_frag_dma_map, \ + COUNT_ARGS(__VA_ARGS__))(dev, frag, ##__VA_ARGS__) + +#define __skb_frag_dma_map1(dev, frag, offset, uf, uo) ({ \ + const skb_frag_t *uf =3D (frag); \ + size_t uo =3D (offset); \ + \ + __skb_frag_dma_map(dev, uf, uo, skb_frag_size(uf) - uo, \ + DMA_TO_DEVICE); \ +}) +#define _skb_frag_dma_map1(dev, frag, offset) \ + __skb_frag_dma_map1(dev, frag, offset, __UNIQUE_ID(frag_), \ + __UNIQUE_ID(offset_)) +#define _skb_frag_dma_map0(dev, frag) \ + _skb_frag_dma_map1(dev, frag, 0) +#define _skb_frag_dma_map2(dev, frag, offset, size) \ + __skb_frag_dma_map(dev, frag, offset, size, DMA_TO_DEVICE) +#define _skb_frag_dma_map3(dev, frag, offset, size, dir) \ + __skb_frag_dma_map(dev, frag, offset, size, dir) + static inline struct sk_buff *pskb_copy(struct sk_buff *skb, gfp_t gfp_mask) { --=20 2.46.2 From nobody Wed Nov 27 14:28:30 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 37F4C1E2841; Wed, 9 Oct 2024 15:28:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487729; cv=none; b=DkO+Ro1sXNcZMDtyAkcV8V7mH36kD9XyEgEpGMdSrgHkrVKK16RhaFdo4czSSto7jsvJOpLvbcCDMEHfyuW1062QUpfSYEdh8mUWhcVwrLmrA26YbsbSEFZL+nQVtlvNw3I4rle58mqUrmdfMJB3s9RatzNyZvm1rmf3Nves3RI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487729; c=relaxed/simple; bh=kbq20DXpdYwp9Xsh+Yl+6W8Tf+jQuZgByx3qLkTFaT4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HHUnvCsfpI8MsHGIv0AFQGznH6ALEINV6+Z6lizAnPswvERalFQcKxeGNTZ8UIDjeE+UsnyNCfbg4tiCzKCubpQ9GsTvZtSKL5K7puvMFFqkzFSq+lj7HkxI00kQ0/dkfYb/TNcpBPIqYQtGrUQlvAJmX30UP52nByCJVqk3q74= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=MiPnJm9A; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="MiPnJm9A" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1728487728; x=1760023728; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=kbq20DXpdYwp9Xsh+Yl+6W8Tf+jQuZgByx3qLkTFaT4=; b=MiPnJm9AJy2wMJUq6mSoLus/Lw2jjLIdzbuUV9d1e5HYXZM7AA+n2fYf GJA0sgo24BRbHt9TELUVLX69a8fIVMVm4sTX73fh0v1izJ+ZFuxt2XZOo IubHPsU590oLnyUsKaZGqLZV6+HMmbbEJ/jHk+j7NuxXhoWWJGbNcXhEH i8zHyYvKmG6dIBipnIVSO7dh2jD5EJ7GNIEn8DmmDZGm5qrT5vxQRBLjR 3LMFp9mmSuR1pU0x7+wxjqsXJ1gE/BEmrdsz+stEXDVwDKRJitXBnAz1s ynsL0ESpjOt6svnY94yWIQipID7K8HUTXaeq1HXcCmEBc2gJ7TM2Cyjbm A==; X-CSE-ConnectionGUID: PXVnQbBvR2G+yRmwKklXBA== X-CSE-MsgGUID: 9QTnm7OjTGCg73FXl7ZT1g== X-IronPort-AV: E=McAfee;i="6700,10204,11220"; a="27675703" X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="27675703" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2024 08:28:47 -0700 X-CSE-ConnectionGUID: QW2j2/XQTH2RDkmfp3lFKQ== X-CSE-MsgGUID: 3KF8sTI+RG2+RU/nPz8F+Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="81305836" Received: from newjersey.igk.intel.com ([10.102.20.203]) by orviesa004.jf.intel.com with ESMTP; 09 Oct 2024 08:28:43 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Stanislav Fomichev , Magnus Karlsson , nex.sw.ncis.osdt.itp.upstreaming@intel.com, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, "Jose E. Marchesi" , Przemek Kitszel Subject: [PATCH net-next 03/18] unroll: add generic loop unroll helpers Date: Wed, 9 Oct 2024 17:27:41 +0200 Message-ID: <20241009152756.3113697-4-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.2 In-Reply-To: <20241009152756.3113697-1-aleksander.lobakin@intel.com> References: <20241009152756.3113697-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" There are cases when we need to explicitly unroll loops. For example, cache operations, filling DMA descriptors on very high speeds etc. Add compiler-specific attribute macros to give the compiler a hint that we'd like to unroll a loop. Example usage: #define UNROLL_BATCH 8 unrolled_count(UNROLL_BATCH) for (u32 i =3D 0; i < UNROLL_BATCH; i++) op(priv, i); Note that sometimes the compilers won't unroll loops if they think this would have worse optimization and perf than without unrolling, and that unroll attributes are available only starting GCC 8. For older compiler versions, no hints/attributes will be applied. For better unrolling/parallelization, don't have any variables that interfere between iterations except for the iterator itself. Co-developed-by: Jose E. Marchesi # pragmas Signed-off-by: Jose E. Marchesi Reviewed-by: Przemek Kitszel Signed-off-by: Alexander Lobakin --- include/linux/unroll.h | 43 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/include/linux/unroll.h b/include/linux/unroll.h index d42fd6366373..2870c89a93f4 100644 --- a/include/linux/unroll.h +++ b/include/linux/unroll.h @@ -9,6 +9,49 @@ =20 #include =20 +#ifdef CONFIG_CC_IS_CLANG +#define __pick_unrolled(x, y) _Pragma(#x) +#elif CONFIG_GCC_VERSION >=3D 80000 +#define __pick_unrolled(x, y) _Pragma(#y) +#else +#define __pick_unrolled(x, y) /* not supported */ +#endif + +/** + * unrolled - loop attributes to ask the compiler to unroll it + * + * Usage: + * + * #define BATCH 4 + * unrolled_count(BATCH) + * for (u32 i =3D 0; i < BATCH; i++) + * // loop body without cross-iteration dependencies + * + * This is only a hint and the compiler is free to disable unrolling if it + * thinks the count is suboptimal and may hurt performance and/or hugely + * increase object code size. + * Not having any cross-iteration dependencies (i.e. when iter x + 1 depen= ds + * on what iter x will do with variables) is not a strict requirement, but + * provides best performance and object code size. + * Available only on Clang and GCC 8.x onwards. + */ + +/* Ask the compiler to pick an optimal unroll count, Clang only */ +#define unrolled \ + __pick_unrolled(clang loop unroll(enable), /* nothing */) + +/* Unroll each @n iterations of a loop */ +#define unrolled_count(n) \ + __pick_unrolled(clang loop unroll_count(n), GCC unroll n) + +/* Unroll the whole loop */ +#define unrolled_full \ + __pick_unrolled(clang loop unroll(full), GCC unroll 65534) + +/* Never unroll a loop */ +#define unrolled_none \ + __pick_unrolled(clang loop unroll(disable), GCC unroll 1) + #define UNROLL(N, MACRO, args...) CONCATENATE(__UNROLL_, N)(MACRO, args) =20 #define __UNROLL_0(MACRO, args...) --=20 2.46.2 From nobody Wed Nov 27 14:28:30 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 83BC71E103B; Wed, 9 Oct 2024 15:28:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487734; cv=none; b=Tv1DBsXdDvvLqZcJS6EUaBtU5IkkdyaBZEWjuf+OBQRy9vu+c+hpWthg24vdPa5h+DHTsEHfGY8O+8ir3hWGBsROSJVLIUixcyMSR6XUlAUhUqlFnZe/KxW0RvGYmuS2/nK0HwvPOfgQp5lt2QN1O+LhiI1Bc81qdJIZlPnNp5U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487734; c=relaxed/simple; bh=TeeOH37WAvBpj5aOJdtqxOgr80igzzQO+brz05UxJbg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Uq57KVQgi2NgJosdobgednp/yitw6exrQDjAMxfpZtFZ0qv6qjOY+501yJFXuE61BHiPGf4QvCowRdu9YAWoj62wXiYfVBuREt8EasKMgsebzLmx4unDKvsXEPE609ROsSOF9gXxon3h0rIjoDEtlYB7wvop4ADVkoVeaGlKAlw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=kuQC71wi; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="kuQC71wi" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1728487732; x=1760023732; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TeeOH37WAvBpj5aOJdtqxOgr80igzzQO+brz05UxJbg=; b=kuQC71wimA9q/echvgItF/M6vq007xU4myHisfRs9qcy1wVvBrg5v0iM uUV+5sWg/ZFyR+T6r0t1P1ZgN8jjqWT032PbbAqTcspcJLFXlXvcrkX4I lqlmLPzPUJ/N5v+xV2pm/EwbMCLs0I7RG1nKeJFNfJ3VOjslBlk7p+T+a tUsPr4D7ium88lwHJR93MmAb9fUfx5K658IEm6F5JsdeML7ae4o/Xh01d 88VtheA3qSoHjDVXXvjwKATyawKNLRlGoeG6nUxF3wYE26bkaTDjdo1q5 cyYClV8QAd0X6HGm/vkiO3sc1367z2S8xph5zXDTfHWxomHtbLvlWx5+m Q==; X-CSE-ConnectionGUID: w9adIYrfT0yZrQRvUVcZ7w== X-CSE-MsgGUID: dVzeVuhWRlS0mceN3sCMlg== X-IronPort-AV: E=McAfee;i="6700,10204,11220"; a="27675716" X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="27675716" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2024 08:28:52 -0700 X-CSE-ConnectionGUID: OqnpdPDkTOqeFMvyTnDCZw== X-CSE-MsgGUID: JPqELvH0QhOlaY99UFUqzA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="81305870" Received: from newjersey.igk.intel.com ([10.102.20.203]) by orviesa004.jf.intel.com with ESMTP; 09 Oct 2024 08:28:48 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Stanislav Fomichev , Magnus Karlsson , nex.sw.ncis.osdt.itp.upstreaming@intel.com, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next 04/18] bpf, xdp: constify some bpf_prog * function arguments Date: Wed, 9 Oct 2024 17:27:42 +0200 Message-ID: <20241009152756.3113697-5-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.2 In-Reply-To: <20241009152756.3113697-1-aleksander.lobakin@intel.com> References: <20241009152756.3113697-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In lots of places, bpf_prog pointer is used only for tracing or other stuff that doesn't modify the structure itself. Same for net_device. Address at least some of them and add `const` attributes there. The object code didn't change, but that may prevent unwanted data modifications and also allow more helpers to have const arguments. Signed-off-by: Alexander Lobakin --- include/linux/bpf.h | 12 ++++++------ include/linux/filter.h | 9 +++++---- include/linux/netdevice.h | 6 +++--- include/linux/skbuff.h | 2 +- kernel/bpf/devmap.c | 8 ++++---- net/core/dev.c | 10 +++++----- net/core/filter.c | 29 ++++++++++++++++------------- net/core/skbuff.c | 2 +- 8 files changed, 41 insertions(+), 37 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 19d8ca8ac960..263515478984 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2534,10 +2534,10 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, st= ruct xdp_frame *xdpf, int dev_map_enqueue_multi(struct xdp_frame *xdpf, struct net_device *dev_r= x, struct bpf_map *map, bool exclude_ingress); int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *= skb, - struct bpf_prog *xdp_prog); + const struct bpf_prog *xdp_prog); int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb, - struct bpf_prog *xdp_prog, struct bpf_map *map, - bool exclude_ingress); + const struct bpf_prog *xdp_prog, + struct bpf_map *map, bool exclude_ingress); =20 void __cpu_map_flush(struct list_head *flush_list); int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_frame *xdpf, @@ -2801,15 +2801,15 @@ struct sk_buff; =20 static inline int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb, - struct bpf_prog *xdp_prog) + const struct bpf_prog *xdp_prog) { return 0; } =20 static inline int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb, - struct bpf_prog *xdp_prog, struct bpf_map *map, - bool exclude_ingress) + const struct bpf_prog *xdp_prog, + struct bpf_map *map, bool exclude_ingress) { return 0; } diff --git a/include/linux/filter.h b/include/linux/filter.h index 7d7578a8eac1..ee067ab13272 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -1178,17 +1178,18 @@ static inline int xdp_ok_fwd_dev(const struct net_d= evice *fwd, * This does not appear to be a real limitation for existing software. */ int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb, - struct xdp_buff *xdp, struct bpf_prog *prog); + struct xdp_buff *xdp, const struct bpf_prog *prog); int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, - struct bpf_prog *prog); + const struct bpf_prog *prog); int xdp_do_redirect_frame(struct net_device *dev, struct xdp_buff *xdp, struct xdp_frame *xdpf, - struct bpf_prog *prog); + const struct bpf_prog *prog); void xdp_do_flush(void); =20 -void bpf_warn_invalid_xdp_action(struct net_device *dev, struct bpf_prog *= prog, u32 act); +void bpf_warn_invalid_xdp_action(const struct net_device *dev, + const struct bpf_prog *prog, u32 act); =20 #ifdef CONFIG_INET struct sock *bpf_run_sk_reuseport(struct sock_reuseport *reuse, struct soc= k *sk, diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 43720e3bf25b..b31c8ef4bb7b 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3910,9 +3910,9 @@ static inline void dev_consume_skb_any(struct sk_buff= *skb) } =20 u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp, - struct bpf_prog *xdp_prog); -void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog); -int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff **pskb); + const struct bpf_prog *xdp_prog); +void generic_xdp_tx(struct sk_buff *skb, const struct bpf_prog *xdp_prog); +int do_xdp_generic(const struct bpf_prog *xdp_prog, struct sk_buff **pskb); int netif_rx(struct sk_buff *skb); int __netif_rx(struct sk_buff *skb); =20 diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 4a07bb3bcec6..0f0266183d3e 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3590,7 +3590,7 @@ static inline netmem_ref skb_frag_netmem(const skb_fr= ag_t *frag) int skb_pp_cow_data(struct page_pool *pool, struct sk_buff **pskb, unsigned int headroom); int skb_cow_data_for_xdp(struct page_pool *pool, struct sk_buff **pskb, - struct bpf_prog *prog); + const struct bpf_prog *prog); =20 /** * skb_frag_address - gets the address of the data contained in a paged fr= agment diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index 9e0e3b0a18e4..f634b87aa0fa 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -675,7 +675,7 @@ int dev_map_enqueue_multi(struct xdp_frame *xdpf, struc= t net_device *dev_rx, } =20 int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *= skb, - struct bpf_prog *xdp_prog) + const struct bpf_prog *xdp_prog) { int err; =20 @@ -698,7 +698,7 @@ int dev_map_generic_redirect(struct bpf_dtab_netdev *ds= t, struct sk_buff *skb, =20 static int dev_map_redirect_clone(struct bpf_dtab_netdev *dst, struct sk_buff *skb, - struct bpf_prog *xdp_prog) + const struct bpf_prog *xdp_prog) { struct sk_buff *nskb; int err; @@ -717,8 +717,8 @@ static int dev_map_redirect_clone(struct bpf_dtab_netde= v *dst, } =20 int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb, - struct bpf_prog *xdp_prog, struct bpf_map *map, - bool exclude_ingress) + const struct bpf_prog *xdp_prog, + struct bpf_map *map, bool exclude_ingress) { struct bpf_dtab *dtab =3D container_of(map, struct bpf_dtab, map); struct bpf_dtab_netdev *dst, *last_dst =3D NULL; diff --git a/net/core/dev.c b/net/core/dev.c index 3f47e9d48636..c86cf5e235fc 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4930,7 +4930,7 @@ static struct netdev_rx_queue *netif_get_rxqueue(stru= ct sk_buff *skb) } =20 u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp, - struct bpf_prog *xdp_prog) + const struct bpf_prog *xdp_prog) { void *orig_data, *orig_data_end, *hard_start; struct netdev_rx_queue *rxqueue; @@ -5032,7 +5032,7 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, str= uct xdp_buff *xdp, } =20 static int -netif_skb_check_for_xdp(struct sk_buff **pskb, struct bpf_prog *prog) +netif_skb_check_for_xdp(struct sk_buff **pskb, const struct bpf_prog *prog) { struct sk_buff *skb =3D *pskb; int err, hroom, troom; @@ -5056,7 +5056,7 @@ netif_skb_check_for_xdp(struct sk_buff **pskb, struct= bpf_prog *prog) =20 static u32 netif_receive_generic_xdp(struct sk_buff **pskb, struct xdp_buff *xdp, - struct bpf_prog *xdp_prog) + const struct bpf_prog *xdp_prog) { struct sk_buff *skb =3D *pskb; u32 mac_len, act =3D XDP_DROP; @@ -5109,7 +5109,7 @@ static u32 netif_receive_generic_xdp(struct sk_buff *= *pskb, * and DDOS attacks will be more effective. In-driver-XDP use dedicated TX * queues, so they do not have this starvation issue. */ -void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog) +void generic_xdp_tx(struct sk_buff *skb, const struct bpf_prog *xdp_prog) { struct net_device *dev =3D skb->dev; struct netdev_queue *txq; @@ -5134,7 +5134,7 @@ void generic_xdp_tx(struct sk_buff *skb, struct bpf_p= rog *xdp_prog) =20 static DEFINE_STATIC_KEY_FALSE(generic_xdp_needed_key); =20 -int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff **pskb) +int do_xdp_generic(const struct bpf_prog *xdp_prog, struct sk_buff **pskb) { struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; =20 diff --git a/net/core/filter.c b/net/core/filter.c index bd0d08bf76bb..be78a5cd4e8c 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4349,9 +4349,9 @@ u32 xdp_master_redirect(struct xdp_buff *xdp) EXPORT_SYMBOL_GPL(xdp_master_redirect); =20 static inline int __xdp_do_redirect_xsk(struct bpf_redirect_info *ri, - struct net_device *dev, + const struct net_device *dev, struct xdp_buff *xdp, - struct bpf_prog *xdp_prog) + const struct bpf_prog *xdp_prog) { enum bpf_map_type map_type =3D ri->map_type; void *fwd =3D ri->tgt_value; @@ -4372,10 +4372,10 @@ static inline int __xdp_do_redirect_xsk(struct bpf_= redirect_info *ri, return err; } =20 -static __always_inline int __xdp_do_redirect_frame(struct bpf_redirect_inf= o *ri, - struct net_device *dev, - struct xdp_frame *xdpf, - struct bpf_prog *xdp_prog) +static __always_inline int +__xdp_do_redirect_frame(struct bpf_redirect_info *ri, struct net_device *d= ev, + struct xdp_frame *xdpf, + const struct bpf_prog *xdp_prog) { enum bpf_map_type map_type =3D ri->map_type; void *fwd =3D ri->tgt_value; @@ -4444,7 +4444,7 @@ static __always_inline int __xdp_do_redirect_frame(st= ruct bpf_redirect_info *ri, } =20 int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, - struct bpf_prog *xdp_prog) + const struct bpf_prog *xdp_prog) { struct bpf_redirect_info *ri =3D bpf_net_ctx_get_ri(); enum bpf_map_type map_type =3D ri->map_type; @@ -4458,7 +4458,8 @@ int xdp_do_redirect(struct net_device *dev, struct xd= p_buff *xdp, EXPORT_SYMBOL_GPL(xdp_do_redirect); =20 int xdp_do_redirect_frame(struct net_device *dev, struct xdp_buff *xdp, - struct xdp_frame *xdpf, struct bpf_prog *xdp_prog) + struct xdp_frame *xdpf, + const struct bpf_prog *xdp_prog) { struct bpf_redirect_info *ri =3D bpf_net_ctx_get_ri(); enum bpf_map_type map_type =3D ri->map_type; @@ -4473,9 +4474,9 @@ EXPORT_SYMBOL_GPL(xdp_do_redirect_frame); static int xdp_do_generic_redirect_map(struct net_device *dev, struct sk_buff *skb, struct xdp_buff *xdp, - struct bpf_prog *xdp_prog, void *fwd, - enum bpf_map_type map_type, u32 map_id, - u32 flags) + const struct bpf_prog *xdp_prog, + void *fwd, enum bpf_map_type map_type, + u32 map_id, u32 flags) { struct bpf_redirect_info *ri =3D bpf_net_ctx_get_ri(); struct bpf_map *map; @@ -4529,7 +4530,8 @@ static int xdp_do_generic_redirect_map(struct net_dev= ice *dev, } =20 int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb, - struct xdp_buff *xdp, struct bpf_prog *xdp_prog) + struct xdp_buff *xdp, + const struct bpf_prog *xdp_prog) { struct bpf_redirect_info *ri =3D bpf_net_ctx_get_ri(); enum bpf_map_type map_type =3D ri->map_type; @@ -9079,7 +9081,8 @@ static bool xdp_is_valid_access(int off, int size, return __is_valid_xdp_access(off, size); } =20 -void bpf_warn_invalid_xdp_action(struct net_device *dev, struct bpf_prog *= prog, u32 act) +void bpf_warn_invalid_xdp_action(const struct net_device *dev, + const struct bpf_prog *prog, u32 act) { const u32 act_max =3D XDP_REDIRECT; =20 diff --git a/net/core/skbuff.c b/net/core/skbuff.c index e0968d3dc450..e7765716983f 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1071,7 +1071,7 @@ int skb_pp_cow_data(struct page_pool *pool, struct sk= _buff **pskb, EXPORT_SYMBOL(skb_pp_cow_data); =20 int skb_cow_data_for_xdp(struct page_pool *pool, struct sk_buff **pskb, - struct bpf_prog *prog) + const struct bpf_prog *prog) { if (!prog->aux->xdp_has_frags) return -EINVAL; --=20 2.46.2 From nobody Wed Nov 27 14:28:30 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 207201E32D1; Wed, 9 Oct 2024 15:28:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487737; cv=none; b=dzIEsjx1Z+4GPtuNEPnLc0DwttRm8MMUlSpZtQHGPfzAi5qga49c2ugMO8y7vmtII+t9eru8bTlTjlSzQamBM/ZLNk8+Dy+umCcUcmRf3Dm8uBEQT3/BnHsr+OslpDsGg4ogkjAC7vPPY1A4ZhsIQc61oeW0RhDpDGtc7+yVjco= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487737; c=relaxed/simple; bh=dJlueMbv2H250KgUk6TCEW2A2W3BSOFKutIIN8CF/Lk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=bNQppUl9BmVkg1zoSaeI+dXSu9BqySqp+1DeAn7qEx9W/OGJIIXDfguipxOjrb6kUkSO99DbxGsV3WBJMr2q8TNil383bryoLliUNUlb0qA1Y6cgYHz+EjKHD/VswVHiqLkUs4jRtvO5pSx/cMlD0bP6UK03WTN3sSzcKNc0LHY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=k37a+5dS; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="k37a+5dS" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1728487736; x=1760023736; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dJlueMbv2H250KgUk6TCEW2A2W3BSOFKutIIN8CF/Lk=; b=k37a+5dSOU1CrXrjN9S7ihSWipw5amJ5k/8n+gB2X8ga4KKZT5yLziad 2Pz7bfvqiDl9J72yT6fjszAS//6YpCwHo+Rjyc1Duvuz8gzd47/tbUD0z nsE7xjyshsefSWtOES9wCc61K9klQ5Y4V3/SMPupTQJIFuzU/TPbc5/JX ByvaS9Bl4vwRhgyb4ILYqLdysY7MssCm9y07TmS+RweGEyU6njQ7cPJsR YrUwKRY8XQuxM1zDQ/+IxossgJF1g/APBdjtxDU1JXAQj1NONRao/bVhH FMHMeStEDkK7JtxfAO8YYS0rVN9+dc70k+ZVVHlRd58fFSBK1gfz5q/pD w==; X-CSE-ConnectionGUID: FeLoH8ZNRsyi9sF1veA/Zw== X-CSE-MsgGUID: 2btu7Q76Sz+GVGrkUZ9yoA== X-IronPort-AV: E=McAfee;i="6700,10204,11220"; a="27675732" X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="27675732" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2024 08:28:55 -0700 X-CSE-ConnectionGUID: ssjHJADnRd6YzFcy3TgPSQ== X-CSE-MsgGUID: 2Vh33guST3KOnOVyGMXW9A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="81305894" Received: from newjersey.igk.intel.com ([10.102.20.203]) by orviesa004.jf.intel.com with ESMTP; 09 Oct 2024 08:28:52 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Stanislav Fomichev , Magnus Karlsson , nex.sw.ncis.osdt.itp.upstreaming@intel.com, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next 05/18] xdp, xsk: constify read-only arguments of some static inline helpers Date: Wed, 9 Oct 2024 17:27:43 +0200 Message-ID: <20241009152756.3113697-6-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.2 In-Reply-To: <20241009152756.3113697-1-aleksander.lobakin@intel.com> References: <20241009152756.3113697-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Lots of read-only helpers for &xdp_buff and &xdp_frame, such as getting the frame length, skb_shared_info etc., don't have their arguments marked with `const` for no reason. Add the missing annotations to leave less place for mistakes and more for optimization. Signed-off-by: Alexander Lobakin --- include/net/xdp.h | 29 +++++++++++++++++------------ include/net/xdp_sock_drv.h | 11 ++++++----- include/net/xsk_buff_pool.h | 2 +- 3 files changed, 24 insertions(+), 18 deletions(-) diff --git a/include/net/xdp.h b/include/net/xdp.h index bd3363e384b2..e683e835ab82 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -88,7 +88,7 @@ struct xdp_buff { u32 flags; /* supported values defined in xdp_buff_flags */ }; =20 -static __always_inline bool xdp_buff_has_frags(struct xdp_buff *xdp) +static __always_inline bool xdp_buff_has_frags(const struct xdp_buff *xdp) { return !!(xdp->flags & XDP_FLAGS_HAS_FRAGS); } @@ -103,7 +103,8 @@ static __always_inline void xdp_buff_clear_frags_flag(s= truct xdp_buff *xdp) xdp->flags &=3D ~XDP_FLAGS_HAS_FRAGS; } =20 -static __always_inline bool xdp_buff_is_frag_pfmemalloc(struct xdp_buff *x= dp) +static __always_inline bool +xdp_buff_is_frag_pfmemalloc(const struct xdp_buff *xdp) { return !!(xdp->flags & XDP_FLAGS_FRAGS_PF_MEMALLOC); } @@ -144,15 +145,16 @@ xdp_prepare_buff(struct xdp_buff *xdp, unsigned char = *hard_start, SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) =20 static inline struct skb_shared_info * -xdp_get_shared_info_from_buff(struct xdp_buff *xdp) +xdp_get_shared_info_from_buff(const struct xdp_buff *xdp) { return (struct skb_shared_info *)xdp_data_hard_end(xdp); } =20 -static __always_inline unsigned int xdp_get_buff_len(struct xdp_buff *xdp) +static __always_inline unsigned int +xdp_get_buff_len(const struct xdp_buff *xdp) { unsigned int len =3D xdp->data_end - xdp->data; - struct skb_shared_info *sinfo; + const struct skb_shared_info *sinfo; =20 if (likely(!xdp_buff_has_frags(xdp))) goto out; @@ -177,12 +179,13 @@ struct xdp_frame { u32 flags; /* supported values defined in xdp_buff_flags */ }; =20 -static __always_inline bool xdp_frame_has_frags(struct xdp_frame *frame) +static __always_inline bool xdp_frame_has_frags(const struct xdp_frame *fr= ame) { return !!(frame->flags & XDP_FLAGS_HAS_FRAGS); } =20 -static __always_inline bool xdp_frame_is_frag_pfmemalloc(struct xdp_frame = *frame) +static __always_inline bool +xdp_frame_is_frag_pfmemalloc(const struct xdp_frame *frame) { return !!(frame->flags & XDP_FLAGS_FRAGS_PF_MEMALLOC); } @@ -201,7 +204,7 @@ static __always_inline void xdp_frame_bulk_init(struct = xdp_frame_bulk *bq) } =20 static inline struct skb_shared_info * -xdp_get_shared_info_from_frame(struct xdp_frame *frame) +xdp_get_shared_info_from_frame(const struct xdp_frame *frame) { void *data_hard_start =3D frame->data - frame->headroom - sizeof(*frame); =20 @@ -248,7 +251,8 @@ struct sk_buff *xdp_build_skb_from_frame(struct xdp_fra= me *xdpf, struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf); =20 static inline -void xdp_convert_frame_to_buff(struct xdp_frame *frame, struct xdp_buff *x= dp) +void xdp_convert_frame_to_buff(const struct xdp_frame *frame, + struct xdp_buff *xdp) { xdp->data_hard_start =3D frame->data - frame->headroom - sizeof(*frame); xdp->data =3D frame->data; @@ -259,7 +263,7 @@ void xdp_convert_frame_to_buff(struct xdp_frame *frame,= struct xdp_buff *xdp) } =20 static inline -int xdp_update_frame_from_buff(struct xdp_buff *xdp, +int xdp_update_frame_from_buff(const struct xdp_buff *xdp, struct xdp_frame *xdp_frame) { int metasize, headroom; @@ -316,9 +320,10 @@ void xdp_flush_frame_bulk(struct xdp_frame_bulk *bq); void xdp_return_frame_bulk(struct xdp_frame *xdpf, struct xdp_frame_bulk *bq); =20 -static __always_inline unsigned int xdp_get_frame_len(struct xdp_frame *xd= pf) +static __always_inline unsigned int +xdp_get_frame_len(const struct xdp_frame *xdpf) { - struct skb_shared_info *sinfo; + const struct skb_shared_info *sinfo; unsigned int len =3D xdpf->len; =20 if (likely(!xdp_frame_has_frags(xdpf))) diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index 0a5dca2b2b3f..dcd469d25840 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -101,7 +101,7 @@ static inline struct xdp_buff *xsk_buff_alloc(struct xs= k_buff_pool *pool) return xp_alloc(pool); } =20 -static inline bool xsk_is_eop_desc(struct xdp_desc *desc) +static inline bool xsk_is_eop_desc(const struct xdp_desc *desc) { return !xp_mb_desc(desc); } @@ -143,7 +143,7 @@ static inline void xsk_buff_add_frag(struct xdp_buff *x= dp) list_add_tail(&frag->xskb_list_node, &frag->pool->xskb_list); } =20 -static inline struct xdp_buff *xsk_buff_get_frag(struct xdp_buff *first) +static inline struct xdp_buff *xsk_buff_get_frag(const struct xdp_buff *fi= rst) { struct xdp_buff_xsk *xskb =3D container_of(first, struct xdp_buff_xsk, xd= p); struct xdp_buff *ret =3D NULL; @@ -200,7 +200,8 @@ static inline void *xsk_buff_raw_get_data(struct xsk_bu= ff_pool *pool, u64 addr) XDP_TXMD_FLAGS_CHECKSUM | \ 0) =20 -static inline bool xsk_buff_valid_tx_metadata(struct xsk_tx_metadata *meta) +static inline bool +xsk_buff_valid_tx_metadata(const struct xsk_tx_metadata *meta) { return !(meta->flags & ~XDP_TXMD_FLAGS_VALID); } @@ -337,7 +338,7 @@ static inline struct xdp_buff *xsk_buff_alloc(struct xs= k_buff_pool *pool) return NULL; } =20 -static inline bool xsk_is_eop_desc(struct xdp_desc *desc) +static inline bool xsk_is_eop_desc(const struct xdp_desc *desc) { return false; } @@ -360,7 +361,7 @@ static inline void xsk_buff_add_frag(struct xdp_buff *x= dp) { } =20 -static inline struct xdp_buff *xsk_buff_get_frag(struct xdp_buff *first) +static inline struct xdp_buff *xsk_buff_get_frag(const struct xdp_buff *fi= rst) { return NULL; } diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index bacb33f1e3e5..0442ba8dafa4 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -185,7 +185,7 @@ static inline bool xp_desc_crosses_non_contig_pg(struct= xsk_buff_pool *pool, !(pool->dma_pages[addr >> PAGE_SHIFT] & XSK_NEXT_PG_CONTIG_MASK); } =20 -static inline bool xp_mb_desc(struct xdp_desc *desc) +static inline bool xp_mb_desc(const struct xdp_desc *desc) { return desc->options & XDP_PKT_CONTD; } --=20 2.46.2 From nobody Wed Nov 27 14:28:30 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E261C1E3DD7; Wed, 9 Oct 2024 15:28:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487741; cv=none; b=PhyG/lcvmF2l4XvuYn+0FmYsyw5JBQzS/gtTZjqzPvjJl9hB86RmScYuBswNoc5jeegHLhur0qu/aLy5LVqSQbicBh1tiQczGmJL9JTaRLBUoLcZAE91SuCvBMIlZBliJJNqvUzrwC51cZO5jTCcxX8VmctJp2ALDAxgt0Fua40= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487741; c=relaxed/simple; bh=44POJs3v9/8guCtY9BiRnRKSxN9pZUlcfyC0/KKE7gk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=L8KgBptqpj/CZSpGJXim8gXFoAtX9E/pCXwxmPMalXfWhn7w71GTa+xmnbvHEq8Rzz2/6Rzcw8VcUS/L3jCahAI/fxIjVRgkCfnPgMGfGTTXFzCKxys8tJCiXQOPXhSlMmcNgCvMzvpomt+YJTulV9grYNbvuCzkirefqSBY3Cw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=htG5pqdG; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="htG5pqdG" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1728487740; x=1760023740; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=44POJs3v9/8guCtY9BiRnRKSxN9pZUlcfyC0/KKE7gk=; b=htG5pqdGXwmM1ln5/blPA+QKBoPXDk6HRQwVvzYFgX9xwLvqI52egz3k BiM3vek6e9Zmm1AAhoKUR+/xF5kNeyxamW4970BXvzoja4PWaafglPutF qeMSpW1+L7CXFqlF5FH0ZQgmEynag0WfkhqKtvdyWUmi8VXx5BfrwzU3z pdfOk7j5qw14HkG4Ox6S9Dol9tCXdbcyTKfwbQzqUR/5V2c1jPnAqdXNf toTImMynrbtHZnaE98SW6AkJk+kRTDQr2rvk516NNnAD3NgYt4hBXBNdW scQmqh8MeGLx5yL/mA4EOR0ykqmI/Q1VuEtRI+RQtyTxe5/1Xw2Vcs6/f w==; X-CSE-ConnectionGUID: oELuvMQGTs23/jpzt9tSWw== X-CSE-MsgGUID: Xa2EPGkAR6S6j8I9b6ddsQ== X-IronPort-AV: E=McAfee;i="6700,10204,11220"; a="27675743" X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="27675743" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2024 08:28:59 -0700 X-CSE-ConnectionGUID: Cr1T/hVCRqmrRMLoeWceVg== X-CSE-MsgGUID: z2rQevScTxefXxuHl4I0Gg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="81305915" Received: from newjersey.igk.intel.com ([10.102.20.203]) by orviesa004.jf.intel.com with ESMTP; 09 Oct 2024 08:28:56 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Stanislav Fomichev , Magnus Karlsson , nex.sw.ncis.osdt.itp.upstreaming@intel.com, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next 06/18] xdp: allow attaching already registered memory model to xdp_rxq_info Date: Wed, 9 Oct 2024 17:27:44 +0200 Message-ID: <20241009152756.3113697-7-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.2 In-Reply-To: <20241009152756.3113697-1-aleksander.lobakin@intel.com> References: <20241009152756.3113697-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" One may need to register memory model separately from xdp_rxq_info. One simple example may be XDP test run code, but in general, it might be useful when memory model registering is managed by one layer and then XDP RxQ info by a different one. Allow such scenarios by adding a simple helper which "attaches" an already registered memory model to the desired xdp_rxq_info. As this is mostly needed for Page Pool, add a special function to do that for a &page_pool pointer. Signed-off-by: Alexander Lobakin --- include/net/xdp.h | 32 +++++++++++++++++++++++++++ net/core/xdp.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 88 insertions(+) diff --git a/include/net/xdp.h b/include/net/xdp.h index e683e835ab82..fae6305e2123 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -355,6 +355,38 @@ void xdp_rxq_info_unreg_mem_model(struct xdp_rxq_info = *xdp_rxq); int xdp_reg_mem_model(struct xdp_mem_info *mem, enum xdp_mem_type type, void *allocator); void xdp_unreg_mem_model(struct xdp_mem_info *mem); +int xdp_reg_page_pool(struct page_pool *pool); +void xdp_unreg_page_pool(const struct page_pool *pool); +void xdp_rxq_info_attach_page_pool(struct xdp_rxq_info *xdp_rxq, + const struct page_pool *pool); + +/** + * xdp_rxq_info_attach_mem_model - attach a registered mem info to an RxQ = info + * @xdp_rxq: XDP RxQ info to attach the memory info to + * @mem: already registered memory info + * + * If a driver registers its memory providers manually, it must use this + * function instead of xdp_rxq_info_reg_mem_model(). + */ +static inline void +xdp_rxq_info_attach_mem_model(struct xdp_rxq_info *xdp_rxq, + const struct xdp_mem_info *mem) +{ + xdp_rxq->mem =3D *mem; +} + +/** + * xdp_rxq_info_detach_mem_model - detach a registered mem info from RxQ i= nfo + * @xdp_rxq: XDP RxQ info to detach the memory info from + * + * If a driver registers its memory providers manually and then attaches it + * via xdp_rxq_info_attach_mem_model(), it must call this function before + * xdp_rxq_info_unreg(). + */ +static inline void xdp_rxq_info_detach_mem_model(struct xdp_rxq_info *xdp_= rxq) +{ + xdp_rxq->mem =3D (struct xdp_mem_info){ }; +} =20 /* Drivers not supporting XDP metadata can use this helper, which * rejects any room expansion for metadata as a result. diff --git a/net/core/xdp.c b/net/core/xdp.c index 34d057089d20..72d2bd22bc40 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -365,6 +365,62 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xd= p_rxq, =20 EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model); =20 +/** + * xdp_reg_page_pool - register a &page_pool as a memory provider for XDP + * @pool: &page_pool to register + * + * Can be used to register pools manually without connecting to any XDP RxQ + * info, so that the XDP layer will be aware of them. Then, they can be + * attached to an RxQ info manually via xdp_rxq_info_attach_page_pool(). + * + * Return: %0 on success, -errno on error. + */ +int xdp_reg_page_pool(struct page_pool *pool) +{ + struct xdp_mem_info mem; + + return xdp_reg_mem_model(&mem, MEM_TYPE_PAGE_POOL, pool); +} +EXPORT_SYMBOL_GPL(xdp_reg_page_pool); + +/** + * xdp_unreg_page_pool - unregister a &page_pool from the memory providers= list + * @pool: &page_pool to unregister + * + * A shorthand for manual unregistering page pools. If the pool was previo= usly + * attached to an RxQ info, it must be detached first. + */ +void xdp_unreg_page_pool(const struct page_pool *pool) +{ + struct xdp_mem_info mem =3D { + .type =3D MEM_TYPE_PAGE_POOL, + .id =3D pool->xdp_mem_id, + }; + + xdp_unreg_mem_model(&mem); +} +EXPORT_SYMBOL_GPL(xdp_unreg_page_pool); + +/** + * xdp_rxq_info_attach_page_pool - attach a registered pool to an RxQ info + * @xdp_rxq: XDP RxQ info to attach the pool to + * @pool: pool to attach + * + * If the pool was registered manually, this function must be called inste= ad + * of xdp_rxq_info_reg_mem_model() to connect it to an RxQ info. + */ +void xdp_rxq_info_attach_page_pool(struct xdp_rxq_info *xdp_rxq, + const struct page_pool *pool) +{ + struct xdp_mem_info mem =3D { + .type =3D MEM_TYPE_PAGE_POOL, + .id =3D pool->xdp_mem_id, + }; + + xdp_rxq_info_attach_mem_model(xdp_rxq, &mem); +} +EXPORT_SYMBOL_GPL(xdp_rxq_info_attach_page_pool); + /* XDP RX runs under NAPI protection, and in different delivery error * scenarios (e.g. queue full), it is possible to return the xdp_frame * while still leveraging this protection. The @napi_direct boolean --=20 2.46.2 From nobody Wed Nov 27 14:28:30 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5439F1E47AD; Wed, 9 Oct 2024 15:29:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487745; cv=none; b=LQrG0KrDUyaA5j1mXzkScUoeUZW2W8yNyAUWc/p+5HYuw3stHQj2KkPMvcDPzZJdUADyEZ87vyClzCsBghgRHjxA1+aUJQkY8DeoMGQSiW9VeSCMXqURdaYHHbOdINBZ0nd9rtFB/DOnEC6N+WnhLfE4YfceUsvQF/Y86/RN5kA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487745; c=relaxed/simple; bh=VWcOAVANAkT5BVZi5PuJvDmmLtKpescHO06WqVKkduk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=DKFzt493EY7/OZnbyS91Fy0kyeUGWXJ7JGhBLbpxEKOHkzk6gF9w62hrbdMTMAbyWKw/bFHSGtAUl5UFRpDx8+yjsj0BM3VZiLEIyDLq6Wmr4k/KcJepp/9cJrVPgypPRd+6ltmceFHxETyoavl48zg3JGbM7oCQddy3sHZU8jI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=PTeUaBtu; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="PTeUaBtu" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1728487744; x=1760023744; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VWcOAVANAkT5BVZi5PuJvDmmLtKpescHO06WqVKkduk=; b=PTeUaBtu0jIW1wdSzHuiBMGRwZCsPtsmeFIe/TFzWxN1+SAocAx6xbKC 9RV37sXVJmV1Lhvo6thxAYrvyn6+vKuAiqN2sj/Qb07BWhJfBq6TtMo42 L3avyj0AGe73Kx6Hy7EdeeiB9g6y/QeW65vK0Gh7miioxs7cEWNViOkHW OMJ59cLb4qNVRZ7w7uvhuDv3jD8M9iTmcrJgq24emWGyFJNCumTXenF9W 70bKo9VQad45azSYHb/5ojEEORUssbkudQg0UiCLhEPRKSUjpzZtexWC5 G41ijA9xXpZRpZiD7gKXwWNGhfeS9qZAphOxBNY/+SvVDIFPbkbHj/CTW A==; X-CSE-ConnectionGUID: I2msODszTEKadPAk1JsJqg== X-CSE-MsgGUID: IHBPWBemTPKE+zF8GLEoag== X-IronPort-AV: E=McAfee;i="6700,10204,11220"; a="27675757" X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="27675757" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2024 08:29:03 -0700 X-CSE-ConnectionGUID: KG79ojUbTzWjoNjB+gEqzw== X-CSE-MsgGUID: dwuzp8JXQsmm71avXFGVfg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="81305954" Received: from newjersey.igk.intel.com ([10.102.20.203]) by orviesa004.jf.intel.com with ESMTP; 09 Oct 2024 08:28:59 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Stanislav Fomichev , Magnus Karlsson , nex.sw.ncis.osdt.itp.upstreaming@intel.com, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next 07/18] net: Register system page pool as an XDP memory model Date: Wed, 9 Oct 2024 17:27:45 +0200 Message-ID: <20241009152756.3113697-8-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.2 In-Reply-To: <20241009152756.3113697-1-aleksander.lobakin@intel.com> References: <20241009152756.3113697-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Toke H=C3=B8iland-J=C3=B8rgensen To make the system page pool usable as a source for allocating XDP frames, we need to register it with xdp_reg_mem_model(), so that page return works correctly. This is done in preparation for using the system page pool for the XDP live frame mode in BPF_TEST_RUN; for the same reason, make the per-cpu variable non-static so we can access it from the test_run code as well. Signed-off-by: Toke H=C3=B8iland-J=C3=B8rgensen Signed-off-by: Alexander Lobakin --- include/linux/netdevice.h | 1 + net/core/dev.c | 10 +++++++++- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index b31c8ef4bb7b..24700d0b4cc3 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3286,6 +3286,7 @@ struct softnet_data { }; =20 DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data); +DECLARE_PER_CPU(struct page_pool *, system_page_pool); =20 #ifndef CONFIG_PREEMPT_RT static inline int dev_recursion_level(void) diff --git a/net/core/dev.c b/net/core/dev.c index c86cf5e235fc..98f676f0be3a 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -460,7 +460,7 @@ EXPORT_PER_CPU_SYMBOL(softnet_data); * PP consumers must pay attention to run APIs in the appropriate context * (e.g. NAPI context). */ -static DEFINE_PER_CPU(struct page_pool *, system_page_pool); +DEFINE_PER_CPU(struct page_pool *, system_page_pool); =20 #ifdef CONFIG_LOCKDEP /* @@ -12037,11 +12037,18 @@ static int net_page_pool_create(int cpuid) .nid =3D cpu_to_mem(cpuid), }; struct page_pool *pp_ptr; + int err; =20 pp_ptr =3D page_pool_create_percpu(&page_pool_params, cpuid); if (IS_ERR(pp_ptr)) return -ENOMEM; =20 + err =3D xdp_reg_page_pool(pp_ptr); + if (err) { + page_pool_destroy(pp_ptr); + return err; + } + per_cpu(system_page_pool, cpuid) =3D pp_ptr; #endif return 0; @@ -12175,6 +12182,7 @@ static int __init net_dev_init(void) if (!pp_ptr) continue; =20 + xdp_unreg_page_pool(pp_ptr); page_pool_destroy(pp_ptr); per_cpu(system_page_pool, i) =3D NULL; } --=20 2.46.2 From nobody Wed Nov 27 14:28:30 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 912F01E47D5; Wed, 9 Oct 2024 15:29:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487749; cv=none; b=ph5Qs6Au9oH2uvG1O7P9q7KGMvhLHGLQA9BQIuVOvqr8QurtwT8TP5pBlr+OftPuzvlXgO2u3oAsHAvPsJ+NV4vddlGPFacle4pYe5lXPzxsuW+jGaJzBG3xpGeNgBnAMsG+QCj16kNCZqiKwni4G9AgTAOKjLmH8Du+Nq3f7h0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487749; c=relaxed/simple; bh=VunDVl3Xvc8ika62yCC+qV1MgWuofkp3d7WaFC9Lmxo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FJG0Vz6xNeFbvKdIJIMkJhjCyCl/0q5ycYYUVtRHYKzg4VUJ9xXvMVdOiaBFjSl6bU7UkpezrLwIeZAToh8JHQHea/kXkTNJrZxhHcO1SeeVkWDRBRMr8Ms2w6k4ODBSYCAuFKxuqmzXDTaGSkRFY+FDIqBWbL1Rd2SWEZURZtM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=XbR9+z8D; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="XbR9+z8D" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1728487747; x=1760023747; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VunDVl3Xvc8ika62yCC+qV1MgWuofkp3d7WaFC9Lmxo=; b=XbR9+z8D9T76gVixGVCYzTR3YDUc5/tn8XmnVws9dCMd5fZJWXTYhe42 ep8grAps/4lNqwpZra0SK2gpsw3GjyWLjeexi8QGC38t3YGKH+863MmCk nuHEYCmQr7xIYA2IpPKgjX09RujOT6GIpgoRr3YfZN7NBaQpGcuKp2UdZ xdGMO95fxsEuQRcy0nr72o5NzsvoXOZkEmJMiBz/mKgtxZeTPU1zzfUxI MHNpTPmMq9loLgoIgqT0GQBEKYgo9Y4OPD4fE8HAnC5q3lVQPlXdDA0a4 +dEaAdnUss335/gnfNiJmETFYar4IeCkW11P3iLEn86j18a+1yC2FEo8x Q==; X-CSE-ConnectionGUID: KCW3s/rfRY+rRHFBscAHcA== X-CSE-MsgGUID: gRB4Oht6SUOIFTMDIQErfg== X-IronPort-AV: E=McAfee;i="6700,10204,11220"; a="27675769" X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="27675769" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2024 08:29:07 -0700 X-CSE-ConnectionGUID: twTZ2VspRM2xMsYfNNpIKg== X-CSE-MsgGUID: mNJLdASlRLqeKSCA9IgeKA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="81305970" Received: from newjersey.igk.intel.com ([10.102.20.203]) by orviesa004.jf.intel.com with ESMTP; 09 Oct 2024 08:29:03 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Stanislav Fomichev , Magnus Karlsson , nex.sw.ncis.osdt.itp.upstreaming@intel.com, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next 08/18] page_pool: make page_pool_put_page_bulk() actually handle array of pages Date: Wed, 9 Oct 2024 17:27:46 +0200 Message-ID: <20241009152756.3113697-9-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.2 In-Reply-To: <20241009152756.3113697-1-aleksander.lobakin@intel.com> References: <20241009152756.3113697-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently, page_pool_put_page_bulk() indeed takes an array of pointers to the data, not pages, despite the name. As one side effect, when you're freeing frags from &skb_shared_info, xdp_return_frame_bulk() converts page pointers to virtual addresses and then page_pool_put_page_bulk() converts them back. Make page_pool_put_page_bulk() actually handle array of pages. Pass frags directly and use virt_to_page() when freeing xdpf->data, so that the PP core will then get the compound head and take care of the rest. Signed-off-by: Alexander Lobakin --- include/net/page_pool/types.h | 8 ++++---- include/net/xdp.h | 2 +- net/core/page_pool.c | 6 +++--- net/core/xdp.c | 4 ++-- 4 files changed, 10 insertions(+), 10 deletions(-) diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h index c022c410abe3..6c1be99a5959 100644 --- a/include/net/page_pool/types.h +++ b/include/net/page_pool/types.h @@ -259,8 +259,8 @@ void page_pool_disable_direct_recycling(struct page_poo= l *pool); void page_pool_destroy(struct page_pool *pool); void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void= *), const struct xdp_mem_info *mem); -void page_pool_put_page_bulk(struct page_pool *pool, void **data, - int count); +void page_pool_put_page_bulk(struct page_pool *pool, struct page **data, + u32 count); #else static inline void page_pool_destroy(struct page_pool *pool) { @@ -272,8 +272,8 @@ static inline void page_pool_use_xdp_mem(struct page_po= ol *pool, { } =20 -static inline void page_pool_put_page_bulk(struct page_pool *pool, void **= data, - int count) +static inline void page_pool_put_page_bulk(struct page_pool *pool, + struct page **data, u32 count) { } #endif diff --git a/include/net/xdp.h b/include/net/xdp.h index fae6305e2123..f99fbe9bf5a5 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -194,7 +194,7 @@ xdp_frame_is_frag_pfmemalloc(const struct xdp_frame *fr= ame) struct xdp_frame_bulk { int count; void *xa; - void *q[XDP_BULK_QUEUE_SIZE]; + struct page *q[XDP_BULK_QUEUE_SIZE]; }; =20 static __always_inline void xdp_frame_bulk_init(struct xdp_frame_bulk *bq) diff --git a/net/core/page_pool.c b/net/core/page_pool.c index a813d30d2135..ad219206ee8d 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -854,8 +854,8 @@ EXPORT_SYMBOL(page_pool_put_unrefed_page); * Please note the caller must not use data area after running * page_pool_put_page_bulk(), as this function overwrites it. */ -void page_pool_put_page_bulk(struct page_pool *pool, void **data, - int count) +void page_pool_put_page_bulk(struct page_pool *pool, struct page **data, + u32 count) { int i, bulk_len =3D 0; bool allow_direct; @@ -864,7 +864,7 @@ void page_pool_put_page_bulk(struct page_pool *pool, vo= id **data, allow_direct =3D page_pool_napi_local(pool); =20 for (i =3D 0; i < count; i++) { - netmem_ref netmem =3D page_to_netmem(virt_to_head_page(data[i])); + netmem_ref netmem =3D page_to_netmem(compound_head(data[i])); =20 /* It is not the last user for the page frag case */ if (!page_pool_is_last_ref(netmem)) diff --git a/net/core/xdp.c b/net/core/xdp.c index 72d2bd22bc40..4b2ff5a280bb 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -556,12 +556,12 @@ void xdp_return_frame_bulk(struct xdp_frame *xdpf, for (i =3D 0; i < sinfo->nr_frags; i++) { skb_frag_t *frag =3D &sinfo->frags[i]; =20 - bq->q[bq->count++] =3D skb_frag_address(frag); + bq->q[bq->count++] =3D skb_frag_page(frag); if (bq->count =3D=3D XDP_BULK_QUEUE_SIZE) xdp_flush_frame_bulk(bq); } } - bq->q[bq->count++] =3D xdpf->data; + bq->q[bq->count++] =3D virt_to_page(xdpf->data); } EXPORT_SYMBOL_GPL(xdp_return_frame_bulk); =20 --=20 2.46.2 From nobody Wed Nov 27 14:28:30 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 82D0F1E5713; Wed, 9 Oct 2024 15:29:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487753; cv=none; b=obXt2aM+ZFf3bAJfe6eCRct8KwCFxBXkvBzmSunZM6pkJw4ej+Ji8klxA1+XoTFyaEe28ynrKPYEsCN+4GhdlW13JCzSXc4atxvnlV6Q83xYzRnfWONLDjEHd7gByah+FGFNlIbqp0VqLRQhaokpxEUGgkSqeGbiz9gpfn3droM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487753; c=relaxed/simple; bh=dKSd1kPDw+yeIAzbWu7UjF1bVJ2WCtTpGkqUxz/8jlI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iSUo0bKGJCpMIv+n77jZ6WIJiDjMTANWnzRH6frbGPJpyURBfeXcgBX/TEkKMlPuji7WJjyNE8QR1R3unZYfd+58dAKhFxjMryHq2yKegix+KKfOGEUyYJWL7srDHiq/6wwv6rqr6TOyHJrSjtKLMZL6hgGbpCv/3ov382hd+AY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=fvNj4M9z; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="fvNj4M9z" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1728487751; x=1760023751; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dKSd1kPDw+yeIAzbWu7UjF1bVJ2WCtTpGkqUxz/8jlI=; b=fvNj4M9zzZzgJ1ru4WHBpdRzWPjzV9XyDYDuXdaynYTdjFW3W/nZTrv4 Npch+OeD1LUVDIIUHnW/YNHARr4rq5ckKfCr8gXsNX8Z8kTQgdOTfotwh iBsJ+FYAdmS2a8D2xBLvqri+woXfk4wUQs6Y6h9FbsYMeZnc2ondGCyOe kwP+123CIN9YkoYO5f8zY9tU2RQFFxlle7eEJKCioPR+hT4qRQkTQDeJm qaHLrHfpuDvEQfdpPDfqhfV7+P/7x+SzLbhlW2/Hoa8v2m5Opu2hbVelj VfSpjN6qACikTrpVbg0eECn663YTblhQa/klj4RForYPC8AJnVIkdQvoi w==; X-CSE-ConnectionGUID: UkUWQvptTX+rFjsq+q17dQ== X-CSE-MsgGUID: Mi0yBwAAQvuyUmp8neC6/g== X-IronPort-AV: E=McAfee;i="6700,10204,11220"; a="27675782" X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="27675782" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2024 08:29:11 -0700 X-CSE-ConnectionGUID: +Zm6UkXyRFGO8j4Q0KLQaw== X-CSE-MsgGUID: q3AE64BTR0KLs8eDFNu98Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="81305983" Received: from newjersey.igk.intel.com ([10.102.20.203]) by orviesa004.jf.intel.com with ESMTP; 09 Oct 2024 08:29:07 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Stanislav Fomichev , Magnus Karlsson , nex.sw.ncis.osdt.itp.upstreaming@intel.com, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next 09/18] page_pool: allow mixing PPs within one bulk Date: Wed, 9 Oct 2024 17:27:47 +0200 Message-ID: <20241009152756.3113697-10-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.2 In-Reply-To: <20241009152756.3113697-1-aleksander.lobakin@intel.com> References: <20241009152756.3113697-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The main reason for this change was to allow mixing pages from different &page_pools within one &xdp_buff/&xdp_frame. Why not? Adjust xdp_return_frame_bulk() and page_pool_put_page_bulk(), so that they won't be tied to a particular pool. Let the latter create a separate bulk of pages which's PP is different and flush it recursively. This greatly optimizes xdp_return_frame_bulk(): no more hashtable lookups. Also make xdp_flush_frame_bulk() inline, as it's just one if + function call + one u32 read, not worth extending the call ladder. Signed-off-by: Alexander Lobakin --- include/net/page_pool/types.h | 7 +++-- include/net/xdp.h | 16 ++++++++--- net/core/page_pool.c | 50 ++++++++++++++++++++++++++++++----- net/core/xdp.c | 29 +------------------- 4 files changed, 59 insertions(+), 43 deletions(-) diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h index 6c1be99a5959..9a01ce864afa 100644 --- a/include/net/page_pool/types.h +++ b/include/net/page_pool/types.h @@ -259,8 +259,7 @@ void page_pool_disable_direct_recycling(struct page_poo= l *pool); void page_pool_destroy(struct page_pool *pool); void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void= *), const struct xdp_mem_info *mem); -void page_pool_put_page_bulk(struct page_pool *pool, struct page **data, - u32 count); +void page_pool_put_page_bulk(struct page **data, u32 count, bool rec); #else static inline void page_pool_destroy(struct page_pool *pool) { @@ -272,8 +271,8 @@ static inline void page_pool_use_xdp_mem(struct page_po= ol *pool, { } =20 -static inline void page_pool_put_page_bulk(struct page_pool *pool, - struct page **data, u32 count) +static inline void page_pool_put_page_bulk(struct page **data, u32 count, + bool rec) { } #endif diff --git a/include/net/xdp.h b/include/net/xdp.h index f99fbe9bf5a5..38f27b8b9efd 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -11,6 +11,8 @@ #include #include /* skb_shared_info */ =20 +#include + /** * DOC: XDP RX-queue information * @@ -193,14 +195,12 @@ xdp_frame_is_frag_pfmemalloc(const struct xdp_frame *= frame) #define XDP_BULK_QUEUE_SIZE 16 struct xdp_frame_bulk { int count; - void *xa; struct page *q[XDP_BULK_QUEUE_SIZE]; }; =20 static __always_inline void xdp_frame_bulk_init(struct xdp_frame_bulk *bq) { - /* bq->count will be zero'ed when bq->xa gets updated */ - bq->xa =3D NULL; + bq->count =3D 0; } =20 static inline struct skb_shared_info * @@ -316,10 +316,18 @@ void __xdp_return(void *data, struct xdp_mem_info *me= m, bool napi_direct, void xdp_return_frame(struct xdp_frame *xdpf); void xdp_return_frame_rx_napi(struct xdp_frame *xdpf); void xdp_return_buff(struct xdp_buff *xdp); -void xdp_flush_frame_bulk(struct xdp_frame_bulk *bq); void xdp_return_frame_bulk(struct xdp_frame *xdpf, struct xdp_frame_bulk *bq); =20 +static inline void xdp_flush_frame_bulk(struct xdp_frame_bulk *bq) +{ + if (unlikely(!bq->count)) + return; + + page_pool_put_page_bulk(bq->q, bq->count, false); + bq->count =3D 0; +} + static __always_inline unsigned int xdp_get_frame_len(const struct xdp_frame *xdpf) { diff --git a/net/core/page_pool.c b/net/core/page_pool.c index ad219206ee8d..22b44f86dfa0 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -839,11 +839,22 @@ void page_pool_put_unrefed_page(struct page_pool *poo= l, struct page *page, } EXPORT_SYMBOL(page_pool_put_unrefed_page); =20 +static void page_pool_bulk_rec_add(struct xdp_frame_bulk *bulk, + struct page *page) +{ + bulk->q[bulk->count++] =3D page; + + if (unlikely(bulk->count =3D=3D ARRAY_SIZE(bulk->q))) { + page_pool_put_page_bulk(bulk->q, ARRAY_SIZE(bulk->q), true); + bulk->count =3D 0; + } +} + /** * page_pool_put_page_bulk() - release references on multiple pages - * @pool: pool from which pages were allocated * @data: array holding page pointers * @count: number of pages in @data + * @rec: whether it's called recursively by itself * * Tries to refill a number of pages into the ptr_ring cache holding ptr_r= ing * producer lock. If the ptr_ring is full, page_pool_put_page_bulk() @@ -854,21 +865,43 @@ EXPORT_SYMBOL(page_pool_put_unrefed_page); * Please note the caller must not use data area after running * page_pool_put_page_bulk(), as this function overwrites it. */ -void page_pool_put_page_bulk(struct page_pool *pool, struct page **data, - u32 count) +void page_pool_put_page_bulk(struct page **data, u32 count, bool rec) { + struct page_pool *pool =3D NULL; + struct xdp_frame_bulk sub; int i, bulk_len =3D 0; bool allow_direct; bool in_softirq; =20 - allow_direct =3D page_pool_napi_local(pool); + xdp_frame_bulk_init(&sub); =20 for (i =3D 0; i < count; i++) { - netmem_ref netmem =3D page_to_netmem(compound_head(data[i])); + struct page *page; + netmem_ref netmem; + + if (!rec) { + page =3D compound_head(data[i]); + netmem =3D page_to_netmem(page); =20 - /* It is not the last user for the page frag case */ - if (!page_pool_is_last_ref(netmem)) + /* It is not the last user for the page frag case */ + if (!page_pool_is_last_ref(netmem)) + continue; + } else { + page =3D data[i]; + netmem =3D page_to_netmem(page); + } + + if (unlikely(!pool)) { + pool =3D page->pp; + allow_direct =3D page_pool_napi_local(pool); + } else if (page->pp !=3D pool) { + /* + * If the page belongs to a different page_pool, save + * it to handle recursively after the main loop. + */ + page_pool_bulk_rec_add(&sub, page); continue; + } =20 netmem =3D __page_pool_put_page(pool, netmem, -1, allow_direct); /* Approved for bulk recycling in ptr_ring cache */ @@ -876,6 +909,9 @@ void page_pool_put_page_bulk(struct page_pool *pool, st= ruct page **data, data[bulk_len++] =3D (__force void *)netmem; } =20 + if (sub.count) + page_pool_put_page_bulk(sub.q, sub.count, true); + if (!bulk_len) return; =20 diff --git a/net/core/xdp.c b/net/core/xdp.c index 4b2ff5a280bb..60a2d4122606 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -508,46 +508,19 @@ EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi); * xdp_frame_bulk is usually stored/allocated on the function * call-stack to avoid locking penalties. */ -void xdp_flush_frame_bulk(struct xdp_frame_bulk *bq) -{ - struct xdp_mem_allocator *xa =3D bq->xa; - - if (unlikely(!xa || !bq->count)) - return; - - page_pool_put_page_bulk(xa->page_pool, bq->q, bq->count); - /* bq->xa is not cleared to save lookup, if mem.id same in next bulk */ - bq->count =3D 0; -} -EXPORT_SYMBOL_GPL(xdp_flush_frame_bulk); =20 /* Must be called with rcu_read_lock held */ void xdp_return_frame_bulk(struct xdp_frame *xdpf, struct xdp_frame_bulk *bq) { - struct xdp_mem_info *mem =3D &xdpf->mem; - struct xdp_mem_allocator *xa; - - if (mem->type !=3D MEM_TYPE_PAGE_POOL) { + if (xdpf->mem.type !=3D MEM_TYPE_PAGE_POOL) { xdp_return_frame(xdpf); return; } =20 - xa =3D bq->xa; - if (unlikely(!xa)) { - xa =3D rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params); - bq->count =3D 0; - bq->xa =3D xa; - } - if (bq->count =3D=3D XDP_BULK_QUEUE_SIZE) xdp_flush_frame_bulk(bq); =20 - if (unlikely(mem->id !=3D xa->mem.id)) { - xdp_flush_frame_bulk(bq); - bq->xa =3D rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params); - } - if (unlikely(xdp_frame_has_frags(xdpf))) { struct skb_shared_info *sinfo; int i; --=20 2.46.2 From nobody Wed Nov 27 14:28:30 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 67C671E573D; Wed, 9 Oct 2024 15:29:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487757; cv=none; b=Emb9jIpwxv/icpEfZSMRoaX08gGsR3csrhP+OeawYYNZOyqEg8pitch5XaF85rKUIj96H03l4nae+b8+Rm1gdzBizZuipwu3qc+NiO8/vBIstWh0D/og3b5TnMXE1fPUDfl5090+GFU5u7eCf+54xzi6QvFKeE1EXagQKzNA3KA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487757; c=relaxed/simple; bh=oSfvMapObShEERCQLJM1tV+z6CBiiYZbJfEtnBwTyCA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=m5UhYjMarjvKUZjX59oYOqyF0ZV9k9E/BJEwlj+snSE+rdI5Br5gik50lYZxS6frIOJme7XBQpqsKK7P7n0mYu3YO80O4rXQfastB0UW4HI5dtKbjIYqKfe/zC8FRE/2AZ31IdA9Lrq+F8Trq5prNLmgxLY5QPDG7EX6bMMrJAg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=BrZZl3Y6; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="BrZZl3Y6" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1728487755; x=1760023755; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=oSfvMapObShEERCQLJM1tV+z6CBiiYZbJfEtnBwTyCA=; b=BrZZl3Y6jcF/OBkfA1srtgL6rwZ5NmFYubr/u/thum+4kK5hqlod2Olo vcpyvqIwwyiqgxdLAV2EbuM30ne4n+UkVdDnC/Igo7n4s9H0d7qx1/bRb icoLRraiGn28OlQQxKPnqdTOUpzFq17NEPuZ4i7JyTAduPSbILDesILd2 mwJEtMRFfoQCdeJSyE01+6994hK42mUeDPZxpe706dFrfhPQaE3Dc32BK PGx4kBsO8Y8bAMAdiEdVq4iqdRrUb9hca+YzJGKPgRk/6dFLLAHBullNP eQAfLqBUxPt2wozIQ/W3geV+Xp5PMfhiw40h9flGNJd1vdbLmap3TG8Fn g==; X-CSE-ConnectionGUID: MaL5j5/zT/+Xqvuz6MMcAw== X-CSE-MsgGUID: V0CEhd39T8ShsHKao4S97A== X-IronPort-AV: E=McAfee;i="6700,10204,11220"; a="27675792" X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="27675792" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2024 08:29:15 -0700 X-CSE-ConnectionGUID: mYmqItAiSFCTjTrqF1Y43A== X-CSE-MsgGUID: giTjJ+bgS5WRHMc6JO2KIA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="81305991" Received: from newjersey.igk.intel.com ([10.102.20.203]) by orviesa004.jf.intel.com with ESMTP; 09 Oct 2024 08:29:11 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Stanislav Fomichev , Magnus Karlsson , nex.sw.ncis.osdt.itp.upstreaming@intel.com, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next 10/18] xdp: get rid of xdp_frame::mem.id Date: Wed, 9 Oct 2024 17:27:48 +0200 Message-ID: <20241009152756.3113697-11-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.2 In-Reply-To: <20241009152756.3113697-1-aleksander.lobakin@intel.com> References: <20241009152756.3113697-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Initially, xdp_frame::mem.id was used to search for the corresponding &page_pool to return the page correctly. However, after that struct page now contains a direct pointer to its PP, further keeping of this field makes no sense. xdp_return_frame_bulk() still uses it to do a lookup, but this is rather a leftover. Remove xdp_frame::mem and replace it with ::mem_type, as only memory type still matters and we need to know it to be able to free the frame correctly. As a cute side effect, we can now make every scalar field in &xdp_frame of 4 byte width, speeding up accesses to them. Signed-off-by: Alexander Lobakin --- include/net/xdp.h | 14 +++++----- .../net/ethernet/freescale/dpaa/dpaa_eth.c | 2 +- drivers/net/veth.c | 4 +-- kernel/bpf/cpumap.c | 2 +- net/bpf/test_run.c | 2 +- net/core/filter.c | 12 ++++---- net/core/xdp.c | 28 +++++++++---------- 7 files changed, 32 insertions(+), 32 deletions(-) diff --git a/include/net/xdp.h b/include/net/xdp.h index 38f27b8b9efd..334814d377eb 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -169,13 +169,13 @@ xdp_get_buff_len(const struct xdp_buff *xdp) =20 struct xdp_frame { void *data; - u16 len; - u16 headroom; + u32 len; + u32 headroom; u32 metasize; /* uses lower 8-bits */ /* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time, - * while mem info is valid on remote CPU. + * while mem_type is valid on remote CPU. */ - struct xdp_mem_info mem; + enum xdp_mem_type mem_type:32; struct net_device *dev_rx; /* used by cpumap */ u32 frame_sz; u32 flags; /* supported values defined in xdp_buff_flags */ @@ -305,13 +305,13 @@ struct xdp_frame *xdp_convert_buff_to_frame(struct xd= p_buff *xdp) if (unlikely(xdp_update_frame_from_buff(xdp, xdp_frame) < 0)) return NULL; =20 - /* rxq only valid until napi_schedule ends, convert to xdp_mem_info */ - xdp_frame->mem =3D xdp->rxq->mem; + /* rxq only valid until napi_schedule ends, convert to xdp_mem_type */ + xdp_frame->mem_type =3D xdp->rxq->mem.type; =20 return xdp_frame; } =20 -void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct, +void __xdp_return(void *data, enum xdp_mem_type mem_type, bool napi_direct, struct xdp_buff *xdp); void xdp_return_frame(struct xdp_frame *xdpf); void xdp_return_frame_rx_napi(struct xdp_frame *xdpf); diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/e= thernet/freescale/dpaa/dpaa_eth.c index ac06b01fe934..da17ff573c81 100644 --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c @@ -2275,7 +2275,7 @@ static int dpaa_a050385_wa_xdpf(struct dpaa_priv *pri= v, new_xdpf->len =3D xdpf->len; new_xdpf->headroom =3D priv->tx_headroom; new_xdpf->frame_sz =3D DPAA_BP_RAW_SIZE; - new_xdpf->mem.type =3D MEM_TYPE_PAGE_ORDER0; + new_xdpf->mem_type =3D MEM_TYPE_PAGE_ORDER0; =20 /* Release the initial buffer */ xdp_return_frame_rx_napi(xdpf); diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 774f226666c8..674bba50950d 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -634,7 +634,7 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_r= q *rq, break; case XDP_TX: orig_frame =3D *frame; - xdp->rxq->mem =3D frame->mem; + xdp->rxq->mem.type =3D frame->mem_type; if (unlikely(veth_xdp_tx(rq, xdp, bq) < 0)) { trace_xdp_exception(rq->dev, xdp_prog, act); frame =3D &orig_frame; @@ -646,7 +646,7 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_r= q *rq, goto xdp_xmit; case XDP_REDIRECT: orig_frame =3D *frame; - xdp->rxq->mem =3D frame->mem; + xdp->rxq->mem.type =3D frame->mem_type; if (xdp_do_redirect(rq->dev, xdp, xdp_prog)) { frame =3D &orig_frame; stats->rx_drops++; diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index 992f4e30a589..c21be7bca1e6 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -207,7 +207,7 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_= entry *rcpu, int err; =20 rxq.dev =3D xdpf->dev_rx; - rxq.mem =3D xdpf->mem; + rxq.mem.type =3D xdpf->mem_type; /* TODO: report queue_index to xdp_rxq_info */ =20 xdp_convert_frame_to_buff(xdpf, &xdp); diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 6d7a442ceb89..eac959b04fa9 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -153,7 +153,7 @@ static void xdp_test_run_init_page(netmem_ref netmem, v= oid *arg) new_ctx->data =3D new_ctx->data_meta + meta_len; =20 xdp_update_frame_from_buff(new_ctx, frm); - frm->mem =3D new_ctx->rxq->mem; + frm->mem_type =3D new_ctx->rxq->mem.type; =20 memcpy(&head->orig_ctx, new_ctx, sizeof(head->orig_ctx)); } diff --git a/net/core/filter.c b/net/core/filter.c index be78a5cd4e8c..a1e6a3ac5320 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4120,13 +4120,13 @@ static int bpf_xdp_frags_increase_tail(struct xdp_b= uff *xdp, int offset) } =20 static void bpf_xdp_shrink_data_zc(struct xdp_buff *xdp, int shrink, - struct xdp_mem_info *mem_info, bool release) + enum xdp_mem_type mem_type, bool release) { struct xdp_buff *zc_frag =3D xsk_buff_get_tail(xdp); =20 if (release) { xsk_buff_del_tail(zc_frag); - __xdp_return(NULL, mem_info, false, zc_frag); + __xdp_return(NULL, mem_type, false, zc_frag); } else { zc_frag->data_end -=3D shrink; } @@ -4135,18 +4135,18 @@ static void bpf_xdp_shrink_data_zc(struct xdp_buff = *xdp, int shrink, static bool bpf_xdp_shrink_data(struct xdp_buff *xdp, skb_frag_t *frag, int shrink) { - struct xdp_mem_info *mem_info =3D &xdp->rxq->mem; + enum xdp_mem_type mem_type =3D xdp->rxq->mem.type; bool release =3D skb_frag_size(frag) =3D=3D shrink; =20 - if (mem_info->type =3D=3D MEM_TYPE_XSK_BUFF_POOL) { - bpf_xdp_shrink_data_zc(xdp, shrink, mem_info, release); + if (mem_type =3D=3D MEM_TYPE_XSK_BUFF_POOL) { + bpf_xdp_shrink_data_zc(xdp, shrink, mem_type, release); goto out; } =20 if (release) { struct page *page =3D skb_frag_page(frag); =20 - __xdp_return(page_address(page), mem_info, false, NULL); + __xdp_return(page_address(page), mem_type, false, NULL); } =20 out: diff --git a/net/core/xdp.c b/net/core/xdp.c index 60a2d4122606..63f4f418e4e1 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -427,12 +427,12 @@ EXPORT_SYMBOL_GPL(xdp_rxq_info_attach_page_pool); * is used for those calls sites. Thus, allowing for faster recycling * of xdp_frames/pages in those cases. */ -void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct, +void __xdp_return(void *data, enum xdp_mem_type mem_type, bool napi_direct, struct xdp_buff *xdp) { struct page *page; =20 - switch (mem->type) { + switch (mem_type) { case MEM_TYPE_PAGE_POOL: page =3D virt_to_head_page(data); if (napi_direct && xdp_return_frame_no_direct()) @@ -455,7 +455,7 @@ void __xdp_return(void *data, struct xdp_mem_info *mem,= bool napi_direct, break; default: /* Not possible, checked in xdp_rxq_info_reg_mem_model() */ - WARN(1, "Incorrect XDP memory type (%d) usage", mem->type); + WARN(1, "Incorrect XDP memory type (%d) usage", mem_type); break; } } @@ -472,10 +472,10 @@ void xdp_return_frame(struct xdp_frame *xdpf) for (i =3D 0; i < sinfo->nr_frags; i++) { struct page *page =3D skb_frag_page(&sinfo->frags[i]); =20 - __xdp_return(page_address(page), &xdpf->mem, false, NULL); + __xdp_return(page_address(page), xdpf->mem_type, false, NULL); } out: - __xdp_return(xdpf->data, &xdpf->mem, false, NULL); + __xdp_return(xdpf->data, xdpf->mem_type, false, NULL); } EXPORT_SYMBOL_GPL(xdp_return_frame); =20 @@ -491,10 +491,10 @@ void xdp_return_frame_rx_napi(struct xdp_frame *xdpf) for (i =3D 0; i < sinfo->nr_frags; i++) { struct page *page =3D skb_frag_page(&sinfo->frags[i]); =20 - __xdp_return(page_address(page), &xdpf->mem, true, NULL); + __xdp_return(page_address(page), xdpf->mem_type, true, NULL); } out: - __xdp_return(xdpf->data, &xdpf->mem, true, NULL); + __xdp_return(xdpf->data, xdpf->mem_type, true, NULL); } EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi); =20 @@ -513,7 +513,7 @@ EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi); void xdp_return_frame_bulk(struct xdp_frame *xdpf, struct xdp_frame_bulk *bq) { - if (xdpf->mem.type !=3D MEM_TYPE_PAGE_POOL) { + if (xdpf->mem_type !=3D MEM_TYPE_PAGE_POOL) { xdp_return_frame(xdpf); return; } @@ -550,10 +550,11 @@ void xdp_return_buff(struct xdp_buff *xdp) for (i =3D 0; i < sinfo->nr_frags; i++) { struct page *page =3D skb_frag_page(&sinfo->frags[i]); =20 - __xdp_return(page_address(page), &xdp->rxq->mem, true, xdp); + __xdp_return(page_address(page), xdp->rxq->mem.type, true, + xdp); } out: - __xdp_return(xdp->data, &xdp->rxq->mem, true, xdp); + __xdp_return(xdp->data, xdp->rxq->mem.type, true, xdp); } EXPORT_SYMBOL_GPL(xdp_return_buff); =20 @@ -599,7 +600,7 @@ struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xd= p_buff *xdp) xdpf->headroom =3D 0; xdpf->metasize =3D metasize; xdpf->frame_sz =3D PAGE_SIZE; - xdpf->mem.type =3D MEM_TYPE_PAGE_ORDER0; + xdpf->mem_type =3D MEM_TYPE_PAGE_ORDER0; =20 xsk_buff_free(xdp); return xdpf; @@ -659,7 +660,7 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_f= rame *xdpf, * - RX ring dev queue index (skb_record_rx_queue) */ =20 - if (xdpf->mem.type =3D=3D MEM_TYPE_PAGE_POOL) + if (xdpf->mem_type =3D=3D MEM_TYPE_PAGE_POOL) skb_mark_for_recycle(skb); =20 /* Allow SKB to reuse area used by xdp_frame */ @@ -706,8 +707,7 @@ struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf) nxdpf =3D addr; nxdpf->data =3D addr + headroom; nxdpf->frame_sz =3D PAGE_SIZE; - nxdpf->mem.type =3D MEM_TYPE_PAGE_ORDER0; - nxdpf->mem.id =3D 0; + nxdpf->mem_type =3D MEM_TYPE_PAGE_ORDER0; =20 return nxdpf; } --=20 2.46.2 From nobody Wed Nov 27 14:28:30 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 446881E7C18; Wed, 9 Oct 2024 15:29:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487760; cv=none; b=cdfOAKERM0rSeFqbbFMyhJzbgoQZsWM0Ghxs3XMx1V4P3Glr+qn61sXQytr9yZm7pzAP8zgl5clapFlKUpbr/nWmrWV50Rjg4iyZ3is8CRFJq+Jre+yIb6HF8ccsvn4mrowaPWGDhfvrQvQHvLLzOGKSESq3hMTNbwbj0QFf2Rk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487760; c=relaxed/simple; bh=6+0LCjXDQi34G/Xi89YQW0OtUb8pjFLGiOBHBVjRG0k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tOHzIyhoCioBclYacBP6HJwqjM1sH5BcTH/JLgCv5J6jxTc+3kDJeAxGM6jtDwUwYhka9LB0PZBNpqpn2q+YOVaXdT9vMIIR6NY0rmZSkhPP8X1iWAUp2zfT1Anq0u7ONNhIPC3oODo2L/h7z3s0ML1uk2lotnpNWUqthZcV8rs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=hPiBafNo; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="hPiBafNo" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1728487759; x=1760023759; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=6+0LCjXDQi34G/Xi89YQW0OtUb8pjFLGiOBHBVjRG0k=; b=hPiBafNoAcFlRTLwzFvLpFceUQltP6SasPW5G9E3BMs7y7y61iffXhBI l6VQZu4yHPps3dMxHjSISTWjfDxVb4GZgwRj4FCZBoHUET1KGd9Lbjx0W BbDTfERnf76L17mB6tvLy/BmB11+gNFQWz3+43w49LyRoCvSAMtw/X4C8 wQu2NJcJbJVm0LV3RER5/oD1v0jxXt5dAfIOls8k057+ksbZ7YjLZ2h8Y 66TCAyAitpXaTrrsbPZaPBdElyJKntBe6Gso9E4OxVt3rZxcPxmWzulLG 2TJI2wKMMIlCDUBvlGq5lLnCTehXZbam3l1lTqY+5NiqBykmQYElXdL95 A==; X-CSE-ConnectionGUID: IJZOqRbbSSWHHJ5Z+0AyOg== X-CSE-MsgGUID: jmi7oRcpTnOlpVGitD8Ovg== X-IronPort-AV: E=McAfee;i="6700,10204,11220"; a="27675806" X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="27675806" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2024 08:29:19 -0700 X-CSE-ConnectionGUID: iOu66YabQr2SFH093K+gug== X-CSE-MsgGUID: cieGurHJRbm2icDwAATy2g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="81306000" Received: from newjersey.igk.intel.com ([10.102.20.203]) by orviesa004.jf.intel.com with ESMTP; 09 Oct 2024 08:29:15 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Stanislav Fomichev , Magnus Karlsson , nex.sw.ncis.osdt.itp.upstreaming@intel.com, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next 11/18] xdp: add generic xdp_buff_add_frag() Date: Wed, 9 Oct 2024 17:27:49 +0200 Message-ID: <20241009152756.3113697-12-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.2 In-Reply-To: <20241009152756.3113697-1-aleksander.lobakin@intel.com> References: <20241009152756.3113697-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The code piece which would attach a frag to &xdp_buff is almost identical across the drivers supporting XDP multi-buffer on Rx. Make it a generic elegant onelner. Also, I see lots of drivers calculating frags_truesize as `xdp->frame_sz * nr_frags`. I can't say this is fully correct, since frags might be backed by chunks of different sizes, especially with stuff like the header split. Even page_pool_alloc() can give you two different truesizes on two subsequent requests to allocate the same buffer size. Add a field to &skb_shared_info (unionized as there's no free slot currently on x6_64) to track the "true" truesize. It can be used later when updating an skb. Signed-off-by: Alexander Lobakin --- include/linux/skbuff.h | 16 ++++++-- include/net/xdp.h | 90 +++++++++++++++++++++++++++++++++++++++++- 2 files changed, 101 insertions(+), 5 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 0f0266183d3e..696b267ce15b 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -607,11 +607,19 @@ struct skb_shared_info { * Warning : all fields before dataref are cleared in __alloc_skb() */ atomic_t dataref; - unsigned int xdp_frags_size; =20 - /* Intermediate layers must ensure that destructor_arg - * remains valid until skb destructor */ - void * destructor_arg; + union { + struct { + u32 xdp_frags_size; + u32 xdp_frags_truesize; + }; + + /* + * Intermediate layers must ensure that destructor_arg + * remains valid until skb destructor. + */ + void *destructor_arg; + }; =20 /* must be last field, see pskb_expand_head() */ skb_frag_t frags[MAX_SKB_FRAGS]; diff --git a/include/net/xdp.h b/include/net/xdp.h index 334814d377eb..ae92af8b782d 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -167,6 +167,88 @@ xdp_get_buff_len(const struct xdp_buff *xdp) return len; } =20 +/** + * __xdp_buff_add_frag - attach a frag to an &xdp_buff + * @xdp: XDP buffer to attach the frag to + * @page: page containing the frag + * @offset: page offset at which the frag starts + * @size: size of the frag + * @truesize: truesize (page / page frag size) of the frag + * @try_coalesce: whether to try coalescing the frags + * + * Attach a frag to an XDP buffer. If it currently has no frags attached, + * initialize the related fields, otherwise check that the frag number + * didn't reach the limit of ``MAX_SKB_FRAGS``. If possible, try coalescing + * the frag with the previous one. + * The function doesn't check/update the pfmemalloc bit. Please use the + * non-underscored wrapper in drivers. + * + * Return: true on success, false if there's no space for the frag in + * the shared info struct. + */ +static inline bool __xdp_buff_add_frag(struct xdp_buff *xdp, struct page *= page, + u32 offset, u32 size, u32 truesize, + bool try_coalesce) +{ + struct skb_shared_info *sinfo =3D xdp_get_shared_info_from_buff(xdp); + skb_frag_t *prev; + u32 nr_frags; + + if (!xdp_buff_has_frags(xdp)) { + xdp_buff_set_frags_flag(xdp); + + nr_frags =3D 0; + sinfo->xdp_frags_size =3D 0; + sinfo->xdp_frags_truesize =3D 0; + + goto fill; + } + + nr_frags =3D sinfo->nr_frags; + if (unlikely(nr_frags =3D=3D MAX_SKB_FRAGS)) + return false; + + prev =3D &sinfo->frags[nr_frags - 1]; + if (try_coalesce && page =3D=3D skb_frag_page(prev) && + offset =3D=3D skb_frag_off(prev) + skb_frag_size(prev)) + skb_frag_size_add(prev, size); + else +fill: + __skb_fill_page_desc_noacc(sinfo, nr_frags++, page, + offset, size); + + sinfo->nr_frags =3D nr_frags; + sinfo->xdp_frags_size +=3D size; + sinfo->xdp_frags_truesize +=3D truesize; + + return true; +} + +/** + * xdp_buff_add_frag - attach a frag to an &xdp_buff + * @xdp: XDP buffer to attach the frag to + * @page: page containing the frag + * @offset: page offset at which the frag starts + * @size: size of the frag + * @truesize: truesize (page / page frag size) of the frag + * + * Version of __xdp_buff_add_frag() which takes care of the pfmemalloc bit. + * + * Return: true on success, false if there's no space for the frag in + * the shared info struct. + */ +static inline bool xdp_buff_add_frag(struct xdp_buff *xdp, struct page *pa= ge, + u32 offset, u32 size, u32 truesize) +{ + if (!__xdp_buff_add_frag(xdp, page, offset, size, truesize, true)) + return false; + + if (unlikely(page_is_pfmemalloc(page))) + xdp_buff_set_frag_pfmemalloc(xdp); + + return true; +} + struct xdp_frame { void *data; u32 len; @@ -230,7 +312,13 @@ xdp_update_skb_shared_info(struct sk_buff *skb, u8 nr_= frags, unsigned int size, unsigned int truesize, bool pfmemalloc) { - skb_shinfo(skb)->nr_frags =3D nr_frags; + struct skb_shared_info *sinfo =3D skb_shinfo(skb); + + sinfo->nr_frags =3D nr_frags; + /* ``destructor_arg`` is unionized with ``xdp_frags_{,true}size``, + * reset it after that these fields aren't used anymore. + */ + sinfo->destructor_arg =3D NULL; =20 skb->len +=3D size; skb->data_len +=3D size; --=20 2.46.2 From nobody Wed Nov 27 14:28:30 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 28DC41E8832; Wed, 9 Oct 2024 15:29:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487764; cv=none; b=YVUV0U1UBkPJ6j3G0WO7yOXclM/hJgUIz3Ge9pN03e/MJeFbmuy62pqoBoJMdtYyMnqriGiqLhoiwNVwPM57afwP4DQ9x3Ffz+1fZyAW/j4g2iaEfruGP5gz0xNt5KoRaleAj53MbGAnjZi/zTn4jFIO7NMb7U6+Wtf4FdSI2c0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487764; c=relaxed/simple; bh=LCEFH9KlY6JQ9yCP9DuaP4kDqbezbMoCnG4tID+CZ0U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lKNp3fjXdXwPjDun6ao7FderTNHfN662vkj1F2p3XiVTnkJAQ2QlvHAaEh0iAxh8A/UV1W6OQ7eRduVSFJGgHm+wqflcPyupufiUyxwDJxa8HwGROD6Ey9npKOi5CSOOaspf63HBOboUQlaYeURSi9Tt8fUbvr1qGBYA0GiAJwg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=FWV8dWJK; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="FWV8dWJK" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1728487763; x=1760023763; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=LCEFH9KlY6JQ9yCP9DuaP4kDqbezbMoCnG4tID+CZ0U=; b=FWV8dWJKVfaXPJgjnNzz46HGS8F5FSurSqIW+1usc6hDHkT9IVaFZaBO tTV/3MNj6HleEZBaVsr/l8++ZPUZqazzdx6F+kpMHuV2YgchXWz20sqIt fbq23v+uu872PlchVDGqNUxP3ewIcsvzLAx9T1pVCwRuLxQyVwQo1FDgx QUvTqzU4Mw9tks6oVfX/TB4QeVPi3uvPBn/qeDRPb4LmQcvq0dcDHFpYj dmpiC6UZtZAaK7rT3vVP16QJugl6romJqN0ElGIUVE5X4BpFeVWXY60kp zZ33Pa2Imjf4WC3lc10Ol1Lt0bl3oRKtwYA8kTMDDydO/R6i+GU7AProE g==; X-CSE-ConnectionGUID: ++1Jp371TIWRwVqHA8QXDA== X-CSE-MsgGUID: mscUF9t6Twe6eKXc9dLYEA== X-IronPort-AV: E=McAfee;i="6700,10204,11220"; a="27675822" X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="27675822" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2024 08:29:22 -0700 X-CSE-ConnectionGUID: iq3on9AhSMSfrodyWfGVeQ== X-CSE-MsgGUID: H84xCq7rQsqxkQQqPq2Zkg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="81306011" Received: from newjersey.igk.intel.com ([10.102.20.203]) by orviesa004.jf.intel.com with ESMTP; 09 Oct 2024 08:29:19 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Stanislav Fomichev , Magnus Karlsson , nex.sw.ncis.osdt.itp.upstreaming@intel.com, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next 12/18] xdp: add generic xdp_build_skb_from_buff() Date: Wed, 9 Oct 2024 17:27:50 +0200 Message-ID: <20241009152756.3113697-13-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.2 In-Reply-To: <20241009152756.3113697-1-aleksander.lobakin@intel.com> References: <20241009152756.3113697-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The code which builds an skb from an &xdp_buff keeps multiplying itself around the drivers with almost no changes. Let's try to stop that by adding a generic function. There's __xdp_build_skb_from_frame() already, so just convert it to take &xdp_buff instead, while making the original one a wrapper. The original one always took an already allocated skb, allow both variants here -- if no skb passed, which is expected when calling from a driver, pick one via napi_build_skb(). Signed-off-by: Alexander Lobakin --- include/net/xdp.h | 1 + net/core/xdp.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 56 insertions(+) diff --git a/include/net/xdp.h b/include/net/xdp.h index ae92af8b782d..fa56570b15d7 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -330,6 +330,7 @@ xdp_update_skb_shared_info(struct sk_buff *skb, u8 nr_f= rags, void xdp_warn(const char *msg, const char *func, const int line); #define XDP_WARN(msg) xdp_warn(msg, __func__, __LINE__) =20 +struct sk_buff *xdp_build_skb_from_buff(const struct xdp_buff *xdp); struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp); struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, struct sk_buff *skb, diff --git a/net/core/xdp.c b/net/core/xdp.c index 63f4f418e4e1..e5395048a925 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -614,6 +614,61 @@ void xdp_warn(const char *msg, const char *func, const= int line) }; EXPORT_SYMBOL_GPL(xdp_warn); =20 +/** + * xdp_build_skb_from_buff - create an skb from an &xdp_buff + * @xdp: &xdp_buff to convert to an skb + * + * Perform common operations to create a new skb to pass up the stack from + * an &xdp_buff: allocate an skb head from the NAPI percpu cache, initiali= ze + * skb data pointers and offsets, set the recycle bit if the buff is PP-ba= cked, + * Rx queue index, protocol and update frags info. + * + * Return: new &sk_buff on success, %NULL on error. + */ +struct sk_buff *xdp_build_skb_from_buff(const struct xdp_buff *xdp) +{ + const struct xdp_rxq_info *rxq =3D xdp->rxq; + const struct skb_shared_info *sinfo; + struct sk_buff *skb; + u32 nr_frags =3D 0; + int metalen; + + if (unlikely(xdp_buff_has_frags(xdp))) { + sinfo =3D xdp_get_shared_info_from_buff(xdp); + nr_frags =3D sinfo->nr_frags; + } + + skb =3D napi_build_skb(xdp->data_hard_start, xdp->frame_sz); + if (unlikely(!skb)) + return NULL; + + skb_reserve(skb, xdp->data - xdp->data_hard_start); + __skb_put(skb, xdp->data_end - xdp->data); + + metalen =3D xdp->data - xdp->data_meta; + if (metalen > 0) + skb_metadata_set(skb, metalen); + + if (is_page_pool_compiled_in() && rxq->mem.type =3D=3D MEM_TYPE_PAGE_POOL) + skb_mark_for_recycle(skb); + + skb_record_rx_queue(skb, rxq->queue_index); + + if (unlikely(nr_frags)) { + u32 ts; + + ts =3D sinfo->xdp_frags_truesize ? : nr_frags * xdp->frame_sz; + xdp_update_skb_shared_info(skb, nr_frags, + sinfo->xdp_frags_size, ts, + xdp_buff_is_frag_pfmemalloc(xdp)); + } + + skb->protocol =3D eth_type_trans(skb, rxq->dev); + + return skb; +} +EXPORT_SYMBOL_GPL(xdp_build_skb_from_buff); + struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, struct sk_buff *skb, struct net_device *dev) --=20 2.46.2 From nobody Wed Nov 27 14:28:30 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B04121E9066; Wed, 9 Oct 2024 15:29:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487768; cv=none; b=ZjrtcNvjt8RjfAHNqY0UE96G2ldx7ECgz3H92J/klVI54t4xdNEMAZQ2egNzzzKXI6Ji795JX1EbPyTKcGf6JIvAg2AlmcYnlILB48ARDRRAqms3i4XPJkGaKmgR3L2qRTVaxdHb7qySBzrjNlJ5ezhid9AMCPtsUpDRAqccopY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487768; c=relaxed/simple; bh=+8ZSgV1rvwP712o96If3RIbWx7f/0uNCZVHC70PupoE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JwLnm4mWLhIl7u7hGfZPZ4aLHkPkTeZVNsgDqNF2Db+PJBNlGQS8LfYaEsRIvOZctmiOUaoq9ryeuYXpJtmpLigLHs3O38mi9Q0t2LTbKUtDMPqcR6gMG4YT9t/5bbtVirrGZUhgcMi0ed1LcldOIwqYBv9GkTKm1f1FEvpGfwA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=R/bJcfFj; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="R/bJcfFj" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1728487766; x=1760023766; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+8ZSgV1rvwP712o96If3RIbWx7f/0uNCZVHC70PupoE=; b=R/bJcfFjBzyk1YeBfNOlGNHiwwAJ3qLoH2X2Jl2rXBQwylkXPLrdr/h1 bQ3eLEpLTKM0nS8ZwrIoqQhkHxup4rGG3PChQAEdR8+Vq/QdCYTJDEjQc Vre/J2EZj00HD6b7tF4O2C8TADi+iJEIxypS7a33iZRCA1gy+1MEyHUcX gA+IKqX0d5qxEkGBnGgB7IPoUsgBmZhiupehKGecvpxB7Wk9EwQTifaAu jPf0YrueD9eYNdK5W/KN7GyL5PKGZn56Z5nf5YdWKSSh0gMzcjNy+3b8S rLtUS3heQXlZ2zc0o9+zpQ6uXbhcUCNiqFZ66UXad42vvA7tSiay2Gyru Q==; X-CSE-ConnectionGUID: 8aFUue/hTVCf7D9djwkc/w== X-CSE-MsgGUID: irV9IdMLTKmxnUxSCSgbQA== X-IronPort-AV: E=McAfee;i="6700,10204,11220"; a="27675833" X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="27675833" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2024 08:29:26 -0700 X-CSE-ConnectionGUID: KqNd4tgARaOvLDBKkqTTaQ== X-CSE-MsgGUID: 0jxynQ2tRy+cdxXusK/bdQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="81306025" Received: from newjersey.igk.intel.com ([10.102.20.203]) by orviesa004.jf.intel.com with ESMTP; 09 Oct 2024 08:29:23 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Stanislav Fomichev , Magnus Karlsson , nex.sw.ncis.osdt.itp.upstreaming@intel.com, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next 13/18] xsk: allow attaching XSk pool via xdp_rxq_info_reg_mem_model() Date: Wed, 9 Oct 2024 17:27:51 +0200 Message-ID: <20241009152756.3113697-14-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.2 In-Reply-To: <20241009152756.3113697-1-aleksander.lobakin@intel.com> References: <20241009152756.3113697-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When you register an XSk pool as XDP Rxq info memory model, you then need to manually attach it after the registration. Let the user combine both actions into one by just passing a pointer to the pool directly to xdp_rxq_info_reg_mem_model(), which will take care of calling xsk_pool_set_rxq_info(). This looks similar to how a &page_pool gets registered and reduce repeating driver code. Signed-off-by: Alexander Lobakin --- net/core/xdp.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/core/xdp.c b/net/core/xdp.c index e5395048a925..1cccc00510ff 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -358,6 +358,9 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp= _rxq, if (IS_ERR(xdp_alloc)) return PTR_ERR(xdp_alloc); =20 + if (type =3D=3D MEM_TYPE_XSK_BUFF_POOL && allocator) + xsk_pool_set_rxq_info(allocator, xdp_rxq); + if (trace_mem_connect_enabled() && xdp_alloc) trace_mem_connect(xdp_alloc, xdp_rxq); return 0; --=20 2.46.2 From nobody Wed Nov 27 14:28:30 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8C2441E882A; Wed, 9 Oct 2024 15:29:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487772; cv=none; b=U4st5w6/gtXMduoEFI9xQ88gbtPkwOqXjF3EspxOwrMWXlJh9GsP6jDr27UU5U8YMnVHrRXkYRs2B1OFMX/aj783R18vTZeMbZfAt+JLo6DMrcNQQgrReyAQGqykdOz4Wu/V+kd/wTwZWhP/jxY0+v8IsL/XLErw2eJO40I3fhk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487772; c=relaxed/simple; bh=A3IvlovBCj+ZqLG2paInQl3tTdqIc6I3Dl0C69ntKtM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=toL/X9fcHC1wYCrKl34ISXiUR2fYLfuKYuTb5BvIQ98KeamIZdSDa0+lvqwV2A/0cqDP2T2Vo1Ib1U1sDJTWN1EqirCLsICJJPwxESrymny3fMwXkcxus4/bax/JU8p3EoQ+DQYjgdikNNrKaptYwoUoevnOGm4Z115S9Obinq0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=L0SujqYZ; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="L0SujqYZ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1728487770; x=1760023770; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=A3IvlovBCj+ZqLG2paInQl3tTdqIc6I3Dl0C69ntKtM=; b=L0SujqYZmF0bGAaYB72LDbjuthehHiRuL6nN5ow5ZfeWyqCNwuPYpGU6 jN+LFO+RpwnzRXdILSUBbEehi0lcWjbYwc0XNgCVK1E79wBwq/MlK2igW EcbnkWrKKLdKtTE5d9o1ghjOMc5r/vTTJ5TiXDYgZLgMXSoYkAI8L0PK1 BAbxvZbDY44OmiKOZRSqSE1yvupFXo/f5p7gXpaf1hFXAMJ+87/KvQLvl YAr3P6B+7mFFxpUX5v1F0reHrYU9GKzKcj15N4RRnUUNTMAKjgDSgQXma KgC3FDHDd5j6n/BP5XJwB7FuMOCMZw6MLEi9yk12OE8QsbinUyF9IgEOe A==; X-CSE-ConnectionGUID: wa4ePh3xRxGyc005IfauIQ== X-CSE-MsgGUID: 2RhYq0BXRh68E2bL47KREw== X-IronPort-AV: E=McAfee;i="6700,10204,11220"; a="27675847" X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="27675847" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2024 08:29:30 -0700 X-CSE-ConnectionGUID: Kbht4UAWR3qoxA75h3H3gw== X-CSE-MsgGUID: 4X18SZlzSDmIGUHoS1Ictg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="81306048" Received: from newjersey.igk.intel.com ([10.102.20.203]) by orviesa004.jf.intel.com with ESMTP; 09 Oct 2024 08:29:26 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Stanislav Fomichev , Magnus Karlsson , nex.sw.ncis.osdt.itp.upstreaming@intel.com, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next 14/18] xsk: make xsk_buff_add_frag really add a frag via __xdp_buff_add_frag() Date: Wed, 9 Oct 2024 17:27:52 +0200 Message-ID: <20241009152756.3113697-15-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.2 In-Reply-To: <20241009152756.3113697-1-aleksander.lobakin@intel.com> References: <20241009152756.3113697-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently, xsk_buff_add_frag() only adds a frag to the pool linked list, not doing anything with the &xdp_buff. The drivers do that manually and the logic is the same. Make it really add an skb frag, just like xdp_buff_add_frag() does that, and freeing frags on error if needed. This allows to remove repeating code from i40e and ice and not add the same code again and again. Signed-off-by: Alexander Lobakin --- include/net/xdp_sock_drv.h | 18 ++++++++++-- drivers/net/ethernet/intel/i40e/i40e_xsk.c | 30 ++------------------ drivers/net/ethernet/intel/ice/ice_xsk.c | 32 ++-------------------- 3 files changed, 20 insertions(+), 60 deletions(-) diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index dcd469d25840..e6c08749c3d2 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -136,11 +136,21 @@ static inline void xsk_buff_free(struct xdp_buff *xdp) xp_free(xskb); } =20 -static inline void xsk_buff_add_frag(struct xdp_buff *xdp) +static inline bool xsk_buff_add_frag(struct xdp_buff *head, + struct xdp_buff *xdp) { - struct xdp_buff_xsk *frag =3D container_of(xdp, struct xdp_buff_xsk, xdp); + const void *data =3D xdp->data; + struct xdp_buff_xsk *frag; + + if (!__xdp_buff_add_frag(head, virt_to_page(data), + offset_in_page(data), xdp->data_end - data, + xdp->frame_sz, false)) + return false; =20 + frag =3D container_of(xdp, struct xdp_buff_xsk, xdp); list_add_tail(&frag->xskb_list_node, &frag->pool->xskb_list); + + return true; } =20 static inline struct xdp_buff *xsk_buff_get_frag(const struct xdp_buff *fi= rst) @@ -357,8 +367,10 @@ static inline void xsk_buff_free(struct xdp_buff *xdp) { } =20 -static inline void xsk_buff_add_frag(struct xdp_buff *xdp) +static inline bool xsk_buff_add_frag(struct xdp_buff *head, + struct xdp_buff *xdp) { + return false; } =20 static inline struct xdp_buff *xsk_buff_get_frag(const struct xdp_buff *fi= rst) diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ether= net/intel/i40e/i40e_xsk.c index 4e885df789ef..e28f1905a4a0 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c @@ -395,32 +395,6 @@ static void i40e_handle_xdp_result_zc(struct i40e_ring= *rx_ring, WARN_ON_ONCE(1); } =20 -static int -i40e_add_xsk_frag(struct i40e_ring *rx_ring, struct xdp_buff *first, - struct xdp_buff *xdp, const unsigned int size) -{ - struct skb_shared_info *sinfo =3D xdp_get_shared_info_from_buff(first); - - if (!xdp_buff_has_frags(first)) { - sinfo->nr_frags =3D 0; - sinfo->xdp_frags_size =3D 0; - xdp_buff_set_frags_flag(first); - } - - if (unlikely(sinfo->nr_frags =3D=3D MAX_SKB_FRAGS)) { - xsk_buff_free(first); - return -ENOMEM; - } - - __skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++, - virt_to_page(xdp->data_hard_start), - XDP_PACKET_HEADROOM, size); - sinfo->xdp_frags_size +=3D size; - xsk_buff_add_frag(xdp); - - return 0; -} - /** * i40e_clean_rx_irq_zc - Consumes Rx packets from the hardware ring * @rx_ring: Rx ring @@ -486,8 +460,10 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, in= t budget) =20 if (!first) first =3D bi; - else if (i40e_add_xsk_frag(rx_ring, first, bi, size)) + else if (!xsk_buff_add_frag(first, bi)) { + xsk_buff_free(first); break; + } =20 if (++next_to_process =3D=3D count) next_to_process =3D 0; diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/etherne= t/intel/ice/ice_xsk.c index 334ae945d640..8975d2971bc3 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -801,35 +801,6 @@ ice_run_xdp_zc(struct ice_rx_ring *rx_ring, struct xdp= _buff *xdp, return result; } =20 -static int -ice_add_xsk_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *first, - struct xdp_buff *xdp, const unsigned int size) -{ - struct skb_shared_info *sinfo =3D xdp_get_shared_info_from_buff(first); - - if (!size) - return 0; - - if (!xdp_buff_has_frags(first)) { - sinfo->nr_frags =3D 0; - sinfo->xdp_frags_size =3D 0; - xdp_buff_set_frags_flag(first); - } - - if (unlikely(sinfo->nr_frags =3D=3D MAX_SKB_FRAGS)) { - xsk_buff_free(first); - return -ENOMEM; - } - - __skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++, - virt_to_page(xdp->data_hard_start), - XDP_PACKET_HEADROOM, size); - sinfo->xdp_frags_size +=3D size; - xsk_buff_add_frag(xdp); - - return 0; -} - /** * ice_clean_rx_irq_zc - consumes packets from the hardware ring * @rx_ring: AF_XDP Rx ring @@ -895,7 +866,8 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, =20 if (!first) { first =3D xdp; - } else if (ice_add_xsk_frag(rx_ring, first, xdp, size)) { + } else if (likely(size) && !xsk_buff_add_frag(first, xdp)) { + xsk_buff_free(first); break; } =20 --=20 2.46.2 From nobody Wed Nov 27 14:28:30 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 756931EABB8; Wed, 9 Oct 2024 15:29:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487776; cv=none; b=sUU7j846cTMPlJhxIOTcxWyj+BrDbBPzpfbZ3XhehV1pc1F25+6HqTtF6cZI+ebRjSH7OKwXn+l5abyyft+m29xCIeyQnR1FIKQAZLBjileowDOm0jx+OzfwE7rxl6P9pvQ/Pkj0rXhIHDd4jWZBcGwzh/QkxsPFugMuRjBrcI0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487776; c=relaxed/simple; bh=c9Epu29DQfMwENkl6ShyPrYnk3vUAAY9w57CJA8U6ho=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tvG8rgrnyHfr2SoOED1M6TGTSNd7qUWtCKMzwyt2FnE4g8RCqCGbN+sAqK485sS/nI+Q8nMT1IAqSnH22iQOqoxewdoKUda9Ytl+y3YDmF7dHAVjX1AELaA/Upjpe72oeYTKEGD6QiStBq3c43OiU5/0Qy8Ss5RAK7LzZOilE10= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=fYY+2Mu5; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="fYY+2Mu5" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1728487774; x=1760023774; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=c9Epu29DQfMwENkl6ShyPrYnk3vUAAY9w57CJA8U6ho=; b=fYY+2Mu5O/vlaRdgWAWmEABAJeYl4x5vSNi+4IDnzPfBMtdzHz8QQgRY mjBHpYL9kOY5zToy5qNuK7FLnF5H20HgZ983DP8Xs8vHpegUmh0jt3c3+ /reNJ/lMqBfkeqSIGKHhZmU6JME4CuxLpUnKYqHMk4R/Du64lu6RWA3mV wSHJkrF2wQz1MKlNzVIDPmYQ3LmHkVwlwxitfD64I9iWGxJtq0Ri9fqMr jjfbE1krP3goWgQHIdjNSs5rXXBqWL5sM9artZgXVm/HTCBxYQDuTW1ax JlsAFlryeLjMWWtKoC+5S3xJlEB4UGFqjEheSFJMPFQPZSSkfRNLkWjL9 A==; X-CSE-ConnectionGUID: RQWfkqH9SASaEwGIqw7Qew== X-CSE-MsgGUID: c74OFU3tSMyqfjNG3tVibg== X-IronPort-AV: E=McAfee;i="6700,10204,11220"; a="27675857" X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="27675857" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2024 08:29:34 -0700 X-CSE-ConnectionGUID: 0QOVzSxqRZGXZSjgz5TSgg== X-CSE-MsgGUID: XyQw2exCT4e9mFoXfHSXng== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="81306069" Received: from newjersey.igk.intel.com ([10.102.20.203]) by orviesa004.jf.intel.com with ESMTP; 09 Oct 2024 08:29:30 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Stanislav Fomichev , Magnus Karlsson , nex.sw.ncis.osdt.itp.upstreaming@intel.com, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next 15/18] xsk: add generic XSk &xdp_buff -> skb conversion Date: Wed, 9 Oct 2024 17:27:53 +0200 Message-ID: <20241009152756.3113697-16-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.2 In-Reply-To: <20241009152756.3113697-1-aleksander.lobakin@intel.com> References: <20241009152756.3113697-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Same as with converting &xdp_buff to skb on Rx, the code which allocates a new skb and copies the XSk frame there is identical across the drivers, so make it generic. This includes copying all the frags if they are present in the original buff. System percpu Page Pools help here a lot: when available, allocate pages from there instead of the MM layer. This greatly improves XDP_PASS performance on XSk: instead of page_alloc() + page_free(), the net core recycles the same pages, so the only overhead left is memcpy()s. Note that the passed buff gets freed if the conversion is done w/o any error, assuming you don't need this buffer after you convert it to an skb. Signed-off-by: Alexander Lobakin --- include/net/xdp.h | 1 + net/core/xdp.c | 138 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 139 insertions(+) diff --git a/include/net/xdp.h b/include/net/xdp.h index fa56570b15d7..c8164ce87a89 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -331,6 +331,7 @@ void xdp_warn(const char *msg, const char *func, const = int line); #define XDP_WARN(msg) xdp_warn(msg, __func__, __LINE__) =20 struct sk_buff *xdp_build_skb_from_buff(const struct xdp_buff *xdp); +struct sk_buff *xdp_build_skb_from_zc(struct xdp_buff *xdp); struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp); struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, struct sk_buff *skb, diff --git a/net/core/xdp.c b/net/core/xdp.c index 1cccc00510ff..5f1b9e824310 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -22,6 +22,8 @@ #include #include =20 +#include "dev.h" + #define REG_STATE_NEW 0x0 #define REG_STATE_REGISTERED 0x1 #define REG_STATE_UNREGISTERED 0x2 @@ -672,6 +674,142 @@ struct sk_buff *xdp_build_skb_from_buff(const struct = xdp_buff *xdp) } EXPORT_SYMBOL_GPL(xdp_build_skb_from_buff); =20 +/** + * xdp_copy_frags_from_zc - copy the frags from an XSk buff to an skb + * @skb: skb to copy frags to + * @xdp: XSk &xdp_buff from which the frags will be copied + * @pp: &page_pool backing page allocation, if available + * + * Copy all frags from an XSk &xdp_buff to an skb to pass it up the stack. + * Allocate a new page / page frag for each frag, copy it and attach to + * the skb. + * + * Return: true on success, false on page allocation fail. + */ +static noinline bool xdp_copy_frags_from_zc(struct sk_buff *skb, + const struct xdp_buff *xdp, + struct page_pool *pp) +{ + const struct skb_shared_info *xinfo; + struct skb_shared_info *sinfo; + u32 nr_frags, ts; + + xinfo =3D xdp_get_shared_info_from_buff(xdp); + nr_frags =3D xinfo->nr_frags; + sinfo =3D skb_shinfo(skb); + +#if IS_ENABLED(CONFIG_PAGE_POOL) + ts =3D 0; +#else + ts =3D xinfo->xdp_frags_truesize ? : nr_frags * xdp->frame_sz; +#endif + + for (u32 i =3D 0; i < nr_frags; i++) { + u32 len =3D skb_frag_size(&xinfo->frags[i]); + void *data; +#if IS_ENABLED(CONFIG_PAGE_POOL) + u32 truesize =3D len; + + data =3D page_pool_dev_alloc_va(pp, &truesize); + ts +=3D truesize; +#else + data =3D napi_alloc_frag(len); +#endif + if (unlikely(!data)) + return false; + + memcpy(data, skb_frag_address(&xinfo->frags[i]), + LARGEST_ALIGN(len)); + __skb_fill_page_desc(skb, sinfo->nr_frags++, + virt_to_page(data), + offset_in_page(data), len); + } + + xdp_update_skb_shared_info(skb, nr_frags, xinfo->xdp_frags_size, + ts, false); + + return true; +} + +/** + * xdp_build_skb_from_zc - create an skb from an XSk &xdp_buff + * @xdp: source XSk buff + * + * Similar to xdp_build_skb_from_buff(), but for XSk frames. Allocate an s= kb + * head, new page for the head, copy the data and initialize the skb field= s. + * If there are frags, allocate new pages for them and copy. + * If Page Pool is available, the function allocates memory from the system + * percpu pools to try recycling the pages, otherwise it uses the NAPI page + * frag caches. + * If new skb was built successfully, @xdp is returned to XSk pool's freel= ist. + * On error, it remains untouched and the caller must take care of this. + * + * Return: new &sk_buff on success, %NULL on error. + */ +struct sk_buff *xdp_build_skb_from_zc(struct xdp_buff *xdp) +{ + const struct xdp_rxq_info *rxq =3D xdp->rxq; + u32 len =3D xdp->data_end - xdp->data_meta; + struct page_pool *pp; + struct sk_buff *skb; + int metalen; +#if IS_ENABLED(CONFIG_PAGE_POOL) + u32 truesize; + void *data; + + pp =3D this_cpu_read(system_page_pool); + truesize =3D xdp->frame_sz; + + data =3D page_pool_dev_alloc_va(pp, &truesize); + if (unlikely(!data)) + return NULL; + + skb =3D napi_build_skb(data, truesize); + if (unlikely(!skb)) { + page_pool_free_va(pp, data, true); + return NULL; + } + + skb_mark_for_recycle(skb); + skb_reserve(skb, xdp->data_meta - xdp->data_hard_start); +#else /* !CONFIG_PAGE_POOL */ + struct napi_struct *napi; + + pp =3D NULL; + napi =3D napi_by_id(rxq->napi_id); + if (likely(napi)) + skb =3D napi_alloc_skb(napi, len); + else + skb =3D __netdev_alloc_skb_ip_align(rxq->dev, len, + GFP_ATOMIC | __GFP_NOWARN); + if (unlikely(!skb)) + return NULL; +#endif /* !CONFIG_PAGE_POOL */ + + memcpy(__skb_put(skb, len), xdp->data_meta, LARGEST_ALIGN(len)); + + metalen =3D xdp->data - xdp->data_meta; + if (metalen > 0) { + skb_metadata_set(skb, metalen); + __skb_pull(skb, metalen); + } + + skb_record_rx_queue(skb, rxq->queue_index); + + if (unlikely(xdp_buff_has_frags(xdp)) && + unlikely(!xdp_copy_frags_from_zc(skb, xdp, pp))) { + napi_consume_skb(skb, true); + return NULL; + } + + xsk_buff_free(xdp); + + skb->protocol =3D eth_type_trans(skb, rxq->dev); + + return skb; +} +EXPORT_SYMBOL_GPL(xdp_build_skb_from_zc); + struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf, struct sk_buff *skb, struct net_device *dev) --=20 2.46.2 From nobody Wed Nov 27 14:28:30 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 49D001E105B; Wed, 9 Oct 2024 15:29:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487779; cv=none; b=ly4JL7dGsejnC6gXay7pIXFKld4ad19r88PCUX9EWFaazUopyrJJZmchItJnr9rM7OfJ1kzDlXKfUdumQEmfxGexGx9w91uJak3Rp7kDl26mmHVlB38HW94m/TxSQfw/qlbAnLosV6i3hutDQXFU/AVC5vATqgFlRU7EexiA7ro= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487779; c=relaxed/simple; bh=OaIP6Fbig8EboEUzMOctbHaWUtp2kGf6/6R4/oNtyIU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uDbXHuNh0wcKgxfolr5ooTpNRvFyKgXqJdlL84hcMCbWrxfXqFNkcYIXJKnHLq2qPVUYgK9onf7/ldY9CplVRY0rSTZjatMmyvce/gonSR8KKGL2aS9Tdybap1n/Ey1P4wBsgP9WBmwblantol9CYG8GSyTfol7/mZpc69b+Sl8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=XW0jZkg2; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="XW0jZkg2" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1728487778; x=1760023778; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OaIP6Fbig8EboEUzMOctbHaWUtp2kGf6/6R4/oNtyIU=; b=XW0jZkg2KM1xY3q7DYwESdw5ni+TkBxJHwM89hO0/JNwJQNZTFWDT8qS EYJWC64RePiH73ZH0lGf+lbP/m25B9xui8ilU1bQ8ZzwVlUeNdYjVlUrA ly/wK2AGc2MP2aEyqKShupPJ7rjgsyDl6R9cxl7bYUkAJ0qeL90A+fYl6 iPOXxfhAldxxpFsV1593hOvMevqSf++6DOHRoC5jAi7i/89QG8fNiMJyD 9gt2N056dPJ3mnL4JOco7L9z16O53prKnw9SSSP4ZqyY5aOf+3VXneoJM 5FBecY825hKe/5fozTTMP1+tHhdzS9yh7eQIiMjusnb+Ljaxr8T0lJReP Q==; X-CSE-ConnectionGUID: 2UPucyshQKatS1YTH58oNw== X-CSE-MsgGUID: WqyJH8llTjmb4l8fuSl/Qg== X-IronPort-AV: E=McAfee;i="6700,10204,11220"; a="27675868" X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="27675868" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2024 08:29:38 -0700 X-CSE-ConnectionGUID: 89Mep0jVQuupKdbRtDwFuw== X-CSE-MsgGUID: XvVbAnt2TPqzmlycjX11bA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="81306076" Received: from newjersey.igk.intel.com ([10.102.20.203]) by orviesa004.jf.intel.com with ESMTP; 09 Oct 2024 08:29:34 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Stanislav Fomichev , Magnus Karlsson , nex.sw.ncis.osdt.itp.upstreaming@intel.com, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next 16/18] xsk: add helper to get &xdp_desc's DMA and meta pointer in one go Date: Wed, 9 Oct 2024 17:27:54 +0200 Message-ID: <20241009152756.3113697-17-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.2 In-Reply-To: <20241009152756.3113697-1-aleksander.lobakin@intel.com> References: <20241009152756.3113697-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently, when you send an XSk frame without metadata, you need to do the following: * call external xsk_buff_raw_get_dma(); * call inline xsk_buff_get_metadata(), which calls external xsk_buff_raw_get_data() and then do some inline checks. This effectively means that the following piece: addr =3D pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr; is done twice per frame, plus you have 2 external calls per frame, plus this: meta =3D pool->addrs + addr - pool->tx_metadata_len; if (unlikely(!xsk_buff_valid_tx_metadata(meta))) is always inlined, even if there's no meta or it's invalid. Add xsk_buff_raw_get_ctx() (xp_raw_get_ctx() to be precise) to do that in one go. It returns a small structure with 2 fields: DMA address, filled unconditionally, and metadata pointer, valid only if it's present. The address correction is performed only once and you also have only 1 external call per XSk frame, which does all the calculations and checks outside of your hotpath. You only need to check `if (ctx.meta)` for the metadata presence. Signed-off-by: Alexander Lobakin --- include/net/xdp_sock_drv.h | 23 +++++++++++++++++++++ include/net/xsk_buff_pool.h | 8 ++++++++ net/xdp/xsk_buff_pool.c | 40 +++++++++++++++++++++++++++++++++++++ 3 files changed, 71 insertions(+) diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index e6c08749c3d2..fbfe7bf658ba 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -205,6 +205,23 @@ static inline void *xsk_buff_raw_get_data(struct xsk_b= uff_pool *pool, u64 addr) return xp_raw_get_data(pool, addr); } =20 +/** + * xsk_buff_raw_get_ctx - get &xdp_desc context + * @pool: XSk buff pool desc address belongs to + * @addr: desc address (from userspace) + * + * Wrapper for xp_raw_get_ctx() to be used in drivers, see its kdoc for + * details. + * + * Return: new &xdp_desc_ctx struct containing desc's DMA address and meta= data + * pointer, if it is present and valid (initialized to %NULL otherwise). + */ +static inline struct xdp_desc_ctx +xsk_buff_raw_get_ctx(const struct xsk_buff_pool *pool, u64 addr) +{ + return xp_raw_get_ctx(pool, addr); +} + #define XDP_TXMD_FLAGS_VALID ( \ XDP_TXMD_FLAGS_TIMESTAMP | \ XDP_TXMD_FLAGS_CHECKSUM | \ @@ -402,6 +419,12 @@ static inline void *xsk_buff_raw_get_data(struct xsk_b= uff_pool *pool, u64 addr) return NULL; } =20 +static inline struct xdp_desc_ctx +xsk_buff_raw_get_ctx(const struct xsk_buff_pool *pool, u64 addr) +{ + return (struct xdp_desc_ctx){ }; +} + static inline bool xsk_buff_valid_tx_metadata(struct xsk_tx_metadata *meta) { return false; diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index 0442ba8dafa4..e50918b6283e 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -143,6 +143,14 @@ u32 xp_alloc_batch(struct xsk_buff_pool *pool, struct = xdp_buff **xdp, u32 max); bool xp_can_alloc(struct xsk_buff_pool *pool, u32 count); void *xp_raw_get_data(struct xsk_buff_pool *pool, u64 addr); dma_addr_t xp_raw_get_dma(struct xsk_buff_pool *pool, u64 addr); + +struct xdp_desc_ctx { + dma_addr_t dma; + struct xsk_tx_metadata *meta; +}; + +struct xdp_desc_ctx xp_raw_get_ctx(const struct xsk_buff_pool *pool, u64 a= ddr); + static inline dma_addr_t xp_get_dma(struct xdp_buff_xsk *xskb) { return xskb->dma; diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index 521a2938e50a..a0bb67a92d02 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -711,3 +711,43 @@ dma_addr_t xp_raw_get_dma(struct xsk_buff_pool *pool, = u64 addr) (addr & ~PAGE_MASK); } EXPORT_SYMBOL(xp_raw_get_dma); + +/** + * xp_raw_get_ctx - get &xdp_desc context + * @pool: XSk buff pool desc address belongs to + * @addr: desc address (from userspace) + * + * Helper for getting desc's DMA address and metadata pointer, if present. + * Saves one call on hotpath, double calculation of the actual address, + * and inline checks for metadata presence and sanity. + * Please use xsk_buff_raw_get_ctx() in drivers instead. + * + * Return: new &xdp_desc_ctx struct containing desc's DMA address and meta= data + * pointer, if it is present and valid (initialized to %NULL otherwise). + */ +struct xdp_desc_ctx xp_raw_get_ctx(const struct xsk_buff_pool *pool, u64 a= ddr) +{ + struct xsk_tx_metadata *meta; + struct xdp_desc_ctx ret; + + addr =3D pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr; + ret =3D (typeof(ret)){ + /* Same logic as in xp_raw_get_dma() */ + .dma =3D (pool->dma_pages[addr >> PAGE_SHIFT] & + ~XSK_NEXT_PG_CONTIG_MASK) + (addr & ~PAGE_MASK), + }; + + if (!pool->tx_metadata_len) + goto out; + + /* Same logic as in xp_raw_get_data() + xsk_buff_get_metadata() */ + meta =3D pool->addrs + addr - pool->tx_metadata_len; + if (unlikely(!xsk_buff_valid_tx_metadata(meta))) + goto out; + + ret.meta =3D meta; + +out: + return ret; +} +EXPORT_SYMBOL(xp_raw_get_ctx); --=20 2.46.2 From nobody Wed Nov 27 14:28:30 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 12AE91EBA1A; Wed, 9 Oct 2024 15:29:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487783; cv=none; b=pA7ljgrURpq0i0S9zw6mW4OV+U+xFgoXKxsoDQ7OFvZVex6lo67koGSWXnqq0sB/CJqMsTChZbgYD/8j7tentHOOnt+Qx5GPGtHpMgBOvnYjB9i6gKhVbph0xdWWT7u8SfFGioEZTs917sliPg25Yq1qZOq53vgG+GLFTr+YoOQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487783; c=relaxed/simple; bh=xP1quQVTfqWZfVS34xQWuGTXpXHRrBU1xEG+XB6ACGk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mmVb/2jYtULtW8mxprD5WsVPA6gMHAiPSchRqXTAYP2Xd03fKBkEUhyLp8A1W8ICuWDZ9VHd6KsVuDa0YgpaoeZcBiqU8YezAb7wCq19ySU7qILDeWDrlQmrO6Ldfs24QZ3ihlUnXbY+RhfdkGI6hPRbAJLDeb0qzbuubEs5YNQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=N3yfcYuh; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="N3yfcYuh" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1728487782; x=1760023782; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xP1quQVTfqWZfVS34xQWuGTXpXHRrBU1xEG+XB6ACGk=; b=N3yfcYuhB2WXXr8OQrpoSEeHb4hXKMoBmP5ujx8eXMSd5cGnJdTusvn9 ZxaUPNBzWAKBmdDed8Os2msSPH50GJkAKVPD5110jpcPU/3D2xjQu/UA5 WPHEazu+AsPzTfo5ykj+FWj4ZDzFf1yW4XRcF+aK4NwEyPbzdQhwLgS43 LxU7TGCWSBu7L3hiLob7qhuMXrVLQfP1JhFAfs31fl6mtueJLsozr3Hdh qL9ONjGKrYAICMRQBLmUZMuF7jqDmHTu/HSKhQIYseeURAhW9IMVg8CNt 1lzRh0UYJXhnyRgFhZTTJy7KUmgEhJmZd+96jm+bXjr8ZHVFZ6tY69Tih A==; X-CSE-ConnectionGUID: IyNzvMHJQ+a7KGUdEW+NqA== X-CSE-MsgGUID: vwYf/T/ySZaEQoPIFIzMKg== X-IronPort-AV: E=McAfee;i="6700,10204,11220"; a="27675884" X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="27675884" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2024 08:29:41 -0700 X-CSE-ConnectionGUID: ugq3Np68Q+W7mqKbwiUkow== X-CSE-MsgGUID: PMyglTsXTKqD0q7yMqJXWg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="81306088" Received: from newjersey.igk.intel.com ([10.102.20.203]) by orviesa004.jf.intel.com with ESMTP; 09 Oct 2024 08:29:38 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Stanislav Fomichev , Magnus Karlsson , nex.sw.ncis.osdt.itp.upstreaming@intel.com, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next 17/18] libeth: support native XDP and register memory model Date: Wed, 9 Oct 2024 17:27:55 +0200 Message-ID: <20241009152756.3113697-18-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.2 In-Reply-To: <20241009152756.3113697-1-aleksander.lobakin@intel.com> References: <20241009152756.3113697-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Expand libeth's Page Pool functionality by adding native XDP support. This means picking the appropriate headroom and DMA direction. Also, register all the created &page_pools as XDP memory models. A driver then can call xdp_rxq_info_attach_page_pool() when registering its RxQ info. Signed-off-by: Alexander Lobakin --- include/net/libeth/rx.h | 6 +++++- drivers/net/ethernet/intel/libeth/rx.c | 20 +++++++++++++++----- 2 files changed, 20 insertions(+), 6 deletions(-) diff --git a/include/net/libeth/rx.h b/include/net/libeth/rx.h index 43574bd6612f..148be5cd822e 100644 --- a/include/net/libeth/rx.h +++ b/include/net/libeth/rx.h @@ -13,8 +13,10 @@ =20 /* Space reserved in front of each frame */ #define LIBETH_SKB_HEADROOM (NET_SKB_PAD + NET_IP_ALIGN) +#define LIBETH_XDP_HEADROOM (ALIGN(XDP_PACKET_HEADROOM, NET_SKB_PAD) + \ + NET_IP_ALIGN) /* Maximum headroom for worst-case calculations */ -#define LIBETH_MAX_HEADROOM LIBETH_SKB_HEADROOM +#define LIBETH_MAX_HEADROOM LIBETH_XDP_HEADROOM /* Link layer / L2 overhead: Ethernet, 2 VLAN tags (C + S), FCS */ #define LIBETH_RX_LL_LEN (ETH_HLEN + 2 * VLAN_HLEN + ETH_FCS_LEN) /* Maximum supported L2-L4 header length */ @@ -66,6 +68,7 @@ enum libeth_fqe_type { * @count: number of descriptors/buffers the queue has * @type: type of the buffers this queue has * @hsplit: flag whether header split is enabled + * @xdp: flag indicating whether XDP is enabled * @buf_len: HW-writeable length per each buffer * @nid: ID of the closest NUMA node with memory */ @@ -81,6 +84,7 @@ struct libeth_fq { /* Cold fields */ enum libeth_fqe_type type:2; bool hsplit:1; + bool xdp:1; =20 u32 buf_len; int nid; diff --git a/drivers/net/ethernet/intel/libeth/rx.c b/drivers/net/ethernet/= intel/libeth/rx.c index f20926669318..616426a2e363 100644 --- a/drivers/net/ethernet/intel/libeth/rx.c +++ b/drivers/net/ethernet/intel/libeth/rx.c @@ -68,7 +68,7 @@ static u32 libeth_rx_hw_len_truesize(const struct page_po= ol_params *pp, static bool libeth_rx_page_pool_params(struct libeth_fq *fq, struct page_pool_params *pp) { - pp->offset =3D LIBETH_SKB_HEADROOM; + pp->offset =3D fq->xdp ? LIBETH_XDP_HEADROOM : LIBETH_SKB_HEADROOM; /* HW-writeable / syncable length per one page */ pp->max_len =3D LIBETH_RX_PAGE_LEN(pp->offset); =20 @@ -155,11 +155,12 @@ int libeth_rx_fq_create(struct libeth_fq *fq, struct = napi_struct *napi) .dev =3D napi->dev->dev.parent, .netdev =3D napi->dev, .napi =3D napi, - .dma_dir =3D DMA_FROM_DEVICE, }; struct libeth_fqe *fqes; struct page_pool *pool; - bool ret; + int ret; + + pp.dma_dir =3D fq->xdp ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE; =20 if (!fq->hsplit) ret =3D libeth_rx_page_pool_params(fq, &pp); @@ -173,18 +174,26 @@ int libeth_rx_fq_create(struct libeth_fq *fq, struct = napi_struct *napi) return PTR_ERR(pool); =20 fqes =3D kvcalloc_node(fq->count, sizeof(*fqes), GFP_KERNEL, fq->nid); - if (!fqes) + if (!fqes) { + ret =3D -ENOMEM; goto err_buf; + } + + ret =3D xdp_reg_page_pool(pool); + if (ret) + goto err_mem; =20 fq->fqes =3D fqes; fq->pp =3D pool; =20 return 0; =20 +err_mem: + kvfree(fqes); err_buf: page_pool_destroy(pool); =20 - return -ENOMEM; + return ret; } EXPORT_SYMBOL_NS_GPL(libeth_rx_fq_create, LIBETH); =20 @@ -194,6 +203,7 @@ EXPORT_SYMBOL_NS_GPL(libeth_rx_fq_create, LIBETH); */ void libeth_rx_fq_destroy(struct libeth_fq *fq) { + xdp_unreg_page_pool(fq->pp); kvfree(fq->fqes); page_pool_destroy(fq->pp); } --=20 2.46.2 From nobody Wed Nov 27 14:28:30 2024 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 68BB81EBA1A; Wed, 9 Oct 2024 15:29:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487792; cv=none; b=FNjlqUGTq5H5CN6LuaO/SKFMvXT5JKBhZ0xaAB2SDdCKq/IYWKI+TykLFAZqvYkIc8xiDLNDwirVpoHFpnYV9xK65t2HTnawAUzqGMO0Pyh7/ksAG+7ZcreFSqv1equVGNhEs+jTxx3+TiPLA583hrtePkkPASjRsXfX851nWeY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728487792; c=relaxed/simple; bh=SYEqw/3zL9WwssdbJR80XlHQSz4Dc4tcqow58TB3lG8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CDOOiWFkzjfMQFq3F95alvQOsERRn+/f/gTsd93uMs98jE+izrIKt7EjZaPyYx6J36+hxfiu45PIa1COPPk3EovAkWy13gqOenjWmstW5E9Bu3E9ZGt+hvZQxJBP89ETXYh4PLwKU75T00Y5X5q8Y8tlclQQ+G1eg0bK3w+Erdc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=hWCnOPe+; arc=none smtp.client-ip=192.198.163.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="hWCnOPe+" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1728487787; x=1760023787; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=SYEqw/3zL9WwssdbJR80XlHQSz4Dc4tcqow58TB3lG8=; b=hWCnOPe+aGIVpWUFnEbbza6YDyzrm9hyxmJMJ8fIQk5QOUHr7JtPJwH3 cW/V9lC78mv41dmaHww0FgJmDoIH8Kb6yLrV27cEa2CFc0/ktsMO+gw1X 47w+N9jbo4f5em13DaK/DRmoetpxQAFR8aDAcfnwOPvAYo2+XRICAIOFF +RYgEBFx+gYgFHGuSzchO9vaz5lsQuOafSsN3mjvJPtX6JRgpHw2m7zJA hlt+XrSxI0Etf1dZlrSVv0Xbk8uqCd3sboBAQTkYqsQ1HNS5KEACm/zlk +PTqrRYyO2gtJtRRDoRROVDdG7KG1XlSwL3wcNbbG8xfS/MkpTGYKaHMl A==; X-CSE-ConnectionGUID: IqXtO3rzQIqDrsewwDmiYg== X-CSE-MsgGUID: rcCWZ7MxTuqgYxJsNCw8DQ== X-IronPort-AV: E=McAfee;i="6700,10204,11220"; a="27675895" X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="27675895" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2024 08:29:46 -0700 X-CSE-ConnectionGUID: im9GMnquQNKM1/cX6Jgxww== X-CSE-MsgGUID: zvHhVYfIQ3eWJ+ZVtxS93g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,190,1725346800"; d="scan'208";a="81306115" Received: from newjersey.igk.intel.com ([10.102.20.203]) by orviesa004.jf.intel.com with ESMTP; 09 Oct 2024 08:29:42 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Stanislav Fomichev , Magnus Karlsson , nex.sw.ncis.osdt.itp.upstreaming@intel.com, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Maciej Fijalkowski Subject: [PATCH net-next 18/18] libeth: add a couple of XDP helpers (libeth_xdp) Date: Wed, 9 Oct 2024 17:27:56 +0200 Message-ID: <20241009152756.3113697-19-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.46.2 In-Reply-To: <20241009152756.3113697-1-aleksander.lobakin@intel.com> References: <20241009152756.3113697-1-aleksander.lobakin@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" "Couple" is a bit humbly... Add the following functionality to libeth: * XDP shared queues managing * XDP_TX bulk sending infra * .ndo_xdp_xmit() infra * adding buffers to &xdp_buff * running XDP prog and managing its verdict * completing XDP Tx buffers * ^ repeat everything for XSk Suggested-by: Maciej Fijalkowski # lots of s= tuff Signed-off-by: Alexander Lobakin --- drivers/net/ethernet/intel/libeth/Kconfig | 6 + drivers/net/ethernet/intel/libeth/Makefile | 6 + include/net/libeth/types.h | 102 +- drivers/net/ethernet/intel/libeth/priv.h | 37 + include/net/libeth/tx.h | 34 +- include/net/libeth/xdp.h | 1864 ++++++++++++++++++++ include/net/libeth/xsk.h | 684 +++++++ drivers/net/ethernet/intel/libeth/rx.c | 2 +- drivers/net/ethernet/intel/libeth/tx.c | 39 + drivers/net/ethernet/intel/libeth/xdp.c | 444 +++++ drivers/net/ethernet/intel/libeth/xsk.c | 264 +++ 11 files changed, 3478 insertions(+), 4 deletions(-) create mode 100644 drivers/net/ethernet/intel/libeth/priv.h create mode 100644 include/net/libeth/xdp.h create mode 100644 include/net/libeth/xsk.h create mode 100644 drivers/net/ethernet/intel/libeth/tx.c create mode 100644 drivers/net/ethernet/intel/libeth/xdp.c create mode 100644 drivers/net/ethernet/intel/libeth/xsk.c diff --git a/drivers/net/ethernet/intel/libeth/Kconfig b/drivers/net/ethern= et/intel/libeth/Kconfig index 480293b71dbc..80e99d044599 100644 --- a/drivers/net/ethernet/intel/libeth/Kconfig +++ b/drivers/net/ethernet/intel/libeth/Kconfig @@ -7,3 +7,9 @@ config LIBETH help libeth is a common library containing routines shared between several drivers, but not yet promoted to the generic kernel API. + +config LIBETH_XDP + tristate + select LIBETH + help + XDP and XSk helpers based on libeth hotpath management. diff --git a/drivers/net/ethernet/intel/libeth/Makefile b/drivers/net/ether= net/intel/libeth/Makefile index 52492b081132..8b01b2af54cf 100644 --- a/drivers/net/ethernet/intel/libeth/Makefile +++ b/drivers/net/ethernet/intel/libeth/Makefile @@ -4,3 +4,9 @@ obj-$(CONFIG_LIBETH) +=3D libeth.o =20 libeth-y :=3D rx.o +libeth-y +=3D tx.o + +obj-$(CONFIG_LIBETH_XDP) +=3D libeth_xdp.o + +libeth_xdp-y +=3D xdp.o +libeth_xdp-y +=3D xsk.o diff --git a/include/net/libeth/types.h b/include/net/libeth/types.h index 603825e45133..27ded729467a 100644 --- a/include/net/libeth/types.h +++ b/include/net/libeth/types.h @@ -4,7 +4,27 @@ #ifndef __LIBETH_TYPES_H #define __LIBETH_TYPES_H =20 -#include +#include + +/* Stats */ + +/** + * struct libeth_rq_napi_stats - "hot" counters to update in Rx polling lo= op + * @packets: received frames counter + * @bytes: sum of bytes of received frames above + * @fragments: sum of fragments of received S/G frames + * @raw: alias to access all the fields as an array + */ +struct libeth_rq_napi_stats { + union { + struct { + u32 packets; + u32 bytes; + u32 fragments; + }; + DECLARE_FLEX_ARRAY(u32, raw); + }; +}; =20 /** * struct libeth_sq_napi_stats - "hot" counters to update in Tx completion= loop @@ -22,4 +42,84 @@ struct libeth_sq_napi_stats { }; }; =20 +/** + * struct libeth_xdpsq_napi_stats - "hot" counters to update in XDP Tx + * completion loop + * @packets: completed frames counter + * @bytes: sum of bytes of completed frames above + * @fragments: sum of fragments of completed S/G frames + * @raw: alias to access all the fields as an array + */ +struct libeth_xdpsq_napi_stats { + union { + struct { + u32 packets; + u32 bytes; + u32 fragments; + }; + DECLARE_FLEX_ARRAY(u32, raw); + }; +}; + +/* XDP */ + +/* + * The following structures should be embedded into driver's queue structu= re + * and passed to the libeth_xdp helpers, never used directly. + */ + +/* XDPSQ sharing */ + +/** + * struct libeth_xdpsq_lock - locking primitive for sharing XDPSQs + * @lock: spinlock for locking the queue + * @share: whether this particular queue is shared + */ +struct libeth_xdpsq_lock { + spinlock_t lock; + bool share; +}; + +/* XDPSQ clean-up timers */ + +/** + * struct libeth_xdpsq_timer - timer for cleaning up XDPSQs w/o interrupts + * @xdpsq: queue this timer belongs to + * @lock: lock for the queue + * @dwork: work performing cleanups + * + * XDPSQs not using interrupts but lazy cleaning, i.e. only when there's no + * space for sending the current queued frame/bulk, must fire up timers to + * make sure there are no stale buffers to free. + */ +struct libeth_xdpsq_timer { + void *xdpsq; + struct libeth_xdpsq_lock *lock; + + struct delayed_work dwork; +}; + +/* Rx polling path */ + +/** + * struct libeth_xdp_buff_stash - struct for stashing &xdp_buff onto a que= ue + * @data: pointer to the start of the frame, xdp_buff.data + * @headroom: frame headroom, xdp_buff.data - xdp_buff.data_hard_start + * @len: frame linear space length, xdp_buff.data_end - xdp_buff.data + * @frame_sz: truesize occupied by the frame, xdp_buff.frame_sz + * @flags: xdp_buff.flags + * + * &xdp_buff is 56 bytes long on x64, &libeth_xdp_buff is 64 bytes. This + * structure carries only necessary fields to save/restore a partially bui= lt + * frame on the queue structure to finish it during the next NAPI poll. + */ +struct libeth_xdp_buff_stash { + void *data; + u16 headroom; + u16 len; + + u32 frame_sz:24; + u32 flags:8; +} __aligned_largest; + #endif /* __LIBETH_TYPES_H */ diff --git a/drivers/net/ethernet/intel/libeth/priv.h b/drivers/net/etherne= t/intel/libeth/priv.h new file mode 100644 index 000000000000..b9171461186e --- /dev/null +++ b/drivers/net/ethernet/intel/libeth/priv.h @@ -0,0 +1,37 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* Copyright (C) 2024 Intel Corporation */ + +#ifndef __LIBETH_PRIV_H +#define __LIBETH_PRIV_H + +#include + +/* XDP */ + +enum xdp_action; +struct libeth_xdp_buff; +struct libeth_xdp_tx_frame; +struct skb_shared_info; +struct xdp_frame_bulk; + +extern const struct xsk_tx_metadata_ops libeth_xsktmo_slow; + +void libeth_xsk_tx_return_bulk(const struct libeth_xdp_tx_frame *bq, + u32 count); +u32 libeth_xsk_prog_exception(struct libeth_xdp_buff *xdp, enum xdp_action= act, + int ret); + +struct libeth_xdp_ops { + void (*bulk)(const struct skb_shared_info *sinfo, + struct xdp_frame_bulk *bq, bool frags); + void (*xsk)(struct libeth_xdp_buff *xdp); +}; + +void libeth_attach_xdp(const struct libeth_xdp_ops *ops); + +static inline void libeth_detach_xdp(void) +{ + libeth_attach_xdp(NULL); +} + +#endif /* __LIBETH_PRIV_H */ diff --git a/include/net/libeth/tx.h b/include/net/libeth/tx.h index 35614f9523f6..63a76ab7746d 100644 --- a/include/net/libeth/tx.h +++ b/include/net/libeth/tx.h @@ -12,11 +12,17 @@ =20 /** * enum libeth_sqe_type - type of &libeth_sqe to act on Tx completion - * @LIBETH_SQE_EMPTY: unused/empty, no action required + * @LIBETH_SQE_EMPTY: unused/empty OR XDP_TX/XSk frame, no action required * @LIBETH_SQE_CTX: context descriptor with empty SQE, no action required * @LIBETH_SQE_SLAB: kmalloc-allocated buffer, unmap and kfree() * @LIBETH_SQE_FRAG: mapped skb frag, only unmap DMA * @LIBETH_SQE_SKB: &sk_buff, unmap and napi_consume_skb(), update stats + * @__LIBETH_SQE_XDP_START: separator between skb and XDP types + * @LIBETH_SQE_XDP_TX: &skb_shared_info, libeth_xdp_return_buff_bulk(), st= ats + * @LIBETH_SQE_XDP_XMIT: &xdp_frame, unmap and xdp_return_frame_bulk(), st= ats + * @LIBETH_SQE_XDP_XMIT_FRAG: &xdp_frame frag, only unmap DMA + * @LIBETH_SQE_XSK_TX: &libeth_xdp_buff on XSk queue, xsk_buff_free(), sta= ts + * @LIBETH_SQE_XSK_TX_FRAG: &libeth_xdp_buff frag on XSk queue, xsk_buff_f= ree() */ enum libeth_sqe_type { LIBETH_SQE_EMPTY =3D 0U, @@ -24,6 +30,13 @@ enum libeth_sqe_type { LIBETH_SQE_SLAB, LIBETH_SQE_FRAG, LIBETH_SQE_SKB, + + __LIBETH_SQE_XDP_START, + LIBETH_SQE_XDP_TX =3D __LIBETH_SQE_XDP_START, + LIBETH_SQE_XDP_XMIT, + LIBETH_SQE_XDP_XMIT_FRAG, + LIBETH_SQE_XSK_TX, + LIBETH_SQE_XSK_TX_FRAG, }; =20 /** @@ -32,6 +45,9 @@ enum libeth_sqe_type { * @rs_idx: index of the last buffer from the batch this one was sent in * @raw: slab buffer to free via kfree() * @skb: &sk_buff to consume + * @sinfo: skb shared info of an XDP_TX frame + * @xdpf: XDP frame from ::ndo_xdp_xmit() + * @xsk: XSk Rx frame from XDP_TX action * @dma: DMA address to unmap * @len: length of the mapped region to unmap * @nr_frags: number of frags in the frame this buffer belongs to @@ -46,6 +62,9 @@ struct libeth_sqe { union { void *raw; struct sk_buff *skb; + struct skb_shared_info *sinfo; + struct xdp_frame *xdpf; + struct libeth_xdp_buff *xsk; }; =20 DEFINE_DMA_UNMAP_ADDR(dma); @@ -71,7 +90,10 @@ struct libeth_sqe { /** * struct libeth_cq_pp - completion queue poll params * @dev: &device to perform DMA unmapping + * @bq: XDP frame bulk to combine return operations * @ss: onstack NAPI stats to fill + * @xss: onstack XDPSQ NAPI stats to fill + * @xdp_tx: number of XDP-not-XSk frames processed * @napi: whether it's called from the NAPI context * * libeth uses this structure to access objects needed for performing full @@ -80,7 +102,13 @@ struct libeth_sqe { */ struct libeth_cq_pp { struct device *dev; - struct libeth_sq_napi_stats *ss; + struct xdp_frame_bulk *bq; + + union { + struct libeth_sq_napi_stats *ss; + struct libeth_xdpsq_napi_stats *xss; + }; + u32 xdp_tx; =20 bool napi; }; @@ -126,4 +154,6 @@ static inline void libeth_tx_complete(struct libeth_sqe= *sqe, sqe->type =3D LIBETH_SQE_EMPTY; } =20 +void libeth_tx_complete_any(struct libeth_sqe *sqe, struct libeth_cq_pp *c= p); + #endif /* __LIBETH_TX_H */ diff --git a/include/net/libeth/xdp.h b/include/net/libeth/xdp.h new file mode 100644 index 000000000000..a512af0891c2 --- /dev/null +++ b/include/net/libeth/xdp.h @@ -0,0 +1,1864 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* Copyright (C) 2024 Intel Corporation */ + +#ifndef __LIBETH_XDP_H +#define __LIBETH_XDP_H + +#include +#include + +#include +#include +#include + +/* Defined as bits to be able to use them as a mask */ +enum { + LIBETH_XDP_PASS =3D 0U, + LIBETH_XDP_DROP =3D BIT(0), + LIBETH_XDP_ABORTED =3D BIT(1), + LIBETH_XDP_TX =3D BIT(2), + LIBETH_XDP_REDIRECT =3D BIT(3), +}; + +/* + * &xdp_buff_xsk is the largest structure &libeth_xdp_buff gets casted to, + * pick maximum pointer-compatible alignment. + */ +#define __LIBETH_XDP_BUFF_ALIGN \ + (IS_ALIGNED(sizeof(struct xdp_buff_xsk), 16) ? 16 : \ + IS_ALIGNED(sizeof(struct xdp_buff_xsk), 8) ? 8 : \ + sizeof(long)) + +/** + * struct libeth_xdp_buff - libeth extension over &xdp_buff + * @base: main &xdp_buff + * @data: shortcut for @base.data + * @desc: RQ descriptor containing metadata for this buffer + * @priv: driver-private scratchspace + * + * The main reason for this is to have a pointer to the descriptor to be a= ble + * to quickly get frame metadata from xdpmo and driver buff-to-xdp callbac= ks + * (as well as bigger alignment). + * Pointer/layout-compatible with &xdp_buff and &xdp_buff_xsk. + */ +struct libeth_xdp_buff { + union { + struct xdp_buff base; + void *data; + }; + + const void *desc; + unsigned long priv[] + __aligned(__LIBETH_XDP_BUFF_ALIGN); +} __aligned(__LIBETH_XDP_BUFF_ALIGN); +static_assert(offsetof(struct libeth_xdp_buff, data) =3D=3D + offsetof(struct xdp_buff_xsk, xdp.data)); +static_assert(offsetof(struct libeth_xdp_buff, desc) =3D=3D + offsetof(struct xdp_buff_xsk, cb)); +static_assert(IS_ALIGNED(sizeof(struct xdp_buff_xsk), + __alignof(struct libeth_xdp_buff))); + +/** + * __LIBETH_XDP_ONSTACK_BUFF - declare a &libeth_xdp_buff on the stack + * @name: name of the variable to declare + * @...: sizeof() of the driver-private data + */ +#define __LIBETH_XDP_ONSTACK_BUFF(name, ...) \ + ___LIBETH_XDP_ONSTACK_BUFF(name, ##__VA_ARGS__) +/** + * LIBETH_XDP_ONSTACK_BUFF - declare a &libeth_xdp_buff on the stack + * @name: name of the variable to declare + * @...: type or variable name of the driver-private data + */ +#define LIBETH_XDP_ONSTACK_BUFF(name, ...) \ + __LIBETH_XDP_ONSTACK_BUFF(name, __libeth_xdp_priv_sz(__VA_ARGS__)) + +#define ___LIBETH_XDP_ONSTACK_BUFF(name, ...) \ + _DEFINE_FLEX(struct libeth_xdp_buff, name, priv, \ + LIBETH_XDP_PRIV_SZ(__VA_ARGS__ + 0), \ + /* no init */); \ + LIBETH_XDP_ASSERT_PRIV_SZ(__VA_ARGS__ + 0) + +#define __libeth_xdp_priv_sz(...) \ + CONCATENATE(__libeth_xdp_psz, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__) + +#define __libeth_xdp_psz0(...) +#define __libeth_xdp_psz1(...) sizeof(__VA_ARGS__) + +#define LIBETH_XDP_PRIV_SZ(sz) \ + (ALIGN(sz, __alignof(struct libeth_xdp_buff)) / sizeof(long)) + +/* Performs XSK_CHECK_PRIV_TYPE() */ +#define LIBETH_XDP_ASSERT_PRIV_SZ(sz) \ + static_assert(offsetofend(struct xdp_buff_xsk, cb) >=3D \ + struct_size_t(struct libeth_xdp_buff, priv, \ + LIBETH_XDP_PRIV_SZ(sz))) + +/* XDPSQ sharing */ + +DECLARE_STATIC_KEY_FALSE(libeth_xdpsq_share); + +/** + * libeth_xdpsq_num - calculate optimal number of XDPSQs for this device += sys + * @rxq: current number of active Rx queues + * @txq: current number of active Tx queues + * @max: maximum number of Tx queues + * + * Each RQ must have its own XDPSQ for XSk pairs, each CPU must have own X= DPSQ + * for lockless sending (``XDP_TX``, .ndo_xdp_xmit()). Cap the maximum of = these + * two with the number of SQs the device can have (minus used ones). + * + * Return: number of XDP Tx queues the device needs to use. + */ +static inline u32 libeth_xdpsq_num(u32 rxq, u32 txq, u32 max) +{ + return min(max(nr_cpu_ids, rxq), max - txq); +} + +/** + * libeth_xdpsq_shared - whether XDPSQs can be shared between several CPUs + * @num: number of active XDPSQs + * + * Return: true if there's no 1:1 XDPSQ/CPU association, false otherwise. + */ +static inline bool libeth_xdpsq_shared(u32 num) +{ + return num < nr_cpu_ids; +} + +/** + * libeth_xdpsq_id - get XDPSQ index corresponding to this CPU + * @qid: number of active XDPSQs + * + * Helper for libeth_xdp routines, do not use in drivers directly. + * + * Return: XDPSQ index needs to be used on this CPU. + */ +static inline u32 libeth_xdpsq_id(u32 qid) +{ + u32 ret =3D raw_smp_processor_id(); + + if (static_branch_unlikely(&libeth_xdpsq_share) && + libeth_xdpsq_shared(qid)) + ret %=3D qid; + + return ret; +} + +void __libeth_xdpsq_get(struct libeth_xdpsq_lock *lock, + const struct net_device *dev); +void __libeth_xdpsq_put(struct libeth_xdpsq_lock *lock, + const struct net_device *dev); + +#define libeth_xdpsq_get_start cpus_read_lock +#define libeth_xdpsq_get_end cpus_read_unlock + +/** + * libeth_xdpsq_get - initialize a &libeth_xdpsq_lock + * @lock: lock to initialize + * @dev: netdev which this lock belongs to + * @share: whether XDPSQs can be shared + * + * Must be called only inside a libeth_xdpsq_get_{start,put}() block. + * Tracks the current XDPSQ association and enables the static lock + * if needed. + */ +static inline void libeth_xdpsq_get(struct libeth_xdpsq_lock *lock, + const struct net_device *dev, + bool share) +{ + if (unlikely(share)) + __libeth_xdpsq_get(lock, dev); +} + +/** + * libeth_xdpsq_put - deinitialize a &libeth_xdpsq_lock + * @lock: lock to deinitialize + * @dev: netdev which this lock belongs to + * + * Must be called only inside a libeth_xdpsq_get_{start,put}() block. + * Tracks the current XDPSQ association and disables the static lock + * if needed. + */ +static inline void libeth_xdpsq_put(struct libeth_xdpsq_lock *lock, + const struct net_device *dev) +{ + if (static_branch_unlikely(&libeth_xdpsq_share) && lock->share) + __libeth_xdpsq_put(lock, dev); +} + +void __acquires(&lock->lock) +__libeth_xdpsq_lock(struct libeth_xdpsq_lock *lock); +void __releases(&lock->lock) +__libeth_xdpsq_unlock(struct libeth_xdpsq_lock *lock); + +/** + * libeth_xdpsq_lock - grab a &libeth_xdpsq_lock if needed + * @lock: lock to take + * + * Touches the underlying spinlock only if the static key is enabled + * and the queue itself is marked as shareable. + */ +static inline void libeth_xdpsq_lock(struct libeth_xdpsq_lock *lock) +{ + if (static_branch_unlikely(&libeth_xdpsq_share) && lock->share) + __libeth_xdpsq_lock(lock); +} + +/** + * libeth_xdpsq_unlock - free a &libeth_xdpsq_lock if needed + * @lock: lock to free + * + * Touches the underlying spinlock only if the static key is enabled + * and the queue itself is marked as shareable. + */ +static inline void libeth_xdpsq_unlock(struct libeth_xdpsq_lock *lock) +{ + if (static_branch_unlikely(&libeth_xdpsq_share) && lock->share) + __libeth_xdpsq_unlock(lock); +} + +/* XDPSQ clean-up timers */ + +void libeth_xdpsq_init_timer(struct libeth_xdpsq_timer *timer, void *xdpsq, + struct libeth_xdpsq_lock *lock, + void (*poll)(struct work_struct *work)); + +/** + * libeth_xdpsq_deinit_timer - deinitialize a &libeth_xdpsq_timer + * @timer: timer to deinitialize + * + * Flush and disable the underlying workqueue. + */ +static inline void libeth_xdpsq_deinit_timer(struct libeth_xdpsq_timer *ti= mer) +{ + cancel_delayed_work_sync(&timer->dwork); +} + +/** + * libeth_xdpsq_queue_timer - run a &libeth_xdpsq_timer + * @timer: timer to queue + * + * Should be called after the queue was filled and the transmission was run + * to complete the pending buffers if no further sending will be done in a + * second (-> lazy cleaning won't happen). + * If the timer was already run, it will be requeued back to one second + * timeout again. + */ +static inline void libeth_xdpsq_queue_timer(struct libeth_xdpsq_timer *tim= er) +{ + mod_delayed_work_on(raw_smp_processor_id(), system_bh_highpri_wq, + &timer->dwork, HZ); +} + +/** + * libeth_xdpsq_run_timer - wrapper to run a queue clean-up on a timer eve= nt + * @work: workqueue belonging to the corresponding timer + * @poll: driver-specific completion queue poll function + * + * Run the polling function on the locked queue and requeue the timer if + * there's more work to do. + * Designed to be used via LIBETH_XDP_DEFINE_TIMER() below. + */ +static __always_inline void +libeth_xdpsq_run_timer(struct work_struct *work, + u32 (*poll)(void *xdpsq, u32 budget)) +{ + struct libeth_xdpsq_timer *timer =3D container_of(work, typeof(*timer), + dwork.work); + + libeth_xdpsq_lock(timer->lock); + + if (poll(timer->xdpsq, U32_MAX)) + libeth_xdpsq_queue_timer(timer); + + libeth_xdpsq_unlock(timer->lock); +} + +/* Common Tx bits */ + +/** + * enum - libeth_xdp internal Tx flags + * @LIBETH_XDP_TX_BULK: one bulk size at which it will be flushed to the q= ueue + * @LIBETH_XDP_TX_BATCH: batch size for which the queue fill loop is unrol= led + * @LIBETH_XDP_TX_DROP: indicates the send function must drop frames not s= ent + * @LIBETH_XDP_TX_NDO: whether the send function is called from .ndo_xdp_x= mit() + * @LIBETH_XDP_TX_XSK: whether the function is called for ``XDP_TX`` for X= Sk + */ +enum { + LIBETH_XDP_TX_BULK =3D DEV_MAP_BULK_SIZE, + LIBETH_XDP_TX_BATCH =3D 8, + + LIBETH_XDP_TX_DROP =3D BIT(0), + LIBETH_XDP_TX_NDO =3D BIT(1), + LIBETH_XDP_TX_XSK =3D BIT(2), +}; + +/** + * enum - &libeth_xdp_tx_frame and &libeth_xdp_tx_desc flags + * @LIBETH_XDP_TX_LEN: only for ``XDP_TX``, [15:0] of ::len_fl is actual l= ength + * @LIBETH_XDP_TX_CSUM: for XSk xmit, enable checksum offload + * @LIBETH_XDP_TX_XSKMD: for XSk xmit, mask of the metadata bits + * @LIBETH_XDP_TX_FIRST: indicates the frag is the first one of the frame + * @LIBETH_XDP_TX_LAST: whether the frag is the last one of the frame + * @LIBETH_XDP_TX_MULTI: whether the frame contains several frags + * @LIBETH_XDP_TX_FLAGS: only for ``XDP_TX``, [31:16] of ::len_fl is flags + */ +enum { + LIBETH_XDP_TX_LEN =3D GENMASK(15, 0), + + LIBETH_XDP_TX_CSUM =3D XDP_TXMD_FLAGS_CHECKSUM, + LIBETH_XDP_TX_XSKMD =3D LIBETH_XDP_TX_LEN, + + LIBETH_XDP_TX_FIRST =3D BIT(16), + LIBETH_XDP_TX_LAST =3D BIT(17), + LIBETH_XDP_TX_MULTI =3D BIT(18), + + LIBETH_XDP_TX_FLAGS =3D GENMASK(31, 16), +}; + +/** + * struct libeth_xdp_tx_frame - represents one XDP Tx element + * @data: frame start pointer for ``XDP_TX`` + * @len_fl: ``XDP_TX``, combined flags [31:16] and len [15:0] field for sp= eed + * @soff: ``XDP_TX``, offset from @data to the start of &skb_shared_info + * @frag: one (non-head) frag for ``XDP_TX`` + * @xdpf: &xdp_frame for the head frag for .ndo_xdp_xmit() + * @dma: DMA address of the non-head frag for .ndo_xdp_xmit() + * @xsk: ``XDP_TX`` for XSk, XDP buffer for any frag + * @len: frag length for XSk ``XDP_TX`` and .ndo_xdp_xmit() + * @flags: Tx flags for the above + * @opts: combined @len + @flags for the above for speed + * @desc: XSk xmit descriptor for direct casting + */ +struct libeth_xdp_tx_frame { + union { + /* ``XDP_TX`` */ + struct { + void *data; + u32 len_fl; + u32 soff; + }; + + /* ``XDP_TX`` frag */ + skb_frag_t frag; + + /* .ndo_xdp_xmit(), XSk ``XDP_TX`` */ + struct { + union { + struct xdp_frame *xdpf; + dma_addr_t dma; + + struct libeth_xdp_buff *xsk; + }; + union { + struct { + u32 len; + u32 flags; + }; + aligned_u64 opts; + }; + }; + + /* XSk xmit */ + struct xdp_desc desc; + }; +} __aligned(sizeof(struct xdp_desc)); +static_assert(offsetof(struct libeth_xdp_tx_frame, frag.len) =3D=3D + offsetof(struct libeth_xdp_tx_frame, len_fl)); +static_assert(sizeof(struct libeth_xdp_tx_frame) =3D=3D sizeof(struct xdp_= desc)); + +/** + * struct libeth_xdp_tx_bulk - XDP Tx frame bulk for bulk sending + * @prog: corresponding active XDP program, %NULL for .ndo_xdp_xmit() + * @dev: &net_device which the frames are transmitted on + * @xdpsq: shortcut to the corresponding driver-specific XDPSQ structure + * @act_mask: Rx only, mask of all the XDP prog verdicts for that NAPI ses= sion + * @count: current number of frames in @bulk + * @bulk: array of queued frames for bulk Tx + * + * All XDP Tx operations except XSk xmit queue each frame to the bulk first + * and flush it when @count reaches the array end. Bulk is always placed on + * the stack for performance. One bulk element contains all the data neces= sary + * for sending a frame and then freeing it on completion. + * For XSk xmit, Tx descriptor array from &xsk_buff_pool is casted directly + * to &libeth_xdp_tx_frame as they are compatible and the bulk structure is + * not used. + */ +struct libeth_xdp_tx_bulk { + const struct bpf_prog *prog; + struct net_device *dev; + void *xdpsq; + + u32 act_mask; + u32 count; + struct libeth_xdp_tx_frame bulk[LIBETH_XDP_TX_BULK]; +} __aligned(sizeof(struct libeth_xdp_tx_frame)); + +/** + * struct libeth_xdpsq - abstraction for an XDPSQ + * @pool: XSk buffer pool for XSk ``XDP_TX`` and xmit + * @sqes: array of Tx buffers from the actual queue struct + * @descs: opaque pointer to the HW descriptor array + * @ntu: pointer to the next free descriptor index + * @count: number of descriptors on that queue + * @pending: pointer to the number of sent-not-completed descs on that que= ue + * @xdp_tx: pointer to the above, but only for non-XSk-xmit frames + * @lock: corresponding XDPSQ lock + * + * Abstraction for driver-independent implementation of Tx. Placed on the = stack + * and filled by the driver before the transmission, so that the generic + * functions can access and modify driver-specific resources. + */ +struct libeth_xdpsq { + struct xsk_buff_pool *pool; + struct libeth_sqe *sqes; + void *descs; + + u32 *ntu; + u32 count; + + u32 *pending; + u32 *xdp_tx; + struct libeth_xdpsq_lock *lock; +}; + +/** + * struct libeth_xdp_tx_desc - abstraction for an XDP Tx descriptor + * @addr: DMA address of the frame + * @len: length of the frame + * @flags: XDP Tx flags + * @opts: combined @len + @flags for speed + * + * Filled by the generic functions and then passed to driver-specific func= tions + * to fill a HW Tx descriptor, always placed on the [function] stack. + */ +struct libeth_xdp_tx_desc { + dma_addr_t addr; + union { + struct { + u32 len; + u32 flags; + }; + aligned_u64 opts; + }; +} __aligned_largest; + +/** + * libeth_xdp_ptr_to_priv - convert a pointer to a libeth_xdp u64 priv + * @ptr: pointer to convert + * + * The main sending function passes private data as the largest scalar, u6= 4. + * Use this helper when you want to pass a pointer there. + */ +#define libeth_xdp_ptr_to_priv(ptr) ({ \ + typecheck_pointer(ptr); \ + ((u64)(uintptr_t)(ptr)); \ +}) +/** + * libeth_xdp_priv_to_ptr - convert a libeth_xdp u64 priv to a pointer + * @priv: private data to convert + * + * The main sending function passes private data as the largest scalar, u6= 4. + * Use this helper when your callback takes this u64 and you want to conve= rt + * it back to a pointer. + */ +#define libeth_xdp_priv_to_ptr(priv) ({ \ + static_assert(__same_type(priv, u64)); \ + ((const void *)(uintptr_t)(priv)); \ +}) + +/* + * On 64-bit systems, assigning one u64 is faster than two u32s. When ::len + * occupies lowest 32 bits (LE), whole ::opts can be assigned directly ins= tead. + */ +#ifdef __LITTLE_ENDIAN +#define __LIBETH_WORD_ACCESS 1 +#endif +#ifdef __LIBETH_WORD_ACCESS +#define __libeth_xdp_tx_len(flen, ...) \ + .opts =3D ((flen) | FIELD_PREP(GENMASK_ULL(63, 32), (__VA_ARGS__ + 0))) +#else +#define __libeth_xdp_tx_len(flen, ...) \ + .len =3D (flen), .flags =3D (__VA_ARGS__ + 0) +#endif + +/** + * libeth_xdp_tx_xmit_bulk - main XDP Tx function + * @bulk: array of frames to send + * @xdpsq: pointer to the driver-specific XDPSQ struct + * @n: number of frames to send + * @unroll: whether to unroll the queue filling loop for speed + * @priv: driver-specific private data + * @prep: callback for cleaning the queue and filling abstract &libeth_xdp= sq + * @fill: internal callback for filling &libeth_sqe and &libeth_xdp_tx_desc + * @xmit: callback for filling a HW descriptor with the frame info + * + * Internal abstraction for placing @n XDP Tx frames on the HW XDPSQ. Used= for + * all types of frames: ``XDP_TX``, .ndo_xdp_xmit(), XSk ``XDP_TX`` and XSk + * xmit. + * @prep must lock the queue as this function releases it at the end. @unr= oll + * greatly increases the object code size, but also greatly increases XSk = xmit + * performance; for other types of frames, it's not enabled. + * The compilers inline all those onstack abstractions to direct data acce= sses. + * + * Return: number of frames actually placed on the queue, <=3D @n. The fun= ction + * can't fail, but can send less frames if there's no enough free descript= ors + * available. The actual free space is returned by @prep from the driver. + */ +static __always_inline u32 +libeth_xdp_tx_xmit_bulk(const struct libeth_xdp_tx_frame *bulk, void *xdps= q, + u32 n, bool unroll, u64 priv, + u32 (*prep)(void *xdpsq, struct libeth_xdpsq *sq), + struct libeth_xdp_tx_desc + (*fill)(struct libeth_xdp_tx_frame frm, u32 i, + const struct libeth_xdpsq *sq, u64 priv), + void (*xmit)(struct libeth_xdp_tx_desc desc, u32 i, + const struct libeth_xdpsq *sq, u64 priv)) +__releases(sq.lock) +{ + u32 this, batched, off =3D 0; + struct libeth_xdpsq sq; + u32 ntu, i =3D 0; + + n =3D min(n, prep(xdpsq, &sq)); + if (unlikely(!n)) + goto unlock; + + ntu =3D *sq.ntu; + + this =3D sq.count - ntu; + if (likely(this > n)) + this =3D n; + +again: + if (!unroll) + goto linear; + + batched =3D ALIGN_DOWN(this, LIBETH_XDP_TX_BATCH); + + for ( ; i < off + batched; i +=3D LIBETH_XDP_TX_BATCH) { + u32 base =3D ntu + i - off; + + unrolled_count(LIBETH_XDP_TX_BATCH) + for (u32 j =3D 0; j < LIBETH_XDP_TX_BATCH; j++) + xmit(fill(bulk[i + j], base + j, &sq, priv), + base + j, &sq, priv); + } + + if (batched < this) { +linear: + for ( ; i < off + this; i++) + xmit(fill(bulk[i], ntu + i - off, &sq, priv), + ntu + i - off, &sq, priv); + } + + ntu +=3D this; + if (likely(ntu < sq.count)) + goto out; + + ntu =3D 0; + + if (i < n) { + this =3D n - i; + off =3D i; + + goto again; + } + +out: + *sq.ntu =3D ntu; + *sq.pending +=3D n; + if (sq.xdp_tx) + *sq.xdp_tx +=3D n; + +unlock: + libeth_xdpsq_unlock(sq.lock); + + return n; +} + +/* ``XDP_TX`` bulking */ + +void libeth_xdp_return_buff_slow(struct libeth_xdp_buff *xdp); + +/** + * libeth_xdp_tx_queue_head - internal helper for queueing one ``XDP_TX`` = head + * @bq: XDP Tx bulk to queue the head frag to + * @xdp: XDP buffer with the head to queue + * + * Return: false if it's the only frag of the frame, true if it's an S/G f= rame. + */ +static inline bool libeth_xdp_tx_queue_head(struct libeth_xdp_tx_bulk *bq, + const struct libeth_xdp_buff *xdp) +{ + const struct xdp_buff *base =3D &xdp->base; + + bq->bulk[bq->count++] =3D (typeof(*bq->bulk)){ + .data =3D xdp->data, + .len_fl =3D (base->data_end - xdp->data) | LIBETH_XDP_TX_FIRST, + .soff =3D xdp_data_hard_end(base) - xdp->data, + }; + + if (!xdp_buff_has_frags(base)) + return false; + + bq->bulk[bq->count - 1].len_fl |=3D LIBETH_XDP_TX_MULTI; + + return true; +} + +/** + * libeth_xdp_tx_queue_frag - internal helper for queueing one ``XDP_TX`` = frag + * @bq: XDP Tx bulk to queue the frag to + * @frag: frag to queue + */ +static inline void libeth_xdp_tx_queue_frag(struct libeth_xdp_tx_bulk *bq, + const skb_frag_t *frag) +{ + bq->bulk[bq->count++].frag =3D *frag; +} + +/** + * libeth_xdp_tx_queue_bulk - internal helper for queueing one ``XDP_TX`` = frame + * @bq: XDP Tx bulk to queue the frame to + * @xdp: XDP buffer to queue + * @flush_bulk: driver callback to flush the bulk to the HW queue + * + * Return: true on success, false on flush error. + */ +static __always_inline bool +libeth_xdp_tx_queue_bulk(struct libeth_xdp_tx_bulk *bq, + struct libeth_xdp_buff *xdp, + bool (*flush_bulk)(struct libeth_xdp_tx_bulk *bq, + u32 flags)) +{ + const struct skb_shared_info *sinfo; + bool ret =3D true; + u32 nr_frags; + + if (unlikely(bq->count =3D=3D LIBETH_XDP_TX_BULK) && + unlikely(!flush_bulk(bq, 0))) { + libeth_xdp_return_buff_slow(xdp); + return false; + } + + if (!libeth_xdp_tx_queue_head(bq, xdp)) + goto out; + + sinfo =3D xdp_get_shared_info_from_buff(&xdp->base); + nr_frags =3D sinfo->nr_frags; + + for (u32 i =3D 0; i < nr_frags; i++) { + if (unlikely(bq->count =3D=3D LIBETH_XDP_TX_BULK) && + unlikely(!flush_bulk(bq, 0))) { + ret =3D false; + break; + } + + libeth_xdp_tx_queue_frag(bq, &sinfo->frags[i]); + }; + +out: + bq->bulk[bq->count - 1].len_fl |=3D LIBETH_XDP_TX_LAST; + xdp->data =3D NULL; + + return ret; +} + +/** + * libeth_xdp_tx_fill_stats - fill a &libeth_sqe with ``XDP_TX`` frame sta= ts + * @sqe: SQ element to fill + * @desc: libeth_xdp Tx descriptor + * @sinfo: &skb_shared_info for this frame + * + * Internal helper for filling an SQE with the frame stats, do not use in + * drivers. Fills the number of frags and bytes for this frame. + */ +#define libeth_xdp_tx_fill_stats(sqe, desc, sinfo) \ + __libeth_xdp_tx_fill_stats(sqe, desc, sinfo, __UNIQUE_ID(sqe_), \ + __UNIQUE_ID(desc_), __UNIQUE_ID(sinfo_)) + +#define __libeth_xdp_tx_fill_stats(sqe, desc, sinfo, ue, ud, us) do { = \ + const struct libeth_xdp_tx_desc *ud =3D (desc); \ + const struct skb_shared_info *us; \ + struct libeth_sqe *ue =3D (sqe); \ + \ + ue->nr_frags =3D 1; \ + ue->bytes =3D ud->len; \ + \ + if (ud->flags & LIBETH_XDP_TX_MULTI) { \ + us =3D (sinfo); \ + ue->nr_frags +=3D us->nr_frags; \ + ue->bytes +=3D us->xdp_frags_size; \ + } \ +} while (0) + +/** + * libeth_xdp_tx_fill_buf - internal helper to fill one ``XDP_TX`` &libeth= _sqe + * @frm: XDP Tx frame from the bulk + * @i: index on the HW queue + * @sq: XDPSQ abstraction for the queue + * @priv: private data + * + * Return: XDP Tx descriptor with the synced DMA and other info to pass to + * the driver callback. + */ +static inline struct libeth_xdp_tx_desc +libeth_xdp_tx_fill_buf(struct libeth_xdp_tx_frame frm, u32 i, + const struct libeth_xdpsq *sq, u64 priv) +{ + struct libeth_xdp_tx_desc desc; + struct skb_shared_info *sinfo; + skb_frag_t *frag =3D &frm.frag; + struct libeth_sqe *sqe; + + if (frm.len_fl & LIBETH_XDP_TX_FIRST) { + sinfo =3D frm.data + frm.soff; + skb_frag_fill_page_desc(frag, virt_to_page(frm.data), + offset_in_page(frm.data), + frm.len_fl); + } else { + sinfo =3D NULL; + } + + desc =3D (typeof(desc)){ + .addr =3D page_pool_get_dma_addr(skb_frag_page(frag)) + + skb_frag_off(frag), + .len =3D skb_frag_size(frag) & LIBETH_XDP_TX_LEN, + .flags =3D skb_frag_size(frag) & LIBETH_XDP_TX_FLAGS, + }; + + dma_sync_single_for_device(skb_frag_page(frag)->pp->p.dev, desc.addr, + desc.len, DMA_BIDIRECTIONAL); + + if (!sinfo) + return desc; + + sqe =3D &sq->sqes[i]; + sqe->type =3D LIBETH_SQE_XDP_TX; + sqe->sinfo =3D sinfo; + libeth_xdp_tx_fill_stats(sqe, &desc, sinfo); + + return desc; +} + +void libeth_xdp_tx_exception(struct libeth_xdp_tx_bulk *bq, u32 sent, + u32 flags); + +/** + * __libeth_xdp_tx_flush_bulk - internal helper to flush one XDP Tx bulk + * @bq: bulk to flush + * @flags: XDP TX flags (.ndo_xdp_xmit(), XSk etc.) + * @prep: driver-specific callback to prepare the queue for sending + * @fill: libeth_xdp callback to fill &libeth_sqe and &libeth_xdp_tx_desc + * @xmit: driver callback to fill a HW descriptor + * + * Internal abstraction to create bulk flush functions for drivers. Used f= or + * everything except XSk xmit. + * + * Return: true if anything was sent, false otherwise. + */ +static __always_inline bool +__libeth_xdp_tx_flush_bulk(struct libeth_xdp_tx_bulk *bq, u32 flags, + u32 (*prep)(void *xdpsq, struct libeth_xdpsq *sq), + struct libeth_xdp_tx_desc + (*fill)(struct libeth_xdp_tx_frame frm, u32 i, + const struct libeth_xdpsq *sq, u64 priv), + void (*xmit)(struct libeth_xdp_tx_desc desc, u32 i, + const struct libeth_xdpsq *sq, + u64 priv)) +{ + u32 sent, drops; + int err =3D 0; + + sent =3D libeth_xdp_tx_xmit_bulk(bq->bulk, bq->xdpsq, + min(bq->count, LIBETH_XDP_TX_BULK), + false, 0, prep, fill, xmit); + drops =3D bq->count - sent; + + if (unlikely(drops)) { + libeth_xdp_tx_exception(bq, sent, flags); + err =3D -ENXIO; + } else { + bq->count =3D 0; + } + + trace_xdp_bulk_tx(bq->dev, sent, drops, err); + + return likely(sent); +} + +/** + * libeth_xdp_tx_flush_bulk - wrapper to define flush of one ``XDP_TX`` bu= lk + * @bq: bulk to flush + * @flags: Tx flags, see above + * @prep: driver callback to prepare the queue + * @xmit: driver callback to fill a HW descriptor + * + * Use via LIBETH_XDP_DEFINE_FLUSH_TX() to define an ``XDP_TX`` driver + * callback. + */ +#define libeth_xdp_tx_flush_bulk(bq, flags, prep, xmit) \ + __libeth_xdp_tx_flush_bulk(bq, flags, prep, libeth_xdp_tx_fill_buf, \ + xmit) + +/* .ndo_xdp_xmit() implementation */ + +/** + * libeth_xdp_xmit_init_bulk - internal helper to initialize bulk for XDP = xmit + * @bq: bulk to initialize + * @dev: target &net_device + * @xdpsqs: array of driver-specific XDPSQ structs + * @num: number of active XDPSQs (the above array length) + */ +#define libeth_xdp_xmit_init_bulk(bq, dev, xdpsqs, num) \ + __libeth_xdp_xmit_init_bulk(bq, dev, (xdpsqs)[libeth_xdpsq_id(num)]) + +static inline void __libeth_xdp_xmit_init_bulk(struct libeth_xdp_tx_bulk *= bq, + struct net_device *dev, + void *xdpsq) +{ + bq->dev =3D dev; + bq->xdpsq =3D xdpsq; + bq->count =3D 0; +} + +/** + * libeth_xdp_xmit_frame_dma - internal helper to access DMA of an &xdp_fr= ame + * @xf: pointer to the XDP frame + * + * There's no place in &libeth_xdp_tx_frame to store DMA address for an + * &xdp_frame head. The headroom is used then, the address is placed right + * after the frame struct, naturally aligned. + * + * Return: pointer to the DMA address to use. + */ +#define libeth_xdp_xmit_frame_dma(xf) \ + _Generic((xf), \ + const struct xdp_frame *: \ + (const dma_addr_t *)__libeth_xdp_xmit_frame_dma(xf), \ + struct xdp_frame *: \ + (dma_addr_t *)__libeth_xdp_xmit_frame_dma(xf) \ + ) + +static inline void *__libeth_xdp_xmit_frame_dma(const struct xdp_frame *xd= pf) +{ + void *addr =3D (void *)(xdpf + 1); + + if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && + __alignof(*xdpf) < sizeof(dma_addr_t)) + addr =3D PTR_ALIGN(addr, sizeof(dma_addr_t)); + + return addr; +} + +/** + * libeth_xdp_xmit_queue_head - internal helper for queueing one XDP xmit = head + * @bq: XDP Tx bulk to queue the head frag to + * @xdpf: XDP frame with the head to queue + * @dev: device to perform DMA mapping + * + * Return: ``LIBETH_XDP_DROP`` on DMA mapping error, + * ``LIBETH_XDP_PASS`` if it's the only frag in the frame, + * ``LIBETH_XDP_TX`` if it's an S/G frame. + */ +static inline u32 libeth_xdp_xmit_queue_head(struct libeth_xdp_tx_bulk *bq, + struct xdp_frame *xdpf, + struct device *dev) +{ + dma_addr_t dma; + + dma =3D dma_map_single(dev, xdpf->data, xdpf->len, DMA_TO_DEVICE); + if (dma_mapping_error(dev, dma)) + return LIBETH_XDP_DROP; + + *libeth_xdp_xmit_frame_dma(xdpf) =3D dma; + + bq->bulk[bq->count++] =3D (typeof(*bq->bulk)){ + .xdpf =3D xdpf, + __libeth_xdp_tx_len(xdpf->len, LIBETH_XDP_TX_FIRST), + }; + + if (!xdp_frame_has_frags(xdpf)) + return LIBETH_XDP_PASS; + + bq->bulk[bq->count - 1].flags |=3D LIBETH_XDP_TX_MULTI; + + return LIBETH_XDP_TX; +} + +/** + * libeth_xdp_xmit_queue_frag - internal helper for queueing one XDP xmit = frag + * @bq: XDP Tx bulk to queue the frag to + * @frag: frag to queue + * @dev: device to perform DMA mapping + * + * Return: true on success, false on DMA mapping error. + */ +static inline bool libeth_xdp_xmit_queue_frag(struct libeth_xdp_tx_bulk *b= q, + const skb_frag_t *frag, + struct device *dev) +{ + dma_addr_t dma; + + dma =3D skb_frag_dma_map(dev, frag); + if (dma_mapping_error(dev, dma)) + return false; + + bq->bulk[bq->count++] =3D (typeof(*bq->bulk)){ + .dma =3D dma, + __libeth_xdp_tx_len(skb_frag_size(frag)), + }; + + return true; +} + +/** + * libeth_xdp_xmit_queue_bulk - internal helper for queueing one XDP xmit = frame + * @bq: XDP Tx bulk to queue the frame to + * @xdpf: XDP frame to queue + * @flush_bulk: driver callback to flush the bulk to the HW queue + * + * Return: ``LIBETH_XDP_TX`` on success, + * ``LIBETH_XDP_DROP`` if the frame should be dropped by the stack, + * ``LIBETH_XDP_ABORTED`` if the frame will be dropped by libeth_xdp. + */ +static __always_inline u32 +libeth_xdp_xmit_queue_bulk(struct libeth_xdp_tx_bulk *bq, + struct xdp_frame *xdpf, + bool (*flush_bulk)(struct libeth_xdp_tx_bulk *bq, + u32 flags)) +{ + u32 head, nr_frags, i, ret =3D LIBETH_XDP_TX; + struct device *dev =3D bq->dev->dev.parent; + const struct skb_shared_info *sinfo; + + if (unlikely(bq->count =3D=3D LIBETH_XDP_TX_BULK) && + unlikely(!flush_bulk(bq, LIBETH_XDP_TX_NDO))) + return LIBETH_XDP_DROP; + + head =3D libeth_xdp_xmit_queue_head(bq, xdpf, dev); + if (head =3D=3D LIBETH_XDP_PASS) + goto out; + else if (head =3D=3D LIBETH_XDP_DROP) + return LIBETH_XDP_DROP; + + sinfo =3D xdp_get_shared_info_from_frame(xdpf); + nr_frags =3D sinfo->nr_frags; + + for (i =3D 0; i < nr_frags; i++) { + if (unlikely(bq->count =3D=3D LIBETH_XDP_TX_BULK) && + unlikely(!flush_bulk(bq, LIBETH_XDP_TX_NDO))) + break; + + if (!libeth_xdp_xmit_queue_frag(bq, &sinfo->frags[i], dev)) + break; + }; + + if (unlikely(i < nr_frags)) + ret =3D LIBETH_XDP_ABORTED; + +out: + bq->bulk[bq->count - 1].flags |=3D LIBETH_XDP_TX_LAST; + + return ret; +} + +/** + * libeth_xdp_xmit_fill_buf - internal helper to fill one XDP xmit &libeth= _sqe + * @frm: XDP Tx frame from the bulk + * @i: index on the HW queue + * @sq: XDPSQ abstraction for the queue + * @priv: private data + * + * Return: XDP Tx descriptor with the mapped DMA and other info to pass to + * the driver callback. + */ +static inline struct libeth_xdp_tx_desc +libeth_xdp_xmit_fill_buf(struct libeth_xdp_tx_frame frm, u32 i, + const struct libeth_xdpsq *sq, u64 priv) +{ + struct libeth_xdp_tx_desc desc; + struct libeth_sqe *sqe; + struct xdp_frame *xdpf; + + if (frm.flags & LIBETH_XDP_TX_FIRST) { + xdpf =3D frm.xdpf; + desc.addr =3D *libeth_xdp_xmit_frame_dma(xdpf); + } else { + xdpf =3D NULL; + desc.addr =3D frm.dma; + } + desc.opts =3D frm.opts; + + sqe =3D &sq->sqes[i]; + dma_unmap_addr_set(sqe, dma, desc.addr); + dma_unmap_len_set(sqe, len, desc.len); + + if (!xdpf) { + sqe->type =3D LIBETH_SQE_XDP_XMIT_FRAG; + return desc; + } + + sqe->type =3D LIBETH_SQE_XDP_XMIT; + sqe->xdpf =3D xdpf; + libeth_xdp_tx_fill_stats(sqe, &desc, + xdp_get_shared_info_from_frame(xdpf)); + + return desc; +} + +/** + * libeth_xdp_xmit_flush_bulk - wrapper to define flush of one XDP xmit bu= lk + * @bq: bulk to flush + * @flags: Tx flags, see __libeth_xdp_tx_flush_bulk() + * @prep: driver callback to prepare the queue + * @xmit: driver callback to fill a HW descriptor + * + * Use via LIBETH_XDP_DEFINE_FLUSH_XMIT() to define an XDP xmit driver + * callback. + */ +#define libeth_xdp_xmit_flush_bulk(bq, flags, prep, xmit) \ + __libeth_xdp_tx_flush_bulk(bq, (flags) | LIBETH_XDP_TX_NDO, prep, \ + libeth_xdp_xmit_fill_buf, xmit) + +u32 libeth_xdp_xmit_return_bulk(const struct libeth_xdp_tx_frame *bq, + u32 count, const struct net_device *dev); + +/** + * __libeth_xdp_xmit_do_bulk - internal function to implement .ndo_xdp_xmi= t() + * @bq: XDP Tx bulk to queue frames to + * @frames: XDP frames passed by the stack + * @n: number of frames + * @flags: flags passed by the stack + * @flush_bulk: driver callback to flush an XDP xmit bulk + * @finalize: driver callback to finalize sending XDP Tx frames on the que= ue + * + * Perform common checks, map the frags and queue them to the bulk, then f= lush + * the bulk to the XDPSQ. If requested by the stack, finalize the queue. + * + * Return: number of frames send or -errno on error. + */ +static __always_inline int +__libeth_xdp_xmit_do_bulk(struct libeth_xdp_tx_bulk *bq, + struct xdp_frame **frames, u32 n, u32 flags, + bool (*flush_bulk)(struct libeth_xdp_tx_bulk *bq, + u32 flags), + void (*finalize)(void *xdpsq, bool sent, bool flush)) +{ + u32 nxmit =3D 0; + + if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) + return -EINVAL; + + for (u32 i =3D 0; likely(i < n); i++) { + u32 ret; + + ret =3D libeth_xdp_xmit_queue_bulk(bq, frames[i], flush_bulk); + if (unlikely(ret !=3D LIBETH_XDP_TX)) { + nxmit +=3D ret =3D=3D LIBETH_XDP_ABORTED; + break; + } + + nxmit++; + } + + if (bq->count) { + flush_bulk(bq, LIBETH_XDP_TX_NDO); + if (unlikely(bq->count)) + nxmit -=3D libeth_xdp_xmit_return_bulk(bq->bulk, + bq->count, + bq->dev); + } + + finalize(bq->xdpsq, nxmit, flags & XDP_XMIT_FLUSH); + + return nxmit; +} + +/** + * libeth_xdp_xmit_do_bulk - implement full .ndo_xdp_xmit() in driver + * @dev: target &net_device + * @n: number of frames to send + * @fr: XDP frames to send + * @f: flags passed by the stack + * @xqs: array of XDPSQs driver structs + * @nqs: number of active XDPSQs, the above array length + * @fl: driver callback to flush an XDP xmit bulk + * @fin: driver cabback to finalize the queue + * + * If the driver has active XDPSQs, perform common checks and send the fra= mes. + * Finalize the queue, if requested. + * + * Return: number of frames sent or -errno on error. + */ +#define libeth_xdp_xmit_do_bulk(dev, n, fr, f, xqs, nqs, fl, fin) \ + _libeth_xdp_xmit_do_bulk(dev, n, fr, f, xqs, nqs, fl, fin, \ + __UNIQUE_ID(bq_), __UNIQUE_ID(ret_), \ + __UNIQUE_ID(nqs_)) + +#define _libeth_xdp_xmit_do_bulk(d, n, fr, f, xqs, nqs, fl, fin, ub, ur, u= n) \ +({ \ + u32 un =3D (nqs); \ + int ur; \ + \ + if (likely(un)) { \ + struct libeth_xdp_tx_bulk ub; \ + \ + libeth_xdp_xmit_init_bulk(&ub, d, xqs, un); \ + ur =3D __libeth_xdp_xmit_do_bulk(&ub, fr, n, f, fl, fin); \ + } else { \ + ur =3D -ENXIO; \ + } \ + \ + ur; \ +}) + +/* Rx polling path */ + +/** + * libeth_xdp_tx_init_bulk - initialize an XDP Tx bulk for Rx NAPI poll + * @bq: bulk to initialize + * @prog: RCU pointer to the XDP program (can be %NULL) + * @dev: target &net_device + * @xdpsqs: array of driver XDPSQ structs + * @num: number of active XDPSQs, the above array length + * + * Should be called on an onstack XDP Tx bulk before the NAPI polling loop. + * Initializes all the needed fields to run libeth_xdp functions. If @num = =3D=3D 0, + * assumes XDP is not enabled. + * Do not use for XSk, it has its own optimized helper. + */ +#define libeth_xdp_tx_init_bulk(bq, prog, dev, xdpsqs, num) \ + __libeth_xdp_tx_init_bulk(bq, prog, dev, xdpsqs, num, false, \ + __UNIQUE_ID(bq_), __UNIQUE_ID(nqs_)) + +#define __libeth_xdp_tx_init_bulk(bq, pr, d, xdpsqs, num, xsk, ub, un) do = { \ + typeof(bq) ub =3D (bq); \ + u32 un =3D (num); \ + \ + if (un || (xsk)) { \ + ub->prog =3D rcu_dereference(pr); \ + ub->dev =3D (d); \ + ub->xdpsq =3D (xdpsqs)[libeth_xdpsq_id(un)]; \ + } else { \ + ub->prog =3D NULL; \ + } \ + \ + ub->act_mask =3D 0; \ + ub->count =3D 0; \ +} while (0) + +void libeth_xdp_load_stash(struct libeth_xdp_buff *dst, + const struct libeth_xdp_buff_stash *src); +void libeth_xdp_save_stash(struct libeth_xdp_buff_stash *dst, + const struct libeth_xdp_buff *src); +void __libeth_xdp_return_stash(struct libeth_xdp_buff_stash *stash); + +/** + * libeth_xdp_init_buff - initialize a &libeth_xdp_buff for Rx NAPI poll + * @dst: onstack buffer to initialize + * @src: XDP buffer stash placed on the queue + * @rxq: registered &xdp_rxq_info corresponding to this queue + * + * Should be called before the main NAPI polling loop. Loads the content of + * the previously saved stash or initializes the buffer from scratch. + * Do not use for XSk. + */ +static inline void +libeth_xdp_init_buff(struct libeth_xdp_buff *dst, + const struct libeth_xdp_buff_stash *src, + struct xdp_rxq_info *rxq) +{ + if (likely(!src->data)) + dst->data =3D NULL; + else + libeth_xdp_load_stash(dst, src); + + dst->base.rxq =3D rxq; +} + +/** + * libeth_xdp_save_buff - save a partially built buffer on a queue + * @dst: XDP buffer stash placed on the queue + * @src: onstack buffer to save + * + * Should be called after the main NAPI polling loop. If the loop exited b= efore + * the buffer was finished, saves its content on the queue, so that it can= be + * completed during the next poll. Otherwise, clears the stash. + */ +static inline void libeth_xdp_save_buff(struct libeth_xdp_buff_stash *dst, + const struct libeth_xdp_buff *src) +{ + if (likely(!src->data)) + dst->data =3D NULL; + else + libeth_xdp_save_stash(dst, src); +} + +/** + * libeth_xdp_return_stash - free an XDP buffer stash from a queue + * @stash: stash to free + * + * If the queue is about to be destroyed, but it still has an incompleted + * buffer stash, this helper should be called to free it. + */ +static inline void libeth_xdp_return_stash(struct libeth_xdp_buff_stash *s= tash) +{ + if (stash->data) + __libeth_xdp_return_stash(stash); +} + +static inline void libeth_xdp_return_va(const void *data, bool napi) +{ + struct page *page =3D virt_to_page(data); + + page_pool_put_full_page(page->pp, page, napi); +} + +static inline void libeth_xdp_return_frags(const struct skb_shared_info *s= info, + bool napi) +{ + for (u32 i =3D 0; i < sinfo->nr_frags; i++) { + struct page *page =3D skb_frag_page(&sinfo->frags[i]); + + page_pool_put_full_page(page->pp, page, napi); + } +} + +/** + * libeth_xdp_return_buff - free/recycle a &libeth_xdp_buff + * @xdp: buffer to free + * + * Hotpath helper to free a &libeth_xdp_buff. Comparing to xdp_return_buff= (), + * it's faster as it gets inlined and always assumes order-0 pages and safe + * direct recycling. Zeroes @xdp->data to avoid UAFs. + */ +static inline void libeth_xdp_return_buff(struct libeth_xdp_buff *xdp) +{ + if (!xdp_buff_has_frags(&xdp->base)) + goto out; + + libeth_xdp_return_frags(xdp_get_shared_info_from_buff(&xdp->base), + true); + +out: + libeth_xdp_return_va(xdp->data, true); + xdp->data =3D NULL; +} + +bool libeth_xdp_buff_add_frag(struct libeth_xdp_buff *xdp, + const struct libeth_fqe *fqe, + u32 len); + +/** + * libeth_xdp_prepare_buff - fill a &libeth_xdp_buff with a head FQE data + * @xdp: XDP buffer to attach the head to + * @fqe: FQE containing the head buffer + * @len: buffer len passed from HW + * + * Internal, use libeth_xdp_process_buff() instead. Initializes XDP buffer + * head with the Rx buffer data: data pointer, length, headroom, and + * truesize/tailroom. Zeroes the flags. + * Uses faster single u64 write instead of per-field access. + */ +static inline void libeth_xdp_prepare_buff(struct libeth_xdp_buff *xdp, + const struct libeth_fqe *fqe, + u32 len) +{ + const struct page *page =3D fqe->page; + +#ifdef __LIBETH_WORD_ACCESS + static_assert(offsetofend(typeof(xdp->base), flags) - + offsetof(typeof(xdp->base), frame_sz) =3D=3D + sizeof(u64)); + + *(u64 *)&xdp->base.frame_sz =3D fqe->truesize; +#else + xdp_init_buff(&xdp->base, fqe->truesize, xdp->base.rxq); +#endif + xdp_prepare_buff(&xdp->base, page_address(page) + fqe->offset, + page->pp->p.offset, len, true); +} + +/** + * libeth_xdp_process_buff - attach an Rx buffer to a &libeth_xdp_buff + * @xdp: XDP buffer to attach the Rx buffer to + * @fqe: Rx buffer to process + * @len: received data length from the descriptor + * + * If the XDP buffer is empty, attaches the Rx buffer as head and initiali= zes + * the required fields. Otherwise, attaches the buffer as a frag. + * Already performs DMA sync-for-CPU and frame start prefetch + * (for head buffers only). + * + * Return: true on success, false if the descriptor must be skipped (empty= or + * no space for a new frag). + */ +static inline bool libeth_xdp_process_buff(struct libeth_xdp_buff *xdp, + const struct libeth_fqe *fqe, + u32 len) +{ + if (!libeth_rx_sync_for_cpu(fqe, len)) + return false; + + if (xdp->data) + return libeth_xdp_buff_add_frag(xdp, fqe, len); + + libeth_xdp_prepare_buff(xdp, fqe, len); + + prefetch(xdp->data); + + return true; +} + +/** + * libeth_xdp_buff_stats_frags - update onstack RQ stats with XDP frags in= fo + * @ss: onstack stats to update + * @xdp: buffer to account + * + * Internal helper used by __libeth_xdp_run_pass(), do not call directly. + * Adds buffer's frags count and total len to the onstack stats. + */ +static inline void +libeth_xdp_buff_stats_frags(struct libeth_rq_napi_stats *ss, + const struct libeth_xdp_buff *xdp) +{ + const struct skb_shared_info *sinfo; + + sinfo =3D xdp_get_shared_info_from_buff(&xdp->base); + ss->bytes +=3D sinfo->xdp_frags_size; + ss->fragments +=3D sinfo->nr_frags + 1; +} + +u32 libeth_xdp_prog_exception(const struct libeth_xdp_tx_bulk *bq, + struct libeth_xdp_buff *xdp, + enum xdp_action act, int ret); + +/** + * __libeth_xdp_run_prog - run XDP program on an XDP buffer + * @xdp: XDP buffer to run the prog on + * @bq: buffer bulk for ``XDP_TX`` queueing + * + * Internal inline abstraction to run XDP program. Handles ``XDP_DROP`` + * and ``XDP_REDIRECT`` only, the rest is processed levels up. + * Reports an XDP prog exception on errors. + * + * Return: libeth_xdp prog verdict depending on the prog's verdict. + */ +static __always_inline u32 +__libeth_xdp_run_prog(struct libeth_xdp_buff *xdp, + const struct libeth_xdp_tx_bulk *bq) +{ + enum xdp_action act; + + act =3D bpf_prog_run_xdp(bq->prog, &xdp->base); + if (unlikely(act < XDP_DROP || act > XDP_REDIRECT)) + goto out; + + switch (act) { + case XDP_PASS: + return LIBETH_XDP_PASS; + case XDP_DROP: + libeth_xdp_return_buff(xdp); + + return LIBETH_XDP_DROP; + case XDP_TX: + return LIBETH_XDP_TX; + case XDP_REDIRECT: + if (unlikely(xdp_do_redirect(bq->dev, &xdp->base, bq->prog))) + break; + + xdp->data =3D NULL; + + return LIBETH_XDP_REDIRECT; + default: + break; + } + +out: + return libeth_xdp_prog_exception(bq, xdp, act, 0); +} + +/** + * __libeth_xdp_run_flush - run XDP program and handle ``XDP_TX`` verdict + * @xdp: XDP buffer to run the prog on + * @bq: buffer bulk for ``XDP_TX`` queueing + * @run: internal callback for running XDP program + * @queue: internal callback for queuing ``XDP_TX`` frame + * @flush_bulk: driver callback for flushing a bulk + * + * Internal inline abstraction to run XDP program and additionally handle + * ``XDP_TX`` verdict. Used by both XDP and XSk, hence @run and @queue. + * Do not use directly. + * + * Return: libeth_xdp prog verdict depending on the prog's verdict. + */ +static __always_inline u32 +__libeth_xdp_run_flush(struct libeth_xdp_buff *xdp, + struct libeth_xdp_tx_bulk *bq, + u32 (*run)(struct libeth_xdp_buff *xdp, + const struct libeth_xdp_tx_bulk *bq), + bool (*queue)(struct libeth_xdp_tx_bulk *bq, + struct libeth_xdp_buff *xdp, + bool (*flush_bulk) + (struct libeth_xdp_tx_bulk *bq, + u32 flags)), + bool (*flush_bulk)(struct libeth_xdp_tx_bulk *bq, + u32 flags)) +{ + u32 act; + + act =3D run(xdp, bq); + if (act =3D=3D LIBETH_XDP_TX && unlikely(!queue(bq, xdp, flush_bulk))) + act =3D LIBETH_XDP_DROP; + + bq->act_mask |=3D act; + + return act; +} + +/** + * libeth_xdp_run_prog - run XDP program (non-XSk path) and handle all ver= dicts + * @xdp: XDP buffer to process + * @bq: XDP Tx bulk to queue ``XDP_TX`` buffers + * @fl: driver ``XDP_TX`` bulk flush callback + * + * Run the attached XDP program and handle all possible verdicts. XSk has = its + * own version. + * Prefer using it via LIBETH_XDP_DEFINE_RUN{,_PASS,_PROG}(). + * + * Return: true if the buffer should be passed up the stack, false if the = poll + * should go to the next buffer. + */ +#define libeth_xdp_run_prog(xdp, bq, fl) \ + (__libeth_xdp_run_flush(xdp, bq, __libeth_xdp_run_prog, \ + libeth_xdp_tx_queue_bulk, \ + fl) =3D=3D LIBETH_XDP_PASS) + +/** + * __libeth_xdp_run_pass - helper to run XDP program and handle the result + * @xdp: XDP buffer to process + * @bq: XDP Tx bulk to queue ``XDP_TX`` frames + * @napi: NAPI to build an skb and pass it up the stack + * @rs: onstack libeth RQ stats + * @md: metadata that should be filled to the XDP buffer + * @prep: callback for filling the metadata + * @run: driver wrapper to run XDP program + * @populate: driver callback to populate an skb with the HW descriptor da= ta + * + * Inline abstraction that does the following (non-XSk path): + * 1) adds frame size and frag number (if needed) to the onstack stats; + * 2) fills the descriptor metadata to the onstack &libeth_xdp_buff + * 3) runs XDP program if present; + * 4) handles all possible verdicts; + * 5) on ``XDP_PASS`, builds an skb from the buffer; + * 6) populates it with the descriptor metadata; + * 7) passes it up the stack. + * + * In most cases, number 2 means just writing the pointer to the HW descri= ptor + * to the XDP buffer. If so, please use LIBETH_XDP_DEFINE_RUN{,_PASS}() + * wrappers to build a driver function. + */ +static __always_inline void +__libeth_xdp_run_pass(struct libeth_xdp_buff *xdp, + struct libeth_xdp_tx_bulk *bq, struct napi_struct *napi, + struct libeth_rq_napi_stats *rs, const void *md, + void (*prep)(struct libeth_xdp_buff *xdp, + const void *md), + bool (*run)(struct libeth_xdp_buff *xdp, + struct libeth_xdp_tx_bulk *bq), + bool (*populate)(struct sk_buff *skb, + const struct libeth_xdp_buff *xdp, + struct libeth_rq_napi_stats *rs)) +{ + struct sk_buff *skb; + + rs->bytes +=3D xdp->base.data_end - xdp->data; + rs->packets++; + + if (xdp_buff_has_frags(&xdp->base)) + libeth_xdp_buff_stats_frags(rs, xdp); + + if (prep && (!__builtin_constant_p(!!md) || md)) + prep(xdp, md); + + if (!bq || !run || !bq->prog) + goto build; + + if (!run(xdp, bq)) + return; + +build: + skb =3D xdp_build_skb_from_buff(&xdp->base); + if (unlikely(!skb)) { + libeth_xdp_return_buff_slow(xdp); + return; + } + + xdp->data =3D NULL; + + if (unlikely(!populate(skb, xdp, rs))) { + napi_consume_skb(skb, true); + return; + } + + napi_gro_receive(napi, skb); +} + +static inline void libeth_xdp_prep_desc(struct libeth_xdp_buff *xdp, + const void *desc) +{ + xdp->desc =3D desc; +} + +/** + * libeth_xdp_run_pass - helper to run XDP program and handle the result + * @xdp: XDP buffer to process + * @bq: XDP Tx bulk to queue ``XDP_TX`` frames + * @napi: NAPI to build an skb and pass it up the stack + * @ss: onstack libeth RQ stats + * @desc: pointer to the HW descriptor for that frame + * @run: driver wrapper to run XDP program + * @populate: driver callback to populate an skb with the HW descriptor da= ta + * + * Wrapper around the underscored version when "fill the descriptor metada= ta" + * means just writing the pointer to the HW descriptor as @xdp->desc. + */ +#define libeth_xdp_run_pass(xdp, bq, napi, ss, desc, run, populate) \ + __libeth_xdp_run_pass(xdp, bq, napi, ss, desc, libeth_xdp_prep_desc, \ + run, populate) + +/** + * libeth_xdp_finalize_rx - finalize XDPSQ after a NAPI polling loop (non-= XSk) + * @bq: ``XDP_TX`` frame bulk + * @flush: driver callback to flush the bulk + * @finalize: driver callback to start sending the frames and run the timer + * + * Flush the bulk if there are frames left to send, kick the queue and flu= sh + * the XDP maps. + */ +#define libeth_xdp_finalize_rx(bq, flush, finalize) \ + __libeth_xdp_finalize_rx(bq, 0, flush, finalize) + +static __always_inline void +__libeth_xdp_finalize_rx(struct libeth_xdp_tx_bulk *bq, u32 flags, + bool (*flush_bulk)(struct libeth_xdp_tx_bulk *bq, + u32 flags), + void (*finalize)(void *xdpsq, bool sent, bool flush)) +{ + if (bq->act_mask & LIBETH_XDP_TX) { + if (bq->count) + flush_bulk(bq, flags | LIBETH_XDP_TX_DROP); + finalize(bq->xdpsq, true, true); + } + if (bq->act_mask & LIBETH_XDP_REDIRECT) + xdp_do_flush(); +} + +/* + * Helpers to reduce boilerplate code in drivers. + * + * Typical driver Rx flow would be (excl. bulk and buff init, frag attach): + * + * LIBETH_XDP_DEFINE_START(); + * LIBETH_XDP_DEFINE_FLUSH_TX(static driver_xdp_flush_tx, driver_xdp_tx_pr= ep, + * driver_xdp_xmit); + * LIBETH_XDP_DEFINE_RUN(static driver_xdp_run, driver_xdp_run_prog, + * driver_xdp_flush_tx, driver_populate_skb); + * LIBETH_XDP_DEFINE_FINALIZE(static driver_xdp_finalize_rx, + * driver_xdp_flush_tx, driver_xdp_finalize_sq); + * LIBETH_XDP_DEFINE_END(); + * + * This will build a set of 4 static functions. The compiler is free to de= cide + * whether to inline them. + * Then, in the NAPI polling function: + * + * while (packets < budget) { + * // ... + * driver_xdp_run(xdp, &bq, napi, &rs, desc); + * } + * driver_xdp_finalize_rx(&bq); + */ + +#define LIBETH_XDP_DEFINE_START() \ + __diag_push(); \ + __diag_ignore(GCC, 8, "-Wold-style-declaration", \ + "Allow specifying \'static\' after the return type") + +/** + * LIBETH_XDP_DEFINE_TIMER - define a driver XDPSQ cleanup timer callback + * @name: name of the function to define + * @poll: Tx polling/completion function + */ +#define LIBETH_XDP_DEFINE_TIMER(name, poll) \ +void name(struct work_struct *work) \ +{ \ + libeth_xdpsq_run_timer(work, poll); \ +} + +/** + * LIBETH_XDP_DEFINE_FLUSH_TX - define a driver ``XDP_TX`` bulk flush func= tion + * @name: name of the function to define + * @prep: driver callback to clean an XDPSQ + * @xmit: driver callback to write a HW Tx descriptor + */ +#define LIBETH_XDP_DEFINE_FLUSH_TX(name, prep, xmit) \ + __LIBETH_XDP_DEFINE_FLUSH_TX(name, prep, xmit, xdp) + +#define __LIBETH_XDP_DEFINE_FLUSH_TX(name, prep, xmit, pfx) \ +bool name(struct libeth_xdp_tx_bulk *bq, u32 flags) \ +{ \ + return libeth_##pfx##_tx_flush_bulk(bq, flags, prep, xmit); \ +} + +/** + * LIBETH_XDP_DEFINE_FLUSH_XMIT - define a driver XDP xmit bulk flush func= tion + * @name: name of the function to define + * @prep: driver callback to clean an XDPSQ + * @xmit: driver callback to write a HW Tx descriptor + */ +#define LIBETH_XDP_DEFINE_FLUSH_XMIT(name, prep, xmit) \ +bool name(struct libeth_xdp_tx_bulk *bq, u32 flags) \ +{ \ + return libeth_xdp_xmit_flush_bulk(bq, flags, prep, xmit); \ +} + +/** + * LIBETH_XDP_DEFINE_RUN_PROG - define a driver XDP program run function + * @name: name of the function to define + * @flush: driver callback to flush an ``XDP_TX`` bulk + */ +#define LIBETH_XDP_DEFINE_RUN_PROG(name, flush) \ + bool __LIBETH_XDP_DEFINE_RUN_PROG(name, flush, xdp) + +#define __LIBETH_XDP_DEFINE_RUN_PROG(name, flush, pfx) \ +name(struct libeth_xdp_buff *xdp, struct libeth_xdp_tx_bulk *bq) \ +{ \ + return libeth_##pfx##_run_prog(xdp, bq, flush); \ +} + +/** + * LIBETH_XDP_DEFINE_RUN_PASS - define a driver buffer process + pass func= tion + * @name: name of the function to define + * @run: driver callback to run XDP program (above) + * @populate: driver callback to fill an skb with HW descriptor info + */ +#define LIBETH_XDP_DEFINE_RUN_PASS(name, run, populate) \ + void __LIBETH_XDP_DEFINE_RUN_PASS(name, run, populate, xdp) + +#define __LIBETH_XDP_DEFINE_RUN_PASS(name, run, populate, pfx) \ +name(struct libeth_xdp_buff *xdp, struct libeth_xdp_tx_bulk *bq, \ + struct napi_struct *napi, struct libeth_rq_napi_stats *ss, \ + const void *desc) \ +{ \ + return libeth_##pfx##_run_pass(xdp, bq, napi, ss, desc, run, \ + populate); \ +} + +/** + * LIBETH_XDP_DEFINE_RUN - define a driver buffer process, run + pass func= tion + * @name: name of the function to define + * @run: name of the XDP prog run function to define + * @flush: driver callback to flush an ``XDP_TX`` bulk + * @populate: driver callback to fill an skb with HW descriptor info + */ +#define LIBETH_XDP_DEFINE_RUN(name, run, flush, populate) \ + __LIBETH_XDP_DEFINE_RUN(name, run, flush, populate, XDP) + +#define __LIBETH_XDP_DEFINE_RUN(name, run, flush, populate, pfx) \ + LIBETH_##pfx##_DEFINE_RUN_PROG(static run, flush); \ + LIBETH_##pfx##_DEFINE_RUN_PASS(name, run, populate) + +/** + * LIBETH_XDP_DEFINE_FINALIZE - define a driver Rx NAPI poll finalize func= tion + * @name: name of the function to define + * @flush: driver callback to flush an ``XDP_TX`` bulk + * @finalize: driver callback to finalize an XDPSQ and run the timer + */ +#define LIBETH_XDP_DEFINE_FINALIZE(name, flush, finalize) \ + __LIBETH_XDP_DEFINE_FINALIZE(name, flush, finalize, xdp) + +#define __LIBETH_XDP_DEFINE_FINALIZE(name, flush, finalize, pfx) \ +void name(struct libeth_xdp_tx_bulk *bq) \ +{ \ + libeth_##pfx##_finalize_rx(bq, flush, finalize); \ +} + +#define LIBETH_XDP_DEFINE_END() __diag_pop() + +/* XMO */ + +/** + * libeth_xdp_buff_to_rq - get RQ pointer from an XDP buffer pointer + * @xdp: &libeth_xdp_buff corresponding to the queue + * @type: typeof() of the driver Rx queue structure + * @member: name of &xdp_rxq_info inside @type + * + * Often times, pointer to the RQ is needed when reading/filling metadata = from + * HW descriptors. The helper can be used to quickly jump from an XDP buff= er + * to the queue corresponding to its &xdp_rxq_info without introducing + * additional fields (&libeth_xdp_buff is precisely 1 cacheline long on x6= 4). + */ +#define libeth_xdp_buff_to_rq(xdp, type, member) \ + container_of_const((xdp)->base.rxq, type, member) + +/** + * libeth_xdpmo_rx_hash - convert &libeth_rx_pt to an XDP RSS hash metadata + * @hash: pointer to the variable to write the hash to + * @rss_type: pointer to the variable to write the hash type to + * @val: hash value from the HW descriptor + * @pt: libeth parsed packet type + * + * Handle zeroed/non-available hash and convert libeth parsed packet type = to + * the corresponding XDP RSS hash type. To be called at the end of + * xdp_metadata_ops idpf_xdpmo::xmo_rx_hash() implementation. + * Note that if the driver doesn't use a constant packet type lookup table= but + * generates it at runtime, it must call libeth_rx_pt_gen_hash_type(pt) to + * generate XDP RSS hash type for each packet type. + * + * Return: 0 on success, -ENODATA when the hash is not available. + */ +static inline int libeth_xdpmo_rx_hash(u32 *hash, + enum xdp_rss_hash_type *rss_type, + u32 val, struct libeth_rx_pt pt) +{ + if (unlikely(!val)) + return -ENODATA; + + *hash =3D val; + *rss_type =3D pt.hash_type; + + return 0; +} + +/* Tx buffer completion */ + +void libeth_xdp_return_buff_bulk(const struct skb_shared_info *sinfo, + struct xdp_frame_bulk *bq, bool frags); +void libeth_xsk_buff_free_slow(struct libeth_xdp_buff *xdp); + +/** + * __libeth_xdp_complete_tx - complete a sent XDPSQE + * @sqe: SQ element / Tx buffer to complete + * @cp: Tx polling/completion params + * @bulk: internal callback to bulk-free ``XDP_TX`` buffers + * @xsk: internal callback to free XSk ``XDP_TX`` buffers + * + * Use the non-underscored version in drivers instead. This one is shared + * internally with libeth_tx_complete_any(). + * Complete an XDPSQE of any type of XDP frame. This includes DMA unmapping + * when needed, buffer freeing, stats update, and SQE invalidating. + */ +static __always_inline void +__libeth_xdp_complete_tx(struct libeth_sqe *sqe, struct libeth_cq_pp *cp, + typeof(libeth_xdp_return_buff_bulk) bulk, + typeof(libeth_xsk_buff_free_slow) xsk) +{ + enum libeth_sqe_type type =3D sqe->type; + + switch (type) { + case LIBETH_SQE_EMPTY: + return; + case LIBETH_SQE_XDP_XMIT: + case LIBETH_SQE_XDP_XMIT_FRAG: + dma_unmap_page(cp->dev, dma_unmap_addr(sqe, dma), + dma_unmap_len(sqe, len), DMA_TO_DEVICE); + break; + default: + break; + } + + switch (type) { + case LIBETH_SQE_XDP_TX: + bulk(sqe->sinfo, cp->bq, sqe->nr_frags !=3D 1); + break; + case LIBETH_SQE_XDP_XMIT: + xdp_return_frame_bulk(sqe->xdpf, cp->bq); + break; + case LIBETH_SQE_XSK_TX: + case LIBETH_SQE_XSK_TX_FRAG: + xsk(sqe->xsk); + break; + default: + break; + } + + switch (type) { + case LIBETH_SQE_XDP_TX: + case LIBETH_SQE_XDP_XMIT: + case LIBETH_SQE_XSK_TX: + cp->xdp_tx -=3D sqe->nr_frags; + + cp->xss->packets++; + cp->xss->bytes +=3D sqe->bytes; + break; + default: + break; + } + + sqe->type =3D LIBETH_SQE_EMPTY; +} + +static inline void libeth_xdp_complete_tx(struct libeth_sqe *sqe, + struct libeth_cq_pp *cp) +{ + __libeth_xdp_complete_tx(sqe, cp, libeth_xdp_return_buff_bulk, + libeth_xsk_buff_free_slow); +} + +/* Misc */ + +u32 libeth_xdp_queue_threshold(u32 count); + +void __libeth_xdp_set_features(struct net_device *dev, + const struct xdp_metadata_ops *xmo, + u32 zc_segs, + const struct xsk_tx_metadata_ops *tmo); +void libeth_xdp_set_redirect(struct net_device *dev, bool enable); + +/** + * libeth_xdp_set_features - set XDP features for a netdev + * @dev: &net_device to configure + * @...: optional params, see __libeth_xdp_set_features() + * + * Set all the features libeth_xdp supports, including .ndo_xdp_xmit(). Th= at + * said, it should be used only when XDPSQs are always available regardless + * of whether an XDP prog is attached to @dev. + */ +#define libeth_xdp_set_features(dev, ...) \ + CONCATENATE(__libeth_xdp_feat, \ + COUNT_ARGS(__VA_ARGS__))(dev, ##__VA_ARGS__) + +#define __libeth_xdp_feat0(dev) \ + __libeth_xdp_set_features(dev, NULL, 0, NULL) +#define __libeth_xdp_feat1(dev, xmo) \ + __libeth_xdp_set_features(dev, xmo, 0, NULL) +#define __libeth_xdp_feat2(dev, xmo, zc_segs) \ + __libeth_xdp_set_features(dev, xmo, zc_segs, NULL) +#define __libeth_xdp_feat3(dev, xmo, zc_segs, tmo) \ + __libeth_xdp_set_features(dev, xmo, zc_segs, tmo) + +/** + * libeth_xdp_set_features_noredir - enable all libeth_xdp features w/o re= dir + * @dev: target &net_device + * @...: optional params, see __libeth_xdp_set_features() + * + * Enable everything except the .ndo_xdp_xmit() feature, use when XDPSQs a= re + * not available right after netdev registration. + */ +#define libeth_xdp_set_features_noredir(dev, ...) \ + __libeth_xdp_set_features_noredir(dev, __UNIQUE_ID(dev_), \ + ##__VA_ARGS__) + +#define __libeth_xdp_set_features_noredir(dev, ud, ...) do { \ + struct net_device *ud =3D (dev); \ + \ + libeth_xdp_set_features(ud, ##__VA_ARGS__); \ + libeth_xdp_set_redirect(ud, false); \ +} while (0) + +#define libeth_xsktmo ((const void *)true) + +#endif /* __LIBETH_XDP_H */ diff --git a/include/net/libeth/xsk.h b/include/net/libeth/xsk.h new file mode 100644 index 000000000000..7c27e12b0628 --- /dev/null +++ b/include/net/libeth/xsk.h @@ -0,0 +1,684 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* Copyright (C) 2024 Intel Corporation */ + +#ifndef __LIBETH_XSK_H +#define __LIBETH_XSK_H + +#include +#include + +/* ``XDP_TXMD_FLAGS_VALID`` is defined only under ``CONFIG_XDP_SOCKETS`` */ +#ifdef XDP_TXMD_FLAGS_VALID +static_assert(XDP_TXMD_FLAGS_VALID <=3D LIBETH_XDP_TX_XSKMD); +#endif + +/* ``XDP_TX`` bulking */ + +/** + * libeth_xsk_tx_queue_head - internal helper for queueing XSk ``XDP_TX`` = head + * @bq: XDP Tx bulk to queue the head frag to + * @xdp: XSk buffer with the head to queue + * + * Return: false if it's the only frag of the frame, true if it's an S/G f= rame. + */ +static inline bool libeth_xsk_tx_queue_head(struct libeth_xdp_tx_bulk *bq, + struct libeth_xdp_buff *xdp) +{ + bq->bulk[bq->count++] =3D (typeof(*bq->bulk)){ + .xsk =3D xdp, + __libeth_xdp_tx_len(xdp->base.data_end - xdp->data, + LIBETH_XDP_TX_FIRST), + }; + + if (likely(!xdp_buff_has_frags(&xdp->base))) + return false; + + bq->bulk[bq->count - 1].flags |=3D LIBETH_XDP_TX_MULTI; + + return true; +} + +/** + * libeth_xsk_tx_queue_frag - internal helper for queueing XSk ``XDP_TX`` = frag + * @bq: XDP Tx bulk to queue the frag to + * @frag: XSk frag to queue + */ +static inline void libeth_xsk_tx_queue_frag(struct libeth_xdp_tx_bulk *bq, + struct libeth_xdp_buff *frag) +{ + bq->bulk[bq->count++] =3D (typeof(*bq->bulk)){ + .xsk =3D frag, + __libeth_xdp_tx_len(frag->base.data_end - frag->data), + }; +} + +/** + * libeth_xsk_tx_queue_bulk - internal helper for queueing XSk ``XDP_TX`` = frame + * @bq: XDP Tx bulk to queue the frame to + * @xdp: XSk buffer to queue + * @flush_bulk: driver callback to flush the bulk to the HW queue + * + * Return: true on success, false on flush error. + */ +static __always_inline bool +libeth_xsk_tx_queue_bulk(struct libeth_xdp_tx_bulk *bq, + struct libeth_xdp_buff *xdp, + bool (*flush_bulk)(struct libeth_xdp_tx_bulk *bq, + u32 flags)) +{ + bool ret =3D true; + + if (unlikely(bq->count =3D=3D LIBETH_XDP_TX_BULK) && + unlikely(!flush_bulk(bq, LIBETH_XDP_TX_XSK))) { + libeth_xsk_buff_free_slow(xdp); + return false; + } + + if (!libeth_xsk_tx_queue_head(bq, xdp)) + goto out; + + for (const struct libeth_xdp_buff *head =3D xdp; ; ) { + xdp =3D container_of(xsk_buff_get_frag(&head->base), + typeof(*xdp), base); + if (!xdp) + break; + + if (unlikely(bq->count =3D=3D LIBETH_XDP_TX_BULK) && + unlikely(!flush_bulk(bq, LIBETH_XDP_TX_XSK))) { + ret =3D false; + break; + } + + libeth_xsk_tx_queue_frag(bq, xdp); + }; + +out: + bq->bulk[bq->count - 1].flags |=3D LIBETH_XDP_TX_LAST; + + return ret; +} + +/** + * libeth_xsk_tx_fill_buf - internal helper to fill XSk ``XDP_TX`` &libeth= _sqe + * @frm: XDP Tx frame from the bulk + * @i: index on the HW queue + * @sq: XDPSQ abstraction for the queue + * @priv: private data + * + * Return: XDP Tx descriptor with the synced DMA and other info to pass to + * the driver callback. + */ +static inline struct libeth_xdp_tx_desc +libeth_xsk_tx_fill_buf(struct libeth_xdp_tx_frame frm, u32 i, + const struct libeth_xdpsq *sq, u64 priv) +{ + struct libeth_xdp_buff *xdp =3D frm.xsk; + struct libeth_xdp_tx_desc desc =3D { + .addr =3D xsk_buff_xdp_get_dma(&xdp->base), + .opts =3D frm.opts, + }; + struct libeth_sqe *sqe; + + xsk_buff_raw_dma_sync_for_device(sq->pool, desc.addr, desc.len); + + sqe =3D &sq->sqes[i]; + sqe->xsk =3D xdp; + + if (!(desc.flags & LIBETH_XDP_TX_FIRST)) { + sqe->type =3D LIBETH_SQE_XSK_TX_FRAG; + return desc; + } + + sqe->type =3D LIBETH_SQE_XSK_TX; + libeth_xdp_tx_fill_stats(sqe, &desc, + xdp_get_shared_info_from_buff(&xdp->base)); + + return desc; +} + +/** + * libeth_xsk_tx_flush_bulk - wrapper to define flush of an XSk ``XDP_TX``= bulk + * @bq: bulk to flush + * @flags: Tx flags, see __libeth_xdp_tx_flush_bulk() + * @prep: driver callback to prepare the queue + * @xmit: driver callback to fill a HW descriptor + * + * Use via LIBETH_XSK_DEFINE_FLUSH_TX() to define an XSk ``XDP_TX`` driver + * callback. + */ +#define libeth_xsk_tx_flush_bulk(bq, flags, prep, xmit) \ + __libeth_xdp_tx_flush_bulk(bq, (flags) | LIBETH_XDP_TX_XSK, prep, \ + libeth_xsk_tx_fill_buf, xmit) + +/* XSk TMO */ + +/** + * libeth_xsktmo_req_csum - XSk Tx metadata op to request checksum offload + * @csum_start: unused + * @csum_offset: unused + * @priv: &libeth_xdp_tx_desc from the filling helper + * + * Generic implementation of ::tmo_request_checksum. Works only when HW do= esn't + * require filling checksum offsets and other parameters beside the checks= um + * request bit. + * Consider using within @libeth_xsktmo unless the driver requires HW-spec= ific + * callbacks. + */ +static inline void libeth_xsktmo_req_csum(u16 csum_start, u16 csum_offset, + void *priv) +{ + ((struct libeth_xdp_tx_desc *)priv)->flags |=3D LIBETH_XDP_TX_CSUM; +} + +/* Only to inline the callbacks below, use @libeth_xsktmo in drivers inste= ad */ +static const struct xsk_tx_metadata_ops __libeth_xsktmo =3D { + .tmo_request_checksum =3D libeth_xsktmo_req_csum, +}; + +/** + * __libeth_xsk_xmit_fill_buf_md - internal helper to prepare XSk xmit w/m= eta + * @xdesc: &xdp_desc from the XSk buffer pool + * @sq: XDPSQ abstraction for the queue + * @priv: XSk Tx metadata ops + * + * Same as __libeth_xsk_xmit_fill_buf(), but requests metadata pointer and + * fills additional fields in &libeth_xdp_tx_desc to ask for metadata offl= oad. + * + * Return: XDP Tx descriptor with the DMA, metadata request bits, and other + * info to pass to the driver callback. + */ +static __always_inline struct libeth_xdp_tx_desc +__libeth_xsk_xmit_fill_buf_md(const struct xdp_desc *xdesc, + const struct libeth_xdpsq *sq, + u64 priv) +{ + const struct xsk_tx_metadata_ops *tmo =3D libeth_xdp_priv_to_ptr(priv); + struct libeth_xdp_tx_desc desc; + struct xdp_desc_ctx ctx; + + ctx =3D xsk_buff_raw_get_ctx(sq->pool, xdesc->addr); + desc =3D (typeof(desc)){ + .addr =3D ctx.dma, + __libeth_xdp_tx_len(xdesc->len), + }; + + BUILD_BUG_ON(!__builtin_constant_p(tmo =3D=3D libeth_xsktmo)); + tmo =3D tmo =3D=3D libeth_xsktmo ? &__libeth_xsktmo : tmo; + + xsk_tx_metadata_request(ctx.meta, tmo, &desc); + + return desc; +} + +/* XSk xmit implementation */ + +/** + * __libeth_xsk_xmit_fill_buf - internal helper to prepare XSk xmit w/o me= ta + * @xdesc: &xdp_desc from the XSk buffer pool + * @sq: XDPSQ abstraction for the queue + * + * Return: XDP Tx descriptor with the DMA and other info to pass to + * the driver callback. + */ +static inline struct libeth_xdp_tx_desc +__libeth_xsk_xmit_fill_buf(const struct xdp_desc *xdesc, + const struct libeth_xdpsq *sq) +{ + return (struct libeth_xdp_tx_desc){ + .addr =3D xsk_buff_raw_get_dma(sq->pool, xdesc->addr), + __libeth_xdp_tx_len(xdesc->len), + }; +} + +/** + * libeth_xsk_xmit_fill_buf - internal helper to prepare an XSk xmit + * @frm: &xdp_desc from the XSk buffer pool + * @i: index on the HW queue + * @sq: XDPSQ abstraction for the queue + * @priv: XSk Tx metadata ops + * + * Depending on the metadata ops presence (determined at compile time), ca= lls + * the quickest helper to build a libeth XDP Tx descriptor. + * + * Return: XDP Tx descriptor with the synced DMA, metadata request bits, + * and other info to pass to the driver callback. + */ +static __always_inline struct libeth_xdp_tx_desc +libeth_xsk_xmit_fill_buf(struct libeth_xdp_tx_frame frm, u32 i, + const struct libeth_xdpsq *sq, u64 priv) +{ + struct libeth_xdp_tx_desc desc; + + if (priv) + desc =3D __libeth_xsk_xmit_fill_buf_md(&frm.desc, sq, priv); + else + desc =3D __libeth_xsk_xmit_fill_buf(&frm.desc, sq); + + desc.flags |=3D xsk_is_eop_desc(&frm.desc) ? LIBETH_XDP_TX_LAST : 0; + + xsk_buff_raw_dma_sync_for_device(sq->pool, desc.addr, desc.len); + + return desc; +} + +/** + * libeth_xsk_xmit_do_bulk - send XSk xmit frames + * @pool: XSk buffer pool containing the frames to send + * @xdpsq: opaque pointer to driver's XDPSQ struct + * @budget: maximum number of frames can be sent + * @tmo: optional XSk Tx metadata ops + * @prep: driver callback to build a &libeth_xdpsq + * @xmit: driver callback to put frames to a HW queue + * @finalize: driver callback to start a transmission + * + * Implements generic XSk xmit. Always turns on XSk Tx wakeup as it's assu= med + * lazy cleaning is used and interrupts are disabled for the queue. + * HW descriptor filling is unrolled by ``LIBETH_XDP_TX_BATCH`` to optimize + * writes. + * Note that unlike other XDP Tx ops, the queue must be locked and cleaned + * prior to calling this function to already know available @budget. + * @prepare must only build a &libeth_xdpsq and return ``U32_MAX``. + * + * Return: false if @budget was exhausted, true otherwise. + */ +static __always_inline bool +libeth_xsk_xmit_do_bulk(struct xsk_buff_pool *pool, void *xdpsq, u32 budge= t, + const struct xsk_tx_metadata_ops *tmo, + u32 (*prep)(void *xdpsq, struct libeth_xdpsq *sq), + void (*xmit)(struct libeth_xdp_tx_desc desc, u32 i, + const struct libeth_xdpsq *sq, u64 priv), + void (*finalize)(void *xdpsq, bool sent, bool flush)) +{ + const struct libeth_xdp_tx_frame *bulk; + bool wake; + u32 n; + + wake =3D xsk_uses_need_wakeup(pool); + if (wake) + xsk_clear_tx_need_wakeup(pool); + + n =3D xsk_tx_peek_release_desc_batch(pool, budget); + bulk =3D container_of(&pool->tx_descs[0], typeof(*bulk), desc); + + libeth_xdp_tx_xmit_bulk(bulk, xdpsq, n, true, + libeth_xdp_ptr_to_priv(tmo), prep, + libeth_xsk_xmit_fill_buf, xmit); + finalize(xdpsq, n, true); + + if (wake) + xsk_set_tx_need_wakeup(pool); + + return n < budget; +} + +/* Rx polling path */ + +/** + * libeth_xsk_tx_init_bulk - initialize an XDP Tx bulk for XSk Rx NAPI poll + * @bq: bulk to initialize + * @prog: RCU pointer to the XDP program (never %NULL) + * @dev: target &net_device + * @xdpsqs: array of driver XDPSQ structs + * @num: number of active XDPSQs, the above array length + * + * Should be called on an onstack XDP Tx bulk before the XSk NAPI polling = loop. + * Initializes all the needed fields to run libeth_xdp functions. + * Never checks if @prog is %NULL or @num =3D=3D 0 as XDP must always be e= nabled + * when hitting this path. + */ +#define libeth_xsk_tx_init_bulk(bq, prog, dev, xdpsqs, num) \ + __libeth_xdp_tx_init_bulk(bq, prog, dev, xdpsqs, num, true, \ + __UNIQUE_ID(bq_), __UNIQUE_ID(nqs_)) + +struct libeth_xdp_buff *libeth_xsk_buff_add_frag(struct libeth_xdp_buff *h= ead, + struct libeth_xdp_buff *xdp); + +/** + * libeth_xsk_process_buff - attach an XSk Rx buffer to a &libeth_xdp_buff + * @head: head XSk buffer to attach the XSk buffer to (or %NULL) + * @xdp: XSk buffer to process + * @len: received data length from the descriptor + * + * If @head =3D=3D %NULL, treats the XSk buffer as head and initializes + * the required fields. Otherwise, attaches the buffer as a frag. + * Already performs DMA sync-for-CPU and frame start prefetch + * (for head buffers only). + * + * Return: head XSk buffer on success or if the descriptor must be skipped + * (empty), %NULL if there is no space for a new frag. + */ +static inline struct libeth_xdp_buff * +libeth_xsk_process_buff(struct libeth_xdp_buff *head, + struct libeth_xdp_buff *xdp, u32 len) +{ + if (unlikely(!len)) { + libeth_xsk_buff_free_slow(xdp); + return head; + } + + xsk_buff_set_size(&xdp->base, len); + xsk_buff_dma_sync_for_cpu(&xdp->base); + + if (head) + return libeth_xsk_buff_add_frag(head, xdp); + + prefetch(xdp->data); + + return xdp; +} + +void libeth_xsk_buff_stats_frags(struct libeth_rq_napi_stats *rs, + const struct libeth_xdp_buff *xdp); + +u32 __libeth_xsk_run_prog_slow(struct libeth_xdp_buff *xdp, + const struct libeth_xdp_tx_bulk *bq, + enum xdp_action act, int ret); + +/** + * __libeth_xsk_run_prog - run XDP program on an XSk buffer + * @xdp: XSk buffer to run the prog on + * @bq: buffer bulk for ``XDP_TX`` queueing + * + * Internal inline abstraction to run XDP program on XSk Rx path. Handles + * only the most common ``XDP_REDIRECT`` inline, the rest is processed + * externally. + * Reports an XDP prog exception on errors. + * + * Return: libeth_xdp prog verdict depending on the prog's verdict. + */ +static __always_inline u32 +__libeth_xsk_run_prog(struct libeth_xdp_buff *xdp, + const struct libeth_xdp_tx_bulk *bq) +{ + enum xdp_action act; + int ret =3D 0; + + act =3D bpf_prog_run_xdp(bq->prog, &xdp->base); + if (unlikely(act !=3D XDP_REDIRECT)) +rest: + return __libeth_xsk_run_prog_slow(xdp, bq, act, ret); + + ret =3D xdp_do_redirect(bq->dev, &xdp->base, bq->prog); + if (unlikely(ret)) + goto rest; + + return LIBETH_XDP_REDIRECT; +} + +/** + * libeth_xsk_run_prog - run XDP program on XSk path and handle all verdic= ts + * @xdp: XSk buffer to process + * @bq: XDP Tx bulk to queue ``XDP_TX`` buffers + * @fl: driver ``XDP_TX`` bulk flush callback + * + * Run the attached XDP program and handle all possible verdicts. + * Prefer using it via LIBETH_XSK_DEFINE_RUN{,_PASS,_PROG}(). + * + * Return: libeth_xdp prog verdict depending on the prog's verdict. + */ +#define libeth_xsk_run_prog(xdp, bq, fl) \ + __libeth_xdp_run_flush(xdp, bq, __libeth_xsk_run_prog, \ + libeth_xsk_tx_queue_bulk, fl) + +/** + * __libeth_xsk_run_pass - helper to run XDP program and handle the result + * @xdp: XSk buffer to process + * @bq: XDP Tx bulk to queue ``XDP_TX`` frames + * @napi: NAPI to build an skb and pass it up the stack + * @rs: onstack libeth RQ stats + * @md: metadata that should be filled to the XSk buffer + * @prep: callback for filling the metadata + * @run: driver wrapper to run XDP program + * @populate: driver callback to populate an skb with the HW descriptor da= ta + * + * Inline abstraction, XSk's counterpart of __libeth_xdp_run_pass(), see i= ts + * doc for details. + * + * Return: false if the polling loop must be exited due to lack of free + * buffers, true otherwise. + */ +static __always_inline bool +__libeth_xsk_run_pass(struct libeth_xdp_buff *xdp, + struct libeth_xdp_tx_bulk *bq, struct napi_struct *napi, + struct libeth_rq_napi_stats *rs, const void *md, + void (*prep)(struct libeth_xdp_buff *xdp, + const void *md), + u32 (*run)(struct libeth_xdp_buff *xdp, + struct libeth_xdp_tx_bulk *bq), + bool (*populate)(struct sk_buff *skb, + const struct libeth_xdp_buff *xdp, + struct libeth_rq_napi_stats *rs)) +{ + struct sk_buff *skb; + u32 act; + + rs->bytes +=3D xdp->base.data_end - xdp->data; + rs->packets++; + + if (unlikely(xdp_buff_has_frags(&xdp->base))) + libeth_xsk_buff_stats_frags(rs, xdp); + + if (prep && (!__builtin_constant_p(!!md) || md)) + prep(xdp, md); + + act =3D run(xdp, bq); + if (unlikely(act =3D=3D LIBETH_XDP_ABORTED)) + return false; + else if (likely(act !=3D LIBETH_XDP_PASS)) + return true; + + skb =3D xdp_build_skb_from_zc(&xdp->base); + if (unlikely(!skb)) { + libeth_xsk_buff_free_slow(xdp); + return true; + } + + if (unlikely(!populate(skb, xdp, rs))) { + napi_consume_skb(skb, true); + return true; + } + + napi_gro_receive(napi, skb); + + return true; +} + +/** + * libeth_xsk_run_pass - helper to run XDP program and handle the result + * @xdp: XSk buffer to process + * @bq: XDP Tx bulk to queue ``XDP_TX`` frames + * @napi: NAPI to build an skb and pass it up the stack + * @rs: onstack libeth RQ stats + * @desc: pointer to the HW descriptor for that frame + * @run: driver wrapper to run XDP program + * @populate: driver callback to populate an skb with the HW descriptor da= ta + * + * Wrapper around the underscored version when "fill the descriptor metada= ta" + * means just writing the pointer to the HW descriptor as @xdp->desc. + */ +#define libeth_xsk_run_pass(xdp, bq, napi, rs, desc, run, populate) \ + __libeth_xsk_run_pass(xdp, bq, napi, rs, desc, libeth_xdp_prep_desc, \ + run, populate) + +/** + * libeth_xsk_finalize_rx - finalize XDPSQ after an XSk NAPI polling loop + * @bq: ``XDP_TX`` frame bulk + * @flush: driver callback to flush the bulk + * @finalize: driver callback to start sending the frames and run the timer + * + * Flush the bulk if there are frames left to send, kick the queue and flu= sh + * the XDP maps. + */ +#define libeth_xsk_finalize_rx(bq, flush, finalize) \ + __libeth_xdp_finalize_rx(bq, LIBETH_XDP_TX_XSK, flush, finalize) + +/* + * Helpers to reduce boilerplate code in drivers. + * + * Typical driver XSk Rx flow would be (excl. bulk and buff init, frag att= ach): + * + * LIBETH_XDP_DEFINE_START(); + * LIBETH_XSK_DEFINE_FLUSH_TX(static driver_xsk_flush_tx, driver_xsk_tx_pr= ep, + * driver_xdp_xmit); + * LIBETH_XSK_DEFINE_RUN(static driver_xsk_run, driver_xsk_run_prog, + * driver_xsk_flush_tx, driver_populate_skb); + * LIBETH_XSK_DEFINE_FINALIZE(static driver_xsk_finalize_rx, + * driver_xsk_flush_tx, driver_xdp_finalize_sq); + * LIBETH_XDP_DEFINE_END(); + * + * This will build a set of 4 static functions. The compiler is free to de= cide + * whether to inline them. + * Then, in the NAPI polling function: + * + * while (packets < budget) { + * // ... + * if (!driver_xsk_run(xdp, &bq, napi, &rs, desc)) + * break; + * } + * driver_xsk_finalize_rx(&bq); + */ + +/** + * LIBETH_XSK_DEFINE_FLUSH_TX - define a driver XSk ``XDP_TX`` flush funct= ion + * @name: name of the function to define + * @prep: driver callback to clean an XDPSQ + * @xmit: driver callback to write a HW Tx descriptor + */ +#define LIBETH_XSK_DEFINE_FLUSH_TX(name, prep, xmit) \ + __LIBETH_XDP_DEFINE_FLUSH_TX(name, prep, xmit, xsk) + +/** + * LIBETH_XSK_DEFINE_RUN_PROG - define a driver XDP program run function + * @name: name of the function to define + * @flush: driver callback to flush an XSk ``XDP_TX`` bulk + */ +#define LIBETH_XSK_DEFINE_RUN_PROG(name, flush) \ + u32 __LIBETH_XDP_DEFINE_RUN_PROG(name, flush, xsk) + +/** + * LIBETH_XSK_DEFINE_RUN_PASS - define a driver buffer process + pass func= tion + * @name: name of the function to define + * @run: driver callback to run XDP program (above) + * @populate: driver callback to fill an skb with HW descriptor info + */ +#define LIBETH_XSK_DEFINE_RUN_PASS(name, run, populate) \ + bool __LIBETH_XDP_DEFINE_RUN_PASS(name, run, populate, xsk) + +/** + * LIBETH_XSK_DEFINE_RUN - define a driver buffer process, run + pass func= tion + * @name: name of the function to define + * @run: name of the XDP prog run function to define + * @flush: driver callback to flush an XSk ``XDP_TX`` bulk + * @populate: driver callback to fill an skb with HW descriptor info + */ +#define LIBETH_XSK_DEFINE_RUN(name, run, flush, populate) \ + __LIBETH_XDP_DEFINE_RUN(name, run, flush, populate, XSK) + +/** + * LIBETH_XSK_DEFINE_FINALIZE - define a driver XSk NAPI poll finalize fun= ction + * @name: name of the function to define + * @flush: driver callback to flush an XSk ``XDP_TX`` bulk + * @finalize: driver callback to finalize an XDPSQ and run the timer + */ +#define LIBETH_XSK_DEFINE_FINALIZE(name, flush, finalize) \ + __LIBETH_XDP_DEFINE_FINALIZE(name, flush, finalize, xsk) + +/* Refilling */ + +/** + * struct libeth_xskfq - structure representing an XSk buffer (fill) queue + * @fp: hotpath part of the structure + * @pool: &xsk_buff_pool for buffer management + * @fqes: array of XSk buffer pointers + * @descs: opaque pointer to the HW descriptor array + * @ntu: index of the next buffer to poll + * @count: number of descriptors/buffers the queue has + * @pending: current number of XSkFQEs to refill + * @thresh: threshold below which the queue is refilled + * @buf_len: HW-writeable length per each buffer + * @nid: ID of the closest NUMA node with memory + */ +struct libeth_xskfq { + struct_group_tagged(libeth_xskfq_fp, fp, + struct xsk_buff_pool *pool; + struct libeth_xdp_buff **fqes; + void *descs; + + u32 ntu; + u32 count; + ); + + /* Cold fields */ + u32 pending; + u32 thresh; + + u32 buf_len; + int nid; +}; + +int libeth_xskfq_create(struct libeth_xskfq *fq); +void libeth_xskfq_destroy(struct libeth_xskfq *fq); + +/** + * libeth_xsk_buff_xdp_get_dma - get DMA address for an XSk &libeth_xdp_bu= ff + * @xdp: buffer to get the DMA addr for + */ +#define libeth_xsk_buff_xdp_get_dma(xdp) \ + xsk_buff_xdp_get_dma(&(xdp)->base) + +/** + * libeth_xskfqe_alloc - allocate @n XSk Rx buffers + * @fq: hotpath part of the XSkFQ, usually onstack + * @n: number of buffers to allocate + * @fill: driver callback to write DMA addresses to HW descriptors + * + * Note that @fq->ntu gets updated, but ::pending must be recalculated + * by the caller. + * + * Return: number of buffers refilled. + */ +static __always_inline u32 +libeth_xskfqe_alloc(struct libeth_xskfq_fp *fq, u32 n, + void (*fill)(const struct libeth_xskfq_fp *fq, u32 i)) +{ + u32 this, ret, done =3D 0; + struct xdp_buff **xskb; + + this =3D fq->count - fq->ntu; + if (likely(this > n)) + this =3D n; + +again: + xskb =3D (typeof(xskb))&fq->fqes[fq->ntu]; + ret =3D xsk_buff_alloc_batch(fq->pool, xskb, this); + + for (u32 i =3D 0, ntu =3D fq->ntu; likely(i < ret); i++) + fill(fq, ntu + i); + + done +=3D ret; + fq->ntu +=3D ret; + + if (likely(fq->ntu < fq->count) || unlikely(ret < this)) + goto out; + + fq->ntu =3D 0; + + if (this < n) { + this =3D n - this; + goto again; + } + +out: + return done; +} + +/* .ndo_xsk_wakeup */ + +void libeth_xsk_init_wakeup(call_single_data_t *csd, struct napi_struct *n= api); +void libeth_xsk_wakeup(call_single_data_t *csd, u32 qid); + +/* Pool setup */ + +int libeth_xsk_setup_pool(struct net_device *dev, u32 qid, bool enable); + +#endif /* __LIBETH_XSK_H */ diff --git a/drivers/net/ethernet/intel/libeth/rx.c b/drivers/net/ethernet/= intel/libeth/rx.c index 616426a2e363..1105fd0db4c3 100644 --- a/drivers/net/ethernet/intel/libeth/rx.c +++ b/drivers/net/ethernet/intel/libeth/rx.c @@ -215,7 +215,7 @@ EXPORT_SYMBOL_NS_GPL(libeth_rx_fq_destroy, LIBETH); * * To be used on exceptions or rare cases not requiring fast inline recycl= ing. */ -void libeth_rx_recycle_slow(struct page *page) +void __cold libeth_rx_recycle_slow(struct page *page) { page_pool_recycle_direct(page->pp, page); } diff --git a/drivers/net/ethernet/intel/libeth/tx.c b/drivers/net/ethernet/= intel/libeth/tx.c new file mode 100644 index 000000000000..dc8df216c922 --- /dev/null +++ b/drivers/net/ethernet/intel/libeth/tx.c @@ -0,0 +1,39 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (C) 2024 Intel Corporation */ + +#include + +#include "priv.h" + +/* Tx buffer completion */ + +DEFINE_STATIC_CALL_NULL(bulk, libeth_xdp_return_buff_bulk); +DEFINE_STATIC_CALL_NULL(xsk, libeth_xsk_buff_free_slow); + +/** + * libeth_tx_complete_any - perform Tx completion for one SQE of any type + * @sqe: Tx buffer to complete + * @cp: polling params + * + * Can be used to complete both regular and XDP SQEs, for example when + * destroying queues. + * When libeth_xdp is not loaded, XDPSQEs won't be handled. + */ +void libeth_tx_complete_any(struct libeth_sqe *sqe, struct libeth_cq_pp *c= p) +{ + if (sqe->type >=3D __LIBETH_SQE_XDP_START) + __libeth_xdp_complete_tx(sqe, cp, static_call(bulk), + static_call(xsk)); + else + libeth_tx_complete(sqe, cp); +} +EXPORT_SYMBOL_NS_GPL(libeth_tx_complete_any, LIBETH); + +/* Module */ + +void libeth_attach_xdp(const struct libeth_xdp_ops *ops) +{ + static_call_update(bulk, ops ? ops->bulk : NULL); + static_call_update(xsk, ops ? ops->xsk : NULL); +} +EXPORT_SYMBOL_NS_GPL(libeth_attach_xdp, LIBETH); diff --git a/drivers/net/ethernet/intel/libeth/xdp.c b/drivers/net/ethernet= /intel/libeth/xdp.c new file mode 100644 index 000000000000..00b9877cc800 --- /dev/null +++ b/drivers/net/ethernet/intel/libeth/xdp.c @@ -0,0 +1,444 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (C) 2024 Intel Corporation */ + +#include + +#include "priv.h" + +/* XDPSQ sharing */ + +DEFINE_STATIC_KEY_FALSE(libeth_xdpsq_share); +EXPORT_SYMBOL_NS_GPL(libeth_xdpsq_share, LIBETH_XDP); + +void __libeth_xdpsq_get(struct libeth_xdpsq_lock *lock, + const struct net_device *dev) +{ + bool warn; + + spin_lock_init(&lock->lock); + lock->share =3D true; + + warn =3D !static_key_enabled(&libeth_xdpsq_share); + static_branch_inc_cpuslocked(&libeth_xdpsq_share); + + if (warn && net_ratelimit()) + netdev_warn(dev, "XDPSQ sharing enabled, possible XDP Tx slowdown\n"); +} +EXPORT_SYMBOL_NS_GPL(__libeth_xdpsq_get, LIBETH_XDP); + +void __libeth_xdpsq_put(struct libeth_xdpsq_lock *lock, + const struct net_device *dev) +{ + static_branch_dec_cpuslocked(&libeth_xdpsq_share); + + if (!static_key_enabled(&libeth_xdpsq_share) && net_ratelimit()) + netdev_notice(dev, "XDPSQ sharing disabled\n"); + + lock->share =3D false; +} +EXPORT_SYMBOL_NS_GPL(__libeth_xdpsq_put, LIBETH_XDP); + +void __libeth_xdpsq_lock(struct libeth_xdpsq_lock *lock) +{ + spin_lock(&lock->lock); +} +EXPORT_SYMBOL_NS_GPL(__libeth_xdpsq_lock, LIBETH_XDP); + +void __libeth_xdpsq_unlock(struct libeth_xdpsq_lock *lock) +{ + spin_unlock(&lock->lock); +} +EXPORT_SYMBOL_NS_GPL(__libeth_xdpsq_unlock, LIBETH_XDP); + +/* XDPSQ clean-up timers */ + +/** + * libeth_xdpsq_init_timer - initialize an XDPSQ clean-up timer + * @timer: timer to initialize + * @xdpsq: queue this timer belongs to + * @lock: corresponding XDPSQ lock + * @poll: queue polling/completion function + * + * XDPSQ clean-up timers must be set up before using at the queue configur= ation + * time. Set the required pointers and the cleaning callback. + */ +void libeth_xdpsq_init_timer(struct libeth_xdpsq_timer *timer, void *xdpsq, + struct libeth_xdpsq_lock *lock, + void (*poll)(struct work_struct *work)) +{ + timer->xdpsq =3D xdpsq; + timer->lock =3D lock; + + INIT_DELAYED_WORK(&timer->dwork, poll); +} +EXPORT_SYMBOL_NS_GPL(libeth_xdpsq_init_timer, LIBETH_XDP); + +/* ``XDP_TX`` bulking */ + +static void __cold +libeth_xdp_tx_return_one(const struct libeth_xdp_tx_frame *frm) +{ + if (frm->len_fl & LIBETH_XDP_TX_MULTI) + libeth_xdp_return_frags(frm->data + frm->soff, true); + + libeth_xdp_return_va(frm->data, true); +} + +static void __cold +libeth_xdp_tx_return_bulk(const struct libeth_xdp_tx_frame *bq, u32 count) +{ + for (u32 i =3D 0; i < count; i++) { + const struct libeth_xdp_tx_frame *frm =3D &bq[i]; + + if (!(frm->len_fl & LIBETH_XDP_TX_FIRST)) + continue; + + libeth_xdp_tx_return_one(frm); + } +} + +static void __cold libeth_trace_xdp_exception(const struct net_device *dev, + const struct bpf_prog *prog, + u32 act) +{ + trace_xdp_exception(dev, prog, act); +} + +/** + * libeth_xdp_tx_exception - handle Tx exceptions of XDP frames + * @bq: XDP Tx frame bulk + * @sent: number of frames sent successfully (from this bulk) + * @flags: internal libeth_xdp flags (XSk, .ndo_xdp_xmit etc.) + * + * Cold helper used by __libeth_xdp_tx_flush_bulk(), do not call directly. + * Reports XDP Tx exceptions, frees the frames that won't be sent or adjust + * the Tx bulk to try again later. + */ +void __cold libeth_xdp_tx_exception(struct libeth_xdp_tx_bulk *bq, u32 sen= t, + u32 flags) +{ + const struct libeth_xdp_tx_frame *pos =3D &bq->bulk[sent]; + u32 left =3D bq->count - sent; + + if (!(flags & LIBETH_XDP_TX_NDO)) + libeth_trace_xdp_exception(bq->dev, bq->prog, XDP_TX); + + if (!(flags & LIBETH_XDP_TX_DROP)) { + memmove(bq->bulk, pos, left * sizeof(*bq->bulk)); + bq->count =3D left; + + return; + } + + if (flags & LIBETH_XDP_TX_XSK) + libeth_xsk_tx_return_bulk(pos, left); + else if (!(flags & LIBETH_XDP_TX_NDO)) + libeth_xdp_tx_return_bulk(pos, left); + else + libeth_xdp_xmit_return_bulk(pos, left, bq->dev); + + bq->count =3D 0; +} +EXPORT_SYMBOL_NS_GPL(libeth_xdp_tx_exception, LIBETH_XDP); + +/* .ndo_xdp_xmit() implementation */ + +u32 __cold libeth_xdp_xmit_return_bulk(const struct libeth_xdp_tx_frame *b= q, + u32 count, const struct net_device *dev) +{ + u32 n =3D 0; + + for (u32 i =3D 0; i < count; i++) { + const struct libeth_xdp_tx_frame *frm =3D &bq[i]; + dma_addr_t dma; + + if (frm->flags & LIBETH_XDP_TX_FIRST) + dma =3D *libeth_xdp_xmit_frame_dma(frm->xdpf); + else + dma =3D dma_unmap_addr(frm, dma); + + dma_unmap_page(dev->dev.parent, dma, dma_unmap_len(frm, len), + DMA_TO_DEVICE); + + /* Actual xdp_frames are freed by the core */ + n +=3D !!(frm->flags & LIBETH_XDP_TX_FIRST); + } + + return n; +} +EXPORT_SYMBOL_NS_GPL(libeth_xdp_xmit_return_bulk, LIBETH_XDP); + +/* Rx polling path */ + +/** + * libeth_xdp_load_stash - recreate an &xdp_buff from a libeth_xdp buffer = stash + * @dst: target &libeth_xdp_buff to initialize + * @src: source stash + * + * External helper used by libeth_xdp_init_buff(), do not call directly. + * Recreate an onstack &libeth_xdp_buff using the stash saved earlier. + * The only field untouched (rxq) is initialized later in the + * abovementioned function. + */ +void libeth_xdp_load_stash(struct libeth_xdp_buff *dst, + const struct libeth_xdp_buff_stash *src) +{ + dst->data =3D src->data; + dst->base.data_end =3D src->data + src->len; + dst->base.data_meta =3D src->data; + dst->base.data_hard_start =3D src->data - src->headroom; + + dst->base.frame_sz =3D src->frame_sz; + dst->base.flags =3D src->flags; +} +EXPORT_SYMBOL_NS_GPL(libeth_xdp_load_stash, LIBETH_XDP); + +/** + * libeth_xdp_save_stash - convert an &xdp_buff to a libeth_xdp buffer sta= sh + * @dst: target &libeth_xdp_buff_stash to initialize + * @src: source XDP buffer + * + * External helper used by libeth_xdp_save_buff(), do not call directly. + * Use the fields from the passed XDP buffer to initialize the stash on the + * queue, so that a partially received frame can be finished later during + * the next NAPI poll. + */ +void libeth_xdp_save_stash(struct libeth_xdp_buff_stash *dst, + const struct libeth_xdp_buff *src) +{ + dst->data =3D src->data; + dst->headroom =3D src->data - src->base.data_hard_start; + dst->len =3D src->base.data_end - src->data; + + dst->frame_sz =3D src->base.frame_sz; + dst->flags =3D src->base.flags; + + WARN_ON_ONCE(dst->flags !=3D src->base.flags); +} +EXPORT_SYMBOL_NS_GPL(libeth_xdp_save_stash, LIBETH_XDP); + +void __libeth_xdp_return_stash(struct libeth_xdp_buff_stash *stash) +{ + LIBETH_XDP_ONSTACK_BUFF(xdp); + + libeth_xdp_load_stash(xdp, stash); + libeth_xdp_return_buff_slow(xdp); + + stash->data =3D NULL; +} +EXPORT_SYMBOL_NS_GPL(__libeth_xdp_return_stash, LIBETH_XDP); + +/** + * libeth_xdp_return_buff_slow - free a &libeth_xdp_buff + * @xdp: buffer to free/return + * + * Slowpath version of libeth_xdp_return_buff() to be called on exceptions, + * queue clean-ups etc., without unwanted inlining. + */ +void __cold libeth_xdp_return_buff_slow(struct libeth_xdp_buff *xdp) +{ + libeth_xdp_return_buff(xdp); +} +EXPORT_SYMBOL_NS_GPL(libeth_xdp_return_buff_slow, LIBETH_XDP); + +/** + * libeth_xdp_buff_add_frag - add a frag to an XDP buffer + * @xdp: head XDP buffer + * @fqe: Rx buffer containing the frag + * @len: frag length reported by HW + * + * External helper used by libeth_xdp_process_buff(), do not call directly. + * Frees both head and frag buffers on error. + * + * Return: true success, false on error (no space for a new frag). + */ +bool libeth_xdp_buff_add_frag(struct libeth_xdp_buff *xdp, + const struct libeth_fqe *fqe, + u32 len) +{ + struct page *page =3D fqe->page; + + if (!xdp_buff_add_frag(&xdp->base, page, + fqe->offset + page->pp->p.offset, + len, fqe->truesize)) + goto recycle; + + return true; + +recycle: + libeth_rx_recycle_slow(page); + libeth_xdp_return_buff_slow(xdp); + + return false; +} +EXPORT_SYMBOL_NS_GPL(libeth_xdp_buff_add_frag, LIBETH_XDP); + +/** + * libeth_xdp_prog_exception - handle XDP prog exceptions + * @bq: XDP Tx bulk + * @xdp: buffer to process + * @act: original XDP prog verdict + * @ret: error code if redirect failed + * + * External helper used by __libeth_xdp_run_prog() and + * __libeth_xsk_run_prog_slow(), do not call directly. + * Reports invalid @act, XDP exception trace event and frees the buffer. + * + * Return: libeth_xdp XDP prog verdict. + */ +u32 __cold libeth_xdp_prog_exception(const struct libeth_xdp_tx_bulk *bq, + struct libeth_xdp_buff *xdp, + enum xdp_action act, int ret) +{ + if (act > XDP_REDIRECT) + bpf_warn_invalid_xdp_action(bq->dev, bq->prog, act); + + libeth_trace_xdp_exception(bq->dev, bq->prog, act); + + if (xdp->base.rxq->mem.type =3D=3D MEM_TYPE_XSK_BUFF_POOL) + return libeth_xsk_prog_exception(xdp, act, ret); + + libeth_xdp_return_buff_slow(xdp); + + return LIBETH_XDP_DROP; +} +EXPORT_SYMBOL_NS_GPL(libeth_xdp_prog_exception, LIBETH_XDP); + +/* Tx buffer completion */ + +static void libeth_xdp_put_page_bulk(struct page *page, + struct xdp_frame_bulk *bq) +{ + if (unlikely(bq->count =3D=3D XDP_BULK_QUEUE_SIZE)) + xdp_flush_frame_bulk(bq); + + bq->q[bq->count++] =3D page; +} + +/** + * libeth_xdp_return_buff_bulk - free &xdp_buff as part of a bulk + * @sinfo: shared info corresponding to the buffer + * @bq: XDP frame bulk to store the buffer + * @frags: whether the buffer has frags + * + * Same as xdp_return_frame_bulk(), but for &libeth_xdp_buff, speeds up Tx + * completion of ``XDP_TX`` buffers and allows to free them in same bulks + * with &xdp_frame buffers. + */ +void libeth_xdp_return_buff_bulk(const struct skb_shared_info *sinfo, + struct xdp_frame_bulk *bq, bool frags) +{ + if (!frags) + goto head; + + for (u32 i =3D 0; i < sinfo->nr_frags; i++) + libeth_xdp_put_page_bulk(skb_frag_page(&sinfo->frags[i]), bq); + +head: + libeth_xdp_put_page_bulk(virt_to_page(sinfo), bq); +} +EXPORT_SYMBOL_NS_GPL(libeth_xdp_return_buff_bulk, LIBETH_XDP); + +/* Misc */ + +/** + * libeth_xdp_queue_threshold - calculate XDP queue clean/refill threshold + * @count: number of descriptors in the queue + * + * The threshold is the limit at which RQs start to refill (when the numbe= r of + * empty buffers exceeds it) and SQs get cleaned up (when the number of fr= ee + * descriptors goes below it). To speed up hotpath processing, threshold is + * always pow-2, closest to 1/4 of the queue length. + * Don't call it on hotpath, calculate and cache the threshold during the + * queue initialization. + * + * Return: the calculated threshold. + */ +u32 libeth_xdp_queue_threshold(u32 count) +{ + u32 quarter, low, high; + + if (likely(is_power_of_2(count))) + return count >> 2; + + quarter =3D DIV_ROUND_CLOSEST(count, 4); + low =3D rounddown_pow_of_two(quarter); + high =3D roundup_pow_of_two(quarter); + + return high - quarter <=3D quarter - low ? high : low; +} +EXPORT_SYMBOL_NS_GPL(libeth_xdp_queue_threshold, LIBETH_XDP); + +/** + * __libeth_xdp_set_features - set XDP features for a netdev + * @dev: &net_device to configure + * @xmo: XDP metadata ops (Rx hints) + * @zc_segs: maximum number of S/G frags the HW can transmit + * @tmo: XSk Tx metadata ops (Tx hints) + * + * Set all the features libeth_xdp supports. Only the first argument is + * necessary; without the third one (zero), XSk support won't be advertise= d. + * Use the non-underscored versions in drivers instead. + */ +void __libeth_xdp_set_features(struct net_device *dev, + const struct xdp_metadata_ops *xmo, + u32 zc_segs, + const struct xsk_tx_metadata_ops *tmo) +{ + xdp_set_features_flag(dev, + NETDEV_XDP_ACT_BASIC | + NETDEV_XDP_ACT_REDIRECT | + NETDEV_XDP_ACT_NDO_XMIT | + (zc_segs ? NETDEV_XDP_ACT_XSK_ZEROCOPY : 0) | + NETDEV_XDP_ACT_RX_SG | + NETDEV_XDP_ACT_NDO_XMIT_SG); + dev->xdp_metadata_ops =3D xmo; + + tmo =3D tmo =3D=3D libeth_xsktmo ? &libeth_xsktmo_slow : tmo; + + dev->xdp_zc_max_segs =3D zc_segs ? : 1; + dev->xsk_tx_metadata_ops =3D zc_segs ? tmo : NULL; +} +EXPORT_SYMBOL_NS_GPL(__libeth_xdp_set_features, LIBETH_XDP); + +/** + * libeth_xdp_set_redirect - toggle the XDP redirect feature + * @dev: &net_device to configure + * @enable: whether XDP is enabled + * + * Use this when XDPSQs are not always available to dynamically enable + * and disable redirect feature. + */ +void libeth_xdp_set_redirect(struct net_device *dev, bool enable) +{ + if (enable) + xdp_features_set_redirect_target(dev, true); + else + xdp_features_clear_redirect_target(dev); +} +EXPORT_SYMBOL_NS_GPL(libeth_xdp_set_redirect, LIBETH_XDP); + +/* Module */ + +static const struct libeth_xdp_ops xdp_ops __initconst =3D { + .bulk =3D libeth_xdp_return_buff_bulk, + .xsk =3D libeth_xsk_buff_free_slow, +}; + +static int __init libeth_xdp_module_init(void) +{ + libeth_attach_xdp(&xdp_ops); + + return 0; +} +module_init(libeth_xdp_module_init); + +static void __exit libeth_xdp_module_exit(void) +{ + libeth_detach_xdp(); +} +module_exit(libeth_xdp_module_exit); + +MODULE_DESCRIPTION("Common Ethernet library - XDP infra"); +MODULE_IMPORT_NS(LIBETH); +MODULE_LICENSE("GPL"); diff --git a/drivers/net/ethernet/intel/libeth/xsk.c b/drivers/net/ethernet= /intel/libeth/xsk.c new file mode 100644 index 000000000000..eccb66a644f0 --- /dev/null +++ b/drivers/net/ethernet/intel/libeth/xsk.c @@ -0,0 +1,264 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (C) 2024 Intel Corporation */ + +#include + +#include "priv.h" + +/* ``XDP_TX`` bulking */ + +void __cold libeth_xsk_tx_return_bulk(const struct libeth_xdp_tx_frame *bq, + u32 count) +{ + for (u32 i =3D 0; i < count; i++) + libeth_xsk_buff_free_slow(bq[i].xsk); +} + +/* XSk TMO */ + +const struct xsk_tx_metadata_ops libeth_xsktmo_slow =3D { + .tmo_request_checksum =3D libeth_xsktmo_req_csum, +}; + +/* Rx polling path */ + +/** + * libeth_xsk_buff_free_slow - free an XSk Rx buffer + * @xdp: buffer to free + * + * Slowpath version of xsk_buff_free() to be used on exceptions, cleanups = etc. + * to avoid unwanted inlining. + */ +void libeth_xsk_buff_free_slow(struct libeth_xdp_buff *xdp) +{ + xsk_buff_free(&xdp->base); +} +EXPORT_SYMBOL_NS_GPL(libeth_xsk_buff_free_slow, LIBETH_XDP); + +/** + * libeth_xsk_buff_add_frag - add a frag to an XSk Rx buffer + * @head: head buffer + * @xdp: frag buffer + * + * External helper used by libeth_xsk_process_buff(), do not call directly. + * Frees both main and frag buffers on error. + * + * Return: main buffer with attached frag on success, %NULL on error (no s= pace + * for a new frag). + */ +struct libeth_xdp_buff *libeth_xsk_buff_add_frag(struct libeth_xdp_buff *h= ead, + struct libeth_xdp_buff *xdp) +{ + if (!xsk_buff_add_frag(&head->base, &xdp->base)) + goto free; + + return head; + +free: + libeth_xsk_buff_free_slow(xdp); + libeth_xsk_buff_free_slow(head); + + return NULL; +} +EXPORT_SYMBOL_NS_GPL(libeth_xsk_buff_add_frag, LIBETH_XDP); + +/** + * libeth_xsk_buff_stats_frags - update onstack RQ stats with XSk frags in= fo + * @rs: onstack stats to update + * @xdp: buffer to account + * + * External helper used by __libeth_xsk_run_pass(), do not call directly. + * Adds buffer's frags count and total len to the onstack stats. + */ +void libeth_xsk_buff_stats_frags(struct libeth_rq_napi_stats *rs, + const struct libeth_xdp_buff *xdp) +{ + libeth_xdp_buff_stats_frags(rs, xdp); +} +EXPORT_SYMBOL_NS_GPL(libeth_xsk_buff_stats_frags, LIBETH_XDP); + +/** + * __libeth_xsk_run_prog_slow - process the non-``XDP_REDIRECT`` verdicts + * @xdp: buffer to process + * @bq: Tx bulk for queueing on ``XDP_TX`` + * @act: verdict to process + * @ret: error code if ``XDP_REDIRECT`` failed + * + * External helper used by __libeth_xsk_run_prog(), do not call directly. + * ``XDP_REDIRECT`` is the most common and hottest verdict on XSk, thus + * it is processed inline. The rest goes here for out-of-line processing, + * together with redirect errors. + * + * Return: libeth_xdp XDP prog verdict. + */ +u32 __libeth_xsk_run_prog_slow(struct libeth_xdp_buff *xdp, + const struct libeth_xdp_tx_bulk *bq, + enum xdp_action act, int ret) +{ + switch (act) { + case XDP_DROP: + xsk_buff_free(&xdp->base); + + return LIBETH_XDP_DROP; + case XDP_TX: + return LIBETH_XDP_TX; + case XDP_PASS: + return LIBETH_XDP_PASS; + default: + break; + } + + return libeth_xdp_prog_exception(bq, xdp, act, ret); +} +EXPORT_SYMBOL_NS_GPL(__libeth_xsk_run_prog_slow, LIBETH_XDP); + +/** + * libeth_xsk_prog_exception - handle XDP prog exceptions on XSk + * @xdp: buffer to process + * @act: verdict returned by the prog + * @ret: error code if ``XDP_REDIRECT`` failed + * + * Internal. Frees the buffer and, if the queue uses XSk wakeups, stop the + * current NAPI poll when there are no free buffers left. + * + * Return: libeth_xdp's XDP prog verdict. + */ +u32 __cold libeth_xsk_prog_exception(struct libeth_xdp_buff *xdp, + enum xdp_action act, int ret) +{ + const struct xdp_buff_xsk *xsk; + u32 __ret =3D LIBETH_XDP_DROP; + + if (act !=3D XDP_REDIRECT) + goto drop; + + xsk =3D container_of(&xdp->base, typeof(*xsk), xdp); + if (xsk_uses_need_wakeup(xsk->pool) && ret =3D=3D -ENOBUFS) + __ret =3D LIBETH_XDP_ABORTED; + +drop: + libeth_xsk_buff_free_slow(xdp); + + return __ret; +} + +/* Refill */ + +/** + * libeth_xskfq_create - create an XSkFQ + * @fq: fill queue to initialize + * + * Allocates the FQEs and initializes the fields used by libeth_xdp: number + * of buffers to refill, refill threshold and buffer len. + * + * Return: %0 on success, -errno otherwise. + */ +int libeth_xskfq_create(struct libeth_xskfq *fq) +{ + fq->fqes =3D kvcalloc_node(fq->count, sizeof(*fq->fqes), GFP_KERNEL, + fq->nid); + if (!fq->fqes) + return -ENOMEM; + + fq->pending =3D fq->count; + fq->thresh =3D libeth_xdp_queue_threshold(fq->count); + fq->buf_len =3D xsk_pool_get_rx_frame_size(fq->pool); + + return 0; +} +EXPORT_SYMBOL_NS_GPL(libeth_xskfq_create, LIBETH_XDP); + +/** + * libeth_xskfq_destroy - destroy an XSkFQ + * @fq: fill queue to destroy + * + * Zeroes the used fields and frees the FQEs array. + */ +void libeth_xskfq_destroy(struct libeth_xskfq *fq) +{ + fq->buf_len =3D 0; + fq->thresh =3D 0; + fq->pending =3D 0; + + kvfree(fq->fqes); +} +EXPORT_SYMBOL_NS_GPL(libeth_xskfq_destroy, LIBETH_XDP); + +/* .ndo_xsk_wakeup */ + +static void libeth_xsk_napi_sched(void *info) +{ + __napi_schedule_irqoff(info); +} + +/** + * libeth_xsk_init_wakeup - initialize libeth XSk wakeup structure + * @csd: struct to initialize + * @napi: NAPI corresponding to this queue + * + * libeth_xdp uses inter-processor interrupts to perform XSk wakeups. In o= rder + * to do that, the corresponding CSDs must be initialized when creating the + * queues. + */ +void libeth_xsk_init_wakeup(call_single_data_t *csd, struct napi_struct *n= api) +{ + INIT_CSD(csd, libeth_xsk_napi_sched, napi); +} +EXPORT_SYMBOL_NS_GPL(libeth_xsk_init_wakeup, LIBETH_XDP); + +/** + * libeth_xsk_wakeup - perform an XSk wakeup + * @csd: CSD corresponding to the queue + * @qid: the stack queue index + * + * Try to mark the NAPI as missed first, so that it could be rescheduled. + * If it's not, schedule it on the corresponding CPU using IPIs (or direct= ly + * if already running on it). + */ +void libeth_xsk_wakeup(call_single_data_t *csd, u32 qid) +{ + struct napi_struct *napi =3D csd->info; + + if (napi_if_scheduled_mark_missed(napi) || + unlikely(!napi_schedule_prep(napi))) + return; + + if (qid !=3D raw_smp_processor_id()) + smp_call_function_single_async(qid, csd); + else + __napi_schedule(napi); +} +EXPORT_SYMBOL_NS_GPL(libeth_xsk_wakeup, LIBETH_XDP); + +/* Pool setup */ + +#define LIBETH_XSK_DMA_ATTR \ + (DMA_ATTR_WEAK_ORDERING | DMA_ATTR_SKIP_CPU_SYNC) + +/** + * libeth_xsk_setup_pool - setup or destroy an XSk pool for a queue + * @dev: target &net_device + * @qid: stack queue index to configure + * @enable: whether to enable or disable the pool + * + * Check that @qid is valid and then map or unmap the pool. + * + * Return: %0 on success, -errno otherwise. + */ +int libeth_xsk_setup_pool(struct net_device *dev, u32 qid, bool enable) +{ + struct xsk_buff_pool *pool; + + pool =3D xsk_get_pool_from_qid(dev, qid); + if (!pool) + return -EINVAL; + + if (enable) + return xsk_pool_dma_map(pool, dev->dev.parent, + LIBETH_XSK_DMA_ATTR); + else + xsk_pool_dma_unmap(pool, LIBETH_XSK_DMA_ATTR); + + return 0; +} +EXPORT_SYMBOL_NS_GPL(libeth_xsk_setup_pool, LIBETH_XDP); --=20 2.46.2