From nobody Tue Apr 7 18:48:07 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 197933B5303; Fri, 27 Feb 2026 10:15:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772187317; cv=none; b=U2C6ispSwrko80c2d0bedBgVet+6xYQw1Im47XG8FkOFBApHikf9Hnti49a0RO9jg6xS6TjblmV0R3OWes9Rrnkpdi389VJSwILWvM15LKcoJQO99B6MX0RtLeF5KRxCsKphhFNEtYIQ5zoM4Z+VOV2t6KmOtx3A43QoZGWVlMA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772187317; c=relaxed/simple; bh=3UW7Jz1Xt2A6mBtIcHGNS4eWWkx34OseZWRoUEd9DVI=; h=Date:From:To:Subject:Message-ID:MIME-Version:Content-Type: Content-Disposition; b=Bske42R22sYi+qOIoKGhTTD2xbhz4RbWkY++N5UZnwGlDDVAjeG8rSCb1DaKeNvZkX/MuvklB0iU29VaRlnOd+QGp1fo1kbo5MUn6IsTffJpi5KJ14lRx2P2Zl8/3WNzwlPwUeKrkozAioD7r20Uha1mXQIc45TP3hwuY1OxtSc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=mZne39Rd; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="mZne39Rd" Received: by linux.microsoft.com (Postfix, from userid 1204) id 9D21E20B6F02; Fri, 27 Feb 2026 02:15:12 -0800 (PST) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 9D21E20B6F02 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1772187312; bh=B3VhwANXn0HbjTZ3Ka/owHsOSi1CtP7s8d7Hd9iql9c=; h=Date:From:To:Subject:From; b=mZne39Rdx4YTT6MFcr+Z9pyxFN9DUOgeiOiDFcLJVohsouSxdD7/WDqexFgJMwG3O cl3csHpIwa8LlKUvy2v7ujBCdrSzQA/HlDAhY34/2Q3qr1jPEgGj3dVU8PNXl1uPsk cha2hNB28cjIMPsMcuXRGXEaxMprTUlcRt71DJws= Date: Fri, 27 Feb 2026 02:15:12 -0800 From: Dipayaan Roy To: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, leon@kernel.org, longli@microsoft.com, kotaranov@microsoft.com, horms@kernel.org, shradhagupta@linux.microsoft.com, ssengar@linux.microsoft.com, ernis@linux.microsoft.com, shirazsaleem@microsoft.com, linux-hyperv@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, dipayanroy@microsoft.com Subject: [PATCH net-next] net: mana: Force full-page RX buffers for 4K page size on specific systems. Message-ID: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" On certain systems configured with 4K PAGE_SIZE, utilizing page_pool fragments for RX buffers results in a significant throughput regression. Profiling reveals that this regression correlates with high overhead in the fragment allocation and reference counting paths on these specific platforms, rendering the multi-buffer-per-page strategy counterproductive. To mitigate this, bypass the page_pool fragment path and force a single RX packet per page allocation when all the following conditions are met: 1. The system is configured with a 4K PAGE_SIZE. 2. A processor-specific quirk is detected via SMBIOS Type 4 data. This approach restores expected line-rate performance by ensuring predictable RX refill behavior on affected hardware. There is no behavioral change for systems using larger page sizes (16K/64K), or platforms where this processor-specific quirk do not apply. Signed-off-by: Dipayaan Roy Reviewed-by: Simon Horman --- .../net/ethernet/microsoft/mana/gdma_main.c | 120 ++++++++++++++++++ drivers/net/ethernet/microsoft/mana/mana_en.c | 23 +++- include/net/mana/gdma.h | 10 ++ 3 files changed, 151 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/= ethernet/microsoft/mana/gdma_main.c index 0055c231acf6..26bbe736a770 100644 --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c @@ -9,6 +9,7 @@ #include #include #include +#include =20 #include #include @@ -1955,6 +1956,115 @@ static bool mana_is_pf(unsigned short dev_id) return dev_id =3D=3D MANA_PF_DEVICE_ID; } =20 +/* + * Table for Processor Version strings found from SMBIOS Type 4 informatio= n, + * for processors that needs to force single RX buffer per page quirk for + * meeting line rate performance with ARM64 + 4K pages. + * Note: These strings are exactly matched with version fetched from SMBIO= S. + */ +static const char * const mana_single_rxbuf_per_page_quirk_tbl[] =3D { + "Cobalt 200", +}; + +static const char *smbios_get_string(const struct dmi_header *hdr, u8 idx) +{ + const u8 *start, *end; + u8 i; + + /* Indexing starts from 1. */ + if (!idx) + return NULL; + + start =3D (const u8 *)hdr + hdr->length; + end =3D start + SMBIOS_STR_AREA_MAX; + + for (i =3D 1; i < idx; i++) { + while (start < end && *start) + start++; + if (start < end) + start++; + if (start + 1 < end && start[0] =3D=3D 0 && start[1] =3D=3D 0) + return NULL; + } + + if (start >=3D end || *start =3D=3D 0) + return NULL; + + return (const char *)start; +} + +/* On some systems with 4K PAGE_SIZE, page_pool RX fragments can + * trigger a throughput regression. Hence identify those processors + * from the extracted SMBIOS table and apply the quirk to forces one + * RX buffer per page to avoid the fragment allocation/refcounting + * overhead in the RX refill path for those processors only. + */ +static bool mana_needs_single_rxbuf_per_page(struct gdma_context *gc) +{ + int i =3D 0; + const char *ver =3D gc->processor_version; + + if (!ver) + return false; + + if (PAGE_SIZE !=3D SZ_4K) + return false; + + while (i < ARRAY_SIZE(mana_single_rxbuf_per_page_quirk_tbl)) { + if (!strcmp(ver, mana_single_rxbuf_per_page_quirk_tbl[i])) + return true; + i++; + } + + return false; +} + +static void mana_get_proc_ver_from_smbios(const struct dmi_header *hdr, + void *data) +{ + struct gdma_context *gc =3D data; + const char *ver_str; + u8 idx; + + /* We are only looking for Type 4: Processor Information */ + if (hdr->type !=3D SMBIOS_TYPE_4_PROCESSOR_INFO) + return; + + /* Ensure the record is long enough to contain the Processor Version + * field + */ + if (hdr->length <=3D SMBIOS_TYPE4_PROC_VERSION_OFFSET) + return; + + /* The 'Processor Version' string is located at index pointed by + * SMBIOS_TYPE4_PROC_VERSION_OFFSET. If found make a copy of it. + * There could be multiple Type 4 tables so read and copy the + * processor version found the first time. + */ + idx =3D ((const u8 *)hdr)[SMBIOS_TYPE4_PROC_VERSION_OFFSET]; + ver_str =3D smbios_get_string(hdr, idx); + if (ver_str && !gc->processor_version) + gc->processor_version =3D kstrdup(ver_str, GFP_KERNEL); +} + +/* Check and initialize all processor optimizations/quirks here */ +static bool mana_init_processor_optimization(struct gdma_context *gc) +{ + bool opt_initialized =3D false; + + gc->processor_version =3D NULL; + dmi_walk(mana_get_proc_ver_from_smbios, gc); + if (!gc->processor_version) + return false; + + if (mana_needs_single_rxbuf_per_page(gc)) { + gc->force_full_page_rx_buffer =3D true; + opt_initialized =3D true; + } + + return opt_initialized; +} + static int mana_gd_probe(struct pci_dev *pdev, const struct pci_device_id = *ent) { struct gdma_context *gc; @@ -2009,6 +2119,11 @@ static int mana_gd_probe(struct pci_dev *pdev, const= struct pci_device_id *ent) gc->mana_pci_debugfs =3D debugfs_create_dir(pci_slot_name(pdev->slot), mana_debugfs_root); =20 + if (mana_init_processor_optimization(gc)) + dev_info(&pdev->dev, + "Processor specific optimization initialized on: %s\n", + gc->processor_version); + err =3D mana_gd_setup(pdev); if (err) goto unmap_bar; @@ -2051,6 +2166,8 @@ static int mana_gd_probe(struct pci_dev *pdev, const = struct pci_device_id *ent) pci_iounmap(pdev, bar0_va); free_gc: pci_set_drvdata(pdev, NULL); + kfree(gc->processor_version); + gc->processor_version =3D NULL; vfree(gc); release_region: pci_release_regions(pdev); @@ -2106,6 +2223,9 @@ static void mana_gd_remove(struct pci_dev *pdev) =20 pci_iounmap(pdev, gc->bar0_va); =20 + kfree(gc->processor_version); + gc->processor_version =3D NULL; + vfree(gc); =20 pci_release_regions(pdev); diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/et= hernet/microsoft/mana/mana_en.c index 91c418097284..a53a8921050b 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_en.c +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c @@ -748,6 +748,26 @@ static void *mana_get_rxbuf_pre(struct mana_rxq *rxq, = dma_addr_t *da) return va; } =20 +static inline bool +mana_use_single_rxbuf_per_page(struct mana_port_context *apc, u32 mtu) +{ + struct gdma_context *gc =3D apc->ac->gdma_dev->gdma_context; + + /* On some systems with 4K PAGE_SIZE, page_pool RX fragments can + * trigger a throughput regression. Hence forces one RX buffer per page + * to avoid the fragment allocation/refcounting overhead in the RX + * refill path for those processors only. + */ + if (gc->force_full_page_rx_buffer) + return true; + + /* For xdp and jumbo frames make sure only one packet fits per page. */ + if (mtu + MANA_RXBUF_PAD > PAGE_SIZE / 2 || mana_xdp_get(apc)) + return true; + + return false; +} + /* Get RX buffer's data size, alloc size, XDP headroom based on MTU */ static void mana_get_rxbuf_cfg(struct mana_port_context *apc, int mtu, u32 *datasize, u32 *alloc_size, @@ -758,8 +778,7 @@ static void mana_get_rxbuf_cfg(struct mana_port_context= *apc, /* Calculate datasize first (consistent across all cases) */ *datasize =3D mtu + ETH_HLEN; =20 - /* For xdp and jumbo frames make sure only one packet fits per page */ - if (mtu + MANA_RXBUF_PAD > PAGE_SIZE / 2 || mana_xdp_get(apc)) { + if (mana_use_single_rxbuf_per_page(apc, mtu)) { if (mana_xdp_get(apc)) { *headroom =3D XDP_PACKET_HEADROOM; *alloc_size =3D PAGE_SIZE; diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h index a59bd4035a99..0ef2d6ac5203 100644 --- a/include/net/mana/gdma.h +++ b/include/net/mana/gdma.h @@ -9,6 +9,14 @@ =20 #include "shm_channel.h" =20 +#define SMBIOS_STR_AREA_MAX 4096 + +/* SMBIOS Type 4: Processor Information table */ +#define SMBIOS_TYPE_4_PROCESSOR_INFO 4 + +/* Byte offset containing the Processor Version string number.*/ +#define SMBIOS_TYPE4_PROC_VERSION_OFFSET 0x10 + #define GDMA_STATUS_MORE_ENTRIES 0x00000105 #define GDMA_STATUS_CMD_UNSUPPORTED 0xffffffff =20 @@ -436,6 +444,8 @@ struct gdma_context { struct workqueue_struct *service_wq; =20 unsigned long flags; + u8 *processor_version; + bool force_full_page_rx_buffer; }; =20 static inline bool mana_gd_is_mana(struct gdma_dev *gd) --=20 2.43.0