From nobody Thu Apr 9 17:23:35 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 71ACC298CC7; Fri, 6 Mar 2026 13:33:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772803981; cv=none; b=JaFjhsLEibejh4KnIHLCPTpYM8nd2VLxhmbp6oCOHOw5kM4JO4E3pu6b9TW5Vc5s2nTXyZOvQ7VFsjEM1Zm12mNAUcX3Y4GISXm3qt3mJz6jtMjcNCVYPLMeJQOtTsrNieLtQRJWMGWls7K0NVqQ4dqTQ4sQZFpfISopI5KN5z0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772803981; c=relaxed/simple; bh=xDHk9gyIMzxUIan8Ors9/osNjl4oyoyWkhyDW6Ig6gQ=; h=Date:From:To:Subject:Message-ID:MIME-Version:Content-Type: Content-Disposition; b=lKEjs2wDHebpNpL6V5SKr3yZ/YNcQ6DqxlyoMNCEBUkP2Rf3+ZA315QCdNl7Csp6Lw625LNWZYFakiyIZ2Ke3YOU+lV5c2+N+Q6Pz3KDlPxzu4zlgn4bQQeJFbZ1slugiq4qQaFZBXpzizHuqFJKL+y6H/GMilTMzHsX1Y8TWJ8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=qun5vPRK; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="qun5vPRK" Received: by linux.microsoft.com (Postfix, from userid 1204) id 333AC20B6F02; Fri, 6 Mar 2026 05:33:00 -0800 (PST) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 333AC20B6F02 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1772803980; bh=0/RwB8HxbocbJWW5WYvYRd83Bki+l/g5eLecLNI+0O4=; h=Date:From:To:Subject:From; b=qun5vPRKpQdaL6UaojjiKAPQivVUffvTP78miiOe4MQQeMPqGTkhQIub3Q8c/WJuO M1cab4wH8A4ZuGc+/r3VPzCdGGZ/u2+BkmX66Fn3yInbJDGuyi/9P1jDCmvaj2fSg1 6GedbsCBvH+0+oajsR9kjA76JeWPsvHelR5FBJjo= Date: Fri, 6 Mar 2026 05:33:00 -0800 From: Dipayaan Roy To: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, leon@kernel.org, longli@microsoft.com, kotaranov@microsoft.com, horms@kernel.org, shradhagupta@linux.microsoft.com, ssengar@linux.microsoft.com, ernis@linux.microsoft.com, shirazsaleem@microsoft.com, linux-hyperv@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, dipayanroy@microsoft.com Subject: [PATCH net-next, v2] net: mana: Force full-page RX buffers for 4K page size on specific systems. Message-ID: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" On certain systems configured with 4K PAGE_SIZE, utilizing page_pool fragments for RX buffers results in a significant throughput regression. Profiling reveals that this regression correlates with high overhead in the fragment allocation and reference counting paths on these specific platforms, rendering the multi-buffer-per-page strategy counterproductive. To mitigate this, bypass the page_pool fragment path and force a single RX packet per page allocation when all the following conditions are met: 1. The system is configured with a 4K PAGE_SIZE. 2. A processor-specific quirk is detected via SMBIOS Type 4 data. This approach restores expected line-rate performance by ensuring predictable RX refill behavior on affected hardware. There is no behavioral change for systems using larger page sizes (16K/64K), or platforms where this processor-specific quirk do not apply. Signed-off-by: Dipayaan Roy --- Changes in v2: - separate reading string index and the string, remove inline. --- --- .../net/ethernet/microsoft/mana/gdma_main.c | 133 ++++++++++++++++++ drivers/net/ethernet/microsoft/mana/mana_en.c | 23 ++- include/net/mana/gdma.h | 9 ++ 3 files changed, 163 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/= ethernet/microsoft/mana/gdma_main.c index aef8612b73cb..05fecc00a90c 100644 --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c @@ -9,6 +9,7 @@ #include #include #include +#include =20 #include #include @@ -1959,6 +1960,128 @@ static bool mana_is_pf(unsigned short dev_id) return dev_id =3D=3D MANA_PF_DEVICE_ID; } =20 +/* + * Table for Processor Version strings found from SMBIOS Type 4 informatio= n, + * for processors that needs to force single RX buffer per page quirk for + * meeting line rate performance with ARM64 + 4K pages. + * Note: These strings are exactly matched with version fetched from SMBIO= S. + */ +static const char * const mana_single_rxbuf_per_page_quirk_tbl[] =3D { + "Cobalt 200", +}; + +/* On some systems with 4K PAGE_SIZE, page_pool RX fragments can + * trigger a throughput regression. Hence identify those processors + * from the extracted SMBIOS table and apply the quirk to forces one + * RX buffer per page to avoid the fragment allocation/refcounting + * overhead in the RX refill path for those processors only. + */ +static bool mana_needs_single_rxbuf_per_page(struct gdma_context *gc) +{ + int i =3D 0; + const char *ver =3D gc->processor_version; + + if (!ver) + return false; + + if (PAGE_SIZE !=3D SZ_4K) + return false; + + while (i < ARRAY_SIZE(mana_single_rxbuf_per_page_quirk_tbl)) { + if (!strcmp(ver, mana_single_rxbuf_per_page_quirk_tbl[i])) + return true; + i++; + } + + return false; +} + +static void mana_get_proc_ver_strno(const struct dmi_header *hdr, void *da= ta) +{ + struct gdma_context *gc =3D data; + const u8 *d =3D (const u8 *)hdr; + + /* We are only looking for Type 4: Processor Information */ + if (hdr->type !=3D SMBIOS_TYPE_4_PROCESSOR_INFO) + return; + + /* Ensure the record is long enough to contain the Processor Version + * field + */ + if (hdr->length <=3D SMBIOS_TYPE4_PROC_VERSION_OFFSET) + return; + + /* The 'Processor Version' string is located at index pointed by + * SMBIOS_TYPE4_PROC_VERSION_OFFSET. Make a copy of the index. + * There could be multiple Type 4 tables so read and store the + * processor version index found the first time. + */ + if (gc->proc_ver_strno) + return; + + gc->proc_ver_strno =3D d[SMBIOS_TYPE4_PROC_VERSION_OFFSET]; +} + +static const char *mana_dmi_string_nosave(const struct dmi_header *hdr, u8= s) +{ + const char *bp =3D (const char *)hdr + hdr->length; + + if (!s) + return NULL; + + /* String numbers start at 1 */ + while (--s > 0 && *bp) + bp +=3D strlen(bp) + 1; + + if (!*bp) + return NULL; + + return bp; +} + +static void mana_fetch_proc_ver_string(const struct dmi_header *hdr, + void *data) +{ + struct gdma_context *gc =3D data; + const char *ver; + + /* We are only looking for Type 4: Processor Information */ + if (hdr->type !=3D SMBIOS_TYPE_4_PROCESSOR_INFO) + return; + + /* Extract proc version found the first time only */ + if (!gc->proc_ver_strno || gc->processor_version) + return; + + ver =3D mana_dmi_string_nosave(hdr, gc->proc_ver_strno); + if (ver) + gc->processor_version =3D kstrdup(ver, GFP_KERNEL); +} + +/* Check and initialize all processor optimizations/quirks here */ +static bool mana_init_processor_optimization(struct gdma_context *gc) +{ + bool opt_initialized =3D false; + + gc->proc_ver_strno =3D 0; + gc->processor_version =3D NULL; + + dmi_walk(mana_get_proc_ver_strno, gc); + if (!gc->proc_ver_strno) + return false; + + dmi_walk(mana_fetch_proc_ver_string, gc); + if (!gc->processor_version) + return false; + + if (mana_needs_single_rxbuf_per_page(gc)) { + gc->force_full_page_rx_buffer =3D true; + opt_initialized =3D true; + } + + return opt_initialized; +} + static int mana_gd_probe(struct pci_dev *pdev, const struct pci_device_id = *ent) { struct gdma_context *gc; @@ -2013,6 +2136,11 @@ static int mana_gd_probe(struct pci_dev *pdev, const= struct pci_device_id *ent) gc->mana_pci_debugfs =3D debugfs_create_dir(pci_slot_name(pdev->slot), mana_debugfs_root); =20 + if (mana_init_processor_optimization(gc)) + dev_info(&pdev->dev, + "Processor specific optimization initialized on: %s\n", + gc->processor_version); + err =3D mana_gd_setup(pdev); if (err) goto unmap_bar; @@ -2055,6 +2183,8 @@ static int mana_gd_probe(struct pci_dev *pdev, const = struct pci_device_id *ent) pci_iounmap(pdev, bar0_va); free_gc: pci_set_drvdata(pdev, NULL); + kfree(gc->processor_version); + gc->processor_version =3D NULL; vfree(gc); release_region: pci_release_regions(pdev); @@ -2110,6 +2240,9 @@ static void mana_gd_remove(struct pci_dev *pdev) =20 pci_iounmap(pdev, gc->bar0_va); =20 + kfree(gc->processor_version); + gc->processor_version =3D NULL; + vfree(gc); =20 pci_release_regions(pdev); diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/et= hernet/microsoft/mana/mana_en.c index a868c28c8280..38f94f7619ad 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_en.c +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c @@ -744,6 +744,26 @@ static void *mana_get_rxbuf_pre(struct mana_rxq *rxq, = dma_addr_t *da) return va; } =20 +static bool +mana_use_single_rxbuf_per_page(struct mana_port_context *apc, u32 mtu) +{ + struct gdma_context *gc =3D apc->ac->gdma_dev->gdma_context; + + /* On some systems with 4K PAGE_SIZE, page_pool RX fragments can + * trigger a throughput regression. Hence forces one RX buffer per page + * to avoid the fragment allocation/refcounting overhead in the RX + * refill path for those processors only. + */ + if (gc->force_full_page_rx_buffer) + return true; + + /* For xdp and jumbo frames make sure only one packet fits per page. */ + if (mtu + MANA_RXBUF_PAD > PAGE_SIZE / 2 || mana_xdp_get(apc)) + return true; + + return false; +} + /* Get RX buffer's data size, alloc size, XDP headroom based on MTU */ static void mana_get_rxbuf_cfg(struct mana_port_context *apc, int mtu, u32 *datasize, u32 *alloc_size, @@ -754,8 +774,7 @@ static void mana_get_rxbuf_cfg(struct mana_port_context= *apc, /* Calculate datasize first (consistent across all cases) */ *datasize =3D mtu + ETH_HLEN; =20 - /* For xdp and jumbo frames make sure only one packet fits per page */ - if (mtu + MANA_RXBUF_PAD > PAGE_SIZE / 2 || mana_xdp_get(apc)) { + if (mana_use_single_rxbuf_per_page(apc, mtu)) { if (mana_xdp_get(apc)) { *headroom =3D XDP_PACKET_HEADROOM; *alloc_size =3D PAGE_SIZE; diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h index ec17004b10c0..be56b347f3f6 100644 --- a/include/net/mana/gdma.h +++ b/include/net/mana/gdma.h @@ -9,6 +9,12 @@ =20 #include "shm_channel.h" =20 +/* SMBIOS Type 4: Processor Information table */ +#define SMBIOS_TYPE_4_PROCESSOR_INFO 4 + +/* Byte offset containing the Processor Version string number.*/ +#define SMBIOS_TYPE4_PROC_VERSION_OFFSET 0x10 + #define GDMA_STATUS_MORE_ENTRIES 0x00000105 #define GDMA_STATUS_CMD_UNSUPPORTED 0xffffffff =20 @@ -444,6 +450,9 @@ struct gdma_context { struct workqueue_struct *service_wq; =20 unsigned long flags; + u8 *processor_version; + u8 proc_ver_strno; + bool force_full_page_rx_buffer; }; =20 static inline bool mana_gd_is_mana(struct gdma_dev *gd) --=20 2.34.1