From nobody Fri Jun 12 12:47:03 2026 Received: from acj35aaf85.lhr1.oracleemaildelivery.com (acj35aaf85.lhr1.oracleemaildelivery.com [130.35.116.85]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0C2F63D0BF3 for ; Thu, 14 May 2026 21:55:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=130.35.116.85 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778795704; cv=none; b=ews2mix9oX7VFAjxbZvezYTvuv59Klp+Dan8fkNJErRsxprZZ2W/N6RbT8bnJzw//UFu3MLz0JJWtn2zG00ju6QZ/Dk+bHMHGcr6Z6Dab79/vCj2qF3TQzQHEb26dBpcyShdjfhfhpYTnqynpsaN1nHDup+4RDmz13hKmf+lw0o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778795704; c=relaxed/simple; bh=X5Fn3ohZpoJSJjP5QpCzv40uCoD93rfUnEoLX3OoLMk=; h=From:To:Cc:Subject:Date:Message-id:In-reply-to:References: MIME-version; b=aEp1POJSn0yRSQ5W6DUt++UeH+O8aojNPG76Bnh3pzJkKV6ReHEj07724ob5kfH3bIqYwrLqQZt6Zahsr3swSDt4GKOpWmq0qBPoyUrXpGlYyg8mMpUeRn19DTU2IYvMxz2yQST/8mnAcMAEp0KGgx8YkceP2OlD7QZGJebQWh0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=raczylo.com; spf=pass smtp.mailfrom=lhr1.rp.oracleemaildelivery.com; dkim=pass (2048-bit key) header.d=raczylo.com header.i=@raczylo.com header.b=FAhNf7As; dkim=pass (2048-bit key) header.d=lhr1.rp.oracleemaildelivery.com header.i=@lhr1.rp.oracleemaildelivery.com header.b=LYBvD2cU; arc=none smtp.client-ip=130.35.116.85 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=raczylo.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=lhr1.rp.oracleemaildelivery.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=raczylo.com header.i=@raczylo.com header.b="FAhNf7As"; dkim=pass (2048-bit key) header.d=lhr1.rp.oracleemaildelivery.com header.i=@lhr1.rp.oracleemaildelivery.com header.b="LYBvD2cU" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; s=oracle-uk-012026; d=raczylo.com; h=Date:To:From:Subject:Message-Id:MIME-Version:Sender:List-Unsubscribe:List-Unsubscribe-Post; bh=bJPU9678+y1cp2+iUvq03ogPthNxCP++IVi7e5ZAngg=; b=FAhNf7AsEpRLxEqzs17NM43ou6ZFgcGoq/5nh3DFfZ3U2w9R92X4UbPZZAm8wTcsKyRgty1X4Vid BiWYr/PJMRT20CGoyZnk444y6sUe2LgI1PVgE/AwUSbftEmjYP7//LR9DceaCjx3SUh/7GJ0eSA5 7lVA3bnIK0xUeWPumpT3sljvk6/0qV7Mt1Vo8TGJ1ENi0Ehe+b5tIUuNiOZHI+U341Xm2XDQQliV BN6KidUduCJnTDATOCUgMH5K6o6cdWDcWjivJQnuSJ1D32X/4pBJGeeswQQ4/0a2untt+MkBMNzb Mbt0+/atYK6h03UVdq9vliZFm6oZWo8Pa4o1yw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; s=prod-lhr-20191104; d=lhr1.rp.oracleemaildelivery.com; h=Date:To:From:Subject:Message-Id:MIME-Version:Sender:List-Unsubscribe:List-Unsubscribe-Post; bh=bJPU9678+y1cp2+iUvq03ogPthNxCP++IVi7e5ZAngg=; b=LYBvD2cUU08weNhGcTkSsIalu60bz7znFBJNsqL2NbJXw+DepCGshZefbg13tf5YEPv8Z0WBUujL 1cr+EfjIjZ6vUxM82VEGHY+vWpoAJh/k3nJz2lQp/0LuOjm7URfAJ/YxlNn2wgVALbg6i/dFNOoy 7J/R5jKY77V/iSpzdV/dC2G6OeVQDmLmxxadMwxpX//btCmadEDlX2xsvek+lt5MiYIluBBUUGjk qC+JJ4PcQtyzibMvV+UVKJELajz9f4LTadtY8zjuCk3FS+r4JZbD9oaOzzz1FURxDCDkwHCZoSXy fboI1PUX3x6f95Zp1BCfTI+LRJaym6y9DB1pAw== Received: by omta-ad2-fd2-1402-uk-london-1.omtaad2.vcndplhr.oraclevcn.com (Oracle Communications Messaging Server 8.1.0.1.20260212 64bit (built Feb 12 2026)) with ESMTPS id <0TF10888CRJO6EE0@omta-ad2-fd2-1402-uk-london-1.omtaad2.vcndplhr.oraclevcn.com> for linux-kernel@vger.kernel.org; Thu, 14 May 2026 21:55:00 +0000 (GMT) List-Unsubscribe-Post: List-Unsubscribe=One-Click From: Lukasz Raczylo To: netdev@vger.kernel.org Cc: Theo Lebrun , Andrea della Porta , Nicolas Ferre , Claudiu Beznea , Andrew Lunn , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-rpi-kernel@lists.infradead.org Subject: [PATCH net-next v2 1/3] net: macb: flush PCIe posted write after TSTART doorbell (PCIe-only) Date: Thu, 14 May 2026 22:54:57 +0100 Message-id: <20260514215459.36109-2-lukasz@raczylo.com> X-Mailer: git-send-email 2.54.0 In-reply-to: <20260514215459.36109-1-lukasz@raczylo.com> References: <20260514215459.36109-1-lukasz@raczylo.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-version: 1.0 Content-transfer-encoding: quoted-printable Reporting-Meta: AAHf+Z0YX8zBMsksb6gSGL9ZV3VCbo9am2PhbSO1iipMaVnbSZPP4U7YWpbXzRHK g669lhZ1JQctwueluqlxYAuNG1sS6Qw9c23Nylz9ZwRKP9DN1xZQVrBXs/PRmKX+ 57Z7+3KVfPvBEMqMElvOdde04oGSQqxDaKC4ztkUEAkKkbwynlh3GDzNrGVMNpSD FW969w2GE7LNbzbXzrC6Kt3OFZB9eoVCgyJBjTXXp7zVJO5P9AfY60xnjCz+dW4n nnfxxY7Cwr0jtbyDlpVOGzwhDNe4LrnKVQSGw45Wwq8TjUoY+sn5PbJMfGBP61zR u06JCLNJx0ldWK1V/bnySSzNJm67gUxyoq9gGnoLzMl9hCpzIT2ZTll4+tphfxWi xAa6px+mXfFVR68FBaZ2Fbt3kzpcdzeNA6LrqPunMkKkBLYLTOdmvi43BPn5yUNu ZQ== Content-Type: text/plain; charset="utf-8" macb_start_xmit() and macb_tx_restart() kick transmission by OR-ing MACB_BIT(TSTART) into NCR. On PCIe-attached macb instances (BCM2712 + RP1 PCIe south bridge on Raspberry Pi 5 is the case I have in front of me), writes to NCR are posted PCIe writes: they are not guaranteed to reach the device before the issuing CPU returns. If the TSTART doorbell does not reach the MAC, no TX begins, no TCOMP completion arrives, and the ring remains quiescent without any kernel-visible indication. Add a read-back of NCR after each TSTART write. The read is an architected PCIe read barrier for earlier posted writes on the same path; it ensures the doorbell has reached the MAC before the function returns. As a side effect on macb_start_xmit() it also flushes the preceding macb_tx_lpi_wake() NCR write -- not just TSTART -- since the barrier applies to all prior posted writes by the same requester. The cost is one non-posted PCIe read per TSTART. To avoid imposing this on SoC-integrated macb variants (Atmel, Microchip, SiFive, Xilinx), where NCR is on-chip MMIO and no fabric posted-write concern exists, gate the readback behind a new MACB_CAPS_PCIE_POSTED_WRITES capability set only on raspberrypi_rp1_config. Note that the raspberrypi/linux vendor fork carries a local patch around the TSTART site (a queue->tx_pending breadcrumb that is promoted to queue->txubr_pending by the next TCOMP interrupt, triggering macb_tx_restart()). That workaround makes the loss recoverable under traffic, but it cannot help if TCOMP itself is not raised because no TX started -- which is exactly the case targeted here. The handshake is not present in mainline. Link: https://github.com/cilium/cilium/issues/43198 Link: https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877 Signed-off-by: Lukasz Raczylo --- drivers/net/ethernet/cadence/macb.h | 4 ++++ drivers/net/ethernet/cadence/macb_main.c | 15 +++++++++++++++ 2 files changed, 19 insertions(+) diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cad= ence/macb.h index 2de56017e..ce9037f9e 100644 --- a/drivers/net/ethernet/cadence/macb.h +++ b/drivers/net/ethernet/cadence/macb.h @@ -791,6 +791,10 @@ #define MACB_CAPS_USRIO_HAS_MII BIT(26) #define MACB_CAPS_USRIO_HAS_REFCLK_SOURCE BIT(27) #define MACB_CAPS_USRIO_HAS_TSUCLK_SOURCE BIT(28) +/* Register writes are posted on the parent fabric and need a non-posted + * read-back to guarantee delivery. Currently set only on RP1. + */ +#define MACB_CAPS_PCIE_POSTED_WRITES BIT(29) =20 /* LSO settings */ #define MACB_LSO_UFO_ENABLE 0x01 diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/etherne= t/cadence/macb_main.c index a12aa2124..6879f3458 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -1922,6 +1922,14 @@ static void macb_tx_restart(struct macb_queue *queue) =20 spin_lock(&bp->lock); macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART)); + /* + * On PCIe-attached parts, flush the posted-write queue so the + * TSTART doorbell reliably reaches the MAC. Without this the + * write can sit in the fabric and the MAC never advances, + * causing a silent TX stall. + */ + if (bp->caps & MACB_CAPS_PCIE_POSTED_WRITES) + (void)macb_readl(bp, NCR); spin_unlock(&bp->lock); =20 out_tx_ptr_unlock: @@ -2560,6 +2568,12 @@ static netdev_tx_t macb_start_xmit(struct sk_buff *s= kb, struct net_device *dev) spin_lock(&bp->lock); macb_tx_lpi_wake(bp); macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART)); + /* + * Flush PCIe posted-write queue; see comment in macb_tx_restart(). + * Also flushes the preceding macb_tx_lpi_wake() NCR write. + */ + if (bp->caps & MACB_CAPS_PCIE_POSTED_WRITES) + (void)macb_readl(bp, NCR); spin_unlock(&bp->lock); =20 if (CIRC_SPACE(queue->tx_head, queue->tx_tail, bp->tx_ring_size) < 1) @@ -5674,6 +5688,7 @@ static const struct macb_config raspberrypi_rp1_confi= g =3D { .caps =3D MACB_CAPS_GIGABIT_MODE_AVAILABLE | MACB_CAPS_CLK_HW_CHG | MACB_CAPS_JUMBO | MACB_CAPS_GEM_HAS_PTP | + MACB_CAPS_PCIE_POSTED_WRITES | MACB_CAPS_EEE | MACB_CAPS_USRIO_HAS_MII, .dma_burst_length =3D 16, --=20 2.54.0 From nobody Fri Jun 12 12:47:03 2026 Received: from acj35aaf126.lhr1.oracleemaildelivery.com (acj35aaf126.lhr1.oracleemaildelivery.com [130.35.116.126]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3BDF53D0C03 for ; Thu, 14 May 2026 21:55:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=130.35.116.126 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778795703; cv=none; b=uHmtWHODtdZPnHakmW+20WuTtXxnKY4DwEkkZp6MPujRBEQV41cfgUs8qC5HwGdO8XjFbAZYT4+f2AutT/IfmKKztBo66EYdLtOWQEgglU4kmTuT9VCk6a6xUjnWsPWlw9zLRbfNSUmur/7rW/+bZKhD5uOdZOhdwILCTYs+8vo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778795703; c=relaxed/simple; bh=SLygEdd4cXqK3o3IGUginuunzKS23d4mg6XXR09qFf4=; h=From:To:Cc:Subject:Date:Message-id:In-reply-to:References: MIME-version; b=Z7hUN8FkLdurjXf5LzDRP03KiczA3+uO7D7olEJb2K8+KpIo6FSLAmU0m/dLHYLUmYfs0xPQeZvQNGHrz92U1nCqPWuuYeDCv6PW5tSL+JU4IZjeH4Rp0lkfIEI37Q2QEnQer7L0KIyevIlJt+CXaPN4sHvddhy1S1LipejwG/k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=raczylo.com; spf=pass smtp.mailfrom=lhr1.rp.oracleemaildelivery.com; dkim=pass (2048-bit key) header.d=raczylo.com header.i=@raczylo.com header.b=ctNnBW5U; dkim=pass (2048-bit key) header.d=lhr1.rp.oracleemaildelivery.com header.i=@lhr1.rp.oracleemaildelivery.com header.b=dcZYpfge; arc=none smtp.client-ip=130.35.116.126 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=raczylo.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=lhr1.rp.oracleemaildelivery.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=raczylo.com header.i=@raczylo.com header.b="ctNnBW5U"; dkim=pass (2048-bit key) header.d=lhr1.rp.oracleemaildelivery.com header.i=@lhr1.rp.oracleemaildelivery.com header.b="dcZYpfge" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; s=oracle-uk-012026; d=raczylo.com; h=Date:To:From:Subject:Message-Id:MIME-Version:Sender:List-Unsubscribe:List-Unsubscribe-Post; bh=FZvXfQHwc0uytzMRSej+dYkhKiNEOLiffrYDTYx5G6s=; b=ctNnBW5UnS0BZs1lH4oNoiYgrEo3M8aYwdp6C5MWqymuRrn80M3+e5vfwMiBztXPZGE7QoGCCfYt tRZl+dNlchQBEZUf2sXEHeqnu3a5gvXtuswZdW2UESlNrPv0q6I9fCxfbpAynOrWavm1If/woc47 GWYwMj9IwufDL/eY3Uo6CP5cCXMKdic7QluRxNNrTPq7Zac4rjWICuvhBEfJcpiSTj5NGmc8LBri vFfCg71xMvOOXUHtx319LkpDFv9ye1QXIGKLOTsvCxU14XzFx5rBHQoph7r8lVzQ2LocaYaGdAi2 +o4pXK+14QqwHx/fstjC+ntFp32VFzxmG2pdcQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; s=prod-lhr-20191104; d=lhr1.rp.oracleemaildelivery.com; h=Date:To:From:Subject:Message-Id:MIME-Version:Sender:List-Unsubscribe:List-Unsubscribe-Post; bh=FZvXfQHwc0uytzMRSej+dYkhKiNEOLiffrYDTYx5G6s=; b=dcZYpfge9YImJRnPGi8PeRTXdpl/Y7BkxSd8h5wjx2FN4HLYxbZ7GbvBdHVMnYF2IKT36DSYt431 xIYIBUkUSPvhG+duwyaIVPqEvlsIldzxpXFOunkA18QpRJ08ulobN05/bwIyp5Ib07i/wP4ZeJKl uMr19v2ZBri49SivgmXhQro00HcVHqy6QKDWQrQindaY23EqpKw9sLVi1KOqpuuK1rs9ol77T5y6 1FGjCzV0J1KUr41MfSVoYRBuYe5meT01kxcLaTFj05Wj95UvNaV4nNZ1kxZKOJ8AnuK4IGpUcuTy GtqMct3zulBBXGyWO0XK8vMFLPhjaMtd6aMbcg== Received: by omta-ad1-fd1-1402-uk-london-1.omtaad1.vcndplhr.oraclevcn.com (Oracle Communications Messaging Server 8.1.0.1.20260212 64bit (built Feb 12 2026)) with ESMTPS id <0TF124TVQRJO5IA0@omta-ad1-fd1-1402-uk-london-1.omtaad1.vcndplhr.oraclevcn.com> for linux-kernel@vger.kernel.org; Thu, 14 May 2026 21:55:00 +0000 (GMT) List-Unsubscribe-Post: List-Unsubscribe=One-Click From: Lukasz Raczylo To: netdev@vger.kernel.org Cc: Theo Lebrun , Andrea della Porta , Nicolas Ferre , Claudiu Beznea , Andrew Lunn , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-rpi-kernel@lists.infradead.org Subject: [PATCH net-next v2 2/3] net: macb: insert PCIe read barrier before TX completion descriptor check Date: Thu, 14 May 2026 22:54:58 +0100 Message-id: <20260514215459.36109-3-lukasz@raczylo.com> X-Mailer: git-send-email 2.54.0 In-reply-to: <20260514215459.36109-1-lukasz@raczylo.com> References: <20260514215459.36109-1-lukasz@raczylo.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-version: 1.0 Content-transfer-encoding: quoted-printable Reporting-Meta: AAHf+Z0YX8zBMsksb6gSGL9ZV3VCbo9am2PhbSO1iipMaVnbSZPP4U7YWpbXzRHK g669lhZ1JQctwueluql/YAuNG1sS6Qw9c22YFIXuL6GwQwR2+42/XB2hs/vO+BPz TMOCutwpIebt5HwafwWfpaUc5lNTW2Qlf0OPbz/yG+N08lT9XDmDm2HezhsiE+EB gTdDnsozeQweNYs6G2zT8UUfk1jMKyosY/8kKBDy5NilXDjCFvmbyENk1WLrav5h ltnq1pJGMMkLIusDd0+h14K1VrJKYgTxtzvUAmYF6wOhptprP1/Eq4P6/J7WB77L on0CktmBF8o6c0DAWzxLFtUANhqCAusAXvmfYwiIjXY3HiQ3BOq8bComfCalMl6w w3a15EW4FSFlI9r5ZvE7dUn+DVljE0X2Oj1mXEzcy+PnD0zlQIxBpcSh1mD/maff Tw== Content-Type: text/plain; charset="utf-8" macb_tx_poll() runs with TCOMP masked, drains the TX ring, then calls napi_complete_done() and re-enables TCOMP via IER. An existing comment in the function notes that completions raised while TCOMP is masked do not re-fire on IER re-enable, and mitigates this by calling macb_tx_complete_pending(), which inspects driver-visible ring state (descriptor->ctrl, after rmb()) and reschedules NAPI if a completion is observable in memory. On PCIe-attached parts (BCM2712 + RP1 PCIe south bridge on Raspberry Pi 5 is the case I have in front of me), the descriptor DMA write that sets TX_USED may not have retired to system memory at the point macb_tx_complete_pending() runs. The rmb() synchronises the CPU view of earlier CPU writes; it is not sufficient to retire an in-flight peripheral DMA write. Under that ordering the in-memory descriptor can still read TX_USED=3D0 when the hardware has in fact completed the frame; the check returns false; NAPI exits; the quirk above prevents the re-enabled IER from re-firing; the ring goes quiescent. Add a side-effect-free MMIO read between the IER write and the macb_tx_complete_pending() check. The read functions as an architected PCIe read barrier for earlier peripheral-originated DMA writes on the same path, so any in-flight TX_USED update retires to system memory before the descriptor read. The register chosen is IMR (the read-only interrupt mask mirror); reading it has no side effects on either read-clear or W1C ISR silicon (it is not the ISR), and the read still flushes prior DMA writes via the PCIe completion-ordering guarantee. Link: https://github.com/cilium/cilium/issues/43198 Link: https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877 Signed-off-by: Lukasz Raczylo --- drivers/net/ethernet/cadence/macb_main.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/etherne= t/cadence/macb_main.c index 6879f3458..f7fa9e7ad 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -1984,6 +1984,14 @@ static int macb_tx_poll(struct napi_struct *napi, in= t budget) * actions if an interrupt is raised just after enabling them, * but this should be harmless. */ + /* + * PCIe read barrier: flush any in-flight peripheral DMA + * writes (descriptor TX_USED updates) so the subsequent + * macb_tx_complete_pending() check observes them. IMR is + * the read-only interrupt mask mirror; the read has no + * side effects on either read-clear or W1C ISR silicon. + */ + (void)queue_readl(queue, IMR); if (macb_tx_complete_pending(queue)) { queue_writel(queue, IDR, MACB_BIT(TCOMP)); macb_queue_isr_clear(bp, queue, MACB_BIT(TCOMP)); --=20 2.54.0 From nobody Fri Jun 12 12:47:03 2026 Received: from acj35aaf122.lhr1.oracleemaildelivery.com (acj35aaf122.lhr1.oracleemaildelivery.com [130.35.116.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5B69D3D16E0 for ; Thu, 14 May 2026 21:55:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=130.35.116.122 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778795704; cv=none; b=c563HwYvDenz11V+CPPnmXWBfKU031R6QvqrCzv8uaExwD1ulsEpNhFghyWLtIgKsnhN1DbPXL6LuwQF8zysl4gWygKN2PQnGsggIpaZtqwzNlUXMehW3SdrYMcRkwn/GgJeAIYo6f60YX1ioJ7bc5YZiv83RiyisoNIBueFjOk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778795704; c=relaxed/simple; bh=hjqTGOSRTsvyFi96iTPVeEMiFJQRH9CDDZHee5KQ090=; h=From:To:Cc:Subject:Date:Message-id:In-reply-to:References: MIME-version; b=XXuInMY28Y4wOIXj0XR+OYPdBFBoOS1KNXFlMt4viSi2Ia3aoRVsBmTrXn+wVEbaZZVXoQI+kiRnV5qG16JwvSWvq5Q0NN4ejeSv+zqPjX1Gmv7FjS0NzMbPC6zO1FE7+IObPzHCTdB75fMWBFKTx4v07tWAesl8/A8cQTb8gP8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=raczylo.com; spf=pass smtp.mailfrom=lhr1.rp.oracleemaildelivery.com; dkim=pass (2048-bit key) header.d=raczylo.com header.i=@raczylo.com header.b=ebRnFnXC; dkim=pass (2048-bit key) header.d=lhr1.rp.oracleemaildelivery.com header.i=@lhr1.rp.oracleemaildelivery.com header.b=M0wSInsb; arc=none smtp.client-ip=130.35.116.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=raczylo.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=lhr1.rp.oracleemaildelivery.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=raczylo.com header.i=@raczylo.com header.b="ebRnFnXC"; dkim=pass (2048-bit key) header.d=lhr1.rp.oracleemaildelivery.com header.i=@lhr1.rp.oracleemaildelivery.com header.b="M0wSInsb" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; s=oracle-uk-012026; d=raczylo.com; h=Date:To:From:Subject:Message-Id:MIME-Version:Sender:List-Unsubscribe:List-Unsubscribe-Post; bh=tv8aCOtDxJxEXVixI/3l1McdtDLsb1UTPzit6MpIk3Q=; b=ebRnFnXCv7dC8jLq1dor30VsNoF+J6QdXI7jo0xnGsayYeYNNQaoSlp5yblX7n/X2XuRuVouOxMB RdXTO3oQmwogkcAz6d8mA4ioHFRjABYuhdGVTOSl8+E19928FE5UTvso9AkGv9t6PMVLhMrhVJRN i2Iydk0B4Jp/Z58s9wHqSoUK5vNKsixag+7wekuYbQ9dokdhehZAorw7PH0ee/E5m3T54Px5wprJ Xs6cJCXS5RuLgKDC/9Q0lO/wXJQ5Ssgg4msKOTLEjkntxImfAzntWupXbNwI3HkKHxuV+RNqgb6a eq0Z/o4TixNt0cVr/wUx9S6PJLtellb053pPGw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; s=prod-lhr-20191104; d=lhr1.rp.oracleemaildelivery.com; h=Date:To:From:Subject:Message-Id:MIME-Version:Sender:List-Unsubscribe:List-Unsubscribe-Post; bh=tv8aCOtDxJxEXVixI/3l1McdtDLsb1UTPzit6MpIk3Q=; b=M0wSInsbXnXpLVDXI0AWXG/azSJE+zPxyGcCGLpeuj3Bgluhy2T5GsKAYrVQ6+ANyPSKxGC62Nz0 e31R+1zEqeTKfunAoeMJBI1jtDnRQnj8S3RztFrjhL0qJxfg9LzFihrMEkayfeTHA3cMRBgSZx/a j8nXvmrEMR3M4T5xgQjubadqvYCB/y6+UOd7JBT+HX6pN3Gq8n/+xLLiYSBmOKzfpmED5xR9rDh6 V28dAJesFhHW9+4fIPu0nxGLddQkfp+9gjal5VXSs3AVSYhSmMzbMsLWaoGiQYotm81aTxNTucTT 6s7NI1/DHWhWkHW5Cjkfh1maYLT8D0nM9tV6rg== Received: by omta-ad2-fd1-1402-uk-london-1.omtaad2.vcndplhr.oraclevcn.com (Oracle Communications Messaging Server 8.1.0.1.20260212 64bit (built Feb 12 2026)) with ESMTPS id <0TF10C8OSRJO1Y70@omta-ad2-fd1-1402-uk-london-1.omtaad2.vcndplhr.oraclevcn.com> for linux-kernel@vger.kernel.org; Thu, 14 May 2026 21:55:00 +0000 (GMT) List-Unsubscribe-Post: List-Unsubscribe=One-Click From: Lukasz Raczylo To: netdev@vger.kernel.org Cc: Theo Lebrun , Andrea della Porta , Nicolas Ferre , Claudiu Beznea , Andrew Lunn , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-rpi-kernel@lists.infradead.org Subject: [PATCH net-next v2 3/3] net: macb: add TX stall watchdog to recover from lost TCOMP interrupts Date: Thu, 14 May 2026 22:54:59 +0100 Message-id: <20260514215459.36109-4-lukasz@raczylo.com> X-Mailer: git-send-email 2.54.0 In-reply-to: <20260514215459.36109-1-lukasz@raczylo.com> References: <20260514215459.36109-1-lukasz@raczylo.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-version: 1.0 Content-transfer-encoding: quoted-printable Reporting-Meta: AAHf+Z0YX8zBMsksb6gSGL9ZV3VCbo9am2PhbSO1iipMaVnbSZPP4U7YWpbXzRHK g669lhZ1JQctwueluqmNYAuNG1sS6Qw9c20kgC03t4lQsjeVrxTgObbk2FMbnYL2 DPhgP177936zBFuY5vN+DwOvDaAV6u6EXIdoQopKPHp4OZ3OQE4FxgxIn10+JQrc 9AS+IoBJ1n0fwXsIKQTDZGw7orLVJ0qK9aCTgltezeqWHNQG/j/qokokIjmn5w2e Di0OxVIcP1Pso77ewdEdMQE1l/B//jn7F5zT/KxIvJoYQ25yagjH7+eUXPQo6uG2 suxm2fbP1/hwmWw+/+lANgrrf9BzlewSTs7zVuQ5ujF0QaocKLsEu8/6sF6X4M1r CI2oBTTsGtpdLNA0VlBehCm0T//lXOXX72euABkTpcEHFDKK106dprwivNEQtpdm Jg== Content-Type: text/plain; charset="utf-8" On PCIe-attached macb instances (BCM2712 + RP1 PCIe south bridge on Raspberry Pi 5 is the case I have in front of me), a TCOMP interrupt can be lost: the TSTART doorbell can be lost in the posted-write fabric (addressed by an earlier patch), or the descriptor TX_USED DMA write can be observed late by the driver (also addressed earlier). When that happens the TX ring stalls silently until something else kicks TSTART. Add a per-queue delayed_work that runs once per second. It detects forward progress on the TX completion path via a per-queue bool tx_stall_tail_moved that macb_tx_complete() sets when tx_tail advances and the watchdog clears on each tick. If the ring is non-empty (queue->tx_head !=3D queue->tx_tail) and the flag is unset when the tick runs, the watchdog calls the existing macb_tx_restart() to re-assert TSTART. The bool form (rather than a tx_tail snapshot) sidesteps any concern about ring-index aliasing between ticks and is the form suggested by Phil Elwell when reviewing the same series anchored against rpi-6.18.y at raspberrypi/linux#7340 (merged 2026-05-08). No new recovery logic is introduced. macb_tx_restart() already exists in this file, is correctly locked (tx_ptr_lock, bp->lock), and verifies that the hardware's TBQP is behind the driver's head index before re-asserting TSTART. On a healthy ring it is a no-op at the hardware level. Cost on a healthy queue: one spin_lock_irqsave / spin_unlock and two field assignments per tick. The delayed_work is only scheduled between macb_open() and macb_close() and is cancelled synchronously on close. A netif_carrier_ok() gate at the top of the tick skips the stall check when there is no carrier (no completion is possible without a link), eliminating a boot-time false positive where queue->tx_head can advance from kernel-queued packets between macb_open() and link autoneg completion, while tx_tail stays unchanged because no TCOMPs have arrived yet. netdev_warn_ratelimited() is used rather than netdev_warn_once() so operators can count occurrences across the lifetime of the netdev. Link: https://github.com/cilium/cilium/issues/43198 Link: https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877 Link: https://github.com/raspberrypi/linux/pull/7340 Signed-off-by: Lukasz Raczylo --- drivers/net/ethernet/cadence/macb.h | 10 ++++ drivers/net/ethernet/cadence/macb_main.c | 72 ++++++++++++++++++++++++ 2 files changed, 82 insertions(+) diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cad= ence/macb.h index ce9037f9e..75df0f75b 100644 --- a/drivers/net/ethernet/cadence/macb.h +++ b/drivers/net/ethernet/cadence/macb.h @@ -1282,6 +1282,16 @@ struct macb_queue { dma_addr_t tx_ring_dma; struct work_struct tx_error_task; bool txubr_pending; + + /* TX stall watchdog -- see macb_tx_stall_watchdog() in macb_main.c. + * tx_stall_tail_moved is set by macb_tx_complete() when tx_tail + * advances and cleared by the watchdog tick on each pass (both + * under tx_ptr_lock). Using a bool sidesteps any ring-index + * aliasing concern between ticks. + */ + struct delayed_work tx_stall_watchdog_work; + bool tx_stall_tail_moved; + struct napi_struct napi_tx; =20 dma_addr_t rx_ring_dma; diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/etherne= t/cadence/macb_main.c index f7fa9e7ad..8245c69e1 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -1473,6 +1473,8 @@ static int macb_tx_complete(struct macb_queue *queue,= int budget) packets, bytes); =20 queue->tx_tail =3D tail; + if (packets) + queue->tx_stall_tail_moved =3D true; if (__netif_subqueue_stopped(bp->dev, queue_index) && CIRC_CNT(queue->tx_head, queue->tx_tail, bp->tx_ring_size) <=3D MACB_TX_WAKEUP_THRESH(bp)) @@ -2003,6 +2005,70 @@ static int macb_tx_poll(struct napi_struct *napi, in= t budget) return work_done; } =20 +#define MACB_TX_STALL_INTERVAL_MS 1000 + +/* + * TX stall watchdog. + * + * Recovers from lost TCOMP interrupts on PCIe-attached macb + * instances. macb already has a recovery chain + * (txubr_pending -> macb_tx_restart()) that fires on TCOMP; if + * TCOMP itself is lost the TX ring stalls silently until something + * else kicks TSTART. This watchdog runs once per second per queue + * and calls macb_tx_restart() if the ring is non-empty and + * tx_tail has not advanced since the previous tick. + * + * Movement is tracked via the tx_stall_tail_moved boolean rather + * than a tx_tail snapshot, sidestepping any ring-index aliasing + * concern. The bool is set by macb_tx_complete() when tx_tail + * advances and cleared here on each tick; both writes are under + * tx_ptr_lock so no atomic is required. + * + * macb_tx_restart() already checks the hardware's TBQP against + * the driver's head index before re-asserting TSTART, so on a + * healthy ring this is a no-op at the hardware level. The + * watchdog only supplies the missing trigger. + */ +static void macb_tx_stall_watchdog(struct work_struct *work) +{ + struct macb_queue *queue =3D container_of(to_delayed_work(work), + struct macb_queue, + tx_stall_watchdog_work); + struct macb *bp =3D queue->bp; + unsigned int cur_tail, cur_head; + bool stalled =3D false; + unsigned long flags; + + if (!netif_running(bp->dev)) + return; + + /* No carrier =3D> no completion is possible. Skip the check + * but keep the watchdog ticking for when carrier comes up. + */ + if (!netif_carrier_ok(bp->dev)) + goto reschedule; + + spin_lock_irqsave(&queue->tx_ptr_lock, flags); + cur_tail =3D queue->tx_tail; + cur_head =3D queue->tx_head; + if (cur_head !=3D cur_tail && !queue->tx_stall_tail_moved) + stalled =3D true; + queue->tx_stall_tail_moved =3D false; + spin_unlock_irqrestore(&queue->tx_ptr_lock, flags); + + if (stalled) { + netdev_warn_ratelimited(bp->dev, + "TX stall detected on queue %u (tail=3D%u head=3D%u); re-kicking TSTA= RT\n", + (unsigned int)(queue - bp->queues), + cur_tail, cur_head); + macb_tx_restart(queue); + } + +reschedule: + schedule_delayed_work(&queue->tx_stall_watchdog_work, + msecs_to_jiffies(MACB_TX_STALL_INTERVAL_MS)); +} + static void macb_hresp_error_task(struct work_struct *work) { struct macb *bp =3D from_work(bp, work, hresp_err_bh_work); @@ -3192,6 +3258,9 @@ static int macb_open(struct net_device *dev) for (q =3D 0, queue =3D bp->queues; q < bp->num_queues; ++q, ++queue) { napi_enable(&queue->napi_rx); napi_enable(&queue->napi_tx); + queue->tx_stall_tail_moved =3D true; + schedule_delayed_work(&queue->tx_stall_watchdog_work, + msecs_to_jiffies(MACB_TX_STALL_INTERVAL_MS)); } =20 macb_init_hw(bp); @@ -3242,6 +3311,7 @@ static int macb_close(struct net_device *dev) for (q =3D 0, queue =3D bp->queues; q < bp->num_queues; ++q, ++queue) { napi_disable(&queue->napi_rx); napi_disable(&queue->napi_tx); + cancel_delayed_work_sync(&queue->tx_stall_watchdog_work); netdev_tx_reset_queue(netdev_get_tx_queue(dev, q)); } =20 @@ -4804,6 +4874,8 @@ static int macb_init_dflt(struct platform_device *pde= v) } =20 INIT_WORK(&queue->tx_error_task, macb_tx_error_task); + INIT_DELAYED_WORK(&queue->tx_stall_watchdog_work, + macb_tx_stall_watchdog); q++; } =20 --=20 2.54.0