From nobody Wed Jun 17 05:11:19 2026 Received: from mail-dy1-f172.google.com (mail-dy1-f172.google.com [74.125.82.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9185B2DB79F for ; Tue, 28 Apr 2026 01:55:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777341327; cv=none; b=C1egW98a2gT/9yPB7ZLX1zMF6RjZM4wJwOp/KOxp8sz3miBYvsrsXLTdIAKluI3giouc3YaS3zUTn1qfVOwaIpHe8C4Xhmy+2zkRXXdgnfCY/1ETzQgy/qEThbJj1k6wZo5kAZmZwdsFqWqnoIPMqAAFG9wJD5J9dcfLa7JkbcA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777341327; c=relaxed/simple; bh=pV74rYbo4PSnOntrueYInbZC2uJly5cb2kmGtmKzl6c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WAx9+SG7WLYewNypcz1yI2yGks/TbBNmrf/ea/v+CqbMhmtx6I22+ae4LqY8J2p7Xw4RtvH/n6v9d/+4AyEnzDOsXcWgmZf4oNVnZ2p1pktErPtBzWe66m0gVhBcgemUWnBChLVHECmCCtHuPoICZfevanYi0YYmiqXHlZF9oRs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=WwWWmicU; arc=none smtp.client-ip=74.125.82.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="WwWWmicU" Received: by mail-dy1-f172.google.com with SMTP id 5a478bee46e88-2ba9c484e5eso11037611eec.1 for ; Mon, 27 Apr 2026 18:55:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777341326; x=1777946126; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=3+7z2IOdn6pmZUm+ZpjK+hZD0XE6Xe2S5mjthipPTXY=; b=WwWWmicUwys5z2BOATb5HbsxIjU4+n++1V1B92iGfVOBkzawckPKXEydd/DW6oYwK4 col/I9Wbs32Z9vFsLtZ9Qp3i+QZmyiWnFZ05MqVjjVnR47f1mvMoIXqNRiEzpcjEhVs2 Kh+oTHvOA/x2aX0Lru1Wrk0pb0rADgOyKV65cs5hpojafz5mJwg1QfKIbCLj2jvtD8FP FwfF/GtH/f8bnf3AloESrkjTIk2n0yS2Ahq2m/NouMaKh64fRDtv3CmywxO/U1ewi+kt Btgx62FxHwCzjQKuMdZCc8TH4zxhjvexjSSvauhOzAWCHXbBIavNxeVvPoESptUOxdwK tUvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777341326; x=1777946126; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=3+7z2IOdn6pmZUm+ZpjK+hZD0XE6Xe2S5mjthipPTXY=; b=CpuzeksQdwnuf45n11zvswec4e4X9VQ8YjLyve9fR4QDf0WHldDjqOPD5v8h6T+/2c GkcP7KpOltgZS4JIiRk4jD8nTRO3mawgYPFlkfxpjtsKKj1J+iWDOAfK4ajFOjzjAgNG siicjt+e+BRB+LTPiQgaqSD1EU7sVAUD9XRzbEjE/rW57arlmnsznHNf5oATwzuYWAGJ P3QEqJJeSd4wEA5IhN6xuruYkuOeK9s/2qN62VgcUEmIkxM/5/XKFi01TOHk9HkHhDTf L8qDApipUI1ENqwzgXAAxqFUhC80nEgeriBMrTYuAwh6rLm3ehSZXWRjr0Z3ioLnVyQ6 LzMw== X-Forwarded-Encrypted: i=1; AFNElJ+/6T6f93iWF7d2WavgtcmyeVWee89QdzB7KHeQ+cK6xZTJrOr84mJyF7dFlnP5EYHfiwS7q3gRp1+cRyQ=@vger.kernel.org X-Gm-Message-State: AOJu0YxaOHGPh3rLgju8Li6ZfF0TAO0T17rRJpnqx0djPTaztBO3doIC TsJjRjC6AofpQEMpvSIwzdRlt3Agsn8b5s7IP9wACXWeAfYrP15dxszK X-Gm-Gg: AeBDietFUbx9SDIr1h+wZnkw0QaAZWVzaXkVMy89Il0mh+cqaKtiBvUX4L1+ebwC2SO 2p4i6VKTivHQQDMrPXyhrcMGFYMypnCvNUXGG6AObbHAuVGZq7/4F5AGKzlgx5oHXJ4vV3OpI/h M4YXSWYWD4LPXMLRyT0G5U+KzRNafdx14zTQ5guecygPwSlnafkx2HGCiIQJTySTCSF9Pv+QIwo ugVReCwlAhqaVqLKq0UJg7fk3AMwVek0UycCdcwe3Lswrg9rti1wkdJeUL29lfn95N8/QzNz/jv SFfyYR4KdLNpOHVXHZgoLuxU4NLpuNVRBN11ETXe6e5+I36FuOgVVeI9yan+w/jYQmB7Fxje7aj 4sDQA7l0bDk5R5VyS+xBwXRR7PYaRAdLsjecbv2M9D14oodJuO3bg7k9gPLZAXm4nGNkcdLbiLE w+VvtTBvnQ62BAV7bE289Z4MQ6mBluBWdLM7cfTcu8eYNVGzq7W/EDGzQlt+tn79d/y5KYyZ+VL fcHsCcfjxehy63+/dxLzofvc9mAmLzoXRHcENZw0bhOLGwTeTNu+g4VYepXXutu/txqT0zOswc+ qEF13im14LdtfKhn41kV3QtKZoSNLeuiZ6qevDg= X-Received: by 2002:a05:7300:3b06:b0:2ed:935:aa33 with SMTP id 5a478bee46e88-2ed09fde4b5mr587329eec.5.1777341325650; Mon, 27 Apr 2026 18:55:25 -0700 (PDT) Received: from appmana-001.i.appmana.com (23-93-84-4.dedicated.static.sonic.net. [23.93.84.4]) by smtp.gmail.com with ESMTPSA id 5a478bee46e88-2ed09f8a909sm1233947eec.4.2026.04.27.18.55.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Apr 2026 18:55:24 -0700 (PDT) From: Benjamin Berman To: Andreas Noever , Mika Westerberg , Yehezkel Bernat Cc: Andrew Lunn , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-usb@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 1/2] thunderbolt: drop start_poll guard in tb_ring_poll_complete() Date: Mon, 27 Apr 2026 18:55:20 -0700 Message-ID: <20260428015521.3454006-2-benjamin.s.berman@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260428015521.3454006-1-benjamin.s.berman@gmail.com> References: <20260428015521.3454006-1-benjamin.s.berman@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Under concurrent load on a single NHI with several rings simultaneously in NAPI poll (e.g. a Maple Ridge TB4 transit forwarding tbnet traffic between two peers), one ring's interrupt enable bit in REG_RING_INTERRUPT_BASE can stay cleared. MSI-X stops for that ring, NAPI is never rescheduled, but carrier is reported up and no driver event fires. The ring stays masked until thunderbolt_net is reloaded. tb_ring_poll_complete() gated the unmask on @start_poll: if (ring->start_poll) __ring_interrupt_mask(ring, false); while the ISR path masks unconditionally via __ring_interrupt(). In a window where @start_poll is observed as NULL by the unmask path while the paired mask persists, the ring is left permanently masked. Gate on @running instead and add an ioread32() barrier so the posted enable reaches the device before the spinlock is dropped. On NHIs without QUIRK_AUTO_CLEAR_INT a second issue compounds the first: stale pending status in REG_RING_NOTIFY_BASE can prevent the hardware from re-arming its MSI-X generator when the ring is re-enabled. Clear the ring's bit in REG_RING_INT_CLEAR before setting the enable bit, mirroring what ring_msix() already does at ISR entry. Verified on a Maple Ridge 4C transit and two TB3 Titan Ridge endpoints running NCCL all-reduce over tb-lo: pre-patch the chain wedges in under 1 GB; post-patch a 192 GB run (3000 iterations of a 64 MiB all-reduce) completes with mask/unmask counters balanced. Generated-by: Claude Opus 4.7 Tested-by: Benjamin Berman Signed-off-by: Benjamin Berman --- drivers/thunderbolt/nhi.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c index 2bb2e79ca..bba45ec36 100644 --- a/drivers/thunderbolt/nhi.c +++ b/drivers/thunderbolt/nhi.c @@ -389,10 +389,24 @@ static void __ring_interrupt_mask(struct tb_ring *rin= g, bool mask) u32 val; =20 val =3D ioread32(ring->nhi->iobase + reg); - if (mask) + if (mask) { val &=3D ~BIT(bit); - else + } else { + if (!(ring->nhi->quirks & QUIRK_AUTO_CLEAR_INT)) { + int cbit =3D ring_interrupt_index(ring) & 31; + + if (ring->is_tx) + iowrite32(BIT(cbit), + ring->nhi->iobase + + REG_RING_INT_CLEAR); + else + iowrite32(BIT(cbit), + ring->nhi->iobase + + REG_RING_INT_CLEAR + + 4 * (ring->nhi->hop_count / 32)); + } val |=3D BIT(bit); + } iowrite32(val, ring->nhi->iobase + reg); } =20 @@ -423,8 +437,10 @@ void tb_ring_poll_complete(struct tb_ring *ring) =20 spin_lock_irqsave(&ring->nhi->lock, flags); spin_lock(&ring->lock); - if (ring->start_poll) + if (ring->running) { __ring_interrupt_mask(ring, false); + (void)ioread32(ring->nhi->iobase + REG_RING_INTERRUPT_BASE); + } spin_unlock(&ring->lock); spin_unlock_irqrestore(&ring->nhi->lock, flags); } --=20 2.43.0 From nobody Wed Jun 17 05:11:19 2026 Received: from mail-dy1-f180.google.com (mail-dy1-f180.google.com [74.125.82.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0BC9E2D97BA for ; Tue, 28 Apr 2026 01:55:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777341329; cv=none; b=cN4U4j20M/cdxN81kX8jPq4/AXtlh80gnIFQoIKuK+fNOBcDmU2Kza9sj08hXtw2lNrnQZE5+gUMV5xYGrFWFEZCIe5M3xlzeX5bjE9uZa3HDvZW0DfamB+JHtOydjWgGDoHKaQ36i31H6hLuPvajZs7rm410dKXlLW4Q1edXcM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777341329; c=relaxed/simple; bh=kEC8iz+za6oVFzoBL743u86JXYeZApyXb7bIcdS8eOw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WMjrVX0udgvDMvO3w8v8UC4FTjuphEL/ycN4hAhCrpwD2H1is6ZKlGul3pi/2DJiXayHPDBewX2zrhuD0OLwPO3h40cxWenZTcHNtoX4Di6G19FBZ0bnaJN9ntd4mh4hXbeEQrNJ/op6X/MyAcY0nchUmSmZ5lNQSNeCds6SZNg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=jFJLtLKr; arc=none smtp.client-ip=74.125.82.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="jFJLtLKr" Received: by mail-dy1-f180.google.com with SMTP id 5a478bee46e88-2d9916deb14so19212851eec.0 for ; Mon, 27 Apr 2026 18:55:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777341327; x=1777946127; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=fARec+nI7yhQfM7l30+YoxkOthmyRbzKG9sUe2e81Lo=; b=jFJLtLKro7bXzvQqRH3l1hNwqwbHECBxdIt64TQvmYHkq8TdWYeLQHlFDYUuB9IF2w fWixNxxJRdfc0+AXct5s7DMC8PB2uSDCK5lEflR5p9oJvJ5XmfaRGp6ifV68g8jMihfH L1WO+BH+fnOA5zYElRIeKNgfb07r374M+cgyeg0/kFA6t/tVtny33QKfa4BEZQxbnw8F eap/algqk00rdWBNGVjUM8HGaqGvmNEwEp1QeP6U26Y7ss5D2sLwpRVqZbqjaFbJSo6x S75IFtyy2l56/r6pm4Co7cwDKfx4FYNNVwdeR+Vfkysa+ubQto+sjMrh66SuUYJ0loQz ZHCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777341327; x=1777946127; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=fARec+nI7yhQfM7l30+YoxkOthmyRbzKG9sUe2e81Lo=; b=Cd9muWrs/U4ERWCuInYI8GEC2mw/+bpV67vaaR5e4h7NYbckOsnLzMloZROFCL33t7 w3JI0jlDvbCdcvxM6K9VeqBWd0ZglHfXxgVSUhUsS3/gD9D327fqQjgubmEXOvLIiLJI Mr2p2vAsLaVbdaLTLBSOsbQ+KgsFc7Ok0bew4F+q08LF0YWaZWlRrpmgELbR1CkR4nx7 a2Rv/S5b2F5Ljjg/1k9ND5+YMBFMxnczM2xelXGEayTW4Fw/Wc+YEcoUyPO7PY74GHNL GAss9v6ESUeNF8+7q2R1s9c2nrqBxP4T/f5FWilEUMZf/VxJRP6Sxs2vla52G/223bsv 2vUA== X-Forwarded-Encrypted: i=1; AFNElJ/XmzHnnNS3UzFMS41unwdozTbo/iuRWk7kRbd7dW9EHUotSeM8pqaol5rftwLP75hRpWdxRNe7dLGwyWg=@vger.kernel.org X-Gm-Message-State: AOJu0YyZP7P231hpmBY97WBxttw6m87YmLsAhPSQ6qZcL/MDVIuSesb2 4Oil+KDNnHTxSTNI9lV7MRzJzA6NsAzSATArKXvIkFdFvyOzBbnrfWN3 X-Gm-Gg: AeBDievvfnsNylQW6+4H0xtI6bUBObO7FadzU1hqgPLzSwrKsZefLNoF/XMm1teS+aE hR7XdieRDyn9/pejwZeEC0z7X+vpLUkJwWMMqpGvBmG80p/F7w7Dton9lPe+dqrqcDSjgXhBvxx t5W7U2eiqYpQYzfHiqMlRS69HyJ35gLZ2/i9N0C6vKneAiXqQO2F6Kv/IgTnhQqfsNOopmOdX9b MBIeC97uod6QJ+SvNXbI9UXqeRRj7/DgaPkTkX3F+Azy0Mm3nrj7ZLQTSpzc0lzgAssNeHweAVD sHCwjooWCgQIh5IOAOGVF6VzHEbC0E1r7hWlbRDi1H7SM4XeHABgffS4jhGNUwhuuvR2V6qQCf3 MpzcXp5rOe0HlQJf3ONJrpTi8svFlpUTZna6WfhJGSTLE7TwvEfTqN9BnX+rqOAbwMmHlp/REmd QNbZCBZhQwCOaefPKhFpiF14j8UWnUe8idX+r72KEnAvI/LNvdDwvGnKZBCbYSBGh1QTl4dxUky m7u2+uCjGYbr9fq/Wpy+iUk+OXy7P+z3goyZVwIeTrTQe3/EAIv0J0pdXbMHYy1yce1sTtNiTj3 9w7J5gLN7lDzRWw6Mtzp1harUnQq7Vmg8Vy/Hd4= X-Received: by 2002:a05:7300:dc88:b0:2dd:5641:ef2 with SMTP id 5a478bee46e88-2ed0a185d4bmr685315eec.25.1777341327172; Mon, 27 Apr 2026 18:55:27 -0700 (PDT) Received: from appmana-001.i.appmana.com (23-93-84-4.dedicated.static.sonic.net. [23.93.84.4]) by smtp.gmail.com with ESMTPSA id 5a478bee46e88-2ed09f8a909sm1233947eec.4.2026.04.27.18.55.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Apr 2026 18:55:26 -0700 (PDT) From: Benjamin Berman To: Andreas Noever , Mika Westerberg , Yehezkel Bernat Cc: Andrew Lunn , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-usb@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 2/2] net: thunderbolt: enlarge RX/TX ring and set NAPI weight for sustained load Date: Mon, 27 Apr 2026 18:55:21 -0700 Message-ID: <20260428015521.3454006-3-benjamin.s.berman@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260428015521.3454006-1-benjamin.s.berman@gmail.com> References: <20260428015521.3454006-1-benjamin.s.berman@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The default TBNET_RING_SIZE of 256 and the NAPI_POLL_WEIGHT of 64 implicit in netif_napi_add() are too small for host-to-host Thunderbolt networking under sustained bulk traffic. Running NCCL all-reduce over tb-lo on a three-node chain (two TB3 endpoints plus a TB4 Maple Ridge transit) produces rx_missed_errors at ~1 % of rx_packets on the transit and ~0.6 % on the endpoints, with rx_packets stalling against a peer's continuing tx_packets. Raise TBNET_RING_SIZE to 2048 (8x) and use netif_napi_add_weight() with a per-NAPI weight of 256 so tbnet_poll() drains more frames per softirq invocation. With matching sysctls (net.core.netdev_budget=3D1024, net.core.netdev_budget_usecs=3D8000) rx_missed_errors stays below 0.005 % over a 192 GB all-reduce workload on the same hardware. Generated-by: Claude Opus 4.7 Tested-by: Benjamin Berman Signed-off-by: Benjamin Berman Acked-by: Mika Westerberg --- drivers/net/thunderbolt/main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/thunderbolt/main.c b/drivers/net/thunderbolt/main.c index 7aae5d915..3a096f7c5 100644 --- a/drivers/net/thunderbolt/main.c +++ b/drivers/net/thunderbolt/main.c @@ -31,7 +31,7 @@ #define TBNET_LOGIN_TIMEOUT 500 #define TBNET_LOGOUT_TIMEOUT 1000 =20 -#define TBNET_RING_SIZE 256 +#define TBNET_RING_SIZE 2048 #define TBNET_LOGIN_RETRIES 60 #define TBNET_LOGOUT_RETRIES 10 #define TBNET_E2E BIT(0) @@ -1383,7 +1383,7 @@ static int tbnet_probe(struct tb_service *svc, const = struct tb_service_id *id) dev->features =3D dev->hw_features | NETIF_F_HIGHDMA; dev->hard_header_len +=3D sizeof(struct thunderbolt_ip_frame_header); =20 - netif_napi_add(dev, &net->napi, tbnet_poll); + netif_napi_add_weight(dev, &net->napi, tbnet_poll, 256); =20 /* MTU range: 68 - 65522 */ dev->min_mtu =3D ETH_MIN_MTU; --=20 2.43.0