From nobody Thu Apr 9 15:03:41 2026 Received: from m16.mail.163.com (m16.mail.163.com [117.135.210.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 14D7029A2; Tue, 3 Mar 2026 05:52:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=117.135.210.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772517149; cv=none; b=GtkVnYwxdgvunD9G2qziF4wJR8GGgG5hm/b+bZsZOvrbpl970bKjmRAIQKg+1wZlQPA359Youm7JvGbY7rg1zeZc4JMXcUDYerUE3gsOANkspLlx/Au7uUYk/b5xvcOGCH/Y5URthdRlfTmZVWesR3B7x7RDo+I7Ilkl2IqgaVA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772517149; c=relaxed/simple; bh=srsGL27ODkb32v+jJnRQHGZJU3/SbDJAk4aL/3CdEAQ=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=QILFVf2A55Us50IsBKfbPhoMA6sZXYiO7TZXJwDKY5OKsV/jsfwGYr7VaQmfR5fAXQEKnqIaxa0IMBMd3+rpaVdXvrPX7fogegwNR6bjpKFwZ2rgQs4mZVyAU+kPs3tAx0XOlFk5e3jc3JmyT4J9yb16YN6xWds6nado4ZFb/jI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=163.com; spf=pass smtp.mailfrom=163.com; dkim=pass (1024-bit key) header.d=163.com header.i=@163.com header.b=lIMuLjJ+; arc=none smtp.client-ip=117.135.210.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=163.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=163.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=163.com header.i=@163.com header.b="lIMuLjJ+" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=9j neQjI3DzzG0MEdCPF569r0V3d6StOOexg2c0rY+3M=; b=lIMuLjJ+FfznC1uk5y ++eDeP0uTPAmap6S9FO4Aqmyay1nA4H18oID3yG9Cr0ZVraBHKp0DL+BPNplPNmZ MV4N50kjTeJzLaR20ZLq0P0UtpLV73BFJK211xYR+AM0ZunMXgil0ZjB9ulB+M2k f5/vftsmJK3KWIs4h9qTUjgWM= Received: from pek-lpg-core5.wrs.com (unknown []) by gzsmtp3 (Coremail) with SMTP id PigvCgA3CNnXdqZpLe3SRA--.197S2; Tue, 03 Mar 2026 13:51:23 +0800 (CST) From: Robert Garcia To: stable@vger.kernel.org, Vladimir Oltean Cc: Paolo Abeni , Ma Ke , Robert Garcia , Jonas Gorski , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Simon Horman , Russell King , Florian Fainelli , netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 6.12.y] net: dsa: properly keep track of conduit reference Date: Tue, 3 Mar 2026 13:51:20 +0800 Message-Id: <20260303055120.2111614-1-rob_garcia@163.com> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: PigvCgA3CNnXdqZpLe3SRA--.197S2 X-Coremail-Antispam: 1Uf129KBjvJXoWfGF4rtr15Aw4xCF4furWkCrg_yoWkAw4xpF 43Ga43t3ykG3y7Grs8Wa1UuFyrAw4Fyay3G34UK34furn8Gry0yry8KFZa9345CrZ3GF9x XFZ8Za4rCFZ8ZFJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07UxnYwUUUUU= X-CM-SenderInfo: 5uresw5dufxti6rwjhhfrp/xtbDAR3YRGmmdt3QvgAA3K Content-Type: text/plain; charset="utf-8" From: Vladimir Oltean [ Upstream commit 06e219f6a706c367c93051f408ac61417643d2f9 ] Problem description Reported-by: Ma Ke Reviewed-by: Jonas Gorski ------------------- DSA has a mumbo-jumbo of reference handling of the conduit net device and its kobject which, sadly, is just wrong and doesn't make sense. There are two distinct problems. 1. The OF path, which uses of_find_net_device_by_node(), never releases the elevated refcount on the conduit's kobject. Nominally, the OF and non-OF paths should result in objects having identical reference counts taken, and it is already suspicious that dsa_dev_to_net_device() has a put_device() call which is missing in dsa_port_parse_of(), but we can actually even verify that an issue exists. With CONFIG_DEBUG_KOBJECT_RELEASE=3Dy, if we run this command "before" and "after" applying this patch: (unbind the conduit driver for net device eno2) echo 0000:00:00.2 > /sys/bus/pci/drivers/fsl_enetc/unbind we see these lines in the output diff which appear only with the patch applied: kobject: 'eno2' (ffff002009a3a6b8): kobject_release, parent 000000000000000= 0 (delayed 1000) kobject: '109' (ffff0020099d59a0): kobject_release, parent 0000000000000000= (delayed 1000) 2. After we find the conduit interface one way (OF) or another (non-OF), it can get unregistered at any time, and DSA remains with a long-lived, but in this case stale, cpu_dp->conduit pointer. Holding the net device's underlying kobject isn't actually of much help, it just prevents it from being freed (but we never need that kobject directly). What helps us to prevent the net device from being unregistered is the parallel netdev reference mechanism (dev_hold() and dev_put()). Actually we actually use that netdev tracker mechanism implicitly on user ports since commit 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings"), via netdev_upper_dev_link(= ). But time still passes at DSA switch probe time between the initial of_find_net_device_by_node() code and the user port creation time, time during which the conduit could unregister itself and DSA wouldn't know about it. So we have to run of_find_net_device_by_node() under rtnl_lock() to prevent that from happening, and release the lock only with the netdev tracker having acquired the reference. Do we need to keep the reference until dsa_unregister_switch() / dsa_switch_shutdown()? 1: Maybe yes. A switch device will still be registered even if all user ports failed to probe, see commit 86f8b1c01a0a ("net: dsa: Do not make user port errors fatal"), and the cpu_dp->conduit pointers remain valid. I haven't audited all call paths to see whether they will actually use the conduit in lack of any user port, but if they do, it seems safer to not rely on user ports for that reference. 2. Definitely yes. We support changing the conduit which a user port is associated to, and we can get into a situation where we've moved all user ports away from a conduit, thus no longer hold any reference to it via the net device tracker. But we shouldn't let it go nonetheless - see the next change in relation to dsa_tree_find_first_conduit() and LAG conduits which disappear. We have to be prepared to return to the physical conduit, so the CPU port must explicitly keep another reference to it. This is also to say: the user ports and their CPU ports may not always keep a reference to the same conduit net device, and both are needed. As for the conduit's kobject for the /sys/class/net/ entry, we don't care about it, we can release it as soon as we hold the net device object itself. History and blame attribution ----------------------------- The code has been refactored so many times, it is very difficult to follow and properly attribute a blame, but I'll try to make a short history which I hope to be correct. We have two distinct probing paths: - one for OF, introduced in 2016 in commit 83c0afaec7b7 ("net: dsa: Add new binding implementation") - one for non-OF, introduced in 2017 in commit 71e0bbde0d88 ("net: dsa: Add support for platform data") These are both complete rewrites of the original probing paths (which used struct dsa_switch_driver and other weird stuff, instead of regular devices on their respective buses for register access, like MDIO, SPI, I2C etc): - one for OF, introduced in 2013 in commit 5e95329b701c ("dsa: add device tree bindings to register DSA switches") - one for non-OF, introduced in 2008 in commit 91da11f870f0 ("net: Distributed Switch Architecture protocol support") except for tiny bits and pieces like dsa_dev_to_net_device() which were seemingly carried over since the original commit, and used to this day. The point is that the original probing paths received a fix in 2015 in the form of commit 679fb46c5785 ("net: dsa: Add missing master netdev dev_put() calls"), but the fix never made it into the "new" (dsa2) probing paths that can still be traced to today, and the fixed probing path was later deleted in 2019 in commit 93e86b3bc842 ("net: dsa: Remove legacy probing support"). That is to say, the new probing paths were never quite correct in this area. The existence of the legacy probing support which was deleted in 2019 explains why dsa_dev_to_net_device() returns a conduit with elevated refcount (because it was supposed to be released during dsa_remove_dst()). After the removal of the legacy code, the only user of dsa_dev_to_net_device() calls dev_put(conduit) immediately after this function returns. This pattern makes no sense today, and can only be interpreted historically to understand why dev_hold() was there in the first place. Change details -------------- Today we have a better netdev tracking infrastructure which we should use. Logically netdev_hold() belongs in common code (dsa_port_parse_cpu(), where dp->conduit is assigned), but there is a tradeoff to be made with the rtnl_lock() section which would become a bit too long if we did that - dsa_port_parse_cpu() also calls request_module(). So we duplicate a bit of logic in order for the callers of dsa_port_parse_cpu() to be the ones responsible of holding the conduit reference and releasing it on error. This shortens the rtnl_lock() section significantly. In the dsa_switch_probe() error path, dsa_switch_release_ports() will be called in a number of situations, one being where dsa_port_parse_cpu() maybe didn't get the chance to run at all (a different port failed earlier, etc). So we have to test for the conduit being NULL prior to calling netdev_put(). There have still been so many transformations to the code since the blamed commits (rename master -> conduit, commit 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown")), that it only makes sense to fix the code using the best methods available today and see how it can be backported to stable later. I suspect the fix cannot even be backported to kernels which lack dsa_switch_shutdown(), and I suspect this is also maybe why the long-lived conduit reference didn't make it into the new DSA probing paths at the time (problems during shutdown). Because dsa_dev_to_net_device() has a single call site and has to be changed anyway, the logic was just absorbed into the non-OF dsa_port_parse(). Tested on the ocelot/felix switch and on dsa_loop, both on the NXP LS1028A with CONFIG_DEBUG_KOBJECT_RELEASE=3Dy. Reported-by: Ma Ke Closes: https://lore.kernel.org/netdev/20251214131204.4684-1-make24@iscas.a= c.cn/ Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation") Fixes: 71e0bbde0d88 ("net: dsa: Add support for platform data") Reviewed-by: Jonas Gorski Signed-off-by: Vladimir Oltean Link: https://patch.msgid.link/20251215150236.3931670-1-vladimir.oltean@nxp= .com Signed-off-by: Paolo Abeni Signed-off-by: Robert Garcia --- include/net/dsa.h | 1 + net/dsa/dsa.c | 59 +++++++++++++++++++++++++++-------------------- 2 files changed, 35 insertions(+), 25 deletions(-) diff --git a/include/net/dsa.h b/include/net/dsa.h index 877f9b270cf6..a1f730d9f01b 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -296,6 +296,7 @@ struct dsa_port { struct devlink_port devlink_port; struct phylink *pl; struct phylink_config pl_config; + netdevice_tracker conduit_tracker; struct dsa_lag *lag; struct net_device *hsr_dev; =20 diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c index 76a086e846c4..95f87e1f90af 100644 --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -1246,14 +1246,25 @@ static int dsa_port_parse_of(struct dsa_port *dp, s= truct device_node *dn) if (ethernet) { struct net_device *conduit; const char *user_protocol; + int err; =20 + rtnl_lock(); conduit =3D of_find_net_device_by_node(ethernet); of_node_put(ethernet); - if (!conduit) + if (!conduit) { + rtnl_unlock(); return -EPROBE_DEFER; + } + + netdev_hold(conduit, &dp->conduit_tracker, GFP_KERNEL); + put_device(&conduit->dev); + rtnl_unlock(); =20 user_protocol =3D of_get_property(dn, "dsa-tag-protocol", NULL); - return dsa_port_parse_cpu(dp, conduit, user_protocol); + err =3D dsa_port_parse_cpu(dp, conduit, user_protocol); + if (err) + netdev_put(conduit, &dp->conduit_tracker); + return err; } =20 if (link) @@ -1386,37 +1397,30 @@ static struct device *dev_find_class(struct device = *parent, char *class) return device_find_child(parent, class, dev_is_class); } =20 -static struct net_device *dsa_dev_to_net_device(struct device *dev) -{ - struct device *d; - - d =3D dev_find_class(dev, "net"); - if (d !=3D NULL) { - struct net_device *nd; - - nd =3D to_net_dev(d); - dev_hold(nd); - put_device(d); - - return nd; - } - - return NULL; -} - static int dsa_port_parse(struct dsa_port *dp, const char *name, struct device *dev) { if (!strcmp(name, "cpu")) { struct net_device *conduit; + struct device *d; + int err; =20 - conduit =3D dsa_dev_to_net_device(dev); - if (!conduit) + rtnl_lock(); + d =3D dev_find_class(dev, "net"); + if (!d) { + rtnl_unlock(); return -EPROBE_DEFER; + } =20 - dev_put(conduit); + conduit =3D to_net_dev(d); + netdev_hold(conduit, &dp->conduit_tracker, GFP_KERNEL); + put_device(d); + rtnl_unlock(); =20 - return dsa_port_parse_cpu(dp, conduit, NULL); + err =3D dsa_port_parse_cpu(dp, conduit, NULL); + if (err) + netdev_put(conduit, &dp->conduit_tracker); + return err; } =20 if (!strcmp(name, "dsa")) @@ -1484,6 +1488,9 @@ static void dsa_switch_release_ports(struct dsa_switc= h *ds) struct dsa_vlan *v, *n; =20 dsa_switch_for_each_port_safe(dp, next, ds) { + if (dsa_port_is_cpu(dp) && dp->conduit) + netdev_put(dp->conduit, &dp->conduit_tracker); + /* These are either entries that upper layers lost track of * (probably due to bugs), or installed through interfaces * where one does not necessarily have to remove them, like @@ -1636,8 +1643,10 @@ void dsa_switch_shutdown(struct dsa_switch *ds) /* Disconnect from further netdevice notifiers on the conduit, * since netdev_uses_dsa() will now return false. */ - dsa_switch_for_each_cpu_port(dp, ds) + dsa_switch_for_each_cpu_port(dp, ds) { dp->conduit->dsa_ptr =3D NULL; + netdev_put(dp->conduit, &dp->conduit_tracker); + } =20 rtnl_unlock(); out: --=20 2.34.1