drivers/net/phy/phy_device.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
The phy_detach function can be called with or without the rtnl lock held.
When the rtnl lock is not held, using rtnl_dereference() triggers a
warning due to the lack of lock context.
Add an rcu_read_lock() to ensure the lock is acquired and to maintain
synchronization.
Tested-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Reported-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Closes: https://lore.kernel.org/netdev/4c6419d8-c06b-495c-b987-d66c2e1ff848@tuxon.dev/
Fixes: 35f7cad1743e ("net: Add the possibility to support a selected hwtstamp in netdevice")
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
---
Changes in v2:
- Add a missing ;
---
drivers/net/phy/phy_device.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 5b34d39d1d52..3eeee7cba923 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -2001,12 +2001,14 @@ void phy_detach(struct phy_device *phydev)
if (dev) {
struct hwtstamp_provider *hwprov;
- hwprov = rtnl_dereference(dev->hwprov);
+ rcu_read_lock();
+ hwprov = rcu_dereference(dev->hwprov);
/* Disable timestamp if it is the one selected */
if (hwprov && hwprov->phydev == phydev) {
rcu_assign_pointer(dev->hwprov, NULL);
kfree_rcu(hwprov, rcu_head);
}
+ rcu_read_unlock();
phydev->attached_dev->phydev = NULL;
phydev->attached_dev = NULL;
--
2.34.1
On Fri, Jan 17, 2025 at 6:36 PM Kory Maincent <kory.maincent@bootlin.com> wrote:
>
> The phy_detach function can be called with or without the rtnl lock held.
> When the rtnl lock is not held, using rtnl_dereference() triggers a
> warning due to the lack of lock context.
>
> Add an rcu_read_lock() to ensure the lock is acquired and to maintain
> synchronization.
>
> Tested-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
> Reported-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
> Closes: https://lore.kernel.org/netdev/4c6419d8-c06b-495c-b987-d66c2e1ff848@tuxon.dev/
> Fixes: 35f7cad1743e ("net: Add the possibility to support a selected hwtstamp in netdevice")
> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
> ---
>
> Changes in v2:
> - Add a missing ;
> ---
> drivers/net/phy/phy_device.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 5b34d39d1d52..3eeee7cba923 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -2001,12 +2001,14 @@ void phy_detach(struct phy_device *phydev)
> if (dev) {
> struct hwtstamp_provider *hwprov;
>
> - hwprov = rtnl_dereference(dev->hwprov);
> + rcu_read_lock();
> + hwprov = rcu_dereference(dev->hwprov);
> /* Disable timestamp if it is the one selected */
> if (hwprov && hwprov->phydev == phydev) {
> rcu_assign_pointer(dev->hwprov, NULL);
> kfree_rcu(hwprov, rcu_head);
> }
> + rcu_read_unlock();
>
> phydev->attached_dev->phydev = NULL;
> phydev->attached_dev = NULL;
> --
> 2.34.1
>
If not protected by RTNL, what prevents two threads from calling this
function at the same time,
thus attempting to kfree_rcu() the same pointer twice ?
On Fri, 17 Jan 2025 20:06:28 +0100
Eric Dumazet <edumazet@google.com> wrote:
> On Fri, Jan 17, 2025 at 6:36 PM Kory Maincent <kory.maincent@bootlin.com>
> wrote:
> >
> > The phy_detach function can be called with or without the rtnl lock held.
> > When the rtnl lock is not held, using rtnl_dereference() triggers a
> > warning due to the lack of lock context.
> >
> > Add an rcu_read_lock() to ensure the lock is acquired and to maintain
> > synchronization.
> >
> > Tested-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
> > Reported-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
> > Closes:
> > https://lore.kernel.org/netdev/4c6419d8-c06b-495c-b987-d66c2e1ff848@tuxon.dev/
> > Fixes: 35f7cad1743e ("net: Add the possibility to support a selected
> > hwtstamp in netdevice") Signed-off-by: Kory Maincent
> > <kory.maincent@bootlin.com> ---
> >
> > Changes in v2:
> > - Add a missing ;
> > ---
> > drivers/net/phy/phy_device.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> > index 5b34d39d1d52..3eeee7cba923 100644
> > --- a/drivers/net/phy/phy_device.c
> > +++ b/drivers/net/phy/phy_device.c
> > @@ -2001,12 +2001,14 @@ void phy_detach(struct phy_device *phydev)
> > if (dev) {
> > struct hwtstamp_provider *hwprov;
> >
> > - hwprov = rtnl_dereference(dev->hwprov);
> > + rcu_read_lock();
> > + hwprov = rcu_dereference(dev->hwprov);
> > /* Disable timestamp if it is the one selected */
> > if (hwprov && hwprov->phydev == phydev) {
> > rcu_assign_pointer(dev->hwprov, NULL);
> > kfree_rcu(hwprov, rcu_head);
> > }
> > + rcu_read_unlock();
> >
> > phydev->attached_dev->phydev = NULL;
> > phydev->attached_dev = NULL;
> > --
> > 2.34.1
> >
>
> If not protected by RTNL, what prevents two threads from calling this
> function at the same time,
> thus attempting to kfree_rcu() the same pointer twice ?
I don't think this function can be called simultaneously from two threads,
if this were the case we would have already seen several issues with the phydev
pointer. But maybe I am wrong.
The rcu_lock here is to prevent concurrent dev->hwprov pointer modification done
under rtnl_lock in net/ethtool/tsconfig.c.
Regards,
--
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com
On Fri, 17 Jan 2025 23:16:59 +0100 Kory Maincent wrote: > > If not protected by RTNL, what prevents two threads from calling this > > function at the same time, > > thus attempting to kfree_rcu() the same pointer twice ? > > I don't think this function can be called simultaneously from two threads, > if this were the case we would have already seen several issues with the phydev > pointer. But maybe I am wrong. > > The rcu_lock here is to prevent concurrent dev->hwprov pointer modification done > under rtnl_lock in net/ethtool/tsconfig.c. I could also be wrong, but I don't recall being told that suspend path can't race with anything else. So I think ravb should probably take rtnl_lock or some such when its shutting itself down.. ? If I'm wrong I think we should mention this is from suspend and add Claudiu's stack trace to the commit msg. -- pw-bot: cr
On Fri, 17 Jan 2025 19:07:20 -0800 Jakub Kicinski <kuba@kernel.org> wrote: > On Fri, 17 Jan 2025 23:16:59 +0100 Kory Maincent wrote: > > > If not protected by RTNL, what prevents two threads from calling this > > > function at the same time, > > > thus attempting to kfree_rcu() the same pointer twice ? > > > > I don't think this function can be called simultaneously from two threads, > > if this were the case we would have already seen several issues with the > > phydev pointer. But maybe I am wrong. > > > > The rcu_lock here is to prevent concurrent dev->hwprov pointer modification > > done under rtnl_lock in net/ethtool/tsconfig.c. > > I could also be wrong, but I don't recall being told that suspend path > can't race with anything else. So I think ravb should probably take > rtnl_lock or some such when its shutting itself down.. ? > > If I'm wrong I think we should mention this is from suspend and > add Claudiu's stack trace to the commit msg. Is it ok if I send the v3 fix in net-next even if it is closed? Regards, -- Köry Maincent, Bootlin Embedded Linux and kernel engineering https://bootlin.com
On Mon, Jan 20, 2025 at 10:37:22AM +0100, Kory Maincent wrote: > On Fri, 17 Jan 2025 19:07:20 -0800 > Jakub Kicinski <kuba@kernel.org> wrote: > > > On Fri, 17 Jan 2025 23:16:59 +0100 Kory Maincent wrote: > > > > If not protected by RTNL, what prevents two threads from calling this > > > > function at the same time, > > > > thus attempting to kfree_rcu() the same pointer twice ? > > > > > > I don't think this function can be called simultaneously from two threads, > > > if this were the case we would have already seen several issues with the > > > phydev pointer. But maybe I am wrong. > > > > > > The rcu_lock here is to prevent concurrent dev->hwprov pointer modification > > > done under rtnl_lock in net/ethtool/tsconfig.c. > > > > I could also be wrong, but I don't recall being told that suspend path > > can't race with anything else. So I think ravb should probably take > > rtnl_lock or some such when its shutting itself down.. ? > > > > If I'm wrong I think we should mention this is from suspend and > > add Claudiu's stack trace to the commit msg. > > Is it ok if I send the v3 fix in net-next even if it is closed? In general, fixes are still accepted into net-next if the pull request hasn't been sent and the code that is being fixed is only in net-next. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
On Fri, 17 Jan 2025 19:07:20 -0800 Jakub Kicinski <kuba@kernel.org> wrote: > On Fri, 17 Jan 2025 23:16:59 +0100 Kory Maincent wrote: > > > If not protected by RTNL, what prevents two threads from calling this > > > function at the same time, > > > thus attempting to kfree_rcu() the same pointer twice ? > > > > I don't think this function can be called simultaneously from two threads, > > if this were the case we would have already seen several issues with the > > phydev pointer. But maybe I am wrong. > > > > The rcu_lock here is to prevent concurrent dev->hwprov pointer modification > > done under rtnl_lock in net/ethtool/tsconfig.c. > > I could also be wrong, but I don't recall being told that suspend path > can't race with anything else. So I think ravb should probably take > rtnl_lock or some such when its shutting itself down.. ? Should we add an ASSERT_RTNL call in the phy_detach function? (Maybe also in phy_attach to be consistent) Even thought, I think it may raise lots of warning from other NIT drivers. > If I'm wrong I think we should mention this is from suspend and > add Claudiu's stack trace to the commit msg. Ack. Regards, -- Köry Maincent, Bootlin Embedded Linux and kernel engineering https://bootlin.com
On Sun, Jan 19, 2025 at 01:45:18PM +0100, Kory Maincent wrote: > On Fri, 17 Jan 2025 19:07:20 -0800 > Jakub Kicinski <kuba@kernel.org> wrote: > > > On Fri, 17 Jan 2025 23:16:59 +0100 Kory Maincent wrote: > > > > If not protected by RTNL, what prevents two threads from calling this > > > > function at the same time, > > > > thus attempting to kfree_rcu() the same pointer twice ? > > > > > > I don't think this function can be called simultaneously from two threads, > > > if this were the case we would have already seen several issues with the > > > phydev pointer. But maybe I am wrong. > > > > > > The rcu_lock here is to prevent concurrent dev->hwprov pointer modification > > > done under rtnl_lock in net/ethtool/tsconfig.c. > > > > I could also be wrong, but I don't recall being told that suspend path > > can't race with anything else. So I think ravb should probably take > > rtnl_lock or some such when its shutting itself down.. ? > > Should we add an ASSERT_RTNL call in the phy_detach function? (Maybe > also in phy_attach to be consistent) > Even thought, I think it may raise lots of warning from other NIT drivers. How many drivers use phy_detach() ? The answer is... phylink, bcm genet and xgbe. Of the phylink ones: 1. phylink_connect_phy() - for use by drivers. This had better be called _before_ the netdev is registered (without rtnl) or from .ndo_open that holds the RTNL. 2. phylink_fwnode_phy_connect() - same as above. 3. phylink_sfp_config_phy(), called from the SFP code, and its state machines. It will be holding RTNL, because it is only safe to attach and detach PHYs from a registered netdev while holding RTNL. I haven't looked any further. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
On Sun, 19 Jan 2025 14:27:53 +0000 "Russell King (Oracle)" <linux@armlinux.org.uk> wrote: > On Sun, Jan 19, 2025 at 01:45:18PM +0100, Kory Maincent wrote: > > On Fri, 17 Jan 2025 19:07:20 -0800 > > Jakub Kicinski <kuba@kernel.org> wrote: > > > > > On Fri, 17 Jan 2025 23:16:59 +0100 Kory Maincent wrote: > [...] > [...] > > > > > > I could also be wrong, but I don't recall being told that suspend path > > > can't race with anything else. So I think ravb should probably take > > > rtnl_lock or some such when its shutting itself down.. ? > > > > Should we add an ASSERT_RTNL call in the phy_detach function? (Maybe > > also in phy_attach to be consistent) > > Even thought, I think it may raise lots of warning from other NIT drivers. > > How many drivers use phy_detach() ? > > The answer is... phylink, bcm genet and xgbe. phy_detach() is also called by phy_disconnect() which is much more used by the net drivers. > Of the phylink ones: > > 1. phylink_connect_phy() - for use by drivers. This had better be > called _before_ the netdev is registered (without rtnl) or > from .ndo_open that holds the RTNL. > > 2. phylink_fwnode_phy_connect() - same as above. > > 3. phylink_sfp_config_phy(), called from the SFP code, and its state > machines. It will be holding RTNL, because it is only safe to > attach and detach PHYs from a registered netdev while holding RTNL. > > I haven't looked any further. > -- Köry Maincent, Bootlin Embedded Linux and kernel engineering https://bootlin.com
© 2016 - 2026 Red Hat, Inc.