drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
We must set the clk_phy pointer to NULL to indicating it isn't available
if the optional phy clock couldn't be obtained. Otherwise the error code
returned by of_clk_get() could be wrongly taken as an address, causing
invalid pointer dereference when later clk_phy is passed to
clk_prepare_enable().
Fixes: da114122b831 ("net: ethernet: stmmac: dwmac-rk: Make the clk_phy could be used for external phy")
Signed-off-by: Yao Zi <ziyao@disroot.org>
---
drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
On next-20250903, the fixed commit causes NULL pointer dereference on
Radxa E20C during probe of dwmac-rk, a typical dmesg looks like
[ 0.273324] rk_gmac-dwmac ffbe0000.ethernet: IRQ eth_lpi not found
[ 0.273888] rk_gmac-dwmac ffbe0000.ethernet: IRQ sfty not found
[ 0.274520] rk_gmac-dwmac ffbe0000.ethernet: PTP uses main clock
[ 0.275226] rk_gmac-dwmac ffbe0000.ethernet: clock input or output? (output).
[ 0.275867] rk_gmac-dwmac ffbe0000.ethernet: Can not read property: tx_delay.
[ 0.276491] rk_gmac-dwmac ffbe0000.ethernet: set tx_delay to 0x30
[ 0.277026] rk_gmac-dwmac ffbe0000.ethernet: Can not read property: rx_delay.
[ 0.278086] rk_gmac-dwmac ffbe0000.ethernet: set rx_delay to 0x10
[ 0.278658] rk_gmac-dwmac ffbe0000.ethernet: integrated PHY? (no).
[ 0.279249] Unable to handle kernel paging request at virtual address fffffffffffffffe
[ 0.279948] Mem abort info:
[ 0.280195] ESR = 0x000000096000006
[ 0.280523] EC = 0x25: DABT (current EL), IL = 32 bits
[ 0.280989] SET = 0, FnV = 0
[ 0.281287] EA = 0, S1PTW = 0
[ 0.281574] FSC = 0x06: level 2 translation fault
where the invalid address is just -ENOENT (-2).
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
index cf619a428664..26ec8ae662a6 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
@@ -1414,11 +1414,17 @@ static int rk_gmac_clk_init(struct plat_stmmacenet_data *plat)
if (plat->phy_node) {
bsp_priv->clk_phy = of_clk_get(plat->phy_node, 0);
ret = PTR_ERR_OR_ZERO(bsp_priv->clk_phy);
- /* If it is not integrated_phy, clk_phy is optional */
+ /*
+ * If it is not integrated_phy, clk_phy is optional. But we must
+ * set bsp_priv->clk_phy to NULL if clk_phy isn't proivded, or
+ * the error code could be wrongly taken as an invalid pointer.
+ */
if (bsp_priv->integrated_phy) {
if (ret)
return dev_err_probe(dev, ret, "Cannot get PHY clock\n");
clk_set_rate(bsp_priv->clk_phy, 50000000);
+ } else if (ret) {
+ bsp_priv->clk_phy = NULL;
}
}
--
2.50.1
On Thu, Sep 04, 2025 at 03:12:24AM +0000, Yao Zi wrote: > if (plat->phy_node) { > bsp_priv->clk_phy = of_clk_get(plat->phy_node, 0); > ret = PTR_ERR_OR_ZERO(bsp_priv->clk_phy); > - /* If it is not integrated_phy, clk_phy is optional */ > + /* > + * If it is not integrated_phy, clk_phy is optional. But we must > + * set bsp_priv->clk_phy to NULL if clk_phy isn't proivded, or > + * the error code could be wrongly taken as an invalid pointer. > + */ I'm concerned by this. This code is getting the first clock from the DT description of the PHY. We don't know what type of PHY it is, or what the DT description of that PHY might suggest that the first clock would be. However, we're geting it and setting it to 50MHz. What if the clock is not what we think it is? I'm not sure we should be delving in to some other device's DT properties to then get resources that it _uses_ to then effectively take control those resources. I think we need way more detail on what's going on. Commit da114122b83 merely stated: For external phy, clk_phy should be optional, and some external phy need the clock input from clk_phy. This patch adds support for setting clk_phy for external phy. If the external PHY requires a clock supplied to it, shouldn't the PHY driver itself be getting that clock and setting it appropriately? -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
On 9/4/2025 6:58 PM, Russell King (Oracle) wrote: > On Thu, Sep 04, 2025 at 03:12:24AM +0000, Yao Zi wrote: >> if (plat->phy_node) { >> bsp_priv->clk_phy = of_clk_get(plat->phy_node, 0); >> ret = PTR_ERR_OR_ZERO(bsp_priv->clk_phy); >> - /* If it is not integrated_phy, clk_phy is optional */ >> + /* >> + * If it is not integrated_phy, clk_phy is optional. But we must >> + * set bsp_priv->clk_phy to NULL if clk_phy isn't proivded, or >> + * the error code could be wrongly taken as an invalid pointer. >> + */ > I'm concerned by this. This code is getting the first clock from the DT > description of the PHY. We don't know what type of PHY it is, or what > the DT description of that PHY might suggest that the first clock would > be. > > However, we're geting it and setting it to 50MHz. What if the clock is > not what we think it is? We only set integrated_phy to 50M, which are all known targets. For external PHYs, we do not perform frequency settings. > > I'm not sure we should be delving in to some other device's DT > properties to then get resources that it _uses_ to then effectively > take control those resources. > > I think we need way more detail on what's going on. Commit da114122b83 > merely stated: > > For external phy, clk_phy should be optional, and some external phy > need the clock input from clk_phy. This patch adds support for setting > clk_phy for external phy. > > If the external PHY requires a clock supplied to it, shouldn't the PHY > driver itself be getting that clock and setting it appropriately? >
On Thu, Sep 04, 2025 at 07:03:10PM +0800, Chaoyi Chen wrote: > > On 9/4/2025 6:58 PM, Russell King (Oracle) wrote: > > On Thu, Sep 04, 2025 at 03:12:24AM +0000, Yao Zi wrote: > > > if (plat->phy_node) { > > > bsp_priv->clk_phy = of_clk_get(plat->phy_node, 0); > > > ret = PTR_ERR_OR_ZERO(bsp_priv->clk_phy); > > > - /* If it is not integrated_phy, clk_phy is optional */ > > > + /* > > > + * If it is not integrated_phy, clk_phy is optional. But we must > > > + * set bsp_priv->clk_phy to NULL if clk_phy isn't proivded, or > > > + * the error code could be wrongly taken as an invalid pointer. > > > + */ > > I'm concerned by this. This code is getting the first clock from the DT > > description of the PHY. We don't know what type of PHY it is, or what > > the DT description of that PHY might suggest that the first clock would > > be. > > > > However, we're geting it and setting it to 50MHz. What if the clock is > > not what we think it is? > > We only set integrated_phy to 50M, which are all known targets. For external PHYs, we do not perform frequency settings. Same question concerning enabling and disabling another device's clock that the other device should be handling. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
On Thu, Sep 04, 2025 at 12:05:19PM +0100, Russell King (Oracle) wrote: > On Thu, Sep 04, 2025 at 07:03:10PM +0800, Chaoyi Chen wrote: > > > > On 9/4/2025 6:58 PM, Russell King (Oracle) wrote: > > > On Thu, Sep 04, 2025 at 03:12:24AM +0000, Yao Zi wrote: > > > > if (plat->phy_node) { > > > > bsp_priv->clk_phy = of_clk_get(plat->phy_node, 0); > > > > ret = PTR_ERR_OR_ZERO(bsp_priv->clk_phy); > > > > - /* If it is not integrated_phy, clk_phy is optional */ > > > > + /* > > > > + * If it is not integrated_phy, clk_phy is optional. But we must > > > > + * set bsp_priv->clk_phy to NULL if clk_phy isn't proivded, or > > > > + * the error code could be wrongly taken as an invalid pointer. > > > > + */ > > > I'm concerned by this. This code is getting the first clock from the DT > > > description of the PHY. We don't know what type of PHY it is, or what > > > the DT description of that PHY might suggest that the first clock would > > > be. > > > > > > However, we're geting it and setting it to 50MHz. What if the clock is > > > not what we think it is? > > > > We only set integrated_phy to 50M, which are all known targets. For external PHYs, we do not perform frequency settings. > > Same question concerning enabling and disabling another device's clock > that the other device should be handling. Let me be absolutely clear: I consider *everything* that is going on with clk_phy here to be a dirty hack. Resources used by a device that has its own driver should be managed by _that_ driver alone, not by some other random driver. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
On Thu, Sep 04, 2025 at 12:07:26PM +0100, Russell King (Oracle) wrote: > On Thu, Sep 04, 2025 at 12:05:19PM +0100, Russell King (Oracle) wrote: > > On Thu, Sep 04, 2025 at 07:03:10PM +0800, Chaoyi Chen wrote: > > > > > > On 9/4/2025 6:58 PM, Russell King (Oracle) wrote: > > > > On Thu, Sep 04, 2025 at 03:12:24AM +0000, Yao Zi wrote: > > > > > if (plat->phy_node) { > > > > > bsp_priv->clk_phy = of_clk_get(plat->phy_node, 0); > > > > > ret = PTR_ERR_OR_ZERO(bsp_priv->clk_phy); > > > > > - /* If it is not integrated_phy, clk_phy is optional */ > > > > > + /* > > > > > + * If it is not integrated_phy, clk_phy is optional. But we must > > > > > + * set bsp_priv->clk_phy to NULL if clk_phy isn't proivded, or > > > > > + * the error code could be wrongly taken as an invalid pointer. > > > > > + */ > > > > I'm concerned by this. This code is getting the first clock from the DT > > > > description of the PHY. We don't know what type of PHY it is, or what > > > > the DT description of that PHY might suggest that the first clock would > > > > be. > > > > > > > > However, we're geting it and setting it to 50MHz. What if the clock is > > > > not what we think it is? > > > > > > We only set integrated_phy to 50M, which are all known targets. For external PHYs, we do not perform frequency settings. > > > > Same question concerning enabling and disabling another device's clock > > that the other device should be handling. > > Let me be absolutely clear: I consider *everything* that is going on > with clk_phy here to be a dirty hack. > > Resources used by a device that has its own driver should be managed > by _that_ driver alone, not by some other random driver. Agree on this. Should we drop the patch, or fix it up for now to at least prevent the oops? Chaoyi, I guess there's no user of the feature for now, is it? Best regards, Yao Zi > -- > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
On 9/6/2025 1:36 PM, Yao Zi wrote: > On Thu, Sep 04, 2025 at 12:07:26PM +0100, Russell King (Oracle) wrote: >> On Thu, Sep 04, 2025 at 12:05:19PM +0100, Russell King (Oracle) wrote: >>> On Thu, Sep 04, 2025 at 07:03:10PM +0800, Chaoyi Chen wrote: >>>> On 9/4/2025 6:58 PM, Russell King (Oracle) wrote: >>>>> On Thu, Sep 04, 2025 at 03:12:24AM +0000, Yao Zi wrote: >>>>>> if (plat->phy_node) { >>>>>> bsp_priv->clk_phy = of_clk_get(plat->phy_node, 0); >>>>>> ret = PTR_ERR_OR_ZERO(bsp_priv->clk_phy); >>>>>> - /* If it is not integrated_phy, clk_phy is optional */ >>>>>> + /* >>>>>> + * If it is not integrated_phy, clk_phy is optional. But we must >>>>>> + * set bsp_priv->clk_phy to NULL if clk_phy isn't proivded, or >>>>>> + * the error code could be wrongly taken as an invalid pointer. >>>>>> + */ >>>>> I'm concerned by this. This code is getting the first clock from the DT >>>>> description of the PHY. We don't know what type of PHY it is, or what >>>>> the DT description of that PHY might suggest that the first clock would >>>>> be. >>>>> >>>>> However, we're geting it and setting it to 50MHz. What if the clock is >>>>> not what we think it is? >>>> We only set integrated_phy to 50M, which are all known targets. For external PHYs, we do not perform frequency settings. >>> Same question concerning enabling and disabling another device's clock >>> that the other device should be handling. >> Let me be absolutely clear: I consider *everything* that is going on >> with clk_phy here to be a dirty hack. >> >> Resources used by a device that has its own driver should be managed >> by _that_ driver alone, not by some other random driver. > Agree on this. Should we drop the patch, or fix it up for now to at > least prevent the oops? Chaoyi, I guess there's no user of the feature > for now, is it? This at least needs fixing. Sorry, I have no idea how to implement this in the PHY.
Hi, On Sat, Sep 06, 2025 at 02:26:31PM +0800, Chaoyi Chen wrote: > On 9/6/2025 1:36 PM, Yao Zi wrote: > > > On Thu, Sep 04, 2025 at 12:07:26PM +0100, Russell King (Oracle) wrote: > > > On Thu, Sep 04, 2025 at 12:05:19PM +0100, Russell King (Oracle) wrote: > > > > On Thu, Sep 04, 2025 at 07:03:10PM +0800, Chaoyi Chen wrote: > > > > > On 9/4/2025 6:58 PM, Russell King (Oracle) wrote: > > > > > > On Thu, Sep 04, 2025 at 03:12:24AM +0000, Yao Zi wrote: > > > > > > > if (plat->phy_node) { > > > > > > > bsp_priv->clk_phy = of_clk_get(plat->phy_node, 0); > > > > > > > ret = PTR_ERR_OR_ZERO(bsp_priv->clk_phy); > > > > > > > - /* If it is not integrated_phy, clk_phy is optional */ > > > > > > > + /* > > > > > > > + * If it is not integrated_phy, clk_phy is optional. But we must > > > > > > > + * set bsp_priv->clk_phy to NULL if clk_phy isn't proivded, or > > > > > > > + * the error code could be wrongly taken as an invalid pointer. > > > > > > > + */ > > > > > > I'm concerned by this. This code is getting the first clock from the DT > > > > > > description of the PHY. We don't know what type of PHY it is, or what > > > > > > the DT description of that PHY might suggest that the first clock would > > > > > > be. > > > > > > > > > > > > However, we're geting it and setting it to 50MHz. What if the clock is > > > > > > not what we think it is? > > > > > We only set integrated_phy to 50M, which are all known targets. For external PHYs, we do not perform frequency settings. > > > > Same question concerning enabling and disabling another device's clock > > > > that the other device should be handling. > > > Let me be absolutely clear: I consider *everything* that is going on > > > with clk_phy here to be a dirty hack. > > > > > > Resources used by a device that has its own driver should be managed > > > by _that_ driver alone, not by some other random driver. > > Agree on this. Should we drop the patch, or fix it up for now to at > > least prevent the oops? Chaoyi, I guess there's no user of the feature > > for now, is it? > > This at least needs fixing. Sorry, I have no idea how to implement > this in the PHY. I think the proper fix is to revert da114122b8314 ("net: ethernet: stmmac: dwmac-rk: Make the clk_phy could be used for external phy"), which has only recently been merged. External PHYs should reference their clocks themself instead of the MAC doing it. Chaoyi Chen: Have a look at the ROCK 4D devicetree: &mdio0 { rgmii_phy0: ethernet-phy@1 { compatible = "ethernet-phy-id001c.c916"; reg = <0x1>; clocks = <&cru REFCLKO25M_GMAC0_OUT>; assigned-clocks = <&cru REFCLKO25M_GMAC0_OUT>; assigned-clock-rates = <25000000>; ... }; }; The clock is enabled by the RTL8211F PHY driver (check for devm_clk_get_optional_enabled in drivers/net/phy/realtek/realtek_main.c), as the PHY is the one needing the clock and not the Rockchip MAC. For this to work it is important to set the right compatible string, so that the kernel can probe the right driver without needing to read the identification registers (as that would require the clock to be already configured before the driver is being probed). Greetings, -- Sebastian
Hi Sebastian, On 9/7/2025 4:25 AM, Sebastian Reichel wrote: > Hi, > > On Sat, Sep 06, 2025 at 02:26:31PM +0800, Chaoyi Chen wrote: >> On 9/6/2025 1:36 PM, Yao Zi wrote: >> >>> On Thu, Sep 04, 2025 at 12:07:26PM +0100, Russell King (Oracle) wrote: >>>> On Thu, Sep 04, 2025 at 12:05:19PM +0100, Russell King (Oracle) wrote: >>>>> On Thu, Sep 04, 2025 at 07:03:10PM +0800, Chaoyi Chen wrote: >>>>>> On 9/4/2025 6:58 PM, Russell King (Oracle) wrote: >>>>>>> On Thu, Sep 04, 2025 at 03:12:24AM +0000, Yao Zi wrote: >>>>>>>> if (plat->phy_node) { >>>>>>>> bsp_priv->clk_phy = of_clk_get(plat->phy_node, 0); >>>>>>>> ret = PTR_ERR_OR_ZERO(bsp_priv->clk_phy); >>>>>>>> - /* If it is not integrated_phy, clk_phy is optional */ >>>>>>>> + /* >>>>>>>> + * If it is not integrated_phy, clk_phy is optional. But we must >>>>>>>> + * set bsp_priv->clk_phy to NULL if clk_phy isn't proivded, or >>>>>>>> + * the error code could be wrongly taken as an invalid pointer. >>>>>>>> + */ >>>>>>> I'm concerned by this. This code is getting the first clock from the DT >>>>>>> description of the PHY. We don't know what type of PHY it is, or what >>>>>>> the DT description of that PHY might suggest that the first clock would >>>>>>> be. >>>>>>> >>>>>>> However, we're geting it and setting it to 50MHz. What if the clock is >>>>>>> not what we think it is? >>>>>> We only set integrated_phy to 50M, which are all known targets. For external PHYs, we do not perform frequency settings. >>>>> Same question concerning enabling and disabling another device's clock >>>>> that the other device should be handling. >>>> Let me be absolutely clear: I consider *everything* that is going on >>>> with clk_phy here to be a dirty hack. >>>> >>>> Resources used by a device that has its own driver should be managed >>>> by _that_ driver alone, not by some other random driver. >>> Agree on this. Should we drop the patch, or fix it up for now to at >>> least prevent the oops? Chaoyi, I guess there's no user of the feature >>> for now, is it? >> This at least needs fixing. Sorry, I have no idea how to implement >> this in the PHY. > I think the proper fix is to revert da114122b8314 ("net: ethernet: > stmmac: dwmac-rk: Make the clk_phy could be used for external phy"), > which has only recently been merged. External PHYs should reference > their clocks themself instead of the MAC doing it. > > Chaoyi Chen: Have a look at the ROCK 4D devicetree: > > &mdio0 { > rgmii_phy0: ethernet-phy@1 { > compatible = "ethernet-phy-id001c.c916"; > reg = <0x1>; > clocks = <&cru REFCLKO25M_GMAC0_OUT>; > assigned-clocks = <&cru REFCLKO25M_GMAC0_OUT>; > assigned-clock-rates = <25000000>; > ... > }; > }; > > The clock is enabled by the RTL8211F PHY driver (check for > devm_clk_get_optional_enabled in drivers/net/phy/realtek/realtek_main.c), > as the PHY is the one needing the clock and not the Rockchip MAC. For > this to work it is important to set the right compatible string, so > that the kernel can probe the right driver without needing to read the > identification registers (as that would require the clock to be already > configured before the driver is being probed). Yes, what you said is correct. This is also the issue we encountered earlier on RK3576 board :)
On Thu, Sep 04, 2025 at 03:12:24AM +0000, Yao Zi wrote: > We must set the clk_phy pointer to NULL to indicating it isn't available > if the optional phy clock couldn't be obtained. Otherwise the error code > returned by of_clk_get() could be wrongly taken as an address, causing > invalid pointer dereference when later clk_phy is passed to > clk_prepare_enable(). > > Fixes: da114122b831 ("net: ethernet: stmmac: dwmac-rk: Make the clk_phy could be used for external phy") > Signed-off-by: Yao Zi <ziyao@disroot.org> > --- > drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > On next-20250903, the fixed commit causes NULL pointer dereference on > Radxa E20C during probe of dwmac-rk, a typical dmesg looks like > > [ 0.273324] rk_gmac-dwmac ffbe0000.ethernet: IRQ eth_lpi not found > [ 0.273888] rk_gmac-dwmac ffbe0000.ethernet: IRQ sfty not found > [ 0.274520] rk_gmac-dwmac ffbe0000.ethernet: PTP uses main clock > [ 0.275226] rk_gmac-dwmac ffbe0000.ethernet: clock input or output? (output). > [ 0.275867] rk_gmac-dwmac ffbe0000.ethernet: Can not read property: tx_delay. > [ 0.276491] rk_gmac-dwmac ffbe0000.ethernet: set tx_delay to 0x30 > [ 0.277026] rk_gmac-dwmac ffbe0000.ethernet: Can not read property: rx_delay. > [ 0.278086] rk_gmac-dwmac ffbe0000.ethernet: set rx_delay to 0x10 > [ 0.278658] rk_gmac-dwmac ffbe0000.ethernet: integrated PHY? (no). > [ 0.279249] Unable to handle kernel paging request at virtual address fffffffffffffffe > [ 0.279948] Mem abort info: > [ 0.280195] ESR = 0x000000096000006 > [ 0.280523] EC = 0x25: DABT (current EL), IL = 32 bits > [ 0.280989] SET = 0, FnV = 0 > [ 0.281287] EA = 0, S1PTW = 0 > [ 0.281574] FSC = 0x06: level 2 translation fault > > where the invalid address is just -ENOENT (-2). > > diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c > index cf619a428664..26ec8ae662a6 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c > +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c > @@ -1414,11 +1414,17 @@ static int rk_gmac_clk_init(struct plat_stmmacenet_data *plat) > if (plat->phy_node) { > bsp_priv->clk_phy = of_clk_get(plat->phy_node, 0); > ret = PTR_ERR_OR_ZERO(bsp_priv->clk_phy); > - /* If it is not integrated_phy, clk_phy is optional */ > + /* > + * If it is not integrated_phy, clk_phy is optional. But we must > + * set bsp_priv->clk_phy to NULL if clk_phy isn't proivded, or > + * the error code could be wrongly taken as an invalid pointer. > + */ > if (bsp_priv->integrated_phy) { > if (ret) > return dev_err_probe(dev, ret, "Cannot get PHY clock\n"); > clk_set_rate(bsp_priv->clk_phy, 50000000); > + } else if (ret) { > + bsp_priv->clk_phy = NULL; > } > } Thanks, and sorry for my early confusion about applying this patch. I agree that the bug you point out is addressed by this patch. Although I wonder if it is cleaner not to set bsp_priv->clk_phy unless there is no error, rather than setting it then resetting it if there is an error. More importantly, I wonder if there is another bug: does clk_set_rate need to be called in the case where there is no error and bsp_priv->integrated_phy is false? So I am wondering if it makes sense to go with something like this. (Compile tested only!) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c index 266c53379236..a25816af2c37 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c @@ -1411,12 +1411,16 @@ static int rk_gmac_clk_init(struct plat_stmmacenet_data *plat) } if (plat->phy_node) { - bsp_priv->clk_phy = of_clk_get(plat->phy_node, 0); - ret = PTR_ERR_OR_ZERO(bsp_priv->clk_phy); - /* If it is not integrated_phy, clk_phy is optional */ - if (bsp_priv->integrated_phy) { - if (ret) + struct clk *clk_phy; + + clk_phy = of_clk_get(plat->phy_node, 0); + ret = PTR_ERR_OR_ZERO(clk_phy); + if (ret) { + /* If it is not integrated_phy, clk_phy is optional */ + if (bsp_priv->integrated_phy) return dev_err_probe(dev, ret, "Cannot get PHY clock\n"); + } else { + bsp_priv->clk_phy = clk_phy; clk_set_rate(bsp_priv->clk_phy, 50000000); } } Please note: if you send an updated patch (against net) please make sure you wait 24h before the original post. See: https://docs.kernel.org/process/maintainer-netdev.html
On Thu, Sep 04, 2025 at 11:34:43AM +0100, Simon Horman wrote: > On Thu, Sep 04, 2025 at 03:12:24AM +0000, Yao Zi wrote: > > We must set the clk_phy pointer to NULL to indicating it isn't available > > if the optional phy clock couldn't be obtained. Otherwise the error code > > returned by of_clk_get() could be wrongly taken as an address, causing > > invalid pointer dereference when later clk_phy is passed to > > clk_prepare_enable(). > > > > Fixes: da114122b831 ("net: ethernet: stmmac: dwmac-rk: Make the clk_phy could be used for external phy") > > Signed-off-by: Yao Zi <ziyao@disroot.org> > > --- > > drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > On next-20250903, the fixed commit causes NULL pointer dereference on > > Radxa E20C during probe of dwmac-rk, a typical dmesg looks like > > > > [ 0.273324] rk_gmac-dwmac ffbe0000.ethernet: IRQ eth_lpi not found > > [ 0.273888] rk_gmac-dwmac ffbe0000.ethernet: IRQ sfty not found > > [ 0.274520] rk_gmac-dwmac ffbe0000.ethernet: PTP uses main clock > > [ 0.275226] rk_gmac-dwmac ffbe0000.ethernet: clock input or output? (output). > > [ 0.275867] rk_gmac-dwmac ffbe0000.ethernet: Can not read property: tx_delay. > > [ 0.276491] rk_gmac-dwmac ffbe0000.ethernet: set tx_delay to 0x30 > > [ 0.277026] rk_gmac-dwmac ffbe0000.ethernet: Can not read property: rx_delay. > > [ 0.278086] rk_gmac-dwmac ffbe0000.ethernet: set rx_delay to 0x10 > > [ 0.278658] rk_gmac-dwmac ffbe0000.ethernet: integrated PHY? (no). > > [ 0.279249] Unable to handle kernel paging request at virtual address fffffffffffffffe > > [ 0.279948] Mem abort info: > > [ 0.280195] ESR = 0x000000096000006 > > [ 0.280523] EC = 0x25: DABT (current EL), IL = 32 bits > > [ 0.280989] SET = 0, FnV = 0 > > [ 0.281287] EA = 0, S1PTW = 0 > > [ 0.281574] FSC = 0x06: level 2 translation fault > > > > where the invalid address is just -ENOENT (-2). > > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c > > index cf619a428664..26ec8ae662a6 100644 > > --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c > > +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c > > @@ -1414,11 +1414,17 @@ static int rk_gmac_clk_init(struct plat_stmmacenet_data *plat) > > if (plat->phy_node) { > > bsp_priv->clk_phy = of_clk_get(plat->phy_node, 0); > > ret = PTR_ERR_OR_ZERO(bsp_priv->clk_phy); > > - /* If it is not integrated_phy, clk_phy is optional */ > > + /* > > + * If it is not integrated_phy, clk_phy is optional. But we must > > + * set bsp_priv->clk_phy to NULL if clk_phy isn't proivded, or > > + * the error code could be wrongly taken as an invalid pointer. > > + */ > > if (bsp_priv->integrated_phy) { > > if (ret) > > return dev_err_probe(dev, ret, "Cannot get PHY clock\n"); > > clk_set_rate(bsp_priv->clk_phy, 50000000); > > + } else if (ret) { > > + bsp_priv->clk_phy = NULL; > > } > > } > > Thanks, and sorry for my early confusion about applying this patch. > > I agree that the bug you point out is addressed by this patch. > Although I wonder if it is cleaner not to set bsp_priv->clk_phy > unless there is no error, rather than setting it then resetting > it if there is an error. Yes, it sounds more natural to have a temporary variable storing result of of_clk_get() and only assign it to clk_phy when the result is valid. > More importantly, I wonder if there is another bug: does clk_set_rate need > to be called in the case where there is no error and bsp_priv->integrated_phy > is false? In my understanding this may be intended, bsp_priv->integrated_phy is only false when an external phy is used, and an external phy might require arbitrary clock rates, thus it doesn't seem a good idea to me to hardcode the clock rate in the driver. I guess rate of clk_phy could also be set up with assigned-clock-rates in devicetree. If so it may be reasonable to enable the clock only. > So I am wondering if it makes sense to go with something like this. > (Compile tested only!) > > diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c > index 266c53379236..a25816af2c37 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c > +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c > @@ -1411,12 +1411,16 @@ static int rk_gmac_clk_init(struct plat_stmmacenet_data *plat) > } > > if (plat->phy_node) { > - bsp_priv->clk_phy = of_clk_get(plat->phy_node, 0); > - ret = PTR_ERR_OR_ZERO(bsp_priv->clk_phy); > - /* If it is not integrated_phy, clk_phy is optional */ > - if (bsp_priv->integrated_phy) { > - if (ret) > + struct clk *clk_phy; > + > + clk_phy = of_clk_get(plat->phy_node, 0); > + ret = PTR_ERR_OR_ZERO(clk_phy); > + if (ret) { > + /* If it is not integrated_phy, clk_phy is optional */ > + if (bsp_priv->integrated_phy) > return dev_err_probe(dev, ret, "Cannot get PHY clock\n"); > + } else { > + bsp_priv->clk_phy = clk_phy; > clk_set_rate(bsp_priv->clk_phy, 50000000); > } > } > > Please note: if you send an updated patch (against net) please > make sure you wait 24h before the original post. > > See: https://docs.kernel.org/process/maintainer-netdev.html Thanks for the tip. While digging through the problematic commit for the clk_phy's rate problem, I found others have discovered the problem[1] and proposed some fixes (though there hasn't been a formal patch). I should have read the original thread before sending this patch! Will wait for some time and see whether the netdev maintainer prefers waiting for original author's fix or taking mine. Best regards, Yao Zi [1]: https://lore.kernel.org/netdev/a30a8c97-6b96-45ba-bad7-8a40401babc2@samsung.com/
On Thu, Sep 04, 2025 at 11:34:43AM +0100, Simon Horman wrote: > Thanks, and sorry for my early confusion about applying this patch. > > I agree that the bug you point out is addressed by this patch. > Although I wonder if it is cleaner not to set bsp_priv->clk_phy > unless there is no error, rather than setting it then resetting > it if there is an error. +1 ! > More importantly, I wonder if there is another bug: does clk_set_rate need > to be called in the case where there is no error and bsp_priv->integrated_phy > is false? I think there's another issue: static int rk_gmac_clk_init(struct plat_stmmacenet_data *plat) { ... if (plat->phy_node) { bsp_priv->clk_phy = of_clk_get(plat->phy_node, 0); ... static void rk_gmac_remove(struct platform_device *pdev) { ... if (priv->plat->phy_node && bsp_priv->integrated_phy) clk_put(bsp_priv->clk_phy); So if bsp_priv->integrated_phy is false, then we get the clock but don't put it. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
On 9/4/2025 6:49 PM, Russell King (Oracle) wrote: > On Thu, Sep 04, 2025 at 11:34:43AM +0100, Simon Horman wrote: >> Thanks, and sorry for my early confusion about applying this patch. >> >> I agree that the bug you point out is addressed by this patch. >> Although I wonder if it is cleaner not to set bsp_priv->clk_phy >> unless there is no error, rather than setting it then resetting >> it if there is an error. > +1 ! > >> More importantly, I wonder if there is another bug: does clk_set_rate need >> to be called in the case where there is no error and bsp_priv->integrated_phy >> is false? > I think there's another issue: > > static int rk_gmac_clk_init(struct plat_stmmacenet_data *plat) > { > ... > if (plat->phy_node) { > bsp_priv->clk_phy = of_clk_get(plat->phy_node, 0); > ... > > static void rk_gmac_remove(struct platform_device *pdev) > { > ... > if (priv->plat->phy_node && bsp_priv->integrated_phy) > clk_put(bsp_priv->clk_phy); > > So if bsp_priv->integrated_phy is false, then we get the clock but > don't put it. Yes! Just remove "bsp_priv->integrated_phy"
On Thu, Sep 04, 2025 at 06:58:12PM +0800, Chaoyi Chen wrote: > > On 9/4/2025 6:49 PM, Russell King (Oracle) wrote: > > On Thu, Sep 04, 2025 at 11:34:43AM +0100, Simon Horman wrote: > > > Thanks, and sorry for my early confusion about applying this patch. > > > > > > I agree that the bug you point out is addressed by this patch. > > > Although I wonder if it is cleaner not to set bsp_priv->clk_phy > > > unless there is no error, rather than setting it then resetting > > > it if there is an error. > > +1 ! > > > > > More importantly, I wonder if there is another bug: does clk_set_rate need > > > to be called in the case where there is no error and bsp_priv->integrated_phy > > > is false? > > I think there's another issue: > > > > static int rk_gmac_clk_init(struct plat_stmmacenet_data *plat) > > { > > ... > > if (plat->phy_node) { > > bsp_priv->clk_phy = of_clk_get(plat->phy_node, 0); This is the only invokation to of_clk_get() in the driver. Should we convert it to the devres-managed variant? And rk_gmac_remove could be simplified further. > > ... > > > > static void rk_gmac_remove(struct platform_device *pdev) > > { > > ... > > if (priv->plat->phy_node && bsp_priv->integrated_phy) > > clk_put(bsp_priv->clk_phy); > > > > So if bsp_priv->integrated_phy is false, then we get the clock but > > don't put it. > > Yes! Just remove "bsp_priv->integrated_phy" > Cheers, Yao Zi
On Thu, Sep 04, 2025 at 03:12:24AM +0000, Yao Zi wrote: > We must set the clk_phy pointer to NULL to indicating it isn't available > if the optional phy clock couldn't be obtained. Otherwise the error code > returned by of_clk_get() could be wrongly taken as an address, causing > invalid pointer dereference when later clk_phy is passed to > clk_prepare_enable(). > > Fixes: da114122b831 ("net: ethernet: stmmac: dwmac-rk: Make the clk_phy could be used for external phy") > Signed-off-by: Yao Zi <ziyao@disroot.org> ... Hi, I this patch doesn't seem to match upstream code. Looking over the upstream code, it seems to me that going into the code in question .clk_phy should be NULL, as bsp_priv it is allocated using devm_kzalloc() over in rk_gmac_setup() While the upstream version of the code your patch modifies is as follows. And doesn't touch .clk_phy if integrated_phy is not set. if (plat->phy_node && bsp_priv->integrated_phy) { bsp_priv->clk_phy = of_clk_get(plat->phy_node, 0); ret = PTR_ERR_OR_ZERO(bsp_priv->clk_phy); if (ret) return dev_err_probe(dev, ret, "Cannot get PHY clock\n"); clk_set_rate(bsp_priv->clk_phy, 50000000); } Am I missing something?
On Thu, Sep 04, 2025 at 10:54:57AM +0100, Simon Horman wrote: > On Thu, Sep 04, 2025 at 03:12:24AM +0000, Yao Zi wrote: > > We must set the clk_phy pointer to NULL to indicating it isn't available > > if the optional phy clock couldn't be obtained. Otherwise the error code > > returned by of_clk_get() could be wrongly taken as an address, causing > > invalid pointer dereference when later clk_phy is passed to > > clk_prepare_enable(). > > > > Fixes: da114122b831 ("net: ethernet: stmmac: dwmac-rk: Make the clk_phy could be used for external phy") > > Signed-off-by: Yao Zi <ziyao@disroot.org> > > ... > > Hi, > > I this patch doesn't seem to match upstream code. > > Looking over the upstream code, it seems to me that > going into the code in question .clk_phy should > be NULL, as bsp_priv it is allocated using devm_kzalloc() > over in rk_gmac_setup() > > While the upstream version of the code your patch modifies > is as follows. And doesn't touch .clk_phy if integrated_phy is not set. > > if (plat->phy_node && bsp_priv->integrated_phy) { > bsp_priv->clk_phy = of_clk_get(plat->phy_node, 0); > ret = PTR_ERR_OR_ZERO(bsp_priv->clk_phy); > if (ret) > return dev_err_probe(dev, ret, "Cannot get PHY clock\n"); > clk_set_rate(bsp_priv->clk_phy, 50000000); > } > > Am I missing something? Oops, I missed that da114122b831 is present in net-next (but not net). Let me look over this a second time.
On Thu, Sep 04, 2025 at 10:56:57AM +0100, Simon Horman wrote: > On Thu, Sep 04, 2025 at 10:54:57AM +0100, Simon Horman wrote: > > On Thu, Sep 04, 2025 at 03:12:24AM +0000, Yao Zi wrote: > > > We must set the clk_phy pointer to NULL to indicating it isn't available > > > if the optional phy clock couldn't be obtained. Otherwise the error code > > > returned by of_clk_get() could be wrongly taken as an address, causing > > > invalid pointer dereference when later clk_phy is passed to > > > clk_prepare_enable(). > > > > > > Fixes: da114122b831 ("net: ethernet: stmmac: dwmac-rk: Make the clk_phy could be used for external phy") > > > Signed-off-by: Yao Zi <ziyao@disroot.org> > > > > ... > > > > Hi, > > > > I this patch doesn't seem to match upstream code. > > > > Looking over the upstream code, it seems to me that > > going into the code in question .clk_phy should > > be NULL, as bsp_priv it is allocated using devm_kzalloc() > > over in rk_gmac_setup() > > > > While the upstream version of the code your patch modifies > > is as follows. And doesn't touch .clk_phy if integrated_phy is not set. > > > > if (plat->phy_node && bsp_priv->integrated_phy) { > > bsp_priv->clk_phy = of_clk_get(plat->phy_node, 0); > > ret = PTR_ERR_OR_ZERO(bsp_priv->clk_phy); > > if (ret) > > return dev_err_probe(dev, ret, "Cannot get PHY clock\n"); > > clk_set_rate(bsp_priv->clk_phy, 50000000); > > } > > > > Am I missing something? > > Oops, I missed that da114122b831 is present in net-next (but not net). > Let me look over this a second time. Oops, yes. Though this is a fix patch, it should target net-next instead of net since the commit causing problems hasn't landed in net. Sorry for the confusion. Best regards, Yao Zi
On Thu, Sep 04, 2025 at 10:10:01AM +0000, Yao Zi wrote: > On Thu, Sep 04, 2025 at 10:56:57AM +0100, Simon Horman wrote: > > On Thu, Sep 04, 2025 at 10:54:57AM +0100, Simon Horman wrote: > > > On Thu, Sep 04, 2025 at 03:12:24AM +0000, Yao Zi wrote: > > > > We must set the clk_phy pointer to NULL to indicating it isn't available > > > > if the optional phy clock couldn't be obtained. Otherwise the error code > > > > returned by of_clk_get() could be wrongly taken as an address, causing > > > > invalid pointer dereference when later clk_phy is passed to > > > > clk_prepare_enable(). > > > > > > > > Fixes: da114122b831 ("net: ethernet: stmmac: dwmac-rk: Make the clk_phy could be used for external phy") > > > > Signed-off-by: Yao Zi <ziyao@disroot.org> > > > > > > ... > > > > > > Hi, > > > > > > I this patch doesn't seem to match upstream code. > > > > > > Looking over the upstream code, it seems to me that > > > going into the code in question .clk_phy should > > > be NULL, as bsp_priv it is allocated using devm_kzalloc() > > > over in rk_gmac_setup() > > > > > > While the upstream version of the code your patch modifies > > > is as follows. And doesn't touch .clk_phy if integrated_phy is not set. > > > > > > if (plat->phy_node && bsp_priv->integrated_phy) { > > > bsp_priv->clk_phy = of_clk_get(plat->phy_node, 0); > > > ret = PTR_ERR_OR_ZERO(bsp_priv->clk_phy); > > > if (ret) > > > return dev_err_probe(dev, ret, "Cannot get PHY clock\n"); > > > clk_set_rate(bsp_priv->clk_phy, 50000000); > > > } > > > > > > Am I missing something? > > > > Oops, I missed that da114122b831 is present in net-next (but not net). > > Let me look over this a second time. > > Oops, yes. Though this is a fix patch, it should target net-next instead > of net since the commit causing problems hasn't landed in net. Sorry for > the confusion. It's ok. I'm pretty good at confusing myself without any assistance.
© 2016 - 2025 Red Hat, Inc.