PCI: qcom: Implement shutdown() callback

[PATCH] PCI: qcom: Implement shutdown() callback

Posted by Krishna Chaitanya Chundru 10 months, 1 week ago

From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>

PCIe host controller drivers are supposed to properly remove the
endpoint drivers and release the resources during host shutdown/reboot
to avoid issues like smmu errors, NOC errors, etc.

So, stop and remove the root bus and its associated devices and release
its resources during system shutdown to ensure a clean shutdown/reboot.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
---
 drivers/pci/controller/dwc/pcie-qcom.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
index e4d3366ead1f9198693e6f9da4ae1dc40a3a0519..926811a0e63eb3663c1f41dc598659993546d832 100644
--- a/drivers/pci/controller/dwc/pcie-qcom.c
+++ b/drivers/pci/controller/dwc/pcie-qcom.c
@@ -1754,6 +1754,16 @@ static int qcom_pcie_probe(struct platform_device *pdev)
 	return ret;
 }
 
+static void qcom_pcie_shutdown(struct platform_device *pdev)
+{
+	struct qcom_pcie *pcie = platform_get_drvdata(pdev);
+
+	dw_pcie_host_deinit(&pcie->pci->pp);
+	phy_exit(pcie->phy);
+	pm_runtime_put(&pdev->dev);
+	pm_runtime_disable(&pdev->dev);
+}
+
 static int qcom_pcie_suspend_noirq(struct device *dev)
 {
 	struct qcom_pcie *pcie = dev_get_drvdata(dev);
@@ -1890,5 +1900,6 @@ static struct platform_driver qcom_pcie_driver = {
 		.pm = &qcom_pcie_pm_ops,
 		.probe_type = PROBE_PREFER_ASYNCHRONOUS,
 	},
+	.shutdown = qcom_pcie_shutdown,
 };
 builtin_platform_driver(qcom_pcie_driver);

---
base-commit: b3ee1e4609512dfff642a96b34d7e5dfcdc92d05
change-id: 20250321-shutdown-9237fad7374c

Best regards,
-- 
Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>

Re: [PATCH] PCI: qcom: Implement shutdown() callback

Posted by Bjorn Helgaas 9 months, 2 weeks ago

[+cc Greg, Rafael, Danilo for driver model .shutdown() question]
[+cc Heiner et al for related conversation at
https://lore.kernel.org/r/20250415095335.506266-2-cassel@kernel.org]

On Tue, Apr 01, 2025 at 04:51:37PM +0530, Krishna Chaitanya Chundru wrote:
> From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> 
> PCIe host controller drivers are supposed to properly remove the
> endpoint drivers and release the resources during host shutdown/reboot
> to avoid issues like smmu errors, NOC errors, etc.

The effect of this patch is:

    .shutdown()
  +   qcom_pcie_shutdown
  +     dw_pcie_host_deinit
  +       pci_stop_root_bus     # release all drivers of downstream pci_devs
  +       pci_remove_root_bus   # remove all downstream pci_devs

I'm not sure about removing all these drivers in the .shutdown() path.
The generic .shutdown() doc is "quiesce the device" [1], and my
current interpretation for PCI is that it should disable DMA and
interrupts from the device [2].

If PCI host controller drivers are supposed to remove all downstream
drivers and devices in .shutdown(), they're all broken because that's
currently only done in .remove() (and not even all of those).

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/device/driver.h?id=v6.14#n73
[2] https://lore.kernel.org/all/61f70fd6-52fd-da07-ce73-303f95132131@codeaurora.org/

> So, stop and remove the root bus and its associated devices and release
> its resources during system shutdown to ensure a clean shutdown/reboot.
> 
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
> ---
>  drivers/pci/controller/dwc/pcie-qcom.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
> index e4d3366ead1f9198693e6f9da4ae1dc40a3a0519..926811a0e63eb3663c1f41dc598659993546d832 100644
> --- a/drivers/pci/controller/dwc/pcie-qcom.c
> +++ b/drivers/pci/controller/dwc/pcie-qcom.c
> @@ -1754,6 +1754,16 @@ static int qcom_pcie_probe(struct platform_device *pdev)
>  	return ret;
>  }
>  
> +static void qcom_pcie_shutdown(struct platform_device *pdev)
> +{
> +	struct qcom_pcie *pcie = platform_get_drvdata(pdev);
> +
> +	dw_pcie_host_deinit(&pcie->pci->pp);
> +	phy_exit(pcie->phy);
> +	pm_runtime_put(&pdev->dev);
> +	pm_runtime_disable(&pdev->dev);
> +}
> +
>  static int qcom_pcie_suspend_noirq(struct device *dev)
>  {
>  	struct qcom_pcie *pcie = dev_get_drvdata(dev);
> @@ -1890,5 +1900,6 @@ static struct platform_driver qcom_pcie_driver = {
>  		.pm = &qcom_pcie_pm_ops,
>  		.probe_type = PROBE_PREFER_ASYNCHRONOUS,
>  	},
> +	.shutdown = qcom_pcie_shutdown,
>  };
>  builtin_platform_driver(qcom_pcie_driver);

Re: [PATCH] PCI: qcom: Implement shutdown() callback

Posted by Manivannan Sadhasivam 9 months, 4 weeks ago

On Tue, Apr 01, 2025 at 04:51:37PM +0530, Krishna Chaitanya Chundru wrote:
> From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> 
> PCIe host controller drivers are supposed to properly remove the
> endpoint drivers and release the resources during host shutdown/reboot
> to avoid issues like smmu errors, NOC errors, etc.
> 
> So, stop and remove the root bus and its associated devices and release
> its resources during system shutdown to ensure a clean shutdown/reboot.
> 
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
> ---

While reposting a patch, you should include what has changed in-between in the
changelog. Even if there are no changes, you should mention that and express
your intent to get this patch merged.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

Re: [PATCH] PCI: qcom: Implement shutdown() callback

Posted by Niklas Cassel 10 months, 1 week ago

Hello Krishna,

On Tue, Apr 01, 2025 at 04:51:37PM +0530, Krishna Chaitanya Chundru wrote:
> From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
>
> PCIe host controller drivers are supposed to properly remove the
> endpoint drivers and release the resources during host shutdown/reboot
> to avoid issues like smmu errors, NOC errors, etc.
>
> So, stop and remove the root bus and its associated devices and release
> its resources during system shutdown to ensure a clean shutdown/reboot.
>
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
> ---
>  drivers/pci/controller/dwc/pcie-qcom.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
> index e4d3366ead1f9198693e6f9da4ae1dc40a3a0519..926811a0e63eb3663c1f41dc598659993546d832 100644
> --- a/drivers/pci/controller/dwc/pcie-qcom.c
> +++ b/drivers/pci/controller/dwc/pcie-qcom.c
> @@ -1754,6 +1754,16 @@ static int qcom_pcie_probe(struct platform_device *pdev)
>	return ret;
>  }
>
> +static void qcom_pcie_shutdown(struct platform_device *pdev)
> +{
> +	struct qcom_pcie *pcie = platform_get_drvdata(pdev);
> +
> +	dw_pcie_host_deinit(&pcie->pci->pp);
> +	phy_exit(pcie->phy);
> +	pm_runtime_put(&pdev->dev);
> +	pm_runtime_disable(&pdev->dev);
> +}
> +
>  static int qcom_pcie_suspend_noirq(struct device *dev)
>  {
>	struct qcom_pcie *pcie = dev_get_drvdata(dev);
> @@ -1890,5 +1900,6 @@ static struct platform_driver qcom_pcie_driver = {
>		.pm = &qcom_pcie_pm_ops,
>		.probe_type = PROBE_PREFER_ASYNCHRONOUS,
>	},
> +	.shutdown = qcom_pcie_shutdown,
>  };
>  builtin_platform_driver(qcom_pcie_driver);
>
> ---

Out of curiosity, I tried something similar to on pcie-dw-rockchip.c

Simply having a ->shutdown() callback that only calls dw_pcie_host_deinit()
was enough for me to produce:

[   40.209887] r8169 0004:41:00.0 eth0: Link is Down
[   40.216572] ------------[ cut here ]------------
[   40.216986] called from state HALTED
[   40.217317] WARNING: CPU: 7 PID: 265 at drivers/net/phy/phy.c:1630 phy_stop+0x134/0x1a0
[   40.218024] Modules linked in: rk805_pwrkey hantro_vpu v4l2_jpeg v4l2_vp9 v4l2_h264 v4l2_mem2mem videobuf2_v4l2 videobuf2_dma_contig videobuf2_memops videobuf2_common vidf
[   40.220267] CPU: 7 UID: 0 PID: 265 Comm: init Not tainted 6.14.0+ #134 PREEMPT
[   40.220908] Hardware name: Radxa ROCK 5B (DT)
[   40.221289] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   40.221899] pc : phy_stop+0x134/0x1a0
[   40.222222] lr : phy_stop+0x134/0x1a0
[   40.222546] sp : ffff800082213820
[   40.222836] x29: ffff800082213820 x28: ffff45ec84b30000 x27: 0000000000000000
[   40.223463] x26: 0000000000000000 x25: 0000000000000000 x24: ffffbe8df7fde030
[   40.224088] x23: ffff800082213990 x22: 0000000000000001 x21: ffff45ec80e10000
[   40.224714] x20: ffff45ec82cb40c8 x19: ffff45ec82ccc000 x18: 0000000000000006
[   40.225340] x17: 000000040044ffff x16: 005000f2b5503510 x15: 0720072007200720
[   40.225966] x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720
[   40.226592] x11: 0000000000000058 x10: 0000000000000018 x9 : ffffbe8df556469c
[   40.227217] x8 : 0000000000000268 x7 : ffffbe8df7a48648 x6 : ffffbe8df7a48648
[   40.227842] x5 : 0000000000017fe8 x4 : 0000000000000000 x3 : 0000000000000000
[   40.228468] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff45ec84b30000
[   40.229093] Call trace:
[   40.229308]  phy_stop+0x134/0x1a0 (P)
[   40.229634]  rtl8169_down+0x34/0x280
[   40.229952]  rtl8169_close+0x64/0x100
[   40.230275]  __dev_close_many+0xbc/0x1f0
[   40.230621]  dev_close_many+0x94/0x160
[   40.230951]  unregister_netdevice_many_notify+0x14c/0x9c0
[   40.231426]  unregister_netdevice_queue+0xe4/0x100
[   40.231848]  unregister_netdev+0x2c/0x60
[   40.232193]  rtl_remove_one+0xa0/0xe0
[   40.232517]  pci_device_remove+0x4c/0xf8
[   40.232864]  device_remove+0x54/0x90
[   40.233182]  device_release_driver_internal+0x1d4/0x238
[   40.233643]  device_release_driver+0x20/0x38
[   40.234019]  pci_stop_bus_device+0x84/0xe0
[   40.234381]  pci_stop_bus_device+0x40/0xe0
[   40.234741]  pci_stop_root_bus+0x48/0x80
[   40.235087]  dw_pcie_host_deinit+0x34/0xe0
[   40.235452]  rockchip_pcie_shutdown+0x24/0x48
[   40.235839]  platform_shutdown+0x2c/0x48
[   40.236187]  device_shutdown+0x150/0x278
[   40.236533]  kernel_restart+0x4c/0xb8
[   40.236859]  __do_sys_reboot+0x178/0x280
[   40.237206]  __arm64_sys_reboot+0x2c/0x40
[   40.237561]  invoke_syscall+0x50/0x120
[   40.237891]  el0_svc_common.constprop.0+0x48/0xf0
[   40.238305]  do_el0_svc+0x24/0x38
[   40.238597]  el0_svc+0x30/0xd0
[   40.238868]  el0t_64_sync_handler+0x10c/0x138
[   40.239251]  el0t_64_sync+0x198/0x1a0
[   40.239575] ---[ end trace 0000000000000000 ]---

Did you try your change with a simple network card connected to the PCI slot?
(And not just another qcom board running in EP mode.)

I don't see why you wouldn't see the same thing as me.


Kind regards,
Niklas

Re: [PATCH] PCI: qcom: Implement shutdown() callback

Posted by Krishna Chaitanya Chundru 10 months, 1 week ago


On 4/2/2025 6:14 PM, Niklas Cassel wrote:
> Hello Krishna,
> 
> On Tue, Apr 01, 2025 at 04:51:37PM +0530, Krishna Chaitanya Chundru wrote:
>> From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
>>
>> PCIe host controller drivers are supposed to properly remove the
>> endpoint drivers and release the resources during host shutdown/reboot
>> to avoid issues like smmu errors, NOC errors, etc.
>>
>> So, stop and remove the root bus and its associated devices and release
>> its resources during system shutdown to ensure a clean shutdown/reboot.
>>
>> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
>> Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
>> ---
>>   drivers/pci/controller/dwc/pcie-qcom.c | 11 +++++++++++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
>> index e4d3366ead1f9198693e6f9da4ae1dc40a3a0519..926811a0e63eb3663c1f41dc598659993546d832 100644
>> --- a/drivers/pci/controller/dwc/pcie-qcom.c
>> +++ b/drivers/pci/controller/dwc/pcie-qcom.c
>> @@ -1754,6 +1754,16 @@ static int qcom_pcie_probe(struct platform_device *pdev)
>> 	return ret;
>>   }
>>
>> +static void qcom_pcie_shutdown(struct platform_device *pdev)
>> +{
>> +	struct qcom_pcie *pcie = platform_get_drvdata(pdev);
>> +
>> +	dw_pcie_host_deinit(&pcie->pci->pp);
>> +	phy_exit(pcie->phy);
>> +	pm_runtime_put(&pdev->dev);
>> +	pm_runtime_disable(&pdev->dev);
>> +}
>> +
>>   static int qcom_pcie_suspend_noirq(struct device *dev)
>>   {
>> 	struct qcom_pcie *pcie = dev_get_drvdata(dev);
>> @@ -1890,5 +1900,6 @@ static struct platform_driver qcom_pcie_driver = {
>> 		.pm = &qcom_pcie_pm_ops,
>> 		.probe_type = PROBE_PREFER_ASYNCHRONOUS,
>> 	},
>> +	.shutdown = qcom_pcie_shutdown,
>>   };
>>   builtin_platform_driver(qcom_pcie_driver);
>>
>> ---
> 
> Out of curiosity, I tried something similar to on pcie-dw-rockchip.c
> 
> Simply having a ->shutdown() callback that only calls dw_pcie_host_deinit()
> was enough for me to produce:
> 
> [   40.209887] r8169 0004:41:00.0 eth0: Link is Down
> [   40.216572] ------------[ cut here ]------------
> [   40.216986] called from state HALTED
> [   40.217317] WARNING: CPU: 7 PID: 265 at drivers/net/phy/phy.c:1630 phy_stop+0x134/0x1a0
> [   40.218024] Modules linked in: rk805_pwrkey hantro_vpu v4l2_jpeg v4l2_vp9 v4l2_h264 v4l2_mem2mem videobuf2_v4l2 videobuf2_dma_contig videobuf2_memops videobuf2_common vidf
> [   40.220267] CPU: 7 UID: 0 PID: 265 Comm: init Not tainted 6.14.0+ #134 PREEMPT
> [   40.220908] Hardware name: Radxa ROCK 5B (DT)
> [   40.221289] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   40.221899] pc : phy_stop+0x134/0x1a0
> [   40.222222] lr : phy_stop+0x134/0x1a0
> [   40.222546] sp : ffff800082213820
> [   40.222836] x29: ffff800082213820 x28: ffff45ec84b30000 x27: 0000000000000000
> [   40.223463] x26: 0000000000000000 x25: 0000000000000000 x24: ffffbe8df7fde030
> [   40.224088] x23: ffff800082213990 x22: 0000000000000001 x21: ffff45ec80e10000
> [   40.224714] x20: ffff45ec82cb40c8 x19: ffff45ec82ccc000 x18: 0000000000000006
> [   40.225340] x17: 000000040044ffff x16: 005000f2b5503510 x15: 0720072007200720
> [   40.225966] x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720
> [   40.226592] x11: 0000000000000058 x10: 0000000000000018 x9 : ffffbe8df556469c
> [   40.227217] x8 : 0000000000000268 x7 : ffffbe8df7a48648 x6 : ffffbe8df7a48648
> [   40.227842] x5 : 0000000000017fe8 x4 : 0000000000000000 x3 : 0000000000000000
> [   40.228468] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff45ec84b30000
> [   40.229093] Call trace:
> [   40.229308]  phy_stop+0x134/0x1a0 (P)
> [   40.229634]  rtl8169_down+0x34/0x280
> [   40.229952]  rtl8169_close+0x64/0x100
> [   40.230275]  __dev_close_many+0xbc/0x1f0
> [   40.230621]  dev_close_many+0x94/0x160
> [   40.230951]  unregister_netdevice_many_notify+0x14c/0x9c0
> [   40.231426]  unregister_netdevice_queue+0xe4/0x100
> [   40.231848]  unregister_netdev+0x2c/0x60
> [   40.232193]  rtl_remove_one+0xa0/0xe0
> [   40.232517]  pci_device_remove+0x4c/0xf8
> [   40.232864]  device_remove+0x54/0x90
> [   40.233182]  device_release_driver_internal+0x1d4/0x238
> [   40.233643]  device_release_driver+0x20/0x38
> [   40.234019]  pci_stop_bus_device+0x84/0xe0
> [   40.234381]  pci_stop_bus_device+0x40/0xe0
> [   40.234741]  pci_stop_root_bus+0x48/0x80
> [   40.235087]  dw_pcie_host_deinit+0x34/0xe0
> [   40.235452]  rockchip_pcie_shutdown+0x24/0x48
> [   40.235839]  platform_shutdown+0x2c/0x48
> [   40.236187]  device_shutdown+0x150/0x278
> [   40.236533]  kernel_restart+0x4c/0xb8
> [   40.236859]  __do_sys_reboot+0x178/0x280
> [   40.237206]  __arm64_sys_reboot+0x2c/0x40
> [   40.237561]  invoke_syscall+0x50/0x120
> [   40.237891]  el0_svc_common.constprop.0+0x48/0xf0
> [   40.238305]  do_el0_svc+0x24/0x38
> [   40.238597]  el0_svc+0x30/0xd0
> [   40.238868]  el0t_64_sync_handler+0x10c/0x138
> [   40.239251]  el0t_64_sync+0x198/0x1a0
> [   40.239575] ---[ end trace 0000000000000000 ]---
> 
Hi Niklas,

The issue which you are seeing with specific to the RealTek ethernet
driver and should be fixed by RealTek driver nothing to do with the host
controller.
> Did you try your change with a simple network card connected to the PCI slot?
> (And not just another qcom board running in EP mode.)
> 
we don't have ethernet card's to try we tried with qcom PCIe switch
which usb hub connected as a endpoint.

- Krishna Chaitanya.
> I don't see why you wouldn't see the same thing as me.
> 
> 
> Kind regards,
> Niklas

Re: [PATCH] PCI: qcom: Implement shutdown() callback

Posted by Niklas Cassel 10 months, 1 week ago

Hello Krishna,

On Thu, Apr 03, 2025 at 09:26:08AM +0530, Krishna Chaitanya Chundru wrote:
> On 4/2/2025 6:14 PM, Niklas Cassel wrote:
> > 
> > Out of curiosity, I tried something similar to on pcie-dw-rockchip.c
> > 
> > Simply having a ->shutdown() callback that only calls dw_pcie_host_deinit()
> > was enough for me to produce:
> > 
> > [   40.209887] r8169 0004:41:00.0 eth0: Link is Down
> > [   40.216572] ------------[ cut here ]------------
> > [   40.216986] called from state HALTED
> > [   40.217317] WARNING: CPU: 7 PID: 265 at drivers/net/phy/phy.c:1630 phy_stop+0x134/0x1a0
> > [   40.218024] Modules linked in: rk805_pwrkey hantro_vpu v4l2_jpeg v4l2_vp9 v4l2_h264 v4l2_mem2mem videobuf2_v4l2 videobuf2_dma_contig videobuf2_memops videobuf2_common vidf
> > [   40.220267] CPU: 7 UID: 0 PID: 265 Comm: init Not tainted 6.14.0+ #134 PREEMPT
> > [   40.220908] Hardware name: Radxa ROCK 5B (DT)
> > [   40.221289] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [   40.221899] pc : phy_stop+0x134/0x1a0
> > [   40.222222] lr : phy_stop+0x134/0x1a0
> > [   40.222546] sp : ffff800082213820
> > [   40.222836] x29: ffff800082213820 x28: ffff45ec84b30000 x27: 0000000000000000
> > [   40.223463] x26: 0000000000000000 x25: 0000000000000000 x24: ffffbe8df7fde030
> > [   40.224088] x23: ffff800082213990 x22: 0000000000000001 x21: ffff45ec80e10000
> > [   40.224714] x20: ffff45ec82cb40c8 x19: ffff45ec82ccc000 x18: 0000000000000006
> > [   40.225340] x17: 000000040044ffff x16: 005000f2b5503510 x15: 0720072007200720
> > [   40.225966] x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720
> > [   40.226592] x11: 0000000000000058 x10: 0000000000000018 x9 : ffffbe8df556469c
> > [   40.227217] x8 : 0000000000000268 x7 : ffffbe8df7a48648 x6 : ffffbe8df7a48648
> > [   40.227842] x5 : 0000000000017fe8 x4 : 0000000000000000 x3 : 0000000000000000
> > [   40.228468] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff45ec84b30000
> > [   40.229093] Call trace:
> > [   40.229308]  phy_stop+0x134/0x1a0 (P)
> > [   40.229634]  rtl8169_down+0x34/0x280
> > [   40.229952]  rtl8169_close+0x64/0x100
> > [   40.230275]  __dev_close_many+0xbc/0x1f0
> > [   40.230621]  dev_close_many+0x94/0x160
> > [   40.230951]  unregister_netdevice_many_notify+0x14c/0x9c0
> > [   40.231426]  unregister_netdevice_queue+0xe4/0x100
> > [   40.231848]  unregister_netdev+0x2c/0x60
> > [   40.232193]  rtl_remove_one+0xa0/0xe0
> > [   40.232517]  pci_device_remove+0x4c/0xf8
> > [   40.232864]  device_remove+0x54/0x90
> > [   40.233182]  device_release_driver_internal+0x1d4/0x238
> > [   40.233643]  device_release_driver+0x20/0x38
> > [   40.234019]  pci_stop_bus_device+0x84/0xe0
> > [   40.234381]  pci_stop_bus_device+0x40/0xe0
> > [   40.234741]  pci_stop_root_bus+0x48/0x80
> > [   40.235087]  dw_pcie_host_deinit+0x34/0xe0
> > [   40.235452]  rockchip_pcie_shutdown+0x24/0x48
> > [   40.235839]  platform_shutdown+0x2c/0x48
> > [   40.236187]  device_shutdown+0x150/0x278
> > [   40.236533]  kernel_restart+0x4c/0xb8
> > [   40.236859]  __do_sys_reboot+0x178/0x280
> > [   40.237206]  __arm64_sys_reboot+0x2c/0x40
> > [   40.237561]  invoke_syscall+0x50/0x120
> > [   40.237891]  el0_svc_common.constprop.0+0x48/0xf0
> > [   40.238305]  do_el0_svc+0x24/0x38
> > [   40.238597]  el0_svc+0x30/0xd0
> > [   40.238868]  el0t_64_sync_handler+0x10c/0x138
> > [   40.239251]  el0t_64_sync+0x198/0x1a0
> > [   40.239575] ---[ end trace 0000000000000000 ]---
> > 
> Hi Niklas,
> 
> The issue which you are seeing with specific to the RealTek ethernet
> driver and should be fixed by RealTek driver nothing to do with the host
> controller.

The warning comes from:
drivers/net/phy/phy.c:phy_stop()
so from the networking phylib.

Doing a simple:
$ git grep -p phy_stop

shows that practially all Ethernet drivers call phy_stop() from the
.ndo_stop() callback.

So after your suggested patch, you should see this warning appear with
any NIC, if connected to your PCIe controller.

You might not care, which is fine, I simply wanted to inform you about it,
since it is not only realtek, it is any NIC.


Kind regards,
Niklas

Re: [PATCH] PCI: qcom: Implement shutdown() callback

Posted by Manivannan Sadhasivam 9 months, 4 weeks ago

On Thu, Apr 03, 2025 at 09:51:31AM +0200, Niklas Cassel wrote:
> Hello Krishna,
> 
> On Thu, Apr 03, 2025 at 09:26:08AM +0530, Krishna Chaitanya Chundru wrote:
> > On 4/2/2025 6:14 PM, Niklas Cassel wrote:
> > > 
> > > Out of curiosity, I tried something similar to on pcie-dw-rockchip.c
> > > 
> > > Simply having a ->shutdown() callback that only calls dw_pcie_host_deinit()
> > > was enough for me to produce:
> > > 
> > > [   40.209887] r8169 0004:41:00.0 eth0: Link is Down
> > > [   40.216572] ------------[ cut here ]------------
> > > [   40.216986] called from state HALTED
> > > [   40.217317] WARNING: CPU: 7 PID: 265 at drivers/net/phy/phy.c:1630 phy_stop+0x134/0x1a0
> > > [   40.218024] Modules linked in: rk805_pwrkey hantro_vpu v4l2_jpeg v4l2_vp9 v4l2_h264 v4l2_mem2mem videobuf2_v4l2 videobuf2_dma_contig videobuf2_memops videobuf2_common vidf
> > > [   40.220267] CPU: 7 UID: 0 PID: 265 Comm: init Not tainted 6.14.0+ #134 PREEMPT
> > > [   40.220908] Hardware name: Radxa ROCK 5B (DT)
> > > [   40.221289] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > [   40.221899] pc : phy_stop+0x134/0x1a0
> > > [   40.222222] lr : phy_stop+0x134/0x1a0
> > > [   40.222546] sp : ffff800082213820
> > > [   40.222836] x29: ffff800082213820 x28: ffff45ec84b30000 x27: 0000000000000000
> > > [   40.223463] x26: 0000000000000000 x25: 0000000000000000 x24: ffffbe8df7fde030
> > > [   40.224088] x23: ffff800082213990 x22: 0000000000000001 x21: ffff45ec80e10000
> > > [   40.224714] x20: ffff45ec82cb40c8 x19: ffff45ec82ccc000 x18: 0000000000000006
> > > [   40.225340] x17: 000000040044ffff x16: 005000f2b5503510 x15: 0720072007200720
> > > [   40.225966] x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720
> > > [   40.226592] x11: 0000000000000058 x10: 0000000000000018 x9 : ffffbe8df556469c
> > > [   40.227217] x8 : 0000000000000268 x7 : ffffbe8df7a48648 x6 : ffffbe8df7a48648
> > > [   40.227842] x5 : 0000000000017fe8 x4 : 0000000000000000 x3 : 0000000000000000
> > > [   40.228468] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff45ec84b30000
> > > [   40.229093] Call trace:
> > > [   40.229308]  phy_stop+0x134/0x1a0 (P)
> > > [   40.229634]  rtl8169_down+0x34/0x280
> > > [   40.229952]  rtl8169_close+0x64/0x100
> > > [   40.230275]  __dev_close_many+0xbc/0x1f0
> > > [   40.230621]  dev_close_many+0x94/0x160
> > > [   40.230951]  unregister_netdevice_many_notify+0x14c/0x9c0
> > > [   40.231426]  unregister_netdevice_queue+0xe4/0x100
> > > [   40.231848]  unregister_netdev+0x2c/0x60
> > > [   40.232193]  rtl_remove_one+0xa0/0xe0
> > > [   40.232517]  pci_device_remove+0x4c/0xf8
> > > [   40.232864]  device_remove+0x54/0x90
> > > [   40.233182]  device_release_driver_internal+0x1d4/0x238
> > > [   40.233643]  device_release_driver+0x20/0x38
> > > [   40.234019]  pci_stop_bus_device+0x84/0xe0
> > > [   40.234381]  pci_stop_bus_device+0x40/0xe0
> > > [   40.234741]  pci_stop_root_bus+0x48/0x80
> > > [   40.235087]  dw_pcie_host_deinit+0x34/0xe0
> > > [   40.235452]  rockchip_pcie_shutdown+0x24/0x48
> > > [   40.235839]  platform_shutdown+0x2c/0x48
> > > [   40.236187]  device_shutdown+0x150/0x278
> > > [   40.236533]  kernel_restart+0x4c/0xb8
> > > [   40.236859]  __do_sys_reboot+0x178/0x280
> > > [   40.237206]  __arm64_sys_reboot+0x2c/0x40
> > > [   40.237561]  invoke_syscall+0x50/0x120
> > > [   40.237891]  el0_svc_common.constprop.0+0x48/0xf0
> > > [   40.238305]  do_el0_svc+0x24/0x38
> > > [   40.238597]  el0_svc+0x30/0xd0
> > > [   40.238868]  el0t_64_sync_handler+0x10c/0x138
> > > [   40.239251]  el0t_64_sync+0x198/0x1a0
> > > [   40.239575] ---[ end trace 0000000000000000 ]---
> > > 
> > Hi Niklas,
> > 
> > The issue which you are seeing with specific to the RealTek ethernet
> > driver and should be fixed by RealTek driver nothing to do with the host
> > controller.
> 
> The warning comes from:
> drivers/net/phy/phy.c:phy_stop()
> so from the networking phylib.
> 
> Doing a simple:
> $ git grep -p phy_stop
> 
> shows that practially all Ethernet drivers call phy_stop() from the
> .ndo_stop() callback.
> 
> So after your suggested patch, you should see this warning appear with
> any NIC, if connected to your PCIe controller.
> 

I think the issue here is that phy_stop() is called without calling
corresponding phy_start(). This means that either rtl8169_up() is never called
or rtl8169_down() is called twice.

I don't think this issue is applicable to all drivers. It'd be worth
investigating what is going wrong with this r8169 driver.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

Re: [PATCH] PCI: qcom: Implement shutdown() callback

Posted by Niklas Cassel 9 months, 4 weeks ago

On Tue, Apr 15, 2025 at 01:07:23PM +0530, Manivannan Sadhasivam wrote:
> On Thu, Apr 03, 2025 at 09:51:31AM +0200, Niklas Cassel wrote:
> > Hello Krishna,
> >
> > On Thu, Apr 03, 2025 at 09:26:08AM +0530, Krishna Chaitanya Chundru wrote:
> > > On 4/2/2025 6:14 PM, Niklas Cassel wrote:
> > > >
> > > > Out of curiosity, I tried something similar to on pcie-dw-rockchip.c
> > > >
> > > > Simply having a ->shutdown() callback that only calls dw_pcie_host_deinit()
> > > > was enough for me to produce:
> > > >
> > > > [   40.209887] r8169 0004:41:00.0 eth0: Link is Down
> > > > [   40.216572] ------------[ cut here ]------------
> > > > [   40.216986] called from state HALTED
> > > > [   40.217317] WARNING: CPU: 7 PID: 265 at drivers/net/phy/phy.c:1630 phy_stop+0x134/0x1a0
> > > > [   40.218024] Modules linked in: rk805_pwrkey hantro_vpu v4l2_jpeg v4l2_vp9 v4l2_h264 v4l2_mem2mem videobuf2_v4l2 videobuf2_dma_contig videobuf2_memops videobuf2_common vidf
> > > > [   40.220267] CPU: 7 UID: 0 PID: 265 Comm: init Not tainted 6.14.0+ #134 PREEMPT
> > > > [   40.220908] Hardware name: Radxa ROCK 5B (DT)
> > > > [   40.221289] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > > [   40.221899] pc : phy_stop+0x134/0x1a0
> > > > [   40.222222] lr : phy_stop+0x134/0x1a0
> > > > [   40.222546] sp : ffff800082213820
> > > > [   40.222836] x29: ffff800082213820 x28: ffff45ec84b30000 x27: 0000000000000000
> > > > [   40.223463] x26: 0000000000000000 x25: 0000000000000000 x24: ffffbe8df7fde030
> > > > [   40.224088] x23: ffff800082213990 x22: 0000000000000001 x21: ffff45ec80e10000
> > > > [   40.224714] x20: ffff45ec82cb40c8 x19: ffff45ec82ccc000 x18: 0000000000000006
> > > > [   40.225340] x17: 000000040044ffff x16: 005000f2b5503510 x15: 0720072007200720
> > > > [   40.225966] x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720
> > > > [   40.226592] x11: 0000000000000058 x10: 0000000000000018 x9 : ffffbe8df556469c
> > > > [   40.227217] x8 : 0000000000000268 x7 : ffffbe8df7a48648 x6 : ffffbe8df7a48648
> > > > [   40.227842] x5 : 0000000000017fe8 x4 : 0000000000000000 x3 : 0000000000000000
> > > > [   40.228468] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff45ec84b30000
> > > > [   40.229093] Call trace:
> > > > [   40.229308]  phy_stop+0x134/0x1a0 (P)
> > > > [   40.229634]  rtl8169_down+0x34/0x280
> > > > [   40.229952]  rtl8169_close+0x64/0x100
> > > > [   40.230275]  __dev_close_many+0xbc/0x1f0
> > > > [   40.230621]  dev_close_many+0x94/0x160
> > > > [   40.230951]  unregister_netdevice_many_notify+0x14c/0x9c0
> > > > [   40.231426]  unregister_netdevice_queue+0xe4/0x100
> > > > [   40.231848]  unregister_netdev+0x2c/0x60
> > > > [   40.232193]  rtl_remove_one+0xa0/0xe0
> > > > [   40.232517]  pci_device_remove+0x4c/0xf8
> > > > [   40.232864]  device_remove+0x54/0x90
> > > > [   40.233182]  device_release_driver_internal+0x1d4/0x238
> > > > [   40.233643]  device_release_driver+0x20/0x38
> > > > [   40.234019]  pci_stop_bus_device+0x84/0xe0
> > > > [   40.234381]  pci_stop_bus_device+0x40/0xe0
> > > > [   40.234741]  pci_stop_root_bus+0x48/0x80
> > > > [   40.235087]  dw_pcie_host_deinit+0x34/0xe0
> > > > [   40.235452]  rockchip_pcie_shutdown+0x24/0x48
> > > > [   40.235839]  platform_shutdown+0x2c/0x48
> > > > [   40.236187]  device_shutdown+0x150/0x278
> > > > [   40.236533]  kernel_restart+0x4c/0xb8
> > > > [   40.236859]  __do_sys_reboot+0x178/0x280
> > > > [   40.237206]  __arm64_sys_reboot+0x2c/0x40
> > > > [   40.237561]  invoke_syscall+0x50/0x120
> > > > [   40.237891]  el0_svc_common.constprop.0+0x48/0xf0
> > > > [   40.238305]  do_el0_svc+0x24/0x38
> > > > [   40.238597]  el0_svc+0x30/0xd0
> > > > [   40.238868]  el0t_64_sync_handler+0x10c/0x138
> > > > [   40.239251]  el0t_64_sync+0x198/0x1a0
> > > > [   40.239575] ---[ end trace 0000000000000000 ]---
> > > >
> > > Hi Niklas,
> > >
> > > The issue which you are seeing with specific to the RealTek ethernet
> > > driver and should be fixed by RealTek driver nothing to do with the host
> > > controller.
> >
> > The warning comes from:
> > drivers/net/phy/phy.c:phy_stop()
> > so from the networking phylib.
> >
> > Doing a simple:
> > $ git grep -p phy_stop
> >
> > shows that practially all Ethernet drivers call phy_stop() from the
> > .ndo_stop() callback.
> >
> > So after your suggested patch, you should see this warning appear with
> > any NIC, if connected to your PCIe controller.
> >
>
> I think the issue here is that phy_stop() is called without calling
> corresponding phy_start(). This means that either rtl8169_up() is never called
> or rtl8169_down() is called twice.

phy_start() is called.


>
> I don't think this issue is applicable to all drivers. It'd be worth
> investigating what is going wrong with this r8169 driver.

In case you are curious, I added a dump_stack() before phy_stop() is called:

First phy_stop() call:

[   21.733574] platform fc400000.usb: deferred probe pending: dwc3: failed to initialize core
[   23.827753] CPU: 6 UID: 0 PID: 238 Comm: init Not tainted 6.15.0-rc2+ #149 PREEMPT
[   23.827762] Hardware name: Radxa ROCK 5B (DT)
[   23.827764] Call trace:
[   23.827765]  show_stack+0x20/0x40 (C)
[   23.827774]  dump_stack_lvl+0x60/0x80
[   23.827778]  dump_stack+0x18/0x24
[   23.827782]  rtl8169_down+0x30/0x2a0
[   23.827788]  rtl_shutdown+0xb0/0xc0
[   23.827792]  pci_device_shutdown+0x3c/0x88
[   23.827797]  device_shutdown+0x150/0x278
[   23.827802]  kernel_restart+0x4c/0xb8
[   23.827807]  __do_sys_reboot+0x178/0x280
[   23.827811]  __arm64_sys_reboot+0x2c/0x40
[   23.827816]  invoke_syscall+0x50/0x120
[   23.827822]  el0_svc_common.constprop.0+0x48/0xf0
[   23.827826]  do_el0_svc+0x24/0x38
[   23.827828]  el0_svc+0x30/0xd0
[   23.827834]  el0t_64_sync_handler+0x10c/0x138
[   23.827837]  el0t_64_sync+0x198/0x1a0
[   23.827841] calling phy_stop() rtl8169_down:4844
[   23.834789] r8169 0004:41:00.0 eth0: Link is Down

Second phy_stop() call:

[   23.841458] CPU: 6 UID: 0 PID: 238 Comm: init Not tainted 6.15.0-rc2+ #149 PREEMPT
[   23.841467] Hardware name: Radxa ROCK 5B (DT)
[   23.841468] Call trace:
[   23.841470]  show_stack+0x20/0x40 (C)
[   23.841478]  dump_stack_lvl+0x60/0x80
[   23.841483]  dump_stack+0x18/0x24
[   23.841486]  rtl8169_down+0x30/0x2a0
[   23.841492]  rtl8169_close+0x64/0x100
[   23.841496]  __dev_close_many+0xbc/0x1f0
[   23.841502]  dev_close_many+0x94/0x160
[   23.841505]  unregister_netdevice_many_notify+0x160/0x9d0
[   23.841510]  unregister_netdevice_queue+0xf0/0x100
[   23.841515]  unregister_netdev+0x2c/0x58
[   23.841519]  rtl_remove_one+0xa0/0xe0
[   23.841524]  pci_device_remove+0x4c/0xf8
[   23.841528]  device_remove+0x54/0x90
[   23.841534]  device_release_driver_internal+0x1d4/0x238
[   23.841539]  device_release_driver+0x20/0x38
[   23.841544]  pci_stop_bus_device+0x84/0xe0
[   23.841548]  pci_stop_bus_device+0x40/0xe0
[   23.841552]  pci_stop_root_bus+0x48/0x80
[   23.841555]  dw_pcie_host_deinit+0x34/0xe0
[   23.841559]  rockchip_pcie_shutdown+0x20/0x38
[   23.841565]  platform_shutdown+0x2c/0x48
[   23.841571]  device_shutdown+0x150/0x278
[   23.841575]  kernel_restart+0x4c/0xb8
[   23.841580]  __do_sys_reboot+0x178/0x280
[   23.841584]  __arm64_sys_reboot+0x2c/0x40
[   23.841588]  invoke_syscall+0x50/0x120
[   23.841595]  el0_svc_common.constprop.0+0x48/0xf0
[   23.841598]  do_el0_svc+0x24/0x38
[   23.841601]  el0_svc+0x30/0xd0
[   23.841605]  el0t_64_sync_handler+0x10c/0x138
[   23.841609]  el0t_64_sync+0x198/0x1a0
[   23.841613] calling phy_stop() rtl8169_down:4844


So it seems that the driver's rtl_shutdown() calls rtl8169_net_suspend(),
which calls rtl8169_down() which calls phy_stop().


If I compare rtl8169_close() with e.g. e1000e_close(),
e1000e_close() does have a guard:

if (netif_device_present(netdev)) {

around the call to e1000e_down(),
while rtl8169_close() does not have a similar guard around the call to
rtl8169_down().

So, I think you are right, this is probably a r8169 driver specific
problem after all.


Kind regards,
Niklas