drivers/pci/probe.c | 4 ++++ drivers/pci/remove.c | 18 ------------------ 2 files changed, 4 insertions(+), 18 deletions(-)
From: Brian Norris <briannorris@google.com>
We have asymmetry w.r.t. pwrctrl device creation and destruction.
pwrctrl devices are created by the host bridge, as part of scanning for
child devices, but they are destroyed by the child device. This causes
confusing behavior in cases like the following:
1. PCI device is removed (e.g., via /sys/bus/pci/devices/*/remove);
pwrctrl device is also destroyed
2. pwrctrl driver is removed (e.g., rmmod)
3. pwrctrl driver is reloaded
One could expect #3 to re-bind to the pwrctrl device and re-power the
device; but there's no device to bind to, so it remains off. Instead, we
require a forced rescan (/sys/bus/pci/devices/*/rescan) to recreate the
pwrctrl device(s) and restore power.
This asymmetry isn't required though; it makes more logical sense to
retain the pwrctrl devices even without the PCI device, since pwrctrl is
more of a logical ancestor than a child.
Additionally, Documentation/PCI/sysfs-pci.rst documents the behavior of
step #1 (remove):
The 'remove' file is used to remove the PCI device, by writing a
non-zero integer to the file. This does not involve any kind of
hot-plug functionality, e.g. powering off the device.
Instead, let's destroy a pwrctrl device only when its parent (the host
bridge) is destroyed.
We use of_platform_device_destroy(), since it's the logical inverse of
pwrctrl creation (of_platform_device_create()). It performs more or less
the same things pci_pwrctrl_unregister() did, with some added bonus of
ensuring these are OF_POPULATED devices.
Signed-off-by: Brian Norris <briannorris@google.com>
Signed-off-by: Brian Norris <briannorris@chromium.org>
---
drivers/pci/probe.c | 4 ++++
drivers/pci/remove.c | 18 ------------------
2 files changed, 4 insertions(+), 18 deletions(-)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 4b8693ec9e4c..ad6e7d05b9bc 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -5,6 +5,7 @@
#include <linux/kernel.h>
#include <linux/delay.h>
+#include <linux/device.h>
#include <linux/init.h>
#include <linux/pci.h>
#include <linux/msi.h>
@@ -627,6 +628,9 @@ static void pci_release_host_bridge_dev(struct device *dev)
{
struct pci_host_bridge *bridge = to_pci_host_bridge(dev);
+ /* Clean up any pwrctrl children. */
+ device_for_each_child(dev, NULL, of_platform_device_destroy);
+
if (bridge->release_fn)
bridge->release_fn(bridge);
diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
index 445afdfa6498..58dbb41c4730 100644
--- a/drivers/pci/remove.c
+++ b/drivers/pci/remove.c
@@ -17,23 +17,6 @@ static void pci_free_resources(struct pci_dev *dev)
}
}
-static void pci_pwrctrl_unregister(struct device *dev)
-{
- struct device_node *np;
- struct platform_device *pdev;
-
- np = dev_of_node(dev);
- if (!np)
- return;
-
- pdev = of_find_device_by_node(np);
- if (!pdev)
- return;
-
- of_device_unregister(pdev);
- of_node_clear_flag(np, OF_POPULATED);
-}
-
static void pci_stop_dev(struct pci_dev *dev)
{
pci_pme_active(dev, false);
@@ -64,7 +47,6 @@ static void pci_destroy_dev(struct pci_dev *dev)
pci_doe_destroy(dev);
pcie_aspm_exit_link_state(dev);
pci_bridge_d3_update(dev);
- pci_pwrctrl_unregister(&dev->dev);
pci_free_resources(dev);
put_device(&dev->dev);
}
--
2.50.0.727.gbf7dc18ff4-goog
On Fri, Jul 11, 2025 at 05:43:33PM GMT, Brian Norris wrote: > From: Brian Norris <briannorris@google.com> > > We have asymmetry w.r.t. pwrctrl device creation and destruction. > pwrctrl devices are created by the host bridge, as part of scanning for > child devices, but they are destroyed by the child device. This causes > confusing behavior in cases like the following: > > 1. PCI device is removed (e.g., via /sys/bus/pci/devices/*/remove); > pwrctrl device is also destroyed > 2. pwrctrl driver is removed (e.g., rmmod) > 3. pwrctrl driver is reloaded > > One could expect #3 to re-bind to the pwrctrl device and re-power the > device; but there's no device to bind to, so it remains off. Instead, we > require a forced rescan (/sys/bus/pci/devices/*/rescan) to recreate the > pwrctrl device(s) and restore power. > > This asymmetry isn't required though; it makes more logical sense to > retain the pwrctrl devices even without the PCI device, since pwrctrl is > more of a logical ancestor than a child. > > Additionally, Documentation/PCI/sysfs-pci.rst documents the behavior of > step #1 (remove): > > The 'remove' file is used to remove the PCI device, by writing a > non-zero integer to the file. This does not involve any kind of > hot-plug functionality, e.g. powering off the device. > > Instead, let's destroy a pwrctrl device only when its parent (the host > bridge) is destroyed. > Sounds good to me! > We use of_platform_device_destroy(), since it's the logical inverse of > pwrctrl creation (of_platform_device_create()). It performs more or less > the same things pci_pwrctrl_unregister() did, with some added bonus of > ensuring these are OF_POPULATED devices. > If you take a look at commit f1536585588b ("PCI: Don't rely on of_platform_depopulate() for reused OF-nodes"), you can realize that the PCI core clears OF_POPULATED flag while removing the PCI device. So of_platform_device_destroy() will do nothing. - Mani -- மணிவண்ணன் சதாசிவம்
Hi Manivannan, Thanks for reviewing. On Sat, Jul 12, 2025 at 10:56:38PM +0530, Manivannan Sadhasivam wrote: > If you take a look at commit f1536585588b ("PCI: Don't rely on > of_platform_depopulate() for reused OF-nodes"), you can realize that the PCI > core clears OF_POPULATED flag while removing the PCI device. So > of_platform_device_destroy() will do nothing. I've looked through that commit several times, and while I think I understand its claim, I really haven't been able to validate it. I've inspected the code for anything like of_node_clear_flag(nc, OF_POPULATED), and the closest I see for any PCI-relevant code is in drivers/of/platform.c -- mostly in error paths (undoing device creation) or of_platform_device_destroy() or of_platform_depopulate(). I've also tried quite a bit of tracing / printk'ing, and I can't find the OF_POPULATED getting cleared either. Is there any chance there's a mistake in the claims in commit f1536585588b? e.g., maybe Bartosz was looking at OF_POPULATED_BUS (which is different, but also relevant to his change)? Or am I missing something obvious in here? OTOH, I also see that part of my change is not really doing quite what I thought it was -- so far, I think there may be some kind of resource leak (kobj ref), since I'm not seeing pci_release_host_bridge_dev() called when I think it should be. If I perform cleanup in pci_free_host_bridge() instead, then I do indeed see of_platform_device_destroy() tear things down the way I expect. Brian
On Tue, Jul 15, 2025 at 02:21:47PM GMT, Brian Norris wrote: > Hi Manivannan, > > Thanks for reviewing. > > On Sat, Jul 12, 2025 at 10:56:38PM +0530, Manivannan Sadhasivam wrote: > > If you take a look at commit f1536585588b ("PCI: Don't rely on > > of_platform_depopulate() for reused OF-nodes"), you can realize that the PCI > > core clears OF_POPULATED flag while removing the PCI device. So > > of_platform_device_destroy() will do nothing. > > I've looked through that commit several times, and while I think I > understand its claim, I really haven't been able to validate it. I've > inspected the code for anything like of_node_clear_flag(nc, > OF_POPULATED), and the closest I see for any PCI-relevant code is in > drivers/of/platform.c -- mostly in error paths (undoing device creation) > or of_platform_device_destroy() or of_platform_depopulate(). > > I've also tried quite a bit of tracing / printk'ing, and I can't find > the OF_POPULATED getting cleared either. > > Is there any chance there's a mistake in the claims in commit > f1536585588b? e.g., maybe Bartosz was looking at OF_POPULATED_BUS (which > is different, but also relevant to his change)? Or am I missing > something obvious in here? > Now, I did look into the OF code and I also couldn't see where exactly the OF_POPULATED flag is getting cleared by the PCI core :/ So I'll defer the question to Bartosz. > OTOH, I also see that part of my change is not really doing quite what I > thought it was -- so far, I think there may be some kind of resource > leak (kobj ref), since I'm not seeing pci_release_host_bridge_dev() > called when I think it should be. If I perform cleanup in > pci_free_host_bridge() instead, then I do indeed see > of_platform_device_destroy() tear things down the way I expect. > Oh, that's bad! Which controller it is? I played with making the pcie-qcom driver modular and I unloaded/loaded multiple times, but never saw any refcount warning (I really hope if there was any leak, it would've tripped over during insmod). - Mani -- மணிவண்ணன் சதாசிவம்
On Wed, Jul 16, 2025 at 09:27:55PM +0530, Manivannan Sadhasivam wrote: > On Tue, Jul 15, 2025 at 02:21:47PM GMT, Brian Norris wrote: > > OTOH, I also see that part of my change is not really doing quite what I > > thought it was -- so far, I think there may be some kind of resource > > leak (kobj ref), since I'm not seeing pci_release_host_bridge_dev() > > called when I think it should be. If I perform cleanup in > > pci_free_host_bridge() instead, then I do indeed see > > of_platform_device_destroy() tear things down the way I expect. > > > > Oh, that's bad! Which controller it is? I played with making the pcie-qcom > driver modular and I unloaded/loaded multiple times, but never saw any > refcount warning (I really hope if there was any leak, it would've tripped over > during insmod). I'm still trying to tease this apart, and I'm not sure when I'll have plenty of time to get further on this. I'm also primarily using a non-upstream DWC-based driver, which isn't really ready to be published. I also have some systems that use drivers/pci/controller/pcie-rockchip-host.c and are fully upstream-supported, so I'll see if I can replicate my observations there. But I think there are at least two problems: (1) I'm adding code to bridge->dev.release(). release() is only called when the device's refcount drops to zero. And child devices hold a refcount on their parent (the bridge). So, I have a circular refcount, if there were any pwrctrl children present. I think this is easily solved by moving the child destruction to pci_free_host_bridge() instead. (2) Even after resolving 1, I'm seeing pci_free_host_bridge() exit with a bridge->dev.kboj.kref refcount of 1 in some cases. I don't yet have an explanation of that one. IIUC, this kind of error would be considered a leak, but crucially, I also don't think it would produce any kind of refcount warning or other error. It's "just" a device that has been removed (a la, device_del()), but still has some client holding a reference count (i.e., not enough put_device()). Brian
On Wed, Jul 16, 2025 at 09:47:41AM -0700, Brian Norris wrote: > (2) Even after resolving 1, I'm seeing pci_free_host_bridge() exit with > a bridge->dev.kboj.kref refcount of 1 in some cases. I don't yet > have an explanation of that one. Ah, well now I have an explanation: One should always be skeptical of out-of-tree drivers. In this case, one of my endpoint drivers was mismanaging a pci_dev_put() reference count, and that cascades to all its children and links, including the host bridge. Once I fix that (and the aforementioned problem (1)), it seems my problems go away. I'll let a v2 soak in my local environment, and unless I hear some news from Bartosz about OF_POPULATED to change my mind, I'll send it out eventually. Brian
On Thu, Jul 17, 2025 at 12:25 AM Brian Norris <briannorris@chromium.org> wrote: > > On Wed, Jul 16, 2025 at 09:47:41AM -0700, Brian Norris wrote: > > (2) Even after resolving 1, I'm seeing pci_free_host_bridge() exit with > > a bridge->dev.kboj.kref refcount of 1 in some cases. I don't yet > > have an explanation of that one. > > Ah, well now I have an explanation: > One should always be skeptical of out-of-tree drivers. > > In this case, one of my endpoint drivers was mismanaging a pci_dev_put() > reference count, and that cascades to all its children and links, > including the host bridge. > > Once I fix that (and the aforementioned problem (1)), it seems my > problems go away. > > I'll let a v2 soak in my local environment, and unless I hear some news > from Bartosz about OF_POPULATED to change my mind, I'll send it out > eventually. > > Brian Hi! Sorry for the late reply, I would really like to be able to assist with these changes (although Mani is doing a great job!) I'm currently really busy with other stuff. :( FWIW I just spent 30 minutes looking at the tree as of commit f1536585588b~1 and I am no longer sure what exactly did I refer to when I said that the PCI core clears the OF_POPULATED flag but I'm 100% sure I was facing this issue and seeing OF nodes associated with a device that's registered without this flag. Looking at it again now, it's no longer obvious, I wish I had been more verbose in the commit message. Feel free to try and revert this change, maybe over a year later it's no longer needed (or never was). If it is, we should quickly see some issues triggered by it. Bartosz
© 2016 - 2025 Red Hat, Inc.