[PATCH v4] PCI/PM: Prevent runtime suspend before devices are fully initialized

Brian Norris posted 1 patch 3 months, 2 weeks ago
drivers/pci/bus.c | 4 ++++
drivers/pci/pci.c | 2 --
2 files changed, 4 insertions(+), 2 deletions(-)
[PATCH v4] PCI/PM: Prevent runtime suspend before devices are fully initialized
Posted by Brian Norris 3 months, 2 weeks ago
Today, it's possible for a PCI device to be created and
runtime-suspended before it is fully initialized. When that happens, the
device will remain in D0, but the suspend process may save an
intermediate version of that device's state -- for example, without
appropriate BAR configuration. When the device later resumes, we'll
restore invalid PCI state and the device may not function.

Prevent runtime suspend for PCI devices by deferring pm_runtime_enable()
until we've fully initialized the device.

More details on how exactly this may occur:

1. PCI device is created by pci_scan_slot() or similar
2. As part of pci_scan_slot(), pci_pm_init() enables runtime PM; the
   device starts "active" and we initially prevent (pm_runtime_forbid())
   suspend -- but see [*] footnote
3. Underlying 'struct device' is added to the system (device_add());
   runtime PM can now be configured by user space
4. PCI device receives BAR configuration
   (pci_assign_unassigned_bus_resources(), etc.)
5. PCI device is added to the system in pci_bus_add_device()

The device may potentially suspend between #3 and #4.

[*] By default, pm_runtime_forbid() prevents suspending a device; but by
design [**], this can be overridden by user space policy via

  echo auto > /sys/bus/pci/devices/.../power/control

Thus, the above #3/#4 sequence is racy with user space (udev or
similar).

Notably, many PCI devices are enumerated at subsys_initcall time and so
will not race with user space. However, there are several scenarios
where PCI devices are created later on, such as with hotplug or when
drivers (pwrctrl or controller drivers) are built as modules.

  ---

[**] The relationship between pm_runtime_forbid(), pm_runtime_allow(),
/sys/.../power/control, and the runtime PM usage counter can be subtle.
It appears that the intention of pm_runtime_forbid() /
pm_runtime_allow() is twofold:

1. Allow the user to disable runtime_pm (force device to always be
   powered on) through sysfs.
2. Allow the driver to start with runtime_pm disabled (device forced
   on) and user space could later enable runtime_pm.

This conclusion comes from reading `Documentation/power/runtime_pm.rst`,
specifically the section starting "The user space can effectively
disallow".

This means that while pm_runtime_forbid() does technically increase the
runtime PM usage counter, this usage counter is not a guarantee of
functional correctness, because sysfs can decrease that count again.

  ---

Note that we also move pm_runtime_set_active(), but leave
pm_runtime_forbid() in place earlier in the initialization sequence, to
avoid confusing user space. From Documentation/power/runtime_pm.rst:

  "It should be noted, however, that if the user space has already
  intentionally changed the value of /sys/devices/.../power/control to
  "auto" to allow the driver to power manage the device at run time, the
  driver may confuse it by using pm_runtime_forbid() this way."

Thus, we should ensure pm_runtime_forbid() is called before the device
is available to user space.

Link: https://lore.kernel.org/all/20251016155335.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
Signed-off-by: Brian Norris <briannorris@chromium.org>
Cc: <stable@vger.kernel.org>
Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
---

Changes in v4:
 * Move pm_runtime_set_active() too

Changes in v3:
 * Add Link to initial discussion
 * Add Rafael's Reviewed-by
 * Add lengthier footnotes about forbid vs allow vs sysfs

Changes in v2:
 * Update CC list
 * Rework problem description
 * Update solution: defer pm_runtime_enable(), instead of trying to
   get()/put()

 drivers/pci/bus.c | 4 ++++
 drivers/pci/pci.c | 2 --
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index f26aec6ff588..40ff954f416f 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -14,6 +14,7 @@
 #include <linux/of.h>
 #include <linux/of_platform.h>
 #include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
 #include <linux/proc_fs.h>
 #include <linux/slab.h>
 
@@ -375,6 +376,9 @@ void pci_bus_add_device(struct pci_dev *dev)
 		put_device(&pdev->dev);
 	}
 
+	pm_runtime_set_active(&dev->dev);
+	pm_runtime_enable(&dev->dev);
+
 	if (!dn || of_device_is_available(dn))
 		pci_dev_allow_binding(dev);
 
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b14dd064006c..234bf3608569 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3225,8 +3225,6 @@ void pci_pm_init(struct pci_dev *dev)
 poweron:
 	pci_pm_power_up_and_verify_state(dev);
 	pm_runtime_forbid(&dev->dev);
-	pm_runtime_set_active(&dev->dev);
-	pm_runtime_enable(&dev->dev);
 }
 
 static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop)
-- 
2.51.1.821.gb6fe4d2222-goog
Re: [PATCH v4] PCI/PM: Prevent runtime suspend before devices are fully initialized
Posted by Bjorn Helgaas 1 month ago
On Thu, Oct 23, 2025 at 02:09:01PM -0700, Brian Norris wrote:
> Today, it's possible for a PCI device to be created and
> runtime-suspended before it is fully initialized. When that happens, the
> device will remain in D0, but the suspend process may save an
> intermediate version of that device's state -- for example, without
> appropriate BAR configuration. When the device later resumes, we'll
> restore invalid PCI state and the device may not function.
> 
> Prevent runtime suspend for PCI devices by deferring pm_runtime_enable()
> until we've fully initialized the device.
> 
> More details on how exactly this may occur:
> 
> 1. PCI device is created by pci_scan_slot() or similar
> 2. As part of pci_scan_slot(), pci_pm_init() enables runtime PM; the
>    device starts "active" and we initially prevent (pm_runtime_forbid())
>    suspend -- but see [*] footnote
> 3. Underlying 'struct device' is added to the system (device_add());
>    runtime PM can now be configured by user space
> 4. PCI device receives BAR configuration
>    (pci_assign_unassigned_bus_resources(), etc.)
> 5. PCI device is added to the system in pci_bus_add_device()
> 
> The device may potentially suspend between #3 and #4.
> 
> [*] By default, pm_runtime_forbid() prevents suspending a device; but by
> design [**], this can be overridden by user space policy via
> 
>   echo auto > /sys/bus/pci/devices/.../power/control
> 
> Thus, the above #3/#4 sequence is racy with user space (udev or
> similar).
> 
> Notably, many PCI devices are enumerated at subsys_initcall time and so
> will not race with user space. However, there are several scenarios
> where PCI devices are created later on, such as with hotplug or when
> drivers (pwrctrl or controller drivers) are built as modules.
> 
>   ---
> 
> [**] The relationship between pm_runtime_forbid(), pm_runtime_allow(),
> /sys/.../power/control, and the runtime PM usage counter can be subtle.
> It appears that the intention of pm_runtime_forbid() /
> pm_runtime_allow() is twofold:
> 
> 1. Allow the user to disable runtime_pm (force device to always be
>    powered on) through sysfs.
> 2. Allow the driver to start with runtime_pm disabled (device forced
>    on) and user space could later enable runtime_pm.
> 
> This conclusion comes from reading `Documentation/power/runtime_pm.rst`,
> specifically the section starting "The user space can effectively
> disallow".
> 
> This means that while pm_runtime_forbid() does technically increase the
> runtime PM usage counter, this usage counter is not a guarantee of
> functional correctness, because sysfs can decrease that count again.
> 
>   ---
> 
> Note that we also move pm_runtime_set_active(), but leave
> pm_runtime_forbid() in place earlier in the initialization sequence, to
> avoid confusing user space. From Documentation/power/runtime_pm.rst:
> 
>   "It should be noted, however, that if the user space has already
>   intentionally changed the value of /sys/devices/.../power/control to
>   "auto" to allow the driver to power manage the device at run time, the
>   driver may confuse it by using pm_runtime_forbid() this way."
> 
> Thus, we should ensure pm_runtime_forbid() is called before the device
> is available to user space.
> 
> Link: https://lore.kernel.org/all/20251016155335.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
> Signed-off-by: Brian Norris <briannorris@chromium.org>
> Cc: <stable@vger.kernel.org>
> Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>

Applied to pci/pm for v6.20, thanks!  I tried to simplify the commit
log so the issue isn't hidden by details.  Happy to restore things if
I trimmed too much:

    PCI/PM: Prevent runtime suspend until devices are fully initialized

    Previously, it was possible for a PCI device to be runtime-suspended before
    it was fully initialized. When that happened, the suspend process could
    save invalid device state, for example, before BAR assignment. Restoring
    the invalid state during resume may leave the device non-functional.

    Prevent runtime suspend for PCI devices until they are fully initialized by
    deferring pm_runtime_enable().

    More details on how exactly this may occur:

      1. PCI device is created by pci_scan_slot() or similar

      2. As part of pci_scan_slot(), pci_pm_init() puts the device in D0 and
         prevents runtime suspend prevented via pm_runtime_forbid()

      3. pci_device_add() adds the underlying 'struct device' via device_add(),
         which means user space can allow runtime suspend, e.g.,

           echo auto > /sys/bus/pci/devices/.../power/control

      4. PCI device receives BAR configuration
         (pci_assign_unassigned_bus_resources(), etc.)

      5. pci_bus_add_device() applies final fixups, saves device state, and
         tries to attach a driver

    The device may potentially be suspended between #3 and #5, so this is racy
    with user space (udev or similar).

    Many PCI devices are enumerated at subsys_initcall time and so will not
    race with user space, but devices created later by hotplug or modular
    pwrctrl or host controller drivers are susceptible to this race.

    More runtime PM details at the first Link: below.

    Signed-off-by: Brian Norris <briannorris@chromium.org>
    [bhelgaas: simplify commit log]
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/all/20251016155335.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
    Link: https://patch.msgid.link/20251023140901.v4.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid


> ---
> 
> Changes in v4:
>  * Move pm_runtime_set_active() too
> 
> Changes in v3:
>  * Add Link to initial discussion
>  * Add Rafael's Reviewed-by
>  * Add lengthier footnotes about forbid vs allow vs sysfs
> 
> Changes in v2:
>  * Update CC list
>  * Rework problem description
>  * Update solution: defer pm_runtime_enable(), instead of trying to
>    get()/put()
> 
>  drivers/pci/bus.c | 4 ++++
>  drivers/pci/pci.c | 2 --
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index f26aec6ff588..40ff954f416f 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -14,6 +14,7 @@
>  #include <linux/of.h>
>  #include <linux/of_platform.h>
>  #include <linux/platform_device.h>
> +#include <linux/pm_runtime.h>
>  #include <linux/proc_fs.h>
>  #include <linux/slab.h>
>  
> @@ -375,6 +376,9 @@ void pci_bus_add_device(struct pci_dev *dev)
>  		put_device(&pdev->dev);
>  	}
>  
> +	pm_runtime_set_active(&dev->dev);
> +	pm_runtime_enable(&dev->dev);
> +
>  	if (!dn || of_device_is_available(dn))
>  		pci_dev_allow_binding(dev);
>  
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index b14dd064006c..234bf3608569 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3225,8 +3225,6 @@ void pci_pm_init(struct pci_dev *dev)
>  poweron:
>  	pci_pm_power_up_and_verify_state(dev);
>  	pm_runtime_forbid(&dev->dev);
> -	pm_runtime_set_active(&dev->dev);
> -	pm_runtime_enable(&dev->dev);
>  }
>  
>  static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop)
> -- 
> 2.51.1.821.gb6fe4d2222-goog
>
Re: [PATCH v4] PCI/PM: Prevent runtime suspend before devices are fully initialized
Posted by Marek Szyprowski 3 weeks, 4 days ago
On 06.01.2026 23:27, Bjorn Helgaas wrote:
> On Thu, Oct 23, 2025 at 02:09:01PM -0700, Brian Norris wrote:
>> Today, it's possible for a PCI device to be created and
>> runtime-suspended before it is fully initialized. When that happens, the
>> device will remain in D0, but the suspend process may save an
>> intermediate version of that device's state -- for example, without
>> appropriate BAR configuration. When the device later resumes, we'll
>> restore invalid PCI state and the device may not function.
>>
>> Prevent runtime suspend for PCI devices by deferring pm_runtime_enable()
>> until we've fully initialized the device.
>>
>> More details on how exactly this may occur:
>>
>> 1. PCI device is created by pci_scan_slot() or similar
>> 2. As part of pci_scan_slot(), pci_pm_init() enables runtime PM; the
>>     device starts "active" and we initially prevent (pm_runtime_forbid())
>>     suspend -- but see [*] footnote
>> 3. Underlying 'struct device' is added to the system (device_add());
>>     runtime PM can now be configured by user space
>> 4. PCI device receives BAR configuration
>>     (pci_assign_unassigned_bus_resources(), etc.)
>> 5. PCI device is added to the system in pci_bus_add_device()
>>
>> The device may potentially suspend between #3 and #4.
>>
>> [*] By default, pm_runtime_forbid() prevents suspending a device; but by
>> design [**], this can be overridden by user space policy via
>>
>>    echo auto > /sys/bus/pci/devices/.../power/control
>>
>> Thus, the above #3/#4 sequence is racy with user space (udev or
>> similar).
>>
>> Notably, many PCI devices are enumerated at subsys_initcall time and so
>> will not race with user space. However, there are several scenarios
>> where PCI devices are created later on, such as with hotplug or when
>> drivers (pwrctrl or controller drivers) are built as modules.
>>
>>    ---
>>
>> [**] The relationship between pm_runtime_forbid(), pm_runtime_allow(),
>> /sys/.../power/control, and the runtime PM usage counter can be subtle.
>> It appears that the intention of pm_runtime_forbid() /
>> pm_runtime_allow() is twofold:
>>
>> 1. Allow the user to disable runtime_pm (force device to always be
>>     powered on) through sysfs.
>> 2. Allow the driver to start with runtime_pm disabled (device forced
>>     on) and user space could later enable runtime_pm.
>>
>> This conclusion comes from reading `Documentation/power/runtime_pm.rst`,
>> specifically the section starting "The user space can effectively
>> disallow".
>>
>> This means that while pm_runtime_forbid() does technically increase the
>> runtime PM usage counter, this usage counter is not a guarantee of
>> functional correctness, because sysfs can decrease that count again.
>>
>>    ---
>>
>> Note that we also move pm_runtime_set_active(), but leave
>> pm_runtime_forbid() in place earlier in the initialization sequence, to
>> avoid confusing user space. From Documentation/power/runtime_pm.rst:
>>
>>    "It should be noted, however, that if the user space has already
>>    intentionally changed the value of /sys/devices/.../power/control to
>>    "auto" to allow the driver to power manage the device at run time, the
>>    driver may confuse it by using pm_runtime_forbid() this way."
>>
>> Thus, we should ensure pm_runtime_forbid() is called before the device
>> is available to user space.
>>
>> Link: https://lore.kernel.org/all/20251016155335.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
>> Signed-off-by: Brian Norris <briannorris@chromium.org>
>> Cc: <stable@vger.kernel.org>
>> Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
> Applied to pci/pm for v6.20, thanks!  I tried to simplify the commit
> log so the issue isn't hidden by details.  Happy to restore things if
> I trimmed too much:
>
>      PCI/PM: Prevent runtime suspend until devices are fully initialized
>
>      Previously, it was possible for a PCI device to be runtime-suspended before
>      it was fully initialized. When that happened, the suspend process could
>      save invalid device state, for example, before BAR assignment. Restoring
>      the invalid state during resume may leave the device non-functional.
>
>      Prevent runtime suspend for PCI devices until they are fully initialized by
>      deferring pm_runtime_enable().
>
>      More details on how exactly this may occur:
>
>        1. PCI device is created by pci_scan_slot() or similar
>
>        2. As part of pci_scan_slot(), pci_pm_init() puts the device in D0 and
>           prevents runtime suspend prevented via pm_runtime_forbid()
>
>        3. pci_device_add() adds the underlying 'struct device' via device_add(),
>           which means user space can allow runtime suspend, e.g.,
>
>             echo auto > /sys/bus/pci/devices/.../power/control
>
>        4. PCI device receives BAR configuration
>           (pci_assign_unassigned_bus_resources(), etc.)
>
>        5. pci_bus_add_device() applies final fixups, saves device state, and
>           tries to attach a driver
>
>      The device may potentially be suspended between #3 and #5, so this is racy
>      with user space (udev or similar).
>
>      Many PCI devices are enumerated at subsys_initcall time and so will not
>      race with user space, but devices created later by hotplug or modular
>      pwrctrl or host controller drivers are susceptible to this race.
>
>      More runtime PM details at the first Link: below.
>
>      Signed-off-by: Brian Norris <briannorris@chromium.org>
>      [bhelgaas: simplify commit log]
>      Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>      Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
>      Cc: stable@vger.kernel.org
>      Link: https://lore.kernel.org/all/20251016155335.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
>      Link: https://patch.msgid.link/20251023140901.v4.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid

This patch landed recently in linux-next as commit c796513dc54e 
("PCI/PM: Prevent runtime suspend until devices are fully initialized"). 
In my tests I found that it sometimes causes the "pci 0000:01:00.0: 
runtime PM trying to activate child device 0000:01:00.0 but parent 
(0000:00:00.0) is not active" warning on Qualcomm Robotics RB5 board 
(arch/arm64/boot/dts/qcom/qrb5165-rb5.dts). This in turn causes a 
lockdep warning about console lock, but this is just a consequence of 
the runtime pm warning. Reverting $subject patch on top of current 
linux-next hides this warning.

Here is a kernel log:

pci 0000:01:00.0: [17cb:1101] type 00 class 0xff0000 PCIe Endpoint
pci 0000:01:00.0: BAR 0 [mem 0x00000000-0x000fffff 64bit]
pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 
GT/s PCIe x1 link at 0000:00:00.0 (capable of 7.876 Gb/s with 8.0 GT/s 
PCIe x1 link)
pci 0000:01:00.0: Adding to iommu group 13
pci 0000:01:00.0: ASPM: default states L0s L1
pcieport 0000:00:00.0: bridge window [mem 0x60400000-0x604fffff]: assigned
pci 0000:01:00.0: BAR 0 [mem 0x60400000-0x604fffff 64bit]: assigned
pci 0000:01:00.0: runtime PM trying to activate child device 
0000:01:00.0 but parent (0000:00:00.0) is not active

======================================================
WARNING: possible circular locking dependency detected
6.19.0-rc1+ #16398 Not tainted
------------------------------------------------------
kworker/3:0/33 is trying to acquire lock:
ffffcd182ff1ae98 (console_owner){..-.}-{0:0}, at: 
console_lock_spinning_enable+0x44/0x78

but task is already holding lock:
ffff0000835c5238 (&dev->power.lock/1){....}-{3:3}, at: 
__pm_runtime_set_status+0x240/0x384

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (&dev->power.lock/1){....}-{3:3}:
        _raw_spin_lock_nested+0x44/0x5c
        __pm_runtime_set_status+0x240/0x384
        arm_smmu_device_probe+0xbe0/0xe5c
        platform_probe+0x5c/0xac
        really_probe+0xbc/0x298
        __driver_probe_device+0x78/0x12c
        driver_probe_device+0x40/0x164
        __driver_attach+0x9c/0x1ac
        bus_for_each_dev+0x74/0xd0
        driver_attach+0x24/0x30
        bus_add_driver+0xe4/0x208
        driver_register+0x60/0x128
        __platform_driver_register+0x24/0x30
        arm_smmu_driver_init+0x20/0x2c
        do_one_initcall+0x64/0x308
        kernel_init_freeable+0x284/0x500
        kernel_init+0x20/0x1d8
        ret_from_fork+0x10/0x20

-> #2 (&dev->power.lock){-...}-{3:3}:
        _raw_spin_lock_irqsave+0x60/0x88
        __pm_runtime_resume+0x4c/0xbc
        __uart_start+0x4c/0x114
        uart_write+0x98/0x278
        n_tty_write+0x1dc/0x4f0
        file_tty_write.constprop.0+0x12c/0x2bc
        redirected_tty_write+0xc0/0x108
        do_iter_readv_writev+0xdc/0x1c0
        vfs_writev+0xf0/0x280
        do_writev+0x74/0x13c
        __arm64_sys_writev+0x20/0x2c
        invoke_syscall+0x48/0x10c
        el0_svc_common.constprop.0+0x40/0xe8
        do_el0_svc+0x20/0x2c
        el0_svc+0x50/0x2e8
        el0t_64_sync_handler+0xa0/0xe4
        el0t_64_sync+0x198/0x19c

-> #1 (&port_lock_key){-.-.}-{3:3}:
        _raw_spin_lock_irqsave+0x60/0x88
        qcom_geni_serial_console_write+0x50/0x344
        console_flush_one_record+0x33c/0x474
        console_unlock+0x80/0x14c
        vprintk_emit+0x258/0x3d0
        vprintk_default+0x38/0x44
        vprintk+0x28/0x34
        _printk+0x5c/0x84
        register_console+0x278/0x510
        serial_core_register_port+0x6cc/0x79c
        serial_ctrl_register_port+0x10/0x1c
        uart_add_one_port+0x10/0x1c
        qcom_geni_serial_probe+0x2c0/0x448
        platform_probe+0x5c/0xac
        really_probe+0xbc/0x298
        __driver_probe_device+0x78/0x12c
        driver_probe_device+0x40/0x164
        __device_attach_driver+0xb8/0x138
        bus_for_each_drv+0x80/0xdc
        __device_attach+0xa8/0x1b0
        device_initial_probe+0x50/0x54
        bus_probe_device+0x38/0xa8
        device_add+0x540/0x720
        of_device_add+0x44/0x60
        of_platform_device_create_pdata+0x90/0x11c
        of_platform_bus_create+0x17c/0x394
        of_platform_populate+0x58/0xf8
        devm_of_platform_populate+0x58/0xbc
        geni_se_probe+0xdc/0x164
        platform_probe+0x5c/0xac
        really_probe+0xbc/0x298
        __driver_probe_device+0x78/0x12c
        driver_probe_device+0x40/0x164
        __device_attach_driver+0xb8/0x138
        bus_for_each_drv+0x80/0xdc
        __device_attach+0xa8/0x1b0
        device_initial_probe+0x50/0x54
        bus_probe_device+0x38/0xa8
        deferred_probe_work_func+0x8c/0xc8
        process_one_work+0x208/0x604
        worker_thread+0x244/0x388
        kthread+0x150/0x228
        ret_from_fork+0x10/0x20

-> #0 (console_owner){..-.}-{0:0}:
        __lock_acquire+0x1408/0x2254
        lock_acquire+0x1c8/0x354
        console_lock_spinning_enable+0x68/0x78
        console_flush_one_record+0x300/0x474
        console_unlock+0x80/0x14c
        vprintk_emit+0x258/0x3d0
        dev_vprintk_emit+0xd8/0x1a0
        dev_printk_emit+0x58/0x80
        __dev_printk+0x3c/0x88
        _dev_err+0x60/0x88
        __pm_runtime_set_status+0x28c/0x384
        pci_bus_add_device+0xa4/0x18c
        pci_bus_add_devices+0x3c/0x88
        pci_bus_add_devices+0x68/0x88
        pci_rescan_bus+0x30/0x44
        rescan_work_func+0x28/0x3c
        process_one_work+0x208/0x604
        worker_thread+0x244/0x388
        kthread+0x150/0x228
        ret_from_fork+0x10/0x20

other info that might help us debug this:

Chain exists of:
   console_owner --> &dev->power.lock --> &dev->power.lock/1

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&dev->power.lock/1);
                                lock(&dev->power.lock);
lock(&dev->power.lock/1);
   lock(console_owner);

  *** DEADLOCK ***

7 locks held by kworker/3:0/33:
  #0: ffff00008000d948 ((wq_completion)events){+.+.}-{0:0}, at: 
process_one_work+0x18c/0x604
  #1: ffff8000802a3de0 ((work_completion)(&pwrctrl->work)){+.+.}-{0:0}, 
at: process_one_work+0x1b4/0x604
  #2: ffffcd18301138e8 (pci_rescan_remove_lock){+.+.}-{4:4}, at: 
pci_lock_rescan_remove+0x1c/0x28
  #3: ffff00008ac8a238 (&dev->power.lock){-...}-{3:3}, at: 
__pm_runtime_set_status+0x1d4/0x384
  #4: ffff0000835c5238 (&dev->power.lock/1){....}-{3:3}, at: 
__pm_runtime_set_status+0x240/0x384
  #5: ffffcd182ff1ac78 (console_lock){+.+.}-{0:0}, at: 
dev_vprintk_emit+0xd8/0x1a0
  #6: ffffcd182ff1acd8 (console_srcu){....}-{0:0}, at: 
console_flush_one_record+0x0/0x474

stack backtrace:
CPU: 3 UID: 0 PID: 33 Comm: kworker/3:0 Not tainted 6.19.0-rc1+ #16398 
PREEMPT
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
Workqueue: events rescan_work_func
Call trace:
  show_stack+0x18/0x24 (C)
  dump_stack_lvl+0x90/0xd0
  dump_stack+0x18/0x24
  print_circular_bug+0x298/0x37c
  check_noncircular+0x15c/0x170
  __lock_acquire+0x1408/0x2254
  lock_acquire+0x1c8/0x354
  console_lock_spinning_enable+0x68/0x78
  console_flush_one_record+0x300/0x474
  console_unlock+0x80/0x14c
  vprintk_emit+0x258/0x3d0
  dev_vprintk_emit+0xd8/0x1a0
  dev_printk_emit+0x58/0x80
  __dev_printk+0x3c/0x88
  _dev_err+0x60/0x88
  __pm_runtime_set_status+0x28c/0x384
  pci_bus_add_device+0xa4/0x18c
  pci_bus_add_devices+0x3c/0x88
  pci_bus_add_devices+0x68/0x88
  pci_rescan_bus+0x30/0x44
  rescan_work_func+0x28/0x3c
  process_one_work+0x208/0x604
  worker_thread+0x244/0x388
  kthread+0x150/0x228
  ret_from_fork+0x10/0x20

This looks a bit similar to the issue reported some time ago on a 
different board:

https://lore.kernel.org/all/6d438995-4d6d-4a21-9ad2-8a0352482d44@samsung.com/

Let me know if I can somehow help debugging this.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Re: [PATCH v4] PCI/PM: Prevent runtime suspend before devices are fully initialized
Posted by Brian Norris 3 weeks, 4 days ago
Hi Marek,

On Wed, Jan 14, 2026 at 10:46:41AM +0100, Marek Szyprowski wrote:
> On 06.01.2026 23:27, Bjorn Helgaas wrote:
> > On Thu, Oct 23, 2025 at 02:09:01PM -0700, Brian Norris wrote:
> >> Today, it's possible for a PCI device to be created and
> >> runtime-suspended before it is fully initialized. When that happens, the
> >> device will remain in D0, but the suspend process may save an
> >> intermediate version of that device's state -- for example, without
> >> appropriate BAR configuration. When the device later resumes, we'll
> >> restore invalid PCI state and the device may not function.
> >>
> >> Prevent runtime suspend for PCI devices by deferring pm_runtime_enable()
> >> until we've fully initialized the device.
...

> This patch landed recently in linux-next as commit c796513dc54e 
> ("PCI/PM: Prevent runtime suspend until devices are fully initialized"). 
> In my tests I found that it sometimes causes the "pci 0000:01:00.0: 
> runtime PM trying to activate child device 0000:01:00.0 but parent 
> (0000:00:00.0) is not active" warning on Qualcomm Robotics RB5 board 
> (arch/arm64/boot/dts/qcom/qrb5165-rb5.dts). This in turn causes a 
> lockdep warning about console lock, but this is just a consequence of 
> the runtime pm warning. Reverting $subject patch on top of current 
> linux-next hides this warning.
> 
> Here is a kernel log:
> 
> pci 0000:01:00.0: [17cb:1101] type 00 class 0xff0000 PCIe Endpoint
> pci 0000:01:00.0: BAR 0 [mem 0x00000000-0x000fffff 64bit]
> pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
> pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 
> GT/s PCIe x1 link at 0000:00:00.0 (capable of 7.876 Gb/s with 8.0 GT/s 
> PCIe x1 link)
> pci 0000:01:00.0: Adding to iommu group 13
> pci 0000:01:00.0: ASPM: default states L0s L1
> pcieport 0000:00:00.0: bridge window [mem 0x60400000-0x604fffff]: assigned
> pci 0000:01:00.0: BAR 0 [mem 0x60400000-0x604fffff 64bit]: assigned
> pci 0000:01:00.0: runtime PM trying to activate child device 
> 0000:01:00.0 but parent (0000:00:00.0) is not active

Thanks for the report. I'll try to look at reproducing this, or at least
getting a better mental model of exactly why this might fail (or,
"warn") this way. But if you have the time and desire to try things out
for me, can you give v1 a try?

https://lore.kernel.org/all/20251016155335.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/

I'm pretty sure it would not invoke the same problem. I also suspect v3
might not, but I'm less sure:

https://lore.kernel.org/all/20251022141434.v3.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/

> ======================================================
> WARNING: possible circular locking dependency detected
> 6.19.0-rc1+ #16398 Not tainted
> ------------------------------------------------------
> kworker/3:0/33 is trying to acquire lock:
> ffffcd182ff1ae98 (console_owner){..-.}-{0:0}, at: 
> console_lock_spinning_enable+0x44/0x78
> 
> but task is already holding lock:
> ffff0000835c5238 (&dev->power.lock/1){....}-{3:3}, at: 
> __pm_runtime_set_status+0x240/0x384
> 
> which lock already depends on the new lock.

The lockdep warning is a bit messier, and I'd also have to take some
more time to be sure, but in principle, this sounds like a totally
orthogonal problem. It seems like simply performing printk() to a qcom
UART in the "wrong" context is enough to cause this. If so, that's
definitely a console/UART bug (or maybe a lockdep false positive) and
not a PCI/runtime-PM bug.

> the existing dependency chain (in reverse order) is:
> 
> -> #3 (&dev->power.lock/1){....}-{3:3}:
>         _raw_spin_lock_nested+0x44/0x5c
>         __pm_runtime_set_status+0x240/0x384
>         arm_smmu_device_probe+0xbe0/0xe5c
>         platform_probe+0x5c/0xac
>         really_probe+0xbc/0x298
>         __driver_probe_device+0x78/0x12c
>         driver_probe_device+0x40/0x164
>         __driver_attach+0x9c/0x1ac
>         bus_for_each_dev+0x74/0xd0
>         driver_attach+0x24/0x30
>         bus_add_driver+0xe4/0x208
>         driver_register+0x60/0x128
>         __platform_driver_register+0x24/0x30
>         arm_smmu_driver_init+0x20/0x2c
>         do_one_initcall+0x64/0x308
>         kernel_init_freeable+0x284/0x500
>         kernel_init+0x20/0x1d8
>         ret_from_fork+0x10/0x20
> 
> -> #2 (&dev->power.lock){-...}-{3:3}:
>         _raw_spin_lock_irqsave+0x60/0x88
>         __pm_runtime_resume+0x4c/0xbc
>         __uart_start+0x4c/0x114
>         uart_write+0x98/0x278
>         n_tty_write+0x1dc/0x4f0
>         file_tty_write.constprop.0+0x12c/0x2bc
>         redirected_tty_write+0xc0/0x108
>         do_iter_readv_writev+0xdc/0x1c0
>         vfs_writev+0xf0/0x280
>         do_writev+0x74/0x13c
>         __arm64_sys_writev+0x20/0x2c
>         invoke_syscall+0x48/0x10c
>         el0_svc_common.constprop.0+0x40/0xe8
>         do_el0_svc+0x20/0x2c
>         el0_svc+0x50/0x2e8
>         el0t_64_sync_handler+0xa0/0xe4
>         el0t_64_sync+0x198/0x19c
> 
> -> #1 (&port_lock_key){-.-.}-{3:3}:
>         _raw_spin_lock_irqsave+0x60/0x88
>         qcom_geni_serial_console_write+0x50/0x344
>         console_flush_one_record+0x33c/0x474
>         console_unlock+0x80/0x14c
>         vprintk_emit+0x258/0x3d0
>         vprintk_default+0x38/0x44
>         vprintk+0x28/0x34
>         _printk+0x5c/0x84
>         register_console+0x278/0x510
>         serial_core_register_port+0x6cc/0x79c
>         serial_ctrl_register_port+0x10/0x1c
>         uart_add_one_port+0x10/0x1c
>         qcom_geni_serial_probe+0x2c0/0x448
>         platform_probe+0x5c/0xac
>         really_probe+0xbc/0x298
>         __driver_probe_device+0x78/0x12c
>         driver_probe_device+0x40/0x164
>         __device_attach_driver+0xb8/0x138
>         bus_for_each_drv+0x80/0xdc
>         __device_attach+0xa8/0x1b0
>         device_initial_probe+0x50/0x54
>         bus_probe_device+0x38/0xa8
>         device_add+0x540/0x720
>         of_device_add+0x44/0x60
>         of_platform_device_create_pdata+0x90/0x11c
>         of_platform_bus_create+0x17c/0x394
>         of_platform_populate+0x58/0xf8
>         devm_of_platform_populate+0x58/0xbc
>         geni_se_probe+0xdc/0x164
>         platform_probe+0x5c/0xac
>         really_probe+0xbc/0x298
>         __driver_probe_device+0x78/0x12c
>         driver_probe_device+0x40/0x164
>         __device_attach_driver+0xb8/0x138
>         bus_for_each_drv+0x80/0xdc
>         __device_attach+0xa8/0x1b0
>         device_initial_probe+0x50/0x54
>         bus_probe_device+0x38/0xa8
>         deferred_probe_work_func+0x8c/0xc8
>         process_one_work+0x208/0x604
>         worker_thread+0x244/0x388
>         kthread+0x150/0x228
>         ret_from_fork+0x10/0x20
> 
> -> #0 (console_owner){..-.}-{0:0}:
>         __lock_acquire+0x1408/0x2254
>         lock_acquire+0x1c8/0x354
>         console_lock_spinning_enable+0x68/0x78
>         console_flush_one_record+0x300/0x474
>         console_unlock+0x80/0x14c
>         vprintk_emit+0x258/0x3d0
>         dev_vprintk_emit+0xd8/0x1a0
>         dev_printk_emit+0x58/0x80
>         __dev_printk+0x3c/0x88
>         _dev_err+0x60/0x88
>         __pm_runtime_set_status+0x28c/0x384
>         pci_bus_add_device+0xa4/0x18c
>         pci_bus_add_devices+0x3c/0x88
>         pci_bus_add_devices+0x68/0x88
>         pci_rescan_bus+0x30/0x44
>         rescan_work_func+0x28/0x3c
>         process_one_work+0x208/0x604
>         worker_thread+0x244/0x388
>         kthread+0x150/0x228
>         ret_from_fork+0x10/0x20
> 
> other info that might help us debug this:
> 
> Chain exists of:
>    console_owner --> &dev->power.lock --> &dev->power.lock/1
> 
>   Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(&dev->power.lock/1);
>                                 lock(&dev->power.lock);
> lock(&dev->power.lock/1);
>    lock(console_owner);
> 
>   *** DEADLOCK ***
> 
> 7 locks held by kworker/3:0/33:
>   #0: ffff00008000d948 ((wq_completion)events){+.+.}-{0:0}, at: 
> process_one_work+0x18c/0x604
>   #1: ffff8000802a3de0 ((work_completion)(&pwrctrl->work)){+.+.}-{0:0}, 
> at: process_one_work+0x1b4/0x604
>   #2: ffffcd18301138e8 (pci_rescan_remove_lock){+.+.}-{4:4}, at: 
> pci_lock_rescan_remove+0x1c/0x28
>   #3: ffff00008ac8a238 (&dev->power.lock){-...}-{3:3}, at: 
> __pm_runtime_set_status+0x1d4/0x384
>   #4: ffff0000835c5238 (&dev->power.lock/1){....}-{3:3}, at: 
> __pm_runtime_set_status+0x240/0x384
>   #5: ffffcd182ff1ac78 (console_lock){+.+.}-{0:0}, at: 
> dev_vprintk_emit+0xd8/0x1a0
>   #6: ffffcd182ff1acd8 (console_srcu){....}-{0:0}, at: 
> console_flush_one_record+0x0/0x474
> 
> stack backtrace:
> CPU: 3 UID: 0 PID: 33 Comm: kworker/3:0 Not tainted 6.19.0-rc1+ #16398 
> PREEMPT
> Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
> Workqueue: events rescan_work_func
> Call trace:
>   show_stack+0x18/0x24 (C)
>   dump_stack_lvl+0x90/0xd0
>   dump_stack+0x18/0x24
>   print_circular_bug+0x298/0x37c
>   check_noncircular+0x15c/0x170
>   __lock_acquire+0x1408/0x2254
>   lock_acquire+0x1c8/0x354
>   console_lock_spinning_enable+0x68/0x78
>   console_flush_one_record+0x300/0x474
>   console_unlock+0x80/0x14c
>   vprintk_emit+0x258/0x3d0
>   dev_vprintk_emit+0xd8/0x1a0
>   dev_printk_emit+0x58/0x80
>   __dev_printk+0x3c/0x88
>   _dev_err+0x60/0x88
>   __pm_runtime_set_status+0x28c/0x384
>   pci_bus_add_device+0xa4/0x18c
>   pci_bus_add_devices+0x3c/0x88
>   pci_bus_add_devices+0x68/0x88
>   pci_rescan_bus+0x30/0x44
>   rescan_work_func+0x28/0x3c
>   process_one_work+0x208/0x604
>   worker_thread+0x244/0x388
>   kthread+0x150/0x228
>   ret_from_fork+0x10/0x20
> 
> This looks a bit similar to the issue reported some time ago on a 
> different board:
> 
> https://lore.kernel.org/all/6d438995-4d6d-4a21-9ad2-8a0352482d44@samsung.com/

Huh, yeah, the lockdep warning is rather similar looking. So that bug
(whether real or false positive) may have been around a while.

And the "Enabling runtime PM for inactive device with active children"
log is similar, but it involves a different set of devices -- now we're
dealing with the PCIe port and child device, whereas that report was
about the host bridge/controller device.

Brian

> Let me know if I can somehow help debugging this.
> 
> Best regards
> -- 
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland
> 
Re: [PATCH v4] PCI/PM: Prevent runtime suspend before devices are fully initialized
Posted by Marek Szyprowski 3 weeks, 3 days ago
Hi Brian,

On 14.01.2026 21:10, Brian Norris wrote:
> On Wed, Jan 14, 2026 at 10:46:41AM +0100, Marek Szyprowski wrote:
>> On 06.01.2026 23:27, Bjorn Helgaas wrote:
>>> On Thu, Oct 23, 2025 at 02:09:01PM -0700, Brian Norris wrote:
>>>> Today, it's possible for a PCI device to be created and
>>>> runtime-suspended before it is fully initialized. When that happens, the
>>>> device will remain in D0, but the suspend process may save an
>>>> intermediate version of that device's state -- for example, without
>>>> appropriate BAR configuration. When the device later resumes, we'll
>>>> restore invalid PCI state and the device may not function.
>>>>
>>>> Prevent runtime suspend for PCI devices by deferring pm_runtime_enable()
>>>> until we've fully initialized the device.
> ...
>> This patch landed recently in linux-next as commit c796513dc54e
>> ("PCI/PM: Prevent runtime suspend until devices are fully initialized").
>> In my tests I found that it sometimes causes the "pci 0000:01:00.0:
>> runtime PM trying to activate child device 0000:01:00.0 but parent
>> (0000:00:00.0) is not active" warning on Qualcomm Robotics RB5 board
>> (arch/arm64/boot/dts/qcom/qrb5165-rb5.dts). This in turn causes a
>> lockdep warning about console lock, but this is just a consequence of
>> the runtime pm warning. Reverting $subject patch on top of current
>> linux-next hides this warning.
>>
>> Here is a kernel log:
>>
>> pci 0000:01:00.0: [17cb:1101] type 00 class 0xff0000 PCIe Endpoint
>> pci 0000:01:00.0: BAR 0 [mem 0x00000000-0x000fffff 64bit]
>> pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
>> pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0
>> GT/s PCIe x1 link at 0000:00:00.0 (capable of 7.876 Gb/s with 8.0 GT/s
>> PCIe x1 link)
>> pci 0000:01:00.0: Adding to iommu group 13
>> pci 0000:01:00.0: ASPM: default states L0s L1
>> pcieport 0000:00:00.0: bridge window [mem 0x60400000-0x604fffff]: assigned
>> pci 0000:01:00.0: BAR 0 [mem 0x60400000-0x604fffff 64bit]: assigned
>> pci 0000:01:00.0: runtime PM trying to activate child device
>> 0000:01:00.0 but parent (0000:00:00.0) is not active
> Thanks for the report. I'll try to look at reproducing this, or at least
> getting a better mental model of exactly why this might fail (or,
> "warn") this way. But if you have the time and desire to try things out
> for me, can you give v1 a try?
>
> https://lore.kernel.org/all/20251016155335.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
>
> I'm pretty sure it would not invoke the same problem.

Right, this one works fine.

> I also suspect v3
> might not, but I'm less sure:
>
> https://lore.kernel.org/all/20251022141434.v3.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
This one too, at least I was not able to reproduce any fail.

>> ======================================================
>> WARNING: possible circular locking dependency detected
>> 6.19.0-rc1+ #16398 Not tainted
>> ------------------------------------------------------
>> kworker/3:0/33 is trying to acquire lock:
>> ffffcd182ff1ae98 (console_owner){..-.}-{0:0}, at:
>> console_lock_spinning_enable+0x44/0x78
>>
>> but task is already holding lock:
>> ffff0000835c5238 (&dev->power.lock/1){....}-{3:3}, at:
>> __pm_runtime_set_status+0x240/0x384
>>
>> which lock already depends on the new lock.
> The lockdep warning is a bit messier, and I'd also have to take some
> more time to be sure, but in principle, this sounds like a totally
> orthogonal problem. It seems like simply performing printk() to a qcom
> UART in the "wrong" context is enough to cause this. If so, that's
> definitely a console/UART bug (or maybe a lockdep false positive) and
> not a PCI/runtime-PM bug.

Yes, the lockdep warning is not really a problem, it is just a 
consequence of the printing that "runtime PM trying to activate child 
device 0000:01:00.0 but parent (0000:00:00.0) is not active" message. 
However that message is itself a problem imho.

>> (...)
>>
>> This looks a bit similar to the issue reported some time ago on a
>> different board:
>>
>> https://lore.kernel.org/all/6d438995-4d6d-4a21-9ad2-8a0352482d44@samsung.com/
> Huh, yeah, the lockdep warning is rather similar looking. So that bug
> (whether real or false positive) may have been around a while.
>
> And the "Enabling runtime PM for inactive device with active children"
> log is similar, but it involves a different set of devices -- now we're
> dealing with the PCIe port and child device, whereas that report was
> about the host bridge/controller device.

Okay, so a bit different case. At least it confirms that the lockdep 
issue is not really a problem.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Re: [PATCH v4] PCI/PM: Prevent runtime suspend before devices are fully initialized
Posted by Brian Norris 3 weeks, 2 days ago
Hi Marek,

On Thu, Jan 15, 2026 at 12:14:49PM +0100, Marek Szyprowski wrote:
> On 14.01.2026 21:10, Brian Norris wrote:
> > On Wed, Jan 14, 2026 at 10:46:41AM +0100, Marek Szyprowski wrote:
> >> On 06.01.2026 23:27, Bjorn Helgaas wrote:
> >>> On Thu, Oct 23, 2025 at 02:09:01PM -0700, Brian Norris wrote:
> >>>> Today, it's possible for a PCI device to be created and
> >>>> runtime-suspended before it is fully initialized. When that happens, the
> >>>> device will remain in D0, but the suspend process may save an
> >>>> intermediate version of that device's state -- for example, without
> >>>> appropriate BAR configuration. When the device later resumes, we'll
> >>>> restore invalid PCI state and the device may not function.
> >>>>
> >>>> Prevent runtime suspend for PCI devices by deferring pm_runtime_enable()
> >>>> until we've fully initialized the device.
> > ...
> >> This patch landed recently in linux-next as commit c796513dc54e
> >> ("PCI/PM: Prevent runtime suspend until devices are fully initialized").
> >> In my tests I found that it sometimes causes the "pci 0000:01:00.0:
> >> runtime PM trying to activate child device 0000:01:00.0 but parent
> >> (0000:00:00.0) is not active" warning on Qualcomm Robotics RB5 board
> >> (arch/arm64/boot/dts/qcom/qrb5165-rb5.dts). This in turn causes a
> >> lockdep warning about console lock, but this is just a consequence of
> >> the runtime pm warning. Reverting $subject patch on top of current
> >> linux-next hides this warning.
> >>
> >> Here is a kernel log:
> >>
> >> pci 0000:01:00.0: [17cb:1101] type 00 class 0xff0000 PCIe Endpoint
> >> pci 0000:01:00.0: BAR 0 [mem 0x00000000-0x000fffff 64bit]
> >> pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
> >> pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0
> >> GT/s PCIe x1 link at 0000:00:00.0 (capable of 7.876 Gb/s with 8.0 GT/s
> >> PCIe x1 link)
> >> pci 0000:01:00.0: Adding to iommu group 13
> >> pci 0000:01:00.0: ASPM: default states L0s L1
> >> pcieport 0000:00:00.0: bridge window [mem 0x60400000-0x604fffff]: assigned
> >> pci 0000:01:00.0: BAR 0 [mem 0x60400000-0x604fffff 64bit]: assigned
> >> pci 0000:01:00.0: runtime PM trying to activate child device
> >> 0000:01:00.0 but parent (0000:00:00.0) is not active
> > Thanks for the report. I'll try to look at reproducing this, or at least
> > getting a better mental model of exactly why this might fail (or,
> > "warn") this way. But if you have the time and desire to try things out
> > for me, can you give v1 a try?
> >
> > https://lore.kernel.org/all/20251016155335.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
> >
> > I'm pretty sure it would not invoke the same problem.
> 
> Right, this one works fine.
> 
> > I also suspect v3
> > might not, but I'm less sure:
> >
> > https://lore.kernel.org/all/20251022141434.v3.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
> This one too, at least I was not able to reproduce any fail.

Thanks for testing. I'm still not sure exactly how to reproduce your
failure, but it seems as if the root port is being allowed to suspend
before the endpoint is added to the system, and it remains so while the
endpoint is about to probe. device_initial_probe() will be OK with
respect to PM, since it will wake up the port if needed. But this
particular code is not OK, since it doesn't ensure the parent device is
active while preparing the endpoint power state.

I suppose one way to "solve" that is (untested):

--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -380,8 +380,12 @@ void pci_bus_add_device(struct pci_dev *dev)
 		put_device(&pdev->dev);
 	}
 
+	if (dev->dev.parent)
+		pm_runtime_get_sync(dev->dev.parent);
 	pm_runtime_set_active(&dev->dev);
 	pm_runtime_enable(&dev->dev);
+	if (dev->dev.parent)
+		pm_runtime_put(dev->dev.parent);
 
 	if (!dn || of_device_is_available(dn))
 		pci_dev_allow_binding(dev);

Personally, I'm more inclined to go back to v1, since it prepares the
runtime PM status when the device is first discovered. That way, its
ancestors are still active, avoiding these sorts of problems. I'm
frankly not sure of all the reasons Rafael recommended I make the
v1->v3->v4 changes, and now that they cause problems, I'm inclined to
question them again.

Rafael, do you have any thoughts?

Brian
Re: [PATCH v4] PCI/PM: Prevent runtime suspend before devices are fully initialized
Posted by Rafael J. Wysocki 3 weeks ago
On Sat, Jan 17, 2026 at 2:19 AM Brian Norris <briannorris@chromium.org> wrote:
>
> Hi Marek,
>
> On Thu, Jan 15, 2026 at 12:14:49PM +0100, Marek Szyprowski wrote:
> > On 14.01.2026 21:10, Brian Norris wrote:
> > > On Wed, Jan 14, 2026 at 10:46:41AM +0100, Marek Szyprowski wrote:
> > >> On 06.01.2026 23:27, Bjorn Helgaas wrote:
> > >>> On Thu, Oct 23, 2025 at 02:09:01PM -0700, Brian Norris wrote:
> > >>>> Today, it's possible for a PCI device to be created and
> > >>>> runtime-suspended before it is fully initialized. When that happens, the
> > >>>> device will remain in D0, but the suspend process may save an
> > >>>> intermediate version of that device's state -- for example, without
> > >>>> appropriate BAR configuration. When the device later resumes, we'll
> > >>>> restore invalid PCI state and the device may not function.
> > >>>>
> > >>>> Prevent runtime suspend for PCI devices by deferring pm_runtime_enable()
> > >>>> until we've fully initialized the device.
> > > ...
> > >> This patch landed recently in linux-next as commit c796513dc54e
> > >> ("PCI/PM: Prevent runtime suspend until devices are fully initialized").
> > >> In my tests I found that it sometimes causes the "pci 0000:01:00.0:
> > >> runtime PM trying to activate child device 0000:01:00.0 but parent
> > >> (0000:00:00.0) is not active" warning on Qualcomm Robotics RB5 board
> > >> (arch/arm64/boot/dts/qcom/qrb5165-rb5.dts). This in turn causes a
> > >> lockdep warning about console lock, but this is just a consequence of
> > >> the runtime pm warning. Reverting $subject patch on top of current
> > >> linux-next hides this warning.
> > >>
> > >> Here is a kernel log:
> > >>
> > >> pci 0000:01:00.0: [17cb:1101] type 00 class 0xff0000 PCIe Endpoint
> > >> pci 0000:01:00.0: BAR 0 [mem 0x00000000-0x000fffff 64bit]
> > >> pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
> > >> pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0
> > >> GT/s PCIe x1 link at 0000:00:00.0 (capable of 7.876 Gb/s with 8.0 GT/s
> > >> PCIe x1 link)
> > >> pci 0000:01:00.0: Adding to iommu group 13
> > >> pci 0000:01:00.0: ASPM: default states L0s L1
> > >> pcieport 0000:00:00.0: bridge window [mem 0x60400000-0x604fffff]: assigned
> > >> pci 0000:01:00.0: BAR 0 [mem 0x60400000-0x604fffff 64bit]: assigned
> > >> pci 0000:01:00.0: runtime PM trying to activate child device
> > >> 0000:01:00.0 but parent (0000:00:00.0) is not active
> > > Thanks for the report. I'll try to look at reproducing this, or at least
> > > getting a better mental model of exactly why this might fail (or,
> > > "warn") this way. But if you have the time and desire to try things out
> > > for me, can you give v1 a try?
> > >
> > > https://lore.kernel.org/all/20251016155335.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
> > >
> > > I'm pretty sure it would not invoke the same problem.
> >
> > Right, this one works fine.
> >
> > > I also suspect v3
> > > might not, but I'm less sure:
> > >
> > > https://lore.kernel.org/all/20251022141434.v3.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
> > This one too, at least I was not able to reproduce any fail.
>
> Thanks for testing. I'm still not sure exactly how to reproduce your
> failure, but it seems as if the root port is being allowed to suspend
> before the endpoint is added to the system, and it remains so while the
> endpoint is about to probe. device_initial_probe() will be OK with
> respect to PM, since it will wake up the port if needed. But this
> particular code is not OK, since it doesn't ensure the parent device is
> active while preparing the endpoint power state.
>
> I suppose one way to "solve" that is (untested):
>
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -380,8 +380,12 @@ void pci_bus_add_device(struct pci_dev *dev)
>                 put_device(&pdev->dev);
>         }
>
> +       if (dev->dev.parent)
> +               pm_runtime_get_sync(dev->dev.parent);
>         pm_runtime_set_active(&dev->dev);
>         pm_runtime_enable(&dev->dev);
> +       if (dev->dev.parent)
> +               pm_runtime_put(dev->dev.parent);
>
>         if (!dn || of_device_is_available(dn))
>                 pci_dev_allow_binding(dev);
>
> Personally, I'm more inclined to go back to v1, since it prepares the
> runtime PM status when the device is first discovered. That way, its
> ancestors are still active, avoiding these sorts of problems. I'm
> frankly not sure of all the reasons Rafael recommended I make the
> v1->v3->v4 changes, and now that they cause problems, I'm inclined to
> question them again.
>
> Rafael, do you have any thoughts?

Yeah.

Move back pm_runtime_set_active(&dev->dev) back to pm_runtime_init()
because that would prevent the parent from suspending and keep
pm_runtime_enable() here because that would prevent the device itself
from suspending between pm_runtime_init() and this place.

And I would add comments in both places.
Re: [PATCH v4] PCI/PM: Prevent runtime suspend before devices are fully initialized
Posted by Brian Norris 2 weeks, 3 days ago
Hi Rafael,

Thanks for your thoughts!

On Sun, Jan 18, 2026 at 12:53:21PM +0100, Rafael J. Wysocki wrote:
> On Sat, Jan 17, 2026 at 2:19 AM Brian Norris <briannorris@chromium.org> wrote:
> > I suppose one way to "solve" that is (untested):
> >
> > --- a/drivers/pci/bus.c
> > +++ b/drivers/pci/bus.c
> > @@ -380,8 +380,12 @@ void pci_bus_add_device(struct pci_dev *dev)
> >                 put_device(&pdev->dev);
> >         }
> >
> > +       if (dev->dev.parent)
> > +               pm_runtime_get_sync(dev->dev.parent);
> >         pm_runtime_set_active(&dev->dev);
> >         pm_runtime_enable(&dev->dev);
> > +       if (dev->dev.parent)
> > +               pm_runtime_put(dev->dev.parent);
> >
> >         if (!dn || of_device_is_available(dn))
> >                 pci_dev_allow_binding(dev);
> >
> > Personally, I'm more inclined to go back to v1, since it prepares the
> > runtime PM status when the device is first discovered. That way, its
> > ancestors are still active, avoiding these sorts of problems. I'm
> > frankly not sure of all the reasons Rafael recommended I make the
> > v1->v3->v4 changes, and now that they cause problems, I'm inclined to
> > question them again.
> >
> > Rafael, do you have any thoughts?
> 
> Yeah.
> 
> Move back pm_runtime_set_active(&dev->dev) back to pm_runtime_init()
> because that would prevent the parent from suspending and keep
> pm_runtime_enable() here because that would prevent the device itself
> from suspending between pm_runtime_init() and this place.

I'll admit, I was a little fuzzy on the details of the first part of the
sentence here -- specifically, that an "active" (but still "disabled")
device will prevent suspend of its parent. I suppose I'm more familiar
with the typical "disabled and suspended" device, which essentially has
no effect on its parent.

Anyway, that's basically v3, so I rerolled a v5 that looks similar.

> And I would add comments in both places.

I tried to add a short comment to each. It's an art form to write
exactly the right size of comment to make everyone happy (people
complain about too much commenting, and then others complain about
non-obvious behaviors that could have used more comments), especially
when it comes to something as tricky as runtime PM. At least I tried...

Brian
Re: [PATCH v4] PCI/PM: Prevent runtime suspend before devices are fully initialized
Posted by Rafael J. Wysocki 3 weeks ago
On Sun, Jan 18, 2026 at 12:53 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Sat, Jan 17, 2026 at 2:19 AM Brian Norris <briannorris@chromium.org> wrote:
> >
> > Hi Marek,
> >
> > On Thu, Jan 15, 2026 at 12:14:49PM +0100, Marek Szyprowski wrote:
> > > On 14.01.2026 21:10, Brian Norris wrote:
> > > > On Wed, Jan 14, 2026 at 10:46:41AM +0100, Marek Szyprowski wrote:
> > > >> On 06.01.2026 23:27, Bjorn Helgaas wrote:
> > > >>> On Thu, Oct 23, 2025 at 02:09:01PM -0700, Brian Norris wrote:
> > > >>>> Today, it's possible for a PCI device to be created and
> > > >>>> runtime-suspended before it is fully initialized. When that happens, the
> > > >>>> device will remain in D0, but the suspend process may save an
> > > >>>> intermediate version of that device's state -- for example, without
> > > >>>> appropriate BAR configuration. When the device later resumes, we'll
> > > >>>> restore invalid PCI state and the device may not function.
> > > >>>>
> > > >>>> Prevent runtime suspend for PCI devices by deferring pm_runtime_enable()
> > > >>>> until we've fully initialized the device.
> > > > ...
> > > >> This patch landed recently in linux-next as commit c796513dc54e
> > > >> ("PCI/PM: Prevent runtime suspend until devices are fully initialized").
> > > >> In my tests I found that it sometimes causes the "pci 0000:01:00.0:
> > > >> runtime PM trying to activate child device 0000:01:00.0 but parent
> > > >> (0000:00:00.0) is not active" warning on Qualcomm Robotics RB5 board
> > > >> (arch/arm64/boot/dts/qcom/qrb5165-rb5.dts). This in turn causes a
> > > >> lockdep warning about console lock, but this is just a consequence of
> > > >> the runtime pm warning. Reverting $subject patch on top of current
> > > >> linux-next hides this warning.
> > > >>
> > > >> Here is a kernel log:
> > > >>
> > > >> pci 0000:01:00.0: [17cb:1101] type 00 class 0xff0000 PCIe Endpoint
> > > >> pci 0000:01:00.0: BAR 0 [mem 0x00000000-0x000fffff 64bit]
> > > >> pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
> > > >> pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0
> > > >> GT/s PCIe x1 link at 0000:00:00.0 (capable of 7.876 Gb/s with 8.0 GT/s
> > > >> PCIe x1 link)
> > > >> pci 0000:01:00.0: Adding to iommu group 13
> > > >> pci 0000:01:00.0: ASPM: default states L0s L1
> > > >> pcieport 0000:00:00.0: bridge window [mem 0x60400000-0x604fffff]: assigned
> > > >> pci 0000:01:00.0: BAR 0 [mem 0x60400000-0x604fffff 64bit]: assigned
> > > >> pci 0000:01:00.0: runtime PM trying to activate child device
> > > >> 0000:01:00.0 but parent (0000:00:00.0) is not active
> > > > Thanks for the report. I'll try to look at reproducing this, or at least
> > > > getting a better mental model of exactly why this might fail (or,
> > > > "warn") this way. But if you have the time and desire to try things out
> > > > for me, can you give v1 a try?
> > > >
> > > > https://lore.kernel.org/all/20251016155335.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
> > > >
> > > > I'm pretty sure it would not invoke the same problem.
> > >
> > > Right, this one works fine.
> > >
> > > > I also suspect v3
> > > > might not, but I'm less sure:
> > > >
> > > > https://lore.kernel.org/all/20251022141434.v3.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
> > > This one too, at least I was not able to reproduce any fail.
> >
> > Thanks for testing. I'm still not sure exactly how to reproduce your
> > failure, but it seems as if the root port is being allowed to suspend
> > before the endpoint is added to the system, and it remains so while the
> > endpoint is about to probe. device_initial_probe() will be OK with
> > respect to PM, since it will wake up the port if needed. But this
> > particular code is not OK, since it doesn't ensure the parent device is
> > active while preparing the endpoint power state.
> >
> > I suppose one way to "solve" that is (untested):
> >
> > --- a/drivers/pci/bus.c
> > +++ b/drivers/pci/bus.c
> > @@ -380,8 +380,12 @@ void pci_bus_add_device(struct pci_dev *dev)
> >                 put_device(&pdev->dev);
> >         }
> >
> > +       if (dev->dev.parent)
> > +               pm_runtime_get_sync(dev->dev.parent);
> >         pm_runtime_set_active(&dev->dev);
> >         pm_runtime_enable(&dev->dev);
> > +       if (dev->dev.parent)
> > +               pm_runtime_put(dev->dev.parent);
> >
> >         if (!dn || of_device_is_available(dn))
> >                 pci_dev_allow_binding(dev);
> >
> > Personally, I'm more inclined to go back to v1, since it prepares the
> > runtime PM status when the device is first discovered. That way, its
> > ancestors are still active, avoiding these sorts of problems. I'm
> > frankly not sure of all the reasons Rafael recommended I make the
> > v1->v3->v4 changes, and now that they cause problems, I'm inclined to
> > question them again.
> >
> > Rafael, do you have any thoughts?
>
> Yeah.
>
> Move back pm_runtime_set_active(&dev->dev) back to pm_runtime_init()

Or rather leave it there to be precise, but I think you know what I mean. :-)

> because that would prevent the parent from suspending and keep
> pm_runtime_enable() here because that would prevent the device itself
> from suspending between pm_runtime_init() and this place.
>
> And I would add comments in both places.
Re: [PATCH v4] PCI/PM: Prevent runtime suspend before devices are fully initialized
Posted by Marek Szyprowski 2 weeks, 6 days ago
On 18.01.2026 12:59, Rafael J. Wysocki wrote:
> On Sun, Jan 18, 2026 at 12:53 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>> On Sat, Jan 17, 2026 at 2:19 AM Brian Norris <briannorris@chromium.org> wrote:
>>> On Thu, Jan 15, 2026 at 12:14:49PM +0100, Marek Szyprowski wrote:
>>>> On 14.01.2026 21:10, Brian Norris wrote:
>>>>> On Wed, Jan 14, 2026 at 10:46:41AM +0100, Marek Szyprowski wrote:
>>>>>> On 06.01.2026 23:27, Bjorn Helgaas wrote:
>>>>>>> On Thu, Oct 23, 2025 at 02:09:01PM -0700, Brian Norris wrote:
>>>>>>>> Today, it's possible for a PCI device to be created and
>>>>>>>> runtime-suspended before it is fully initialized. When that happens, the
>>>>>>>> device will remain in D0, but the suspend process may save an
>>>>>>>> intermediate version of that device's state -- for example, without
>>>>>>>> appropriate BAR configuration. When the device later resumes, we'll
>>>>>>>> restore invalid PCI state and the device may not function.
>>>>>>>>
>>>>>>>> Prevent runtime suspend for PCI devices by deferring pm_runtime_enable()
>>>>>>>> until we've fully initialized the device.
>>>>> ...
>>>>>> This patch landed recently in linux-next as commit c796513dc54e
>>>>>> ("PCI/PM: Prevent runtime suspend until devices are fully initialized").
>>>>>> In my tests I found that it sometimes causes the "pci 0000:01:00.0:
>>>>>> runtime PM trying to activate child device 0000:01:00.0 but parent
>>>>>> (0000:00:00.0) is not active" warning on Qualcomm Robotics RB5 board
>>>>>> (arch/arm64/boot/dts/qcom/qrb5165-rb5.dts). This in turn causes a
>>>>>> lockdep warning about console lock, but this is just a consequence of
>>>>>> the runtime pm warning. Reverting $subject patch on top of current
>>>>>> linux-next hides this warning.
>>>>>>
>>>>>> Here is a kernel log:
>>>>>>
>>>>>> pci 0000:01:00.0: [17cb:1101] type 00 class 0xff0000 PCIe Endpoint
>>>>>> pci 0000:01:00.0: BAR 0 [mem 0x00000000-0x000fffff 64bit]
>>>>>> pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
>>>>>> pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0
>>>>>> GT/s PCIe x1 link at 0000:00:00.0 (capable of 7.876 Gb/s with 8.0 GT/s
>>>>>> PCIe x1 link)
>>>>>> pci 0000:01:00.0: Adding to iommu group 13
>>>>>> pci 0000:01:00.0: ASPM: default states L0s L1
>>>>>> pcieport 0000:00:00.0: bridge window [mem 0x60400000-0x604fffff]: assigned
>>>>>> pci 0000:01:00.0: BAR 0 [mem 0x60400000-0x604fffff 64bit]: assigned
>>>>>> pci 0000:01:00.0: runtime PM trying to activate child device
>>>>>> 0000:01:00.0 but parent (0000:00:00.0) is not active
>>>>> Thanks for the report. I'll try to look at reproducing this, or at least
>>>>> getting a better mental model of exactly why this might fail (or,
>>>>> "warn") this way. But if you have the time and desire to try things out
>>>>> for me, can you give v1 a try?
>>>>>
>>>>> https://lore.kernel.org/all/20251016155335.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
>>>>>
>>>>> I'm pretty sure it would not invoke the same problem.
>>>> Right, this one works fine.
>>>>
>>>>> I also suspect v3
>>>>> might not, but I'm less sure:
>>>>>
>>>>> https://lore.kernel.org/all/20251022141434.v3.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
>>>> This one too, at least I was not able to reproduce any fail.
>>> Thanks for testing. I'm still not sure exactly how to reproduce your
>>> failure, but it seems as if the root port is being allowed to suspend
>>> before the endpoint is added to the system, and it remains so while the
>>> endpoint is about to probe. device_initial_probe() will be OK with
>>> respect to PM, since it will wake up the port if needed. But this
>>> particular code is not OK, since it doesn't ensure the parent device is
>>> active while preparing the endpoint power state.
>>>
>>> I suppose one way to "solve" that is (untested):
>>>
>>> --- a/drivers/pci/bus.c
>>> +++ b/drivers/pci/bus.c
>>> @@ -380,8 +380,12 @@ void pci_bus_add_device(struct pci_dev *dev)
>>>                  put_device(&pdev->dev);
>>>          }
>>>
>>> +       if (dev->dev.parent)
>>> +               pm_runtime_get_sync(dev->dev.parent);
>>>          pm_runtime_set_active(&dev->dev);
>>>          pm_runtime_enable(&dev->dev);
>>> +       if (dev->dev.parent)
>>> +               pm_runtime_put(dev->dev.parent);
>>>
>>>          if (!dn || of_device_is_available(dn))
>>>                  pci_dev_allow_binding(dev);
>>>
>>> Personally, I'm more inclined to go back to v1, since it prepares the
>>> runtime PM status when the device is first discovered. That way, its
>>> ancestors are still active, avoiding these sorts of problems. I'm
>>> frankly not sure of all the reasons Rafael recommended I make the
>>> v1->v3->v4 changes, and now that they cause problems, I'm inclined to
>>> question them again.
>>>
>>> Rafael, do you have any thoughts?
>> Yeah.
>>
>> Move back pm_runtime_set_active(&dev->dev) back to pm_runtime_init()
> Or rather leave it there to be precise, but I think you know what I mean. :-)
>
>> because that would prevent the parent from suspending and keep
>> pm_runtime_enable() here because that would prevent the device itself
>> from suspending between pm_runtime_init() and this place.
>>
>> And I would add comments in both places.

Confirmed, the following change (compared to $subject patch) fixed my issue:

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 3ef60c2fbd89..7e2b7e452d51 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -381,7 +381,6 @@ void pci_bus_add_device(struct pci_dev *dev)
         }

         pm_runtime_set_active(&dev->dev);
-       pm_runtime_enable(&dev->dev);

         if (!dn || of_device_is_available(dn))
                 pci_dev_allow_binding(dev);
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index fae5a683cf87..22b897416025 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3201,6 +3201,7 @@ void pci_pm_init(struct pci_dev *dev)
  poweron:
         pci_pm_power_up_and_verify_state(dev);
         pm_runtime_forbid(&dev->dev);
+       pm_runtime_enable(&dev->dev);
  }

  static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop)


Feel free to add:

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Re: [PATCH v4] PCI/PM: Prevent runtime suspend before devices are fully initialized
Posted by Rafael J. Wysocki 2 weeks, 6 days ago
On Mon, Jan 19, 2026 at 11:01 AM Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
>
> On 18.01.2026 12:59, Rafael J. Wysocki wrote:
> > On Sun, Jan 18, 2026 at 12:53 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >> On Sat, Jan 17, 2026 at 2:19 AM Brian Norris <briannorris@chromium.org> wrote:
> >>> On Thu, Jan 15, 2026 at 12:14:49PM +0100, Marek Szyprowski wrote:
> >>>> On 14.01.2026 21:10, Brian Norris wrote:
> >>>>> On Wed, Jan 14, 2026 at 10:46:41AM +0100, Marek Szyprowski wrote:
> >>>>>> On 06.01.2026 23:27, Bjorn Helgaas wrote:
> >>>>>>> On Thu, Oct 23, 2025 at 02:09:01PM -0700, Brian Norris wrote:
> >>>>>>>> Today, it's possible for a PCI device to be created and
> >>>>>>>> runtime-suspended before it is fully initialized. When that happens, the
> >>>>>>>> device will remain in D0, but the suspend process may save an
> >>>>>>>> intermediate version of that device's state -- for example, without
> >>>>>>>> appropriate BAR configuration. When the device later resumes, we'll
> >>>>>>>> restore invalid PCI state and the device may not function.
> >>>>>>>>
> >>>>>>>> Prevent runtime suspend for PCI devices by deferring pm_runtime_enable()
> >>>>>>>> until we've fully initialized the device.
> >>>>> ...
> >>>>>> This patch landed recently in linux-next as commit c796513dc54e
> >>>>>> ("PCI/PM: Prevent runtime suspend until devices are fully initialized").
> >>>>>> In my tests I found that it sometimes causes the "pci 0000:01:00.0:
> >>>>>> runtime PM trying to activate child device 0000:01:00.0 but parent
> >>>>>> (0000:00:00.0) is not active" warning on Qualcomm Robotics RB5 board
> >>>>>> (arch/arm64/boot/dts/qcom/qrb5165-rb5.dts). This in turn causes a
> >>>>>> lockdep warning about console lock, but this is just a consequence of
> >>>>>> the runtime pm warning. Reverting $subject patch on top of current
> >>>>>> linux-next hides this warning.
> >>>>>>
> >>>>>> Here is a kernel log:
> >>>>>>
> >>>>>> pci 0000:01:00.0: [17cb:1101] type 00 class 0xff0000 PCIe Endpoint
> >>>>>> pci 0000:01:00.0: BAR 0 [mem 0x00000000-0x000fffff 64bit]
> >>>>>> pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
> >>>>>> pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0
> >>>>>> GT/s PCIe x1 link at 0000:00:00.0 (capable of 7.876 Gb/s with 8.0 GT/s
> >>>>>> PCIe x1 link)
> >>>>>> pci 0000:01:00.0: Adding to iommu group 13
> >>>>>> pci 0000:01:00.0: ASPM: default states L0s L1
> >>>>>> pcieport 0000:00:00.0: bridge window [mem 0x60400000-0x604fffff]: assigned
> >>>>>> pci 0000:01:00.0: BAR 0 [mem 0x60400000-0x604fffff 64bit]: assigned
> >>>>>> pci 0000:01:00.0: runtime PM trying to activate child device
> >>>>>> 0000:01:00.0 but parent (0000:00:00.0) is not active
> >>>>> Thanks for the report. I'll try to look at reproducing this, or at least
> >>>>> getting a better mental model of exactly why this might fail (or,
> >>>>> "warn") this way. But if you have the time and desire to try things out
> >>>>> for me, can you give v1 a try?
> >>>>>
> >>>>> https://lore.kernel.org/all/20251016155335.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
> >>>>>
> >>>>> I'm pretty sure it would not invoke the same problem.
> >>>> Right, this one works fine.
> >>>>
> >>>>> I also suspect v3
> >>>>> might not, but I'm less sure:
> >>>>>
> >>>>> https://lore.kernel.org/all/20251022141434.v3.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
> >>>> This one too, at least I was not able to reproduce any fail.
> >>> Thanks for testing. I'm still not sure exactly how to reproduce your
> >>> failure, but it seems as if the root port is being allowed to suspend
> >>> before the endpoint is added to the system, and it remains so while the
> >>> endpoint is about to probe. device_initial_probe() will be OK with
> >>> respect to PM, since it will wake up the port if needed. But this
> >>> particular code is not OK, since it doesn't ensure the parent device is
> >>> active while preparing the endpoint power state.
> >>>
> >>> I suppose one way to "solve" that is (untested):
> >>>
> >>> --- a/drivers/pci/bus.c
> >>> +++ b/drivers/pci/bus.c
> >>> @@ -380,8 +380,12 @@ void pci_bus_add_device(struct pci_dev *dev)
> >>>                  put_device(&pdev->dev);
> >>>          }
> >>>
> >>> +       if (dev->dev.parent)
> >>> +               pm_runtime_get_sync(dev->dev.parent);
> >>>          pm_runtime_set_active(&dev->dev);
> >>>          pm_runtime_enable(&dev->dev);
> >>> +       if (dev->dev.parent)
> >>> +               pm_runtime_put(dev->dev.parent);
> >>>
> >>>          if (!dn || of_device_is_available(dn))
> >>>                  pci_dev_allow_binding(dev);
> >>>
> >>> Personally, I'm more inclined to go back to v1, since it prepares the
> >>> runtime PM status when the device is first discovered. That way, its
> >>> ancestors are still active, avoiding these sorts of problems. I'm
> >>> frankly not sure of all the reasons Rafael recommended I make the
> >>> v1->v3->v4 changes, and now that they cause problems, I'm inclined to
> >>> question them again.
> >>>
> >>> Rafael, do you have any thoughts?
> >> Yeah.
> >>
> >> Move back pm_runtime_set_active(&dev->dev) back to pm_runtime_init()
> > Or rather leave it there to be precise, but I think you know what I mean. :-)
> >
> >> because that would prevent the parent from suspending and keep
> >> pm_runtime_enable() here because that would prevent the device itself
> >> from suspending between pm_runtime_init() and this place.
> >>
> >> And I would add comments in both places.
>
> Confirmed, the following change (compared to $subject patch) fixed my issue:
>
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 3ef60c2fbd89..7e2b7e452d51 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -381,7 +381,6 @@ void pci_bus_add_device(struct pci_dev *dev)
>          }
>
>          pm_runtime_set_active(&dev->dev);
> -       pm_runtime_enable(&dev->dev);

That works too, but it would defeat the purpose of the original
change, so I mean the other way around.

That is, leave the pm_runtime_enable() here and move the
pm_runtime_set_active() back to the other place.

>
>          if (!dn || of_device_is_available(dn))
>                  pci_dev_allow_binding(dev);
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index fae5a683cf87..22b897416025 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3201,6 +3201,7 @@ void pci_pm_init(struct pci_dev *dev)
>   poweron:
>          pci_pm_power_up_and_verify_state(dev);
>          pm_runtime_forbid(&dev->dev);
> +       pm_runtime_enable(&dev->dev);
>   }
>
>   static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop)
Re: [PATCH v4] PCI/PM: Prevent runtime suspend before devices are fully initialized
Posted by Marek Szyprowski 2 weeks, 6 days ago
On 19.01.2026 13:26, Rafael J. Wysocki wrote:
> On Mon, Jan 19, 2026 at 11:01 AM Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
>> On 18.01.2026 12:59, Rafael J. Wysocki wrote:
>>> On Sun, Jan 18, 2026 at 12:53 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>>>> On Sat, Jan 17, 2026 at 2:19 AM Brian Norris <briannorris@chromium.org> wrote:
>>>>> On Thu, Jan 15, 2026 at 12:14:49PM +0100, Marek Szyprowski wrote:
>>>>>> On 14.01.2026 21:10, Brian Norris wrote:
>>>>>>> On Wed, Jan 14, 2026 at 10:46:41AM +0100, Marek Szyprowski wrote:
>>>>>>>> On 06.01.2026 23:27, Bjorn Helgaas wrote:
>>>>>>>>> On Thu, Oct 23, 2025 at 02:09:01PM -0700, Brian Norris wrote:
>>>>>>>>>> Today, it's possible for a PCI device to be created and
>>>>>>>>>> runtime-suspended before it is fully initialized. When that happens, the
>>>>>>>>>> device will remain in D0, but the suspend process may save an
>>>>>>>>>> intermediate version of that device's state -- for example, without
>>>>>>>>>> appropriate BAR configuration. When the device later resumes, we'll
>>>>>>>>>> restore invalid PCI state and the device may not function.
>>>>>>>>>>
>>>>>>>>>> Prevent runtime suspend for PCI devices by deferring pm_runtime_enable()
>>>>>>>>>> until we've fully initialized the device.
>>>>>>> ...
>>>>>>>> This patch landed recently in linux-next as commit c796513dc54e
>>>>>>>> ("PCI/PM: Prevent runtime suspend until devices are fully initialized").
>>>>>>>> In my tests I found that it sometimes causes the "pci 0000:01:00.0:
>>>>>>>> runtime PM trying to activate child device 0000:01:00.0 but parent
>>>>>>>> (0000:00:00.0) is not active" warning on Qualcomm Robotics RB5 board
>>>>>>>> (arch/arm64/boot/dts/qcom/qrb5165-rb5.dts). This in turn causes a
>>>>>>>> lockdep warning about console lock, but this is just a consequence of
>>>>>>>> the runtime pm warning. Reverting $subject patch on top of current
>>>>>>>> linux-next hides this warning.
>>>>>>>>
>>>>>>>> Here is a kernel log:
>>>>>>>>
>>>>>>>> pci 0000:01:00.0: [17cb:1101] type 00 class 0xff0000 PCIe Endpoint
>>>>>>>> pci 0000:01:00.0: BAR 0 [mem 0x00000000-0x000fffff 64bit]
>>>>>>>> pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
>>>>>>>> pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0
>>>>>>>> GT/s PCIe x1 link at 0000:00:00.0 (capable of 7.876 Gb/s with 8.0 GT/s
>>>>>>>> PCIe x1 link)
>>>>>>>> pci 0000:01:00.0: Adding to iommu group 13
>>>>>>>> pci 0000:01:00.0: ASPM: default states L0s L1
>>>>>>>> pcieport 0000:00:00.0: bridge window [mem 0x60400000-0x604fffff]: assigned
>>>>>>>> pci 0000:01:00.0: BAR 0 [mem 0x60400000-0x604fffff 64bit]: assigned
>>>>>>>> pci 0000:01:00.0: runtime PM trying to activate child device
>>>>>>>> 0000:01:00.0 but parent (0000:00:00.0) is not active
>>>>>>> Thanks for the report. I'll try to look at reproducing this, or at least
>>>>>>> getting a better mental model of exactly why this might fail (or,
>>>>>>> "warn") this way. But if you have the time and desire to try things out
>>>>>>> for me, can you give v1 a try?
>>>>>>>
>>>>>>> https://lore.kernel.org/all/20251016155335.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
>>>>>>>
>>>>>>> I'm pretty sure it would not invoke the same problem.
>>>>>> Right, this one works fine.
>>>>>>
>>>>>>> I also suspect v3
>>>>>>> might not, but I'm less sure:
>>>>>>>
>>>>>>> https://lore.kernel.org/all/20251022141434.v3.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
>>>>>> This one too, at least I was not able to reproduce any fail.
>>>>> Thanks for testing. I'm still not sure exactly how to reproduce your
>>>>> failure, but it seems as if the root port is being allowed to suspend
>>>>> before the endpoint is added to the system, and it remains so while the
>>>>> endpoint is about to probe. device_initial_probe() will be OK with
>>>>> respect to PM, since it will wake up the port if needed. But this
>>>>> particular code is not OK, since it doesn't ensure the parent device is
>>>>> active while preparing the endpoint power state.
>>>>>
>>>>> I suppose one way to "solve" that is (untested):
>>>>>
>>>>> --- a/drivers/pci/bus.c
>>>>> +++ b/drivers/pci/bus.c
>>>>> @@ -380,8 +380,12 @@ void pci_bus_add_device(struct pci_dev *dev)
>>>>>                   put_device(&pdev->dev);
>>>>>           }
>>>>>
>>>>> +       if (dev->dev.parent)
>>>>> +               pm_runtime_get_sync(dev->dev.parent);
>>>>>           pm_runtime_set_active(&dev->dev);
>>>>>           pm_runtime_enable(&dev->dev);
>>>>> +       if (dev->dev.parent)
>>>>> +               pm_runtime_put(dev->dev.parent);
>>>>>
>>>>>           if (!dn || of_device_is_available(dn))
>>>>>                   pci_dev_allow_binding(dev);
>>>>>
>>>>> Personally, I'm more inclined to go back to v1, since it prepares the
>>>>> runtime PM status when the device is first discovered. That way, its
>>>>> ancestors are still active, avoiding these sorts of problems. I'm
>>>>> frankly not sure of all the reasons Rafael recommended I make the
>>>>> v1->v3->v4 changes, and now that they cause problems, I'm inclined to
>>>>> question them again.
>>>>>
>>>>> Rafael, do you have any thoughts?
>>>> Yeah.
>>>>
>>>> Move back pm_runtime_set_active(&dev->dev) back to pm_runtime_init()
>>> Or rather leave it there to be precise, but I think you know what I mean. :-)
>>>
>>>> because that would prevent the parent from suspending and keep
>>>> pm_runtime_enable() here because that would prevent the device itself
>>>> from suspending between pm_runtime_init() and this place.
>>>>
>>>> And I would add comments in both places.
>> Confirmed, the following change (compared to $subject patch) fixed my issue:
>>
>> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
>> index 3ef60c2fbd89..7e2b7e452d51 100644
>> --- a/drivers/pci/bus.c
>> +++ b/drivers/pci/bus.c
>> @@ -381,7 +381,6 @@ void pci_bus_add_device(struct pci_dev *dev)
>>           }
>>
>>           pm_runtime_set_active(&dev->dev);
>> -       pm_runtime_enable(&dev->dev);
> That works too, but it would defeat the purpose of the original
> change, so I mean the other way around.
>
> That is, leave the pm_runtime_enable() here and move the
> pm_runtime_set_active() back to the other place.

Okay, I mixed that. This way it works too and fixes the observed issue.

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Re: [PATCH v4] PCI/PM: Prevent runtime suspend before devices are fully initialized
Posted by Rafael J. Wysocki 2 weeks, 6 days ago
On Mon, Jan 19, 2026 at 2:13 PM Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
>
> On 19.01.2026 13:26, Rafael J. Wysocki wrote:
> > On Mon, Jan 19, 2026 at 11:01 AM Marek Szyprowski
> > <m.szyprowski@samsung.com> wrote:
> >> On 18.01.2026 12:59, Rafael J. Wysocki wrote:
> >>> On Sun, Jan 18, 2026 at 12:53 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >>>> On Sat, Jan 17, 2026 at 2:19 AM Brian Norris <briannorris@chromium.org> wrote:
> >>>>> On Thu, Jan 15, 2026 at 12:14:49PM +0100, Marek Szyprowski wrote:
> >>>>>> On 14.01.2026 21:10, Brian Norris wrote:
> >>>>>>> On Wed, Jan 14, 2026 at 10:46:41AM +0100, Marek Szyprowski wrote:
> >>>>>>>> On 06.01.2026 23:27, Bjorn Helgaas wrote:
> >>>>>>>>> On Thu, Oct 23, 2025 at 02:09:01PM -0700, Brian Norris wrote:
> >>>>>>>>>> Today, it's possible for a PCI device to be created and
> >>>>>>>>>> runtime-suspended before it is fully initialized. When that happens, the
> >>>>>>>>>> device will remain in D0, but the suspend process may save an
> >>>>>>>>>> intermediate version of that device's state -- for example, without
> >>>>>>>>>> appropriate BAR configuration. When the device later resumes, we'll
> >>>>>>>>>> restore invalid PCI state and the device may not function.
> >>>>>>>>>>
> >>>>>>>>>> Prevent runtime suspend for PCI devices by deferring pm_runtime_enable()
> >>>>>>>>>> until we've fully initialized the device.
> >>>>>>> ...
> >>>>>>>> This patch landed recently in linux-next as commit c796513dc54e
> >>>>>>>> ("PCI/PM: Prevent runtime suspend until devices are fully initialized").
> >>>>>>>> In my tests I found that it sometimes causes the "pci 0000:01:00.0:
> >>>>>>>> runtime PM trying to activate child device 0000:01:00.0 but parent
> >>>>>>>> (0000:00:00.0) is not active" warning on Qualcomm Robotics RB5 board
> >>>>>>>> (arch/arm64/boot/dts/qcom/qrb5165-rb5.dts). This in turn causes a
> >>>>>>>> lockdep warning about console lock, but this is just a consequence of
> >>>>>>>> the runtime pm warning. Reverting $subject patch on top of current
> >>>>>>>> linux-next hides this warning.
> >>>>>>>>
> >>>>>>>> Here is a kernel log:
> >>>>>>>>
> >>>>>>>> pci 0000:01:00.0: [17cb:1101] type 00 class 0xff0000 PCIe Endpoint
> >>>>>>>> pci 0000:01:00.0: BAR 0 [mem 0x00000000-0x000fffff 64bit]
> >>>>>>>> pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
> >>>>>>>> pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0
> >>>>>>>> GT/s PCIe x1 link at 0000:00:00.0 (capable of 7.876 Gb/s with 8.0 GT/s
> >>>>>>>> PCIe x1 link)
> >>>>>>>> pci 0000:01:00.0: Adding to iommu group 13
> >>>>>>>> pci 0000:01:00.0: ASPM: default states L0s L1
> >>>>>>>> pcieport 0000:00:00.0: bridge window [mem 0x60400000-0x604fffff]: assigned
> >>>>>>>> pci 0000:01:00.0: BAR 0 [mem 0x60400000-0x604fffff 64bit]: assigned
> >>>>>>>> pci 0000:01:00.0: runtime PM trying to activate child device
> >>>>>>>> 0000:01:00.0 but parent (0000:00:00.0) is not active
> >>>>>>> Thanks for the report. I'll try to look at reproducing this, or at least
> >>>>>>> getting a better mental model of exactly why this might fail (or,
> >>>>>>> "warn") this way. But if you have the time and desire to try things out
> >>>>>>> for me, can you give v1 a try?
> >>>>>>>
> >>>>>>> https://lore.kernel.org/all/20251016155335.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
> >>>>>>>
> >>>>>>> I'm pretty sure it would not invoke the same problem.
> >>>>>> Right, this one works fine.
> >>>>>>
> >>>>>>> I also suspect v3
> >>>>>>> might not, but I'm less sure:
> >>>>>>>
> >>>>>>> https://lore.kernel.org/all/20251022141434.v3.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
> >>>>>> This one too, at least I was not able to reproduce any fail.
> >>>>> Thanks for testing. I'm still not sure exactly how to reproduce your
> >>>>> failure, but it seems as if the root port is being allowed to suspend
> >>>>> before the endpoint is added to the system, and it remains so while the
> >>>>> endpoint is about to probe. device_initial_probe() will be OK with
> >>>>> respect to PM, since it will wake up the port if needed. But this
> >>>>> particular code is not OK, since it doesn't ensure the parent device is
> >>>>> active while preparing the endpoint power state.
> >>>>>
> >>>>> I suppose one way to "solve" that is (untested):
> >>>>>
> >>>>> --- a/drivers/pci/bus.c
> >>>>> +++ b/drivers/pci/bus.c
> >>>>> @@ -380,8 +380,12 @@ void pci_bus_add_device(struct pci_dev *dev)
> >>>>>                   put_device(&pdev->dev);
> >>>>>           }
> >>>>>
> >>>>> +       if (dev->dev.parent)
> >>>>> +               pm_runtime_get_sync(dev->dev.parent);
> >>>>>           pm_runtime_set_active(&dev->dev);
> >>>>>           pm_runtime_enable(&dev->dev);
> >>>>> +       if (dev->dev.parent)
> >>>>> +               pm_runtime_put(dev->dev.parent);
> >>>>>
> >>>>>           if (!dn || of_device_is_available(dn))
> >>>>>                   pci_dev_allow_binding(dev);
> >>>>>
> >>>>> Personally, I'm more inclined to go back to v1, since it prepares the
> >>>>> runtime PM status when the device is first discovered. That way, its
> >>>>> ancestors are still active, avoiding these sorts of problems. I'm
> >>>>> frankly not sure of all the reasons Rafael recommended I make the
> >>>>> v1->v3->v4 changes, and now that they cause problems, I'm inclined to
> >>>>> question them again.
> >>>>>
> >>>>> Rafael, do you have any thoughts?
> >>>> Yeah.
> >>>>
> >>>> Move back pm_runtime_set_active(&dev->dev) back to pm_runtime_init()
> >>> Or rather leave it there to be precise, but I think you know what I mean. :-)
> >>>
> >>>> because that would prevent the parent from suspending and keep
> >>>> pm_runtime_enable() here because that would prevent the device itself
> >>>> from suspending between pm_runtime_init() and this place.
> >>>>
> >>>> And I would add comments in both places.
> >> Confirmed, the following change (compared to $subject patch) fixed my issue:
> >>
> >> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> >> index 3ef60c2fbd89..7e2b7e452d51 100644
> >> --- a/drivers/pci/bus.c
> >> +++ b/drivers/pci/bus.c
> >> @@ -381,7 +381,6 @@ void pci_bus_add_device(struct pci_dev *dev)
> >>           }
> >>
> >>           pm_runtime_set_active(&dev->dev);
> >> -       pm_runtime_enable(&dev->dev);
> > That works too, but it would defeat the purpose of the original
> > change, so I mean the other way around.
> >
> > That is, leave the pm_runtime_enable() here and move the
> > pm_runtime_set_active() back to the other place.
>
> Okay, I mixed that. This way it works too and fixes the observed issue.
>
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

Cool, thanks for verifying!
Re: [PATCH v4] PCI/PM: Prevent runtime suspend before devices are fully initialized
Posted by Bjorn Helgaas 3 weeks, 4 days ago
On Wed, Jan 14, 2026 at 10:46:41AM +0100, Marek Szyprowski wrote:
> On 06.01.2026 23:27, Bjorn Helgaas wrote:
> > On Thu, Oct 23, 2025 at 02:09:01PM -0700, Brian Norris wrote:
> >> Today, it's possible for a PCI device to be created and
> >> runtime-suspended before it is fully initialized. When that happens, the
> >> device will remain in D0, but the suspend process may save an
> >> intermediate version of that device's state -- for example, without
> >> appropriate BAR configuration. When the device later resumes, we'll
> >> restore invalid PCI state and the device may not function.
> ...

> >      Link: https://patch.msgid.link/20251023140901.v4.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid
> 
> This patch landed recently in linux-next as commit c796513dc54e 
> ("PCI/PM: Prevent runtime suspend until devices are fully initialized"). 
> In my tests I found that it sometimes causes the "pci 0000:01:00.0: 
> runtime PM trying to activate child device 0000:01:00.0 but parent 
> (0000:00:00.0) is not active" warning on Qualcomm Robotics RB5 board 
> (arch/arm64/boot/dts/qcom/qrb5165-rb5.dts). This in turn causes a 
> lockdep warning about console lock, but this is just a consequence of 
> the runtime pm warning. Reverting $subject patch on top of current 
> linux-next hides this warning.

I moved this patch from pci/pm to pci/pend to remove it from
linux-next while we figure this out.  Thanks for the report and
debugging!

Bjorn
Re: [PATCH v4] PCI/PM: Prevent runtime suspend before devices are fully initialized
Posted by Brian Norris 2 months, 2 weeks ago
Hi Bjorn,

On Thu, Oct 23, 2025 at 02:09:01PM -0700, Brian Norris wrote:
> Today, it's possible for a PCI device to be created and
> runtime-suspended before it is fully initialized. When that happens, the
> device will remain in D0, but the suspend process may save an
> intermediate version of that device's state -- for example, without
> appropriate BAR configuration. When the device later resumes, we'll
> restore invalid PCI state and the device may not function.
> 
> Prevent runtime suspend for PCI devices by deferring pm_runtime_enable()
> until we've fully initialized the device.
[...] 
> Link: https://lore.kernel.org/all/20251016155335.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
> Signed-off-by: Brian Norris <briannorris@chromium.org>
> Cc: <stable@vger.kernel.org>
> Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
> ---
> 
> Changes in v4:
>  * Move pm_runtime_set_active() too
> 
> Changes in v3:
>  * Add Link to initial discussion
>  * Add Rafael's Reviewed-by
>  * Add lengthier footnotes about forbid vs allow vs sysfs
> 
> Changes in v2:
>  * Update CC list
>  * Rework problem description
>  * Update solution: defer pm_runtime_enable(), instead of trying to
>    get()/put()
> 
>  drivers/pci/bus.c | 4 ++++
>  drivers/pci/pci.c | 2 --
>  2 files changed, 4 insertions(+), 2 deletions(-)

I'm wondering what the status of this patch is, as the next merge window
is approaching. It fixes a critical bug for me, and it has had plenty of
review.

Thanks,
Brian