[PATCH v3] PCI/PM: Prevent runtime suspend before devices are fully initialized

Brian Norris posted 1 patch 3 months, 2 weeks ago
There is a newer version of this series
drivers/pci/bus.c | 3 +++
drivers/pci/pci.c | 1 -
2 files changed, 3 insertions(+), 1 deletion(-)
[PATCH v3] PCI/PM: Prevent runtime suspend before devices are fully initialized
Posted by Brian Norris 3 months, 2 weeks ago
Today, it's possible for a PCI device to be created and
runtime-suspended before it is fully initialized. When that happens, the
device will remain in D0, but the suspend process may save an
intermediate version of that device's state -- for example, without
appropriate BAR configuration. When the device later resumes, we'll
restore invalid PCI state and the device may not function.

Prevent runtime suspend for PCI devices by deferring pm_runtime_enable()
until we've fully initialized the device.

More details on how exactly this may occur:

1. PCI device is created by pci_scan_slot() or similar
2. As part of pci_scan_slot(), pci_pm_init() enables runtime PM; the
   device starts "active" and we initially prevent (pm_runtime_forbid())
   suspend -- but see [*] footnote
3. Underlying 'struct device' is added to the system (device_add());
   runtime PM can now be configured by user space
4. PCI device receives BAR configuration
   (pci_assign_unassigned_bus_resources(), etc.)
5. PCI device is added to the system in pci_bus_add_device()

The device may potentially suspend between #3 and #4.

[*] By default, pm_runtime_forbid() prevents suspending a device; but by
design [**], this can be overridden by user space policy via

  echo auto > /sys/bus/pci/devices/.../power/control

Thus, the above #3/#4 sequence is racy with user space (udev or
similar).

Notably, many PCI devices are enumerated at subsys_initcall time and so
will not race with user space. However, there are several scenarios
where PCI devices are created later on, such as with hotplug or when
drivers (pwrctrl or controller drivers) are built as modules.

[**] The relationship between pm_runtime_forbid(), pm_runtime_allow(),
/sys/.../power/control, and the runtime PM usage counter can be subtle.
It appears that the intention of pm_runtime_forbid() /
pm_runtime_allow() is twofold:

1. Allow the user to disable runtime_pm (force device to always be
   powered on) through sysfs.
2. Allow the driver to start with runtime_pm disabled (device forced
   on) and user space could later enable runtime_pm.

This conclusion comes from reading `Documentation/power/runtime_pm.rst`,
specifically the section starting "The user space can effectively
disallow".

This means that while pm_runtime_forbid() does technically increase the
runtime PM usage counter, this usage counter is not a guarantee of
functional correctness, because sysfs can decrease that count again.

Link: https://lore.kernel.org/all/20251016155335.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
Signed-off-by: Brian Norris <briannorris@chromium.org>
Cc: <stable@vger.kernel.org>
Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
---

Changes in v3:
 * Add Link to initial discussion
 * Add Rafael's Reviewed-by
 * Add lengthier footnotes about forbid vs allow vs sysfs

Changes in v2:
 * Update CC list
 * Rework problem description
 * Update solution: defer pm_runtime_enable(), instead of trying to
   get()/put()

 drivers/pci/bus.c | 3 +++
 drivers/pci/pci.c | 1 -
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index f26aec6ff588..fc66b6cb3a54 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -14,6 +14,7 @@
 #include <linux/of.h>
 #include <linux/of_platform.h>
 #include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
 #include <linux/proc_fs.h>
 #include <linux/slab.h>
 
@@ -375,6 +376,8 @@ void pci_bus_add_device(struct pci_dev *dev)
 		put_device(&pdev->dev);
 	}
 
+	pm_runtime_enable(&dev->dev);
+
 	if (!dn || of_device_is_available(dn))
 		pci_dev_allow_binding(dev);
 
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b14dd064006c..f792164fa297 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3226,7 +3226,6 @@ void pci_pm_init(struct pci_dev *dev)
 	pci_pm_power_up_and_verify_state(dev);
 	pm_runtime_forbid(&dev->dev);
 	pm_runtime_set_active(&dev->dev);
-	pm_runtime_enable(&dev->dev);
 }
 
 static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop)
-- 
2.51.1.814.gb8fa24458f-goog
Re: [PATCH v3] PCI/PM: Prevent runtime suspend before devices are fully initialized
Posted by Rafael J. Wysocki 3 months, 2 weeks ago
On Wed, Oct 22, 2025 at 11:15 PM Brian Norris <briannorris@chromium.org> wrote:
>
> Today, it's possible for a PCI device to be created and
> runtime-suspended before it is fully initialized. When that happens, the
> device will remain in D0, but the suspend process may save an
> intermediate version of that device's state -- for example, without
> appropriate BAR configuration. When the device later resumes, we'll
> restore invalid PCI state and the device may not function.
>
> Prevent runtime suspend for PCI devices by deferring pm_runtime_enable()
> until we've fully initialized the device.
>
> More details on how exactly this may occur:
>
> 1. PCI device is created by pci_scan_slot() or similar
> 2. As part of pci_scan_slot(), pci_pm_init() enables runtime PM; the
>    device starts "active" and we initially prevent (pm_runtime_forbid())
>    suspend -- but see [*] footnote
> 3. Underlying 'struct device' is added to the system (device_add());
>    runtime PM can now be configured by user space
> 4. PCI device receives BAR configuration
>    (pci_assign_unassigned_bus_resources(), etc.)
> 5. PCI device is added to the system in pci_bus_add_device()
>
> The device may potentially suspend between #3 and #4.
>
> [*] By default, pm_runtime_forbid() prevents suspending a device; but by
> design [**], this can be overridden by user space policy via
>
>   echo auto > /sys/bus/pci/devices/.../power/control
>
> Thus, the above #3/#4 sequence is racy with user space (udev or
> similar).
>
> Notably, many PCI devices are enumerated at subsys_initcall time and so
> will not race with user space. However, there are several scenarios
> where PCI devices are created later on, such as with hotplug or when
> drivers (pwrctrl or controller drivers) are built as modules.
>
> [**] The relationship between pm_runtime_forbid(), pm_runtime_allow(),
> /sys/.../power/control, and the runtime PM usage counter can be subtle.
> It appears that the intention of pm_runtime_forbid() /
> pm_runtime_allow() is twofold:
>
> 1. Allow the user to disable runtime_pm (force device to always be
>    powered on) through sysfs.
> 2. Allow the driver to start with runtime_pm disabled (device forced
>    on) and user space could later enable runtime_pm.
>
> This conclusion comes from reading `Documentation/power/runtime_pm.rst`,
> specifically the section starting "The user space can effectively
> disallow".
>
> This means that while pm_runtime_forbid() does technically increase the
> runtime PM usage counter, this usage counter is not a guarantee of
> functional correctness, because sysfs can decrease that count again.
>
> Link: https://lore.kernel.org/all/20251016155335.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
> Signed-off-by: Brian Norris <briannorris@chromium.org>
> Cc: <stable@vger.kernel.org>
> Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
> ---
>
> Changes in v3:
>  * Add Link to initial discussion
>  * Add Rafael's Reviewed-by
>  * Add lengthier footnotes about forbid vs allow vs sysfs
>
> Changes in v2:
>  * Update CC list
>  * Rework problem description
>  * Update solution: defer pm_runtime_enable(), instead of trying to
>    get()/put()
>
>  drivers/pci/bus.c | 3 +++
>  drivers/pci/pci.c | 1 -
>  2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index f26aec6ff588..fc66b6cb3a54 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -14,6 +14,7 @@
>  #include <linux/of.h>
>  #include <linux/of_platform.h>
>  #include <linux/platform_device.h>
> +#include <linux/pm_runtime.h>
>  #include <linux/proc_fs.h>
>  #include <linux/slab.h>
>
> @@ -375,6 +376,8 @@ void pci_bus_add_device(struct pci_dev *dev)
>                 put_device(&pdev->dev);
>         }
>
> +       pm_runtime_enable(&dev->dev);
> +
>         if (!dn || of_device_is_available(dn))
>                 pci_dev_allow_binding(dev);
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index b14dd064006c..f792164fa297 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3226,7 +3226,6 @@ void pci_pm_init(struct pci_dev *dev)
>         pci_pm_power_up_and_verify_state(dev);
>         pm_runtime_forbid(&dev->dev);
>         pm_runtime_set_active(&dev->dev);

Actually, I think that the two statements above can be moved too.

The pm_runtime_forbid() call doesn't matter until runtime PM is
enabled and it is better to do pm_runtime_set_active() right before
enabling runtime PM.

> -       pm_runtime_enable(&dev->dev);
>  }
>
>  static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop)
> --
Re: [PATCH v3] PCI/PM: Prevent runtime suspend before devices are fully initialized
Posted by Brian Norris 3 months, 2 weeks ago
On Thu, Oct 23, 2025 at 11:41:38AM +0200, Rafael J. Wysocki wrote:
> On Wed, Oct 22, 2025 at 11:15 PM Brian Norris <briannorris@chromium.org> wrote:
> >
> > Today, it's possible for a PCI device to be created and
> > runtime-suspended before it is fully initialized. When that happens, the
> > device will remain in D0, but the suspend process may save an
> > intermediate version of that device's state -- for example, without
> > appropriate BAR configuration. When the device later resumes, we'll
> > restore invalid PCI state and the device may not function.
> >
> > Prevent runtime suspend for PCI devices by deferring pm_runtime_enable()
> > until we've fully initialized the device.
> >
> > More details on how exactly this may occur:
> >
> > 1. PCI device is created by pci_scan_slot() or similar
> > 2. As part of pci_scan_slot(), pci_pm_init() enables runtime PM; the
> >    device starts "active" and we initially prevent (pm_runtime_forbid())
> >    suspend -- but see [*] footnote
> > 3. Underlying 'struct device' is added to the system (device_add());
> >    runtime PM can now be configured by user space
> > 4. PCI device receives BAR configuration
> >    (pci_assign_unassigned_bus_resources(), etc.)
> > 5. PCI device is added to the system in pci_bus_add_device()
> >
> > The device may potentially suspend between #3 and #4.
> >
> > [*] By default, pm_runtime_forbid() prevents suspending a device; but by
> > design [**], this can be overridden by user space policy via
> >
> >   echo auto > /sys/bus/pci/devices/.../power/control
> >
> > Thus, the above #3/#4 sequence is racy with user space (udev or
> > similar).
> >
> > Notably, many PCI devices are enumerated at subsys_initcall time and so
> > will not race with user space. However, there are several scenarios
> > where PCI devices are created later on, such as with hotplug or when
> > drivers (pwrctrl or controller drivers) are built as modules.
> >
> > [**] The relationship between pm_runtime_forbid(), pm_runtime_allow(),
> > /sys/.../power/control, and the runtime PM usage counter can be subtle.
> > It appears that the intention of pm_runtime_forbid() /
> > pm_runtime_allow() is twofold:
> >
> > 1. Allow the user to disable runtime_pm (force device to always be
> >    powered on) through sysfs.
> > 2. Allow the driver to start with runtime_pm disabled (device forced
> >    on) and user space could later enable runtime_pm.
> >
> > This conclusion comes from reading `Documentation/power/runtime_pm.rst`,
> > specifically the section starting "The user space can effectively
> > disallow".
> >
> > This means that while pm_runtime_forbid() does technically increase the
> > runtime PM usage counter, this usage counter is not a guarantee of
> > functional correctness, because sysfs can decrease that count again.
> >
> > Link: https://lore.kernel.org/all/20251016155335.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
> > Signed-off-by: Brian Norris <briannorris@chromium.org>
> > Cc: <stable@vger.kernel.org>
> > Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
> > ---
> >
> > Changes in v3:
> >  * Add Link to initial discussion
> >  * Add Rafael's Reviewed-by
> >  * Add lengthier footnotes about forbid vs allow vs sysfs
> >
> > Changes in v2:
> >  * Update CC list
> >  * Rework problem description
> >  * Update solution: defer pm_runtime_enable(), instead of trying to
> >    get()/put()
> >
> >  drivers/pci/bus.c | 3 +++
> >  drivers/pci/pci.c | 1 -
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> > index f26aec6ff588..fc66b6cb3a54 100644
> > --- a/drivers/pci/bus.c
> > +++ b/drivers/pci/bus.c
> > @@ -14,6 +14,7 @@
> >  #include <linux/of.h>
> >  #include <linux/of_platform.h>
> >  #include <linux/platform_device.h>
> > +#include <linux/pm_runtime.h>
> >  #include <linux/proc_fs.h>
> >  #include <linux/slab.h>
> >
> > @@ -375,6 +376,8 @@ void pci_bus_add_device(struct pci_dev *dev)
> >                 put_device(&pdev->dev);
> >         }
> >
> > +       pm_runtime_enable(&dev->dev);
> > +
> >         if (!dn || of_device_is_available(dn))
> >                 pci_dev_allow_binding(dev);
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index b14dd064006c..f792164fa297 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -3226,7 +3226,6 @@ void pci_pm_init(struct pci_dev *dev)
> >         pci_pm_power_up_and_verify_state(dev);
> >         pm_runtime_forbid(&dev->dev);
> >         pm_runtime_set_active(&dev->dev);
> 
> Actually, I think that the two statements above can be moved too.
> 
> The pm_runtime_forbid() call doesn't matter until runtime PM is
> enabled and it is better to do pm_runtime_set_active() right before
> enabling runtime PM.

Ehh, sure, I can do that I suppose. That signs me up for fixing some
excessive Documentation/ that spells out exactly what goes on in
pci_pm_init() for some reason, but I can do that too :)

Brian

> > -       pm_runtime_enable(&dev->dev);
> >  }
> >
> >  static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop)
> > --
Re: [PATCH v3] PCI/PM: Prevent runtime suspend before devices are fully initialized
Posted by Brian Norris 3 months, 2 weeks ago
Hi again,

On Thu, Oct 23, 2025 at 11:34:57AM -0700, Brian Norris wrote:
> On Thu, Oct 23, 2025 at 11:41:38AM +0200, Rafael J. Wysocki wrote:
> > On Wed, Oct 22, 2025 at 11:15 PM Brian Norris <briannorris@chromium.org> wrote:
> > >
> > > Today, it's possible for a PCI device to be created and
> > > runtime-suspended before it is fully initialized. When that happens, the
> > > device will remain in D0, but the suspend process may save an
> > > intermediate version of that device's state -- for example, without
> > > appropriate BAR configuration. When the device later resumes, we'll
> > > restore invalid PCI state and the device may not function.
> > >
> > > Prevent runtime suspend for PCI devices by deferring pm_runtime_enable()
> > > until we've fully initialized the device.
> > >
> > > More details on how exactly this may occur:
> > >
> > > 1. PCI device is created by pci_scan_slot() or similar
> > > 2. As part of pci_scan_slot(), pci_pm_init() enables runtime PM; the
> > >    device starts "active" and we initially prevent (pm_runtime_forbid())
> > >    suspend -- but see [*] footnote
> > > 3. Underlying 'struct device' is added to the system (device_add());
> > >    runtime PM can now be configured by user space
> > > 4. PCI device receives BAR configuration
> > >    (pci_assign_unassigned_bus_resources(), etc.)
> > > 5. PCI device is added to the system in pci_bus_add_device()
> > >
> > > The device may potentially suspend between #3 and #4.
> > >
> > > [*] By default, pm_runtime_forbid() prevents suspending a device; but by
> > > design [**], this can be overridden by user space policy via
> > >
> > >   echo auto > /sys/bus/pci/devices/.../power/control
> > >
> > > Thus, the above #3/#4 sequence is racy with user space (udev or
> > > similar).
> > >
> > > Notably, many PCI devices are enumerated at subsys_initcall time and so
> > > will not race with user space. However, there are several scenarios
> > > where PCI devices are created later on, such as with hotplug or when
> > > drivers (pwrctrl or controller drivers) are built as modules.
> > >
> > > [**] The relationship between pm_runtime_forbid(), pm_runtime_allow(),
> > > /sys/.../power/control, and the runtime PM usage counter can be subtle.
> > > It appears that the intention of pm_runtime_forbid() /
> > > pm_runtime_allow() is twofold:
> > >
> > > 1. Allow the user to disable runtime_pm (force device to always be
> > >    powered on) through sysfs.
> > > 2. Allow the driver to start with runtime_pm disabled (device forced
> > >    on) and user space could later enable runtime_pm.
> > >
> > > This conclusion comes from reading `Documentation/power/runtime_pm.rst`,
> > > specifically the section starting "The user space can effectively
> > > disallow".
> > >
> > > This means that while pm_runtime_forbid() does technically increase the
> > > runtime PM usage counter, this usage counter is not a guarantee of
> > > functional correctness, because sysfs can decrease that count again.
> > >
> > > Link: https://lore.kernel.org/all/20251016155335.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
> > > Signed-off-by: Brian Norris <briannorris@chromium.org>
> > > Cc: <stable@vger.kernel.org>
> > > Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
> > > ---
> > >
> > > Changes in v3:
> > >  * Add Link to initial discussion
> > >  * Add Rafael's Reviewed-by
> > >  * Add lengthier footnotes about forbid vs allow vs sysfs
> > >
> > > Changes in v2:
> > >  * Update CC list
> > >  * Rework problem description
> > >  * Update solution: defer pm_runtime_enable(), instead of trying to
> > >    get()/put()
> > >
> > >  drivers/pci/bus.c | 3 +++
> > >  drivers/pci/pci.c | 1 -
> > >  2 files changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> > > index f26aec6ff588..fc66b6cb3a54 100644
> > > --- a/drivers/pci/bus.c
> > > +++ b/drivers/pci/bus.c
> > > @@ -14,6 +14,7 @@
> > >  #include <linux/of.h>
> > >  #include <linux/of_platform.h>
> > >  #include <linux/platform_device.h>
> > > +#include <linux/pm_runtime.h>
> > >  #include <linux/proc_fs.h>
> > >  #include <linux/slab.h>
> > >
> > > @@ -375,6 +376,8 @@ void pci_bus_add_device(struct pci_dev *dev)
> > >                 put_device(&pdev->dev);
> > >         }
> > >
> > > +       pm_runtime_enable(&dev->dev);
> > > +
> > >         if (!dn || of_device_is_available(dn))
> > >                 pci_dev_allow_binding(dev);
> > >
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index b14dd064006c..f792164fa297 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -3226,7 +3226,6 @@ void pci_pm_init(struct pci_dev *dev)
> > >         pci_pm_power_up_and_verify_state(dev);
> > >         pm_runtime_forbid(&dev->dev);
> > >         pm_runtime_set_active(&dev->dev);
> > 
> > Actually, I think that the two statements above can be moved too.
> > 
> > The pm_runtime_forbid() call doesn't matter until runtime PM is
> > enabled and it is better to do pm_runtime_set_active() right before
> > enabling runtime PM.
> 
> Ehh, sure, I can do that I suppose. That signs me up for fixing some
> excessive Documentation/ that spells out exactly what goes on in
> pci_pm_init() for some reason, but I can do that too :)

Hmm, never mind, I think I agreed with you too soon.

It should be ok to move pm_runtime_set_active(), but I believe moving
pm_runtime_forbid() would introduce a similar race condition problem to
the bug I'm resolving -- pci_bus_add_device() is run after device_add(),
and so user space might already see the device and try to configure
power/control. If we forbid() after that point, user space would be
confused.

Documentation/power/runtime_pm.rst even calls this out:

  "It should be noted, however, that if the user space has already
  intentionally changed the value of /sys/devices/.../power/control to
  "auto" to allow the driver to power manage the device at run time, the
  driver may confuse it by using pm_runtime_forbid() this way."

So I'd rather not make this change.

Brian

> > > -       pm_runtime_enable(&dev->dev);
> > >  }
> > >
> > >  static unsigned long pci_ea_flags(struct pci_dev *dev, u8 prop)
> > > --
> 
Re: [PATCH v3] PCI/PM: Prevent runtime suspend before devices are fully initialized
Posted by Rafael J. Wysocki 3 months, 2 weeks ago
On Thu, Oct 23, 2025 at 8:42 PM Brian Norris <briannorris@chromium.org> wrote:
>
> Hi again,
>
> On Thu, Oct 23, 2025 at 11:34:57AM -0700, Brian Norris wrote:
> > On Thu, Oct 23, 2025 at 11:41:38AM +0200, Rafael J. Wysocki wrote:
> > > On Wed, Oct 22, 2025 at 11:15 PM Brian Norris <briannorris@chromium.org> wrote:
> > > >
> > > > Today, it's possible for a PCI device to be created and
> > > > runtime-suspended before it is fully initialized. When that happens, the
> > > > device will remain in D0, but the suspend process may save an
> > > > intermediate version of that device's state -- for example, without
> > > > appropriate BAR configuration. When the device later resumes, we'll
> > > > restore invalid PCI state and the device may not function.
> > > >
> > > > Prevent runtime suspend for PCI devices by deferring pm_runtime_enable()
> > > > until we've fully initialized the device.
> > > >
> > > > More details on how exactly this may occur:
> > > >
> > > > 1. PCI device is created by pci_scan_slot() or similar
> > > > 2. As part of pci_scan_slot(), pci_pm_init() enables runtime PM; the
> > > >    device starts "active" and we initially prevent (pm_runtime_forbid())
> > > >    suspend -- but see [*] footnote
> > > > 3. Underlying 'struct device' is added to the system (device_add());
> > > >    runtime PM can now be configured by user space
> > > > 4. PCI device receives BAR configuration
> > > >    (pci_assign_unassigned_bus_resources(), etc.)
> > > > 5. PCI device is added to the system in pci_bus_add_device()
> > > >
> > > > The device may potentially suspend between #3 and #4.
> > > >
> > > > [*] By default, pm_runtime_forbid() prevents suspending a device; but by
> > > > design [**], this can be overridden by user space policy via
> > > >
> > > >   echo auto > /sys/bus/pci/devices/.../power/control
> > > >
> > > > Thus, the above #3/#4 sequence is racy with user space (udev or
> > > > similar).
> > > >
> > > > Notably, many PCI devices are enumerated at subsys_initcall time and so
> > > > will not race with user space. However, there are several scenarios
> > > > where PCI devices are created later on, such as with hotplug or when
> > > > drivers (pwrctrl or controller drivers) are built as modules.
> > > >
> > > > [**] The relationship between pm_runtime_forbid(), pm_runtime_allow(),
> > > > /sys/.../power/control, and the runtime PM usage counter can be subtle.
> > > > It appears that the intention of pm_runtime_forbid() /
> > > > pm_runtime_allow() is twofold:
> > > >
> > > > 1. Allow the user to disable runtime_pm (force device to always be
> > > >    powered on) through sysfs.
> > > > 2. Allow the driver to start with runtime_pm disabled (device forced
> > > >    on) and user space could later enable runtime_pm.
> > > >
> > > > This conclusion comes from reading `Documentation/power/runtime_pm.rst`,
> > > > specifically the section starting "The user space can effectively
> > > > disallow".
> > > >
> > > > This means that while pm_runtime_forbid() does technically increase the
> > > > runtime PM usage counter, this usage counter is not a guarantee of
> > > > functional correctness, because sysfs can decrease that count again.
> > > >
> > > > Link: https://lore.kernel.org/all/20251016155335.1.I60a53c170a8596661883bd2b4ef475155c7aa72b@changeid/
> > > > Signed-off-by: Brian Norris <briannorris@chromium.org>
> > > > Cc: <stable@vger.kernel.org>
> > > > Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
> > > > ---
> > > >
> > > > Changes in v3:
> > > >  * Add Link to initial discussion
> > > >  * Add Rafael's Reviewed-by
> > > >  * Add lengthier footnotes about forbid vs allow vs sysfs
> > > >
> > > > Changes in v2:
> > > >  * Update CC list
> > > >  * Rework problem description
> > > >  * Update solution: defer pm_runtime_enable(), instead of trying to
> > > >    get()/put()
> > > >
> > > >  drivers/pci/bus.c | 3 +++
> > > >  drivers/pci/pci.c | 1 -
> > > >  2 files changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> > > > index f26aec6ff588..fc66b6cb3a54 100644
> > > > --- a/drivers/pci/bus.c
> > > > +++ b/drivers/pci/bus.c
> > > > @@ -14,6 +14,7 @@
> > > >  #include <linux/of.h>
> > > >  #include <linux/of_platform.h>
> > > >  #include <linux/platform_device.h>
> > > > +#include <linux/pm_runtime.h>
> > > >  #include <linux/proc_fs.h>
> > > >  #include <linux/slab.h>
> > > >
> > > > @@ -375,6 +376,8 @@ void pci_bus_add_device(struct pci_dev *dev)
> > > >                 put_device(&pdev->dev);
> > > >         }
> > > >
> > > > +       pm_runtime_enable(&dev->dev);
> > > > +
> > > >         if (!dn || of_device_is_available(dn))
> > > >                 pci_dev_allow_binding(dev);
> > > >
> > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > > index b14dd064006c..f792164fa297 100644
> > > > --- a/drivers/pci/pci.c
> > > > +++ b/drivers/pci/pci.c
> > > > @@ -3226,7 +3226,6 @@ void pci_pm_init(struct pci_dev *dev)
> > > >         pci_pm_power_up_and_verify_state(dev);
> > > >         pm_runtime_forbid(&dev->dev);
> > > >         pm_runtime_set_active(&dev->dev);
> > >
> > > Actually, I think that the two statements above can be moved too.
> > >
> > > The pm_runtime_forbid() call doesn't matter until runtime PM is
> > > enabled and it is better to do pm_runtime_set_active() right before
> > > enabling runtime PM.
> >
> > Ehh, sure, I can do that I suppose. That signs me up for fixing some
> > excessive Documentation/ that spells out exactly what goes on in
> > pci_pm_init() for some reason, but I can do that too :)
>
> Hmm, never mind, I think I agreed with you too soon.
>
> It should be ok to move pm_runtime_set_active(), but I believe moving
> pm_runtime_forbid() would introduce a similar race condition problem to
> the bug I'm resolving -- pci_bus_add_device() is run after device_add(),
> and so user space might already see the device and try to configure
> power/control. If we forbid() after that point, user space would be
> confused.
>
> Documentation/power/runtime_pm.rst even calls this out:
>
>   "It should be noted, however, that if the user space has already
>   intentionally changed the value of /sys/devices/.../power/control to
>   "auto" to allow the driver to power manage the device at run time, the
>   driver may confuse it by using pm_runtime_forbid() this way."
>
> So I'd rather not make this change.

OK, fair enough, so leave the pm_runtime_forbid() as is, but
pm_runtime_set_active() should really be done right prior to enabling.