[PATCH v2] PM: runtime: Return properly from rpm_resume() if dev->power.needs_force_resume flag is set

Liu Ying posted 1 patch 1 year, 7 months ago
Documentation/power/runtime_pm.rst | 14 +++++++-------
drivers/base/power/runtime.c       |  2 ++
2 files changed, 9 insertions(+), 7 deletions(-)
[PATCH v2] PM: runtime: Return properly from rpm_resume() if dev->power.needs_force_resume flag is set
Posted by Liu Ying 1 year, 7 months ago
After a device transitions to sleep state through it's system suspend
callback pm_runtime_force_suspend(), the device's driver may still try
to do runtime PM for the device(runtime suspend first and then runtime
resume) although runtime PM is disabled by that callback.  The runtime
PM operations would not touch the device effectively and the device is
assumed to be resumed through it's system resume callback
pm_runtime_force_resume().

The problem is that since the device's runtime PM status is RPM_SUSPENDED
in the sleep state, rpm_resume() would not take the device as in already
active status and would return -EACCES instead of 1.  The error code
-EACCES may make the device's driver put the device into runtime suspend
status with the RPM_GET_PUT flag set and hence drop the device's usage
count to 1.  Then, at dpm_complete stage, device_complete() would call
pm_runtime_put() to drop the device's usage count again to 0 and call
rpm_idle() to try to put the device into runtime PM suspend status.  So,
the device could eventually stay at the runtime PM suspend status.

A real problematic case is the panel-simple.c driver(works with a
downstream DRM device driver), where the optional enable_gpio(controlled
by the runtime PM callbacks) will be disabled through pm_runtime_put()
called in device_complete() if a DRM atomic commit(triggered by a DRM
device's system resume callback) tries to do runtime PM resume for the
panel before the panel is active with force resume:

1) System suspend:
 - pm_runtime_force_suspend()
  - panel_simple_suspend() // enable_gpio is disabled

2) Runtime suspend with a DRM atomic commit:
 - panel_simple_unprepare()
  - pm_runtime_put_autosuspend() // drop device usage count to 1

3) Runtime resume with a DRM atomic commit:
 - panel_simple_prepare()
  - pm_runtime_get_sync() // increase device usage count to 2
    - rpm_resume() // return -EACCES
  - pm_runtime_put_autosuspend() // drop device usage count to 1

4) System resume:
 - pm_runtime_force_resume()
  - panel_simple_resume() // enable_gpio is enabled

5) PM transition complete:
 - dpm_complete()
  - device_complete()
   - pm_runtime_put() // drop device usage count to 0
    - rpm_idle()
     - rpm_suspend() // start hrtimer with expires

6) hrtimer expires:
 - pm_suspend_timer_fn()
  - rpm_suspend() // queue work on pm_wq

7) work function is called:
 - pm_runtime_work()
  - rpm_suspend()
   - panel_simple_suspend() // enable_gpio is disabled

Fix the issue by checking dev->power.needs_force_resume flag in
rpm_resume() so that it returns 1 instead of -EACCES in this scenario,
since the flag is set in pm_runtime_force_suspend().  Then, device
usage count will be 1 after pm_runtime_put() is called at dpm_complete
stage.

Also, update the documentation to change the description of
pm_runtime_resume() to reflect the new behavior of rpm_resume().

Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Liu Ying <victor.liu@nxp.com>
---
v1->v2:
* Fix commit message to tell the reason why the issue happens, that is,
  zeroed device usage count in pm_runtime_put() at dpm_complete stage
  eventually makes the device be in runtime PM suspend status.

 Documentation/power/runtime_pm.rst | 14 +++++++-------
 drivers/base/power/runtime.c       |  2 ++
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/Documentation/power/runtime_pm.rst b/Documentation/power/runtime_pm.rst
index 65b86e487afe..6266f0ac02a8 100644
--- a/Documentation/power/runtime_pm.rst
+++ b/Documentation/power/runtime_pm.rst
@@ -337,13 +337,13 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
 
   `int pm_runtime_resume(struct device *dev);`
     - execute the subsystem-level resume callback for the device; returns 0 on
-      success, 1 if the device's runtime PM status is already 'active' (also if
-      'power.disable_depth' is nonzero, but the status was 'active' when it was
-      changing from 0 to 1) or error code on failure, where -EAGAIN means it may
-      be safe to attempt to resume the device again in future, but
-      'power.runtime_error' should be checked additionally, and -EACCES means
-      that the callback could not be run, because 'power.disable_depth' was
-      different from 0
+      success, 1 if the device's runtime PM status is assumed to be 'active'
+      with force resume or is already 'active' (also if 'power.disable_depth' is
+      nonzero, but the status was 'active' when it was changing from 0 to 1) or
+      error code on failure, where -EAGAIN means it may be safe to attempt to
+      resume the device again in future, but 'power.runtime_error' should be
+      checked additionally, and -EACCES means that the callback could not be
+      run, because 'power.disable_depth' was different from 0
 
   `int pm_runtime_resume_and_get(struct device *dev);`
     - run pm_runtime_resume(dev) and if successful, increment the device's
diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 997be3ac20a7..0bce66ea0036 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -762,6 +762,8 @@ static int rpm_resume(struct device *dev, int rpmflags)
  repeat:
 	if (dev->power.runtime_error) {
 		retval = -EINVAL;
+	} else if (dev->power.needs_force_resume) {
+		retval = 1;
 	} else if (dev->power.disable_depth > 0) {
 		if (dev->power.runtime_status == RPM_ACTIVE &&
 		    dev->power.last_status == RPM_ACTIVE)
-- 
2.37.1
Re: [PATCH v2] PM: runtime: Return properly from rpm_resume() if dev->power.needs_force_resume flag is set
Posted by Ulf Hansson 1 year, 7 months ago
On Fri, 23 Sept 2022 at 14:47, Liu Ying <victor.liu@nxp.com> wrote:
>
> After a device transitions to sleep state through it's system suspend
> callback pm_runtime_force_suspend(), the device's driver may still try
> to do runtime PM for the device(runtime suspend first and then runtime
> resume) although runtime PM is disabled by that callback.  The runtime
> PM operations would not touch the device effectively and the device is
> assumed to be resumed through it's system resume callback
> pm_runtime_force_resume().

This sounds like a fragile use case to me. In principle you want to
allow the device to be runtime resumed/suspended, after the device has
already been put into a low power state through the regular system
suspend callback. Normally it seems better to prevent this from
happening, completely.

That said, in this case, I wonder if a better option would be to point
->suspend_late() to pm_runtime_force_suspend() and ->resume_early() to
pm_runtime_force_resume(), rather than using the regular
->suspend|resume() callbacks. This should avoid the problem, I think,
no?

Note that, the PM core also disables runtime PM for the device in
__device_suspend_late(). For good reasons.

[...]

Kind regards
Uffe
Re: [PATCH v2] PM: runtime: Return properly from rpm_resume() if dev->power.needs_force_resume flag is set
Posted by Liu Ying 1 year, 7 months ago
On Fri, 2022-09-23 at 15:48 +0200, Ulf Hansson wrote:
> On Fri, 23 Sept 2022 at 14:47, Liu Ying <victor.liu@nxp.com> wrote:
> > 
> > After a device transitions to sleep state through it's system
> > suspend
> > callback pm_runtime_force_suspend(), the device's driver may still
> > try
> > to do runtime PM for the device(runtime suspend first and then
> > runtime
> > resume) although runtime PM is disabled by that callback.  The
> > runtime
> > PM operations would not touch the device effectively and the device
> > is
> > assumed to be resumed through it's system resume callback
> > pm_runtime_force_resume().
> 
> This sounds like a fragile use case to me. In principle you want to
> allow the device to be runtime resumed/suspended, after the device
> has
> already been put into a low power state through the regular system
> suspend callback. Normally it seems better to prevent this from
> happening, completely.

Not sure if we really may prevent this from happening completely.

> 
> That said, in this case, I wonder if a better option would be to
> point
> ->suspend_late() to pm_runtime_force_suspend() and ->resume_early()
> to
> pm_runtime_force_resume(), rather than using the regular
> ->suspend|resume() callbacks. This should avoid the problem, I think,
> no?

I thought about this and it actually works for my particular
panel-simple case.  What worries me is that the device(DRM device in my
case) which triggers the runtime PM operations may also use            
->suspend_late/resume_early() callbacks for whatever reasons, hence no
fixed order to suspend/resume the two devices(like panel device and DRM
device).

Also, not sure if there is any sequence issue by using the            
->suspend_late/resume_early() callbacks in the panel-simple driver,
since it's written for quite a few display panels which may work with
various DRM devices - don't want to break any of them.

Regards,
Liu Ying

> 
> Note that, the PM core also disables runtime PM for the device in
> __device_suspend_late(). For good reasons.
> 
> [...]
> 
> Kind regards
> Uffe
Re: [PATCH v2] PM: runtime: Return properly from rpm_resume() if dev->power.needs_force_resume flag is set
Posted by Ulf Hansson 1 year, 7 months ago
On Fri, 23 Sept 2022 at 17:23, Liu Ying <victor.liu@nxp.com> wrote:
>
> On Fri, 2022-09-23 at 15:48 +0200, Ulf Hansson wrote:
> > On Fri, 23 Sept 2022 at 14:47, Liu Ying <victor.liu@nxp.com> wrote:
> > >
> > > After a device transitions to sleep state through it's system
> > > suspend
> > > callback pm_runtime_force_suspend(), the device's driver may still
> > > try
> > > to do runtime PM for the device(runtime suspend first and then
> > > runtime
> > > resume) although runtime PM is disabled by that callback.  The
> > > runtime
> > > PM operations would not touch the device effectively and the device
> > > is
> > > assumed to be resumed through it's system resume callback
> > > pm_runtime_force_resume().
> >
> > This sounds like a fragile use case to me. In principle you want to
> > allow the device to be runtime resumed/suspended, after the device
> > has
> > already been put into a low power state through the regular system
> > suspend callback. Normally it seems better to prevent this from
> > happening, completely.
>
> Not sure if we really may prevent this from happening completely.
>
> >
> > That said, in this case, I wonder if a better option would be to
> > point
> > ->suspend_late() to pm_runtime_force_suspend() and ->resume_early()
> > to
> > pm_runtime_force_resume(), rather than using the regular
> > ->suspend|resume() callbacks. This should avoid the problem, I think,
> > no?
>
> I thought about this and it actually works for my particular
> panel-simple case.  What worries me is that the device(DRM device in my
> case) which triggers the runtime PM operations may also use
> ->suspend_late/resume_early() callbacks for whatever reasons, hence no
> fixed order to suspend/resume the two devices(like panel device and DRM
> device).
>
> Also, not sure if there is any sequence issue by using the
> ->suspend_late/resume_early() callbacks in the panel-simple driver,
> since it's written for quite a few display panels which may work with
> various DRM devices - don't want to break any of them.

What you are describing here, is the classical problem we have with
suspend/resume ordering of devices.

There are in principle two ways to solve this.
1. If it makes sense, the devices might be assigned as parent/child.
2. If it's more a consumer/supplier thing, we can add a device-link
between them.

In this way, the PM core can guarantee that the order becomes correct.

Kind regards
Uffe
Re: [PATCH v2] PM: runtime: Return properly from rpm_resume() if dev->power.needs_force_resume flag is set
Posted by Liu Ying 1 year, 7 months ago
On Mon, 2022-09-26 at 11:47 +0200, Ulf Hansson wrote:
> On Fri, 23 Sept 2022 at 17:23, Liu Ying <victor.liu@nxp.com> wrote:
> > On Fri, 2022-09-23 at 15:48 +0200, Ulf Hansson wrote:
> > > On Fri, 23 Sept 2022 at 14:47, Liu Ying <victor.liu@nxp.com> wrote:
> > > > After a device transitions to sleep state through it's system
> > > > suspend
> > > > callback pm_runtime_force_suspend(), the device's driver may still
> > > > try
> > > > to do runtime PM for the device(runtime suspend first and then
> > > > runtime
> > > > resume) although runtime PM is disabled by that callback.  The
> > > > runtime
> > > > PM operations would not touch the device effectively and the device
> > > > is
> > > > assumed to be resumed through it's system resume callback
> > > > pm_runtime_force_resume().
> > > 
> > > This sounds like a fragile use case to me. In principle you want to
> > > allow the device to be runtime resumed/suspended, after the device
> > > has
> > > already been put into a low power state through the regular system
> > > suspend callback. Normally it seems better to prevent this from
> > > happening, completely.
> > 
> > Not sure if we really may prevent this from happening completely.
> > 
> > > That said, in this case, I wonder if a better option would be to
> > > point
> > > ->suspend_late() to pm_runtime_force_suspend() and ->resume_early()
> > > to
> > > pm_runtime_force_resume(), rather than using the regular
> > > ->suspend|resume() callbacks. This should avoid the problem, I think,
> > > no?
> > 
> > I thought about this and it actually works for my particular
> > panel-simple case.  What worries me is that the device(DRM device in my
> > case) which triggers the runtime PM operations may also use
> > ->suspend_late/resume_early() callbacks for whatever reasons, hence no
> > fixed order to suspend/resume the two devices(like panel device and DRM
> > device).
> > 
> > Also, not sure if there is any sequence issue by using the
> > ->suspend_late/resume_early() callbacks in the panel-simple driver,
> > since it's written for quite a few display panels which may work with
> > various DRM devices - don't want to break any of them.
> 
> What you are describing here, is the classical problem we have with
> suspend/resume ordering of devices.
> 
> There are in principle two ways to solve this.
> 1. If it makes sense, the devices might be assigned as parent/child.
> 2. If it's more a consumer/supplier thing, we can add a device-link
> between them.

I thought about the two ways for my particular panel-simple case and
the first impression is that it's not straightforward to use them. For
DSI panels(with DRM_MODE_CONNECTOR_DSI connector type), it looks like
panel device's parent is DSI host device(set in mipi_dsi_device_alloc()
). For other types of panels, like DPI panels, many show up in device
tree as child-node of root node and connect a display controller or a
display bridge through OF graph.  Seems that DRM architecture level
lacks some sort of glue code to use the two ways.

Regards,
Liu Ying 

> 
> In this way, the PM core can guarantee that the order becomes correct.
> 
> Kind regards
> Uffe
Re: [PATCH v2] PM: runtime: Return properly from rpm_resume() if dev->power.needs_force_resume flag is set
Posted by Rafael J. Wysocki 1 year, 6 months ago
On Tue, Sep 27, 2022 at 9:47 AM Liu Ying <victor.liu@nxp.com> wrote:
>
> On Mon, 2022-09-26 at 11:47 +0200, Ulf Hansson wrote:
> > On Fri, 23 Sept 2022 at 17:23, Liu Ying <victor.liu@nxp.com> wrote:
> > > On Fri, 2022-09-23 at 15:48 +0200, Ulf Hansson wrote:
> > > > On Fri, 23 Sept 2022 at 14:47, Liu Ying <victor.liu@nxp.com> wrote:
> > > > > After a device transitions to sleep state through it's system
> > > > > suspend
> > > > > callback pm_runtime_force_suspend(), the device's driver may still
> > > > > try
> > > > > to do runtime PM for the device(runtime suspend first and then
> > > > > runtime
> > > > > resume) although runtime PM is disabled by that callback.  The
> > > > > runtime
> > > > > PM operations would not touch the device effectively and the device
> > > > > is
> > > > > assumed to be resumed through it's system resume callback
> > > > > pm_runtime_force_resume().
> > > >
> > > > This sounds like a fragile use case to me. In principle you want to
> > > > allow the device to be runtime resumed/suspended, after the device
> > > > has
> > > > already been put into a low power state through the regular system
> > > > suspend callback. Normally it seems better to prevent this from
> > > > happening, completely.
> > >
> > > Not sure if we really may prevent this from happening completely.
> > >
> > > > That said, in this case, I wonder if a better option would be to
> > > > point
> > > > ->suspend_late() to pm_runtime_force_suspend() and ->resume_early()
> > > > to
> > > > pm_runtime_force_resume(), rather than using the regular
> > > > ->suspend|resume() callbacks. This should avoid the problem, I think,
> > > > no?
> > >
> > > I thought about this and it actually works for my particular
> > > panel-simple case.  What worries me is that the device(DRM device in my
> > > case) which triggers the runtime PM operations may also use
> > > ->suspend_late/resume_early() callbacks for whatever reasons, hence no
> > > fixed order to suspend/resume the two devices(like panel device and DRM
> > > device).
> > >
> > > Also, not sure if there is any sequence issue by using the
> > > ->suspend_late/resume_early() callbacks in the panel-simple driver,
> > > since it's written for quite a few display panels which may work with
> > > various DRM devices - don't want to break any of them.
> >
> > What you are describing here, is the classical problem we have with
> > suspend/resume ordering of devices.
> >
> > There are in principle two ways to solve this.
> > 1. If it makes sense, the devices might be assigned as parent/child.
> > 2. If it's more a consumer/supplier thing, we can add a device-link
> > between them.
>
> I thought about the two ways for my particular panel-simple case and
> the first impression is that it's not straightforward to use them. For
> DSI panels(with DRM_MODE_CONNECTOR_DSI connector type), it looks like
> panel device's parent is DSI host device(set in mipi_dsi_device_alloc()
> ). For other types of panels, like DPI panels, many show up in device
> tree as child-node of root node and connect a display controller or a
> display bridge through OF graph.  Seems that DRM architecture level
> lacks some sort of glue code to use the two ways.

Well, apparently, the ordering of power management operations
regarding the components in question cannot be arbitrary, but without
any information on the correct ordering in place, there is no way to
guarantee that ordering in every possible code path.  Addressing one
of them is generally insufficient and you will see problems sooner or
later.