drivers/base/power/main.c | 7 ++++--- kernel/power/main.c | 2 +- 2 files changed, 5 insertions(+), 4 deletions(-)
On Monday, December 1, 2025 7:47:46 PM CET Rafael J. Wysocki wrote:
> On Mon, Dec 1, 2025 at 10:46 AM YangYang <yang.yang@vivo.com> wrote:
[cut]
> If blk_queue_enter() or __bio_queue_enter() is allowed to race with
> disabling runtime PM for q->dev, failure to resume q->dev is alway
> possible and there are no changes that can be made to
> pm_runtime_disable() to prevent that from happening. If
> __pm_runtime_disable() wins the race, it will increment
> power.disable_depth and rpm_resume() will bail out when it sees that
> no matter what.
>
> You should not conflate "runtime PM doesn't work when it is disabled"
> with "asynchronous runtime PM doesn't work after freezing the PM
> workqueue". They are both true, but they are not the same.
So I've been testing the patch below for a few days and it will eliminate
the latter, but even after this patch runtime PM will be disabled in
device_suspend_late() and if the problem you are facing is still there
after this patch, it will need to dealt with at the driver level.
Generally speaking, driver involvement is needed to make runtime PM and
system suspend/resume work together in the majority of cases.
---
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subject:
Till now, the runtime PM workqueue has been flagged as freezable, so it
does not process work items during system-wide PM transitions like
system suspend and resume. The original reason to do that was to
reduce the likelihood of runtime PM getting in the way of system-wide
PM processing, but now it is mostly an optimization because (1) runtime
suspend of devices is prevented by bumping up their runtime PM usage
counters in device_prepare() and (2) device drivers are expected to
disable runtime PM for the devices handled by them before they embark
on system-wide PM activities that may change the state of the hardware
or otherwise interfere with runtime PM. However, it prevents
asynchronous runtime resume of devices from working during system-wide
PM transitions, which is confusing because synchronous runtime resume
is not prevented at the same time, and it also sometimes turns out to
be problematic.
For example, it has been reported that blk_queue_enter() may deadlock
during a system suspend transition because of the pm_request_resume()
usage in it [1]. That happens because the asynchronous runtime resume
of the given device is not processed due to the freezing of the runtime
PM workqueue. While it may be better to address this particular issue
in the block layer, the very presence of it means that similar problems
may be expected to occur elsewhere.
For this reason, remove the WQ_FREEZABLE flag from the runtime PM
workqueue and make device_suspend_late() use the generic variant of
pm_runtime_disable() that will carry out runtime PM of the device
synchronously if there is pending resume work for it.
Also update the comment before the pm_runtime_disable() call in
device_suspend_late() to document the fact that the runtime PM
should not be expected to work for the device until the end of
device_resume_early().
This change may, even though it is not expected to, uncover some
latent issues related to queuing up asynchronous runtime resume
work items during system suspend or hibernation. However, they
should be limited to the interference between runtime resume and
system-wide PM callbacks in the cases when device drivers start
to handle system-wide PM before disabling runtime PM as described
above.
Link: https://lore.kernel.org/linux-pm/20251126101636.205505-2-yang.yang@vivo.com/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
drivers/base/power/main.c | 7 ++++---
kernel/power/main.c | 2 +-
2 files changed, 5 insertions(+), 4 deletions(-)
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1647,10 +1647,11 @@ static void device_suspend_late(struct d
goto Complete;
/*
- * Disable runtime PM for the device without checking if there is a
- * pending resume request for it.
+ * After this point, any runtime PM operations targeting the device
+ * will fail until the corresponding pm_runtime_enable() call in
+ * device_resume_early().
*/
- __pm_runtime_disable(dev, false);
+ pm_runtime_disable(dev);
if (dev->power.syscore)
goto Skip;
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -1125,7 +1125,7 @@ EXPORT_SYMBOL_GPL(pm_wq);
static int __init pm_start_workqueues(void)
{
- pm_wq = alloc_workqueue("pm", WQ_FREEZABLE | WQ_UNBOUND, 0);
+ pm_wq = alloc_workqueue("pm", WQ_UNBOUND, 0);
if (!pm_wq)
return -ENOMEM;
On Mon, 1 Dec 2025 at 20:58, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Monday, December 1, 2025 7:47:46 PM CET Rafael J. Wysocki wrote:
> > On Mon, Dec 1, 2025 at 10:46 AM YangYang <yang.yang@vivo.com> wrote:
>
> [cut]
>
> > If blk_queue_enter() or __bio_queue_enter() is allowed to race with
> > disabling runtime PM for q->dev, failure to resume q->dev is alway
> > possible and there are no changes that can be made to
> > pm_runtime_disable() to prevent that from happening. If
> > __pm_runtime_disable() wins the race, it will increment
> > power.disable_depth and rpm_resume() will bail out when it sees that
> > no matter what.
> >
> > You should not conflate "runtime PM doesn't work when it is disabled"
> > with "asynchronous runtime PM doesn't work after freezing the PM
> > workqueue". They are both true, but they are not the same.
>
> So I've been testing the patch below for a few days and it will eliminate
> the latter, but even after this patch runtime PM will be disabled in
> device_suspend_late() and if the problem you are facing is still there
> after this patch, it will need to dealt with at the driver level.
>
> Generally speaking, driver involvement is needed to make runtime PM and
> system suspend/resume work together in the majority of cases.
>
> ---
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Subject:
>
> Till now, the runtime PM workqueue has been flagged as freezable, so it
> does not process work items during system-wide PM transitions like
> system suspend and resume. The original reason to do that was to
> reduce the likelihood of runtime PM getting in the way of system-wide
> PM processing, but now it is mostly an optimization because (1) runtime
> suspend of devices is prevented by bumping up their runtime PM usage
> counters in device_prepare() and (2) device drivers are expected to
> disable runtime PM for the devices handled by them before they embark
> on system-wide PM activities that may change the state of the hardware
> or otherwise interfere with runtime PM. However, it prevents
> asynchronous runtime resume of devices from working during system-wide
> PM transitions, which is confusing because synchronous runtime resume
> is not prevented at the same time, and it also sometimes turns out to
> be problematic.
>
> For example, it has been reported that blk_queue_enter() may deadlock
> during a system suspend transition because of the pm_request_resume()
> usage in it [1]. That happens because the asynchronous runtime resume
> of the given device is not processed due to the freezing of the runtime
> PM workqueue. While it may be better to address this particular issue
> in the block layer, the very presence of it means that similar problems
> may be expected to occur elsewhere.
>
> For this reason, remove the WQ_FREEZABLE flag from the runtime PM
> workqueue and make device_suspend_late() use the generic variant of
> pm_runtime_disable() that will carry out runtime PM of the device
> synchronously if there is pending resume work for it.
>
> Also update the comment before the pm_runtime_disable() call in
> device_suspend_late() to document the fact that the runtime PM
> should not be expected to work for the device until the end of
> device_resume_early().
>
> This change may, even though it is not expected to, uncover some
> latent issues related to queuing up asynchronous runtime resume
> work items during system suspend or hibernation. However, they
> should be limited to the interference between runtime resume and
> system-wide PM callbacks in the cases when device drivers start
> to handle system-wide PM before disabling runtime PM as described
> above.
>
> Link: https://lore.kernel.org/linux-pm/20251126101636.205505-2-yang.yang@vivo.com/
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
I agree with the above and this seems like a reasonable change to me.
Yep, it's not entirely easy to know whether all users of
pm_request_resume() (and similar) are fine with this too, but in
general I think they should.
So, feel free to add:
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Kind regards
Uffe
> ---
> drivers/base/power/main.c | 7 ++++---
> kernel/power/main.c | 2 +-
> 2 files changed, 5 insertions(+), 4 deletions(-)
>
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -1647,10 +1647,11 @@ static void device_suspend_late(struct d
> goto Complete;
>
> /*
> - * Disable runtime PM for the device without checking if there is a
> - * pending resume request for it.
> + * After this point, any runtime PM operations targeting the device
> + * will fail until the corresponding pm_runtime_enable() call in
> + * device_resume_early().
> */
> - __pm_runtime_disable(dev, false);
> + pm_runtime_disable(dev);
>
> if (dev->power.syscore)
> goto Skip;
> --- a/kernel/power/main.c
> +++ b/kernel/power/main.c
> @@ -1125,7 +1125,7 @@ EXPORT_SYMBOL_GPL(pm_wq);
>
> static int __init pm_start_workqueues(void)
> {
> - pm_wq = alloc_workqueue("pm", WQ_FREEZABLE | WQ_UNBOUND, 0);
> + pm_wq = alloc_workqueue("pm", WQ_UNBOUND, 0);
> if (!pm_wq)
> return -ENOMEM;
>
>
>
>
On 2025/12/2 3:58, Rafael J. Wysocki wrote: > On Monday, December 1, 2025 7:47:46 PM CET Rafael J. Wysocki wrote: >> On Mon, Dec 1, 2025 at 10:46 AM YangYang <yang.yang@vivo.com> wrote: > > [cut] > >> If blk_queue_enter() or __bio_queue_enter() is allowed to race with >> disabling runtime PM for q->dev, failure to resume q->dev is alway >> possible and there are no changes that can be made to >> pm_runtime_disable() to prevent that from happening. If >> __pm_runtime_disable() wins the race, it will increment >> power.disable_depth and rpm_resume() will bail out when it sees that >> no matter what. >> >> You should not conflate "runtime PM doesn't work when it is disabled" >> with "asynchronous runtime PM doesn't work after freezing the PM >> workqueue". They are both true, but they are not the same. > > So I've been testing the patch below for a few days and it will eliminate > the latter, but even after this patch runtime PM will be disabled in > device_suspend_late() and if the problem you are facing is still there > after this patch, it will need to dealt with at the driver level. > > Generally speaking, driver involvement is needed to make runtime PM and > system suspend/resume work together in the majority of cases. > Thank you. I'll perform some tests with this patch applied.
On 12/1/25 11:58 AM, Rafael J. Wysocki wrote: > So I've been testing the patch below for a few days and it will eliminate > the latter, but even after this patch runtime PM will be disabled in > device_suspend_late() and if the problem you are facing is still there > after this patch, it will need to dealt with at the driver level. > > Generally speaking, driver involvement is needed to make runtime PM and > system suspend/resume work together in the majority of cases. Thank you for having developed and shared this patch. Is the following quote from the Linux kernel documentation still correct with this patch applied or should an update for Documentation/power/runtime_pm.rst perhaps be included in this patch? "The power management workqueue pm_wq in which bus types and device drivers can put their PM-related work items. It is strongly recommended that pm_wq be used for queuing all work items related to runtime PM, because this allows them to be synchronized with system-wide power transitions (suspend to RAM, hibernation and resume from system sleep states). pm_wq is declared in include/linux/pm_runtime.h and defined in kernel/power/main.c." Bart.
On Tue, Dec 2, 2025 at 2:06 AM Bart Van Assche <bvanassche@acm.org> wrote: > > On 12/1/25 11:58 AM, Rafael J. Wysocki wrote: > > So I've been testing the patch below for a few days and it will eliminate > > the latter, but even after this patch runtime PM will be disabled in > > device_suspend_late() and if the problem you are facing is still there > > after this patch, it will need to dealt with at the driver level. > > > > Generally speaking, driver involvement is needed to make runtime PM and > > system suspend/resume work together in the majority of cases. > > Thank you for having developed and shared this patch. Is the following > quote from the Linux kernel documentation still correct with this patch > applied or should an update for Documentation/power/runtime_pm.rst > perhaps be included in this patch? > > "The power management workqueue pm_wq in which bus types and device > drivers can > put their PM-related work items. It is strongly recommended that > pm_wq be > used for queuing all work items related to runtime PM, because this > allows > them to be synchronized with system-wide power transitions (suspend > to RAM, > hibernation and resume from system sleep states). pm_wq is declared in > include/linux/pm_runtime.h and defined in kernel/power/main.c." It doesn't say what the synchronization mechanism is in particular and some synchronization is still provided after this patch, via the pm_runtime_barrier() in device_suspend(), for example.
On Tue, Dec 2, 2025 at 12:53 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Tue, Dec 2, 2025 at 2:06 AM Bart Van Assche <bvanassche@acm.org> wrote: > > > > On 12/1/25 11:58 AM, Rafael J. Wysocki wrote: > > > So I've been testing the patch below for a few days and it will eliminate > > > the latter, but even after this patch runtime PM will be disabled in > > > device_suspend_late() and if the problem you are facing is still there > > > after this patch, it will need to dealt with at the driver level. > > > > > > Generally speaking, driver involvement is needed to make runtime PM and > > > system suspend/resume work together in the majority of cases. > > > > Thank you for having developed and shared this patch. Is the following > > quote from the Linux kernel documentation still correct with this patch > > applied or should an update for Documentation/power/runtime_pm.rst > > perhaps be included in this patch? > > > > "The power management workqueue pm_wq in which bus types and device > > drivers can > > put their PM-related work items. It is strongly recommended that > > pm_wq be > > used for queuing all work items related to runtime PM, because this > > allows > > them to be synchronized with system-wide power transitions (suspend > > to RAM, > > hibernation and resume from system sleep states). pm_wq is declared in > > include/linux/pm_runtime.h and defined in kernel/power/main.c." > > It doesn't say what the synchronization mechanism is in particular and > some synchronization is still provided after this patch, via the > pm_runtime_barrier() in device_suspend(), for example. Though there is another piece of documentation that needs updating to reflect the changes in this patch, so I'll send a v2 at one point.
© 2016 - 2026 Red Hat, Inc.