[PATCH] PM: sleep: core: Fix sync issues between work_in_progress and must_resume

Xuewen Yan posted 1 patch 4 days, 7 hours ago
drivers/base/power/main.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
[PATCH] PM: sleep: core: Fix sync issues between work_in_progress and must_resume
Posted by Xuewen Yan 4 days, 7 hours ago
There is a synchronization issue when suspend async:

     suspend-task:                           async-child-kworker
dpm_noirq_suspend_devices
 mutex_lock(dpm_list_mtx)                   async_suspend_noirq
 list_for_each(dpm_late_early_list)         device_suspend_noirq
  dpm_clear_async_state(parent);             dpm_run_callback()
   reinit_completion()                       dpm_superior_set_must_resume(dev)
   parent->power.work_in_progress = false;     dev->parent->power.must_resume = true;

Because the power.work_in_progress and power.must_resume use the
same byte:
struct dev_pm_info {
   ....
  [56] struct completion completion;
  [104] struct wakeup_source *wakeup;
  [112] bool wakeup_path : 1;
  [112] bool syscore : 1;
  [112] bool no_pm_callbacks : 1;
  [112] bool work_in_progress : 1;
  [112] bool smart_suspend : 1;
  [112] bool must_resume : 1;
  [112] bool may_skip_resume : 1;
  [112] bool strict_midlayer : 1;
  ...
}

So, if suspend-task and child-kworker modify these two variables
simultaneously, it will cause mutual overwriting issues.
More severely, this may result in the work_in_progress variable
not being set to false, preventing the __dpm_async() function from
queuing work to execute the parent’s suspend function.
Consequently, the completion event will never be finalized,
ultimately causing the suspend process to be blocked.

To resolve the aforementioned issue, the must_resume variable
should be protected using dpm_list_mtx.

Fixes: aa7a9275ab81 ("PM: sleep: Suspend async parents after suspending children")
Fixes: 443046d1ad66 ("PM: sleep: Make suspend of devices more asynchronous")
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
---
 drivers/base/power/main.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 97a8b4fcf471..7ab42e065074 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1407,8 +1407,11 @@ static void dpm_superior_set_must_resume(struct device *dev)
 	struct device_link *link;
 	int idx;
 
-	if (dev->parent)
+	if (dev->parent) {
+		mutex_lock(&dpm_list_mtx);
 		dev->parent->power.must_resume = true;
+		mutex_unlock(&dpm_list_mtx);
+	}
 
 	idx = device_links_read_lock();
 
-- 
2.25.1

Re: [PATCH] PM: sleep: core: Fix sync issues between work_in_progress and must_resume
Posted by Rafael J. Wysocki 3 days, 17 hours ago
On Tue, Feb 3, 2026 at 7:36 AM Xuewen Yan <xuewen.yan@unisoc.com> wrote:
>
> There is a synchronization issue when suspend async:
>
>      suspend-task:                           async-child-kworker
> dpm_noirq_suspend_devices
>  mutex_lock(dpm_list_mtx)                   async_suspend_noirq
>  list_for_each(dpm_late_early_list)         device_suspend_noirq
>   dpm_clear_async_state(parent);             dpm_run_callback()
>    reinit_completion()                       dpm_superior_set_must_resume(dev)
>    parent->power.work_in_progress = false;     dev->parent->power.must_resume = true;

Good catch!

The issue would affect suppliers too though, AFAICS.

> Because the power.work_in_progress and power.must_resume use the
> same byte:
> struct dev_pm_info {
>    ....
>   [56] struct completion completion;
>   [104] struct wakeup_source *wakeup;
>   [112] bool wakeup_path : 1;
>   [112] bool syscore : 1;
>   [112] bool no_pm_callbacks : 1;
>   [112] bool work_in_progress : 1;
>   [112] bool smart_suspend : 1;
>   [112] bool must_resume : 1;
>   [112] bool may_skip_resume : 1;
>   [112] bool strict_midlayer : 1;
>   ...
> }
>
> So, if suspend-task and child-kworker modify these two variables
> simultaneously, it will cause mutual overwriting issues.
> More severely, this may result in the work_in_progress variable
> not being set to false, preventing the __dpm_async() function from
> queuing work to execute the parent’s suspend function.
> Consequently, the completion event will never be finalized,
> ultimately causing the suspend process to be blocked.
>
> To resolve the aforementioned issue, the must_resume variable
> should be protected using dpm_list_mtx.
>
> Fixes: aa7a9275ab81 ("PM: sleep: Suspend async parents after suspending children")

No, the patch below doesn't fix this one AFAICS.

> Fixes: 443046d1ad66 ("PM: sleep: Make suspend of devices more asynchronous")
> Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
> ---
>  drivers/base/power/main.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index 97a8b4fcf471..7ab42e065074 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -1407,8 +1407,11 @@ static void dpm_superior_set_must_resume(struct device *dev)
>         struct device_link *link;
>         int idx;
>
> -       if (dev->parent)
> +       if (dev->parent) {
> +               mutex_lock(&dpm_list_mtx);
>                 dev->parent->power.must_resume = true;
> +               mutex_unlock(&dpm_list_mtx);
> +       }

This is not sufficient because the suppliers can also be affected and
there are analogous issues in the other suspend phases.

>
>         idx = device_links_read_lock();
>
> --

So let me send my version of the fix.

Thanks!