Optimize async device suspend/resume

[PATCH v1 0/5] Optimize async device suspend/resume

Posted by Saravana Kannan 1 week ago

A lot of the details are in patch 4/5 and 5/5. The summary is that
there's a lot of overhead and wasted work in how async device
suspend/resume is handled today. I talked about this and otther
suspend/resume issues at LPC 2024[1].

You can remove a lot of the overhead by doing a breadth first queuing of
async suspend/resumes. That's what this patch series does. I also
noticed that during resume, because of EAS, we don't use the bigger CPUs
as quickly. This was leading to a lot of scheduling latency and
preemption of runnable threads and increasing the resume latency. So, we
also disable EAS for that tiny period of resume where we know there'll
be a lot of parallelism.

On a Pixel 6, averaging over 100 suspend/resume cycles, this patch
series yields significant improvements:
+---------------------------+-----------+----------------+------------+-------+
| Phase			    | Old full sync | Old full async | New full async |
|			    |		    | 		     | + EAS disabled |
+---------------------------+-----------+----------------+------------+-------+
| Total dpm_suspend*() time |        107 ms |          72 ms |          62 ms |
+---------------------------+-----------+----------------+------------+-------+
| Total dpm_resume*() time  |         75 ms |          90 ms |          61 ms |
+---------------------------+-----------+----------------+------------+-------+
| Sum			    |        182 ms |         162 ms |         123 ms |
+---------------------------+-----------+----------------+------------+-------+

There might be room for some more optimizations in the future, but I'm
keep this patch series simple enough so that it's easier to review and
check that it's not breaking anything. If this series lands and is
stable and no bug reports for a few months, I can work on optimizing
this a bit further.

Thanks,
Saravana
P.S: Cc-ing some usual suspects you might be interested in testing this
out.

[1] - https://lpc.events/event/18/contributions/1845/

Saravana Kannan (5):
  PM: sleep: Fix runtime PM issue in dpm_resume()
  PM: sleep: Remove unnecessary mutex lock when waiting on parent
  PM: sleep: Add helper functions to loop through superior/subordinate
    devs
  PM: sleep: Do breadth first suspend/resume for async suspend/resume
  PM: sleep: Spread out async kworker threads during dpm_resume*()
    phases

 drivers/base/power/main.c | 325 +++++++++++++++++++++++++++++---------
 kernel/power/suspend.c    |  16 ++
 kernel/sched/topology.c   |  13 ++
 3 files changed, 276 insertions(+), 78 deletions(-)

-- 
2.47.0.338.g60cca15819-goog

Re: [PATCH v1 0/5] Optimize async device suspend/resume

Posted by Saravana Kannan 3 days, 17 hours ago

On Thu, Nov 14, 2024 at 2:09 PM Saravana Kannan <saravanak@google.com> wrote:
>
> A lot of the details are in patch 4/5 and 5/5. The summary is that
> there's a lot of overhead and wasted work in how async device
> suspend/resume is handled today. I talked about this and otther
> suspend/resume issues at LPC 2024[1].
>
> You can remove a lot of the overhead by doing a breadth first queuing of
> async suspend/resumes. That's what this patch series does. I also
> noticed that during resume, because of EAS, we don't use the bigger CPUs
> as quickly. This was leading to a lot of scheduling latency and
> preemption of runnable threads and increasing the resume latency. So, we
> also disable EAS for that tiny period of resume where we know there'll
> be a lot of parallelism.
>
> On a Pixel 6, averaging over 100 suspend/resume cycles, this patch
> series yields significant improvements:
> +---------------------------+-----------+----------------+------------+-------+
> | Phase                     | Old full sync | Old full async | New full async |
> |                           |               |                | + EAS disabled |
> +---------------------------+-----------+----------------+------------+-------+
> | Total dpm_suspend*() time |        107 ms |          72 ms |          62 ms |
> +---------------------------+-----------+----------------+------------+-------+
> | Total dpm_resume*() time  |         75 ms |          90 ms |          61 ms |
> +---------------------------+-----------+----------------+------------+-------+
> | Sum                       |        182 ms |         162 ms |         123 ms |
> +---------------------------+-----------+----------------+------------+-------+
>
> There might be room for some more optimizations in the future, but I'm
> keep this patch series simple enough so that it's easier to review and
> check that it's not breaking anything. If this series lands and is
> stable and no bug reports for a few months, I can work on optimizing
> this a bit further.
>
> Thanks,
> Saravana
> P.S: Cc-ing some usual suspects you might be interested in testing this
> out.
>
> [1] - https://lpc.events/event/18/contributions/1845/
>
> Saravana Kannan (5):
>   PM: sleep: Fix runtime PM issue in dpm_resume()
>   PM: sleep: Remove unnecessary mutex lock when waiting on parent
>   PM: sleep: Add helper functions to loop through superior/subordinate
>     devs
>   PM: sleep: Do breadth first suspend/resume for async suspend/resume
>   PM: sleep: Spread out async kworker threads during dpm_resume*()
>     phases
>
>  drivers/base/power/main.c | 325 +++++++++++++++++++++++++++++---------

Hi Rafael/Greg,

I'm waiting for one of your reviews before I send out the next version.

-Saravana

>  kernel/power/suspend.c    |  16 ++
>  kernel/sched/topology.c   |  13 ++
>  3 files changed, 276 insertions(+), 78 deletions(-)
>
> --
> 2.47.0.338.g60cca15819-goog
>

Re: [PATCH v1 0/5] Optimize async device suspend/resume

Posted by Greg Kroah-Hartman 3 days, 11 hours ago

On Mon, Nov 18, 2024 at 08:04:26PM -0800, Saravana Kannan wrote:
> On Thu, Nov 14, 2024 at 2:09 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > A lot of the details are in patch 4/5 and 5/5. The summary is that
> > there's a lot of overhead and wasted work in how async device
> > suspend/resume is handled today. I talked about this and otther
> > suspend/resume issues at LPC 2024[1].
> >
> > You can remove a lot of the overhead by doing a breadth first queuing of
> > async suspend/resumes. That's what this patch series does. I also
> > noticed that during resume, because of EAS, we don't use the bigger CPUs
> > as quickly. This was leading to a lot of scheduling latency and
> > preemption of runnable threads and increasing the resume latency. So, we
> > also disable EAS for that tiny period of resume where we know there'll
> > be a lot of parallelism.
> >
> > On a Pixel 6, averaging over 100 suspend/resume cycles, this patch
> > series yields significant improvements:
> > +---------------------------+-----------+----------------+------------+-------+
> > | Phase                     | Old full sync | Old full async | New full async |
> > |                           |               |                | + EAS disabled |
> > +---------------------------+-----------+----------------+------------+-------+
> > | Total dpm_suspend*() time |        107 ms |          72 ms |          62 ms |
> > +---------------------------+-----------+----------------+------------+-------+
> > | Total dpm_resume*() time  |         75 ms |          90 ms |          61 ms |
> > +---------------------------+-----------+----------------+------------+-------+
> > | Sum                       |        182 ms |         162 ms |         123 ms |
> > +---------------------------+-----------+----------------+------------+-------+
> >
> > There might be room for some more optimizations in the future, but I'm
> > keep this patch series simple enough so that it's easier to review and
> > check that it's not breaking anything. If this series lands and is
> > stable and no bug reports for a few months, I can work on optimizing
> > this a bit further.
> >
> > Thanks,
> > Saravana
> > P.S: Cc-ing some usual suspects you might be interested in testing this
> > out.
> >
> > [1] - https://lpc.events/event/18/contributions/1845/
> >
> > Saravana Kannan (5):
> >   PM: sleep: Fix runtime PM issue in dpm_resume()
> >   PM: sleep: Remove unnecessary mutex lock when waiting on parent
> >   PM: sleep: Add helper functions to loop through superior/subordinate
> >     devs
> >   PM: sleep: Do breadth first suspend/resume for async suspend/resume
> >   PM: sleep: Spread out async kworker threads during dpm_resume*()
> >     phases
> >
> >  drivers/base/power/main.c | 325 +++++++++++++++++++++++++++++---------
> 
> Hi Rafael/Greg,
> 
> I'm waiting for one of your reviews before I send out the next version.

Please feel free to send, it's the middle of the merge window now, and
I'm busy with that for the next 2 weeks, so I can't do anything until
after that.

thanks,

greg k-h