As of today, the scheduler doesn't spread out all the kworker threads
across all the available CPUs during suspend/resume. This causes
significant resume latency during the dpm_resume*() phases.
System resume latency is a very user-visible event. Reducing the
latency is more important than trying to be energy aware during that
period.
Since there are no userspace processes running during this time and
this is a very short time window, we can simply disable EAS during
resume so that the parallel resume of the devices is spread across all
the CPUs.
On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic
plus disabling EAS for resume yields significant improvements:
+---------------------------+-----------+------------+------------------+
| Phase | Old full sync | New full async | % change |
| | | + EAS disabled | |
+---------------------------+-----------+------------+------------------+
| Total dpm_suspend*() time | 107 ms | 62 ms | -42% |
+---------------------------+-----------+------------+------------------+
| Total dpm_resume*() time | 75 ms | 61 ms | -19% |
+---------------------------+-----------+------------+------------------+
| Sum | 182 ms | 123 ms | -32% |
+---------------------------+-----------+------------+------------------+
Signed-off-by: Saravana Kannan <saravanak@google.com>
---
kernel/power/suspend.c | 16 ++++++++++++++++
kernel/sched/topology.c | 13 +++++++++++++
2 files changed, 29 insertions(+)
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 09f8397bae15..7304dc39958f 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void)
local_irq_enable();
}
+/*
+ * Intentionally not part of a header file to avoid risk of abuse by other
+ * drivers.
+ */
+void sched_set_energy_aware(unsigned int enable);
+
/**
* suspend_enter - Make the system enter the given sleep state.
* @state: System sleep state to enter.
@@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
Platform_wake:
platform_resume_noirq(state);
+ /*
+ * We do this only for resume instead of suspend and resume for these
+ * reasons:
+ * - Performance is more important than power for resume.
+ * - Power spent entering suspend is more important for suspend. Also,
+ * stangely, disabling EAS was making suspent a few milliseconds
+ * slower in my testing.
+ */
+ sched_set_energy_aware(0);
dpm_resume_noirq(PMSG_RESUME);
Platform_early_resume:
@@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state)
Resume_devices:
suspend_test_start();
dpm_resume_end(PMSG_RESUME);
+ sched_set_energy_aware(1);
suspend_test_finish("resume devices");
trace_suspend_resume(TPS("resume_console"), state, true);
resume_console();
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 9748a4c8d668..c069c0b17cbf 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void)
mutex_unlock(&sched_energy_mutex);
}
+void sched_set_energy_aware(unsigned int enable)
+{
+ int state;
+
+ if (!sched_is_eas_possible(cpu_active_mask))
+ return;
+
+ sysctl_sched_energy_aware = enable;
+ state = static_branch_unlikely(&sched_energy_present);
+ if (state != sysctl_sched_energy_aware)
+ rebuild_sched_domains_energy();
+}
+
#ifdef CONFIG_PROC_SYSCTL
static int sched_energy_aware_handler(const struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
--
2.47.0.338.g60cca15819-goog
On 11/14/24 22:09, Saravana Kannan wrote: > As of today, the scheduler doesn't spread out all the kworker threads > across all the available CPUs during suspend/resume. This causes > significant resume latency during the dpm_resume*() phases. > > System resume latency is a very user-visible event. Reducing the > latency is more important than trying to be energy aware during that > period. > > Since there are no userspace processes running during this time and > this is a very short time window, we can simply disable EAS during > resume so that the parallel resume of the devices is spread across all > the CPUs. > > On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic > plus disabling EAS for resume yields significant improvements: > +---------------------------+-----------+------------+------------------+ > | Phase | Old full sync | New full async | % change | > | | | + EAS disabled | | > +---------------------------+-----------+------------+------------------+ > | Total dpm_suspend*() time | 107 ms | 62 ms | -42% | > +---------------------------+-----------+------------+------------------+ > | Total dpm_resume*() time | 75 ms | 61 ms | -19% | > +---------------------------+-----------+------------+------------------+ > | Sum | 182 ms | 123 ms | -32% | > +---------------------------+-----------+------------+------------------+ > > Signed-off-by: Saravana Kannan <saravanak@google.com> > --- > kernel/power/suspend.c | 16 ++++++++++++++++ > kernel/sched/topology.c | 13 +++++++++++++ > 2 files changed, 29 insertions(+) > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > index 09f8397bae15..7304dc39958f 100644 > --- a/kernel/power/suspend.c > +++ b/kernel/power/suspend.c > @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void) > local_irq_enable(); > } > > +/* > + * Intentionally not part of a header file to avoid risk of abuse by other > + * drivers. > + */ > +void sched_set_energy_aware(unsigned int enable); > + > /** > * suspend_enter - Make the system enter the given sleep state. > * @state: System sleep state to enter. > @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) > > Platform_wake: > platform_resume_noirq(state); > + /* > + * We do this only for resume instead of suspend and resume for these > + * reasons: > + * - Performance is more important than power for resume. > + * - Power spent entering suspend is more important for suspend. Also, > + * stangely, disabling EAS was making suspent a few milliseconds > + * slower in my testing. s/stangely/strangely s/suspent/suspend I'd also be curious why that is. Disabling EAS shouldn't be that expensive. What if you just hack the static branch switch (without the sd rebuild)? > + */ > + sched_set_energy_aware(0); > dpm_resume_noirq(PMSG_RESUME); > > Platform_early_resume: > @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state) > Resume_devices: > suspend_test_start(); > dpm_resume_end(PMSG_RESUME); > + sched_set_energy_aware(1); > suspend_test_finish("resume devices"); > trace_suspend_resume(TPS("resume_console"), state, true); > resume_console(); > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > index 9748a4c8d668..c069c0b17cbf 100644 > --- a/kernel/sched/topology.c > +++ b/kernel/sched/topology.c > @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void) > mutex_unlock(&sched_energy_mutex); > } > > +void sched_set_energy_aware(unsigned int enable) bool enable? > +{ > + int state; > + > + if (!sched_is_eas_possible(cpu_active_mask)) > + return; > + > + sysctl_sched_energy_aware = enable; > + state = static_branch_unlikely(&sched_energy_present); > + if (state != sysctl_sched_energy_aware) > + rebuild_sched_domains_energy(); > +} > + This definitely shouldn't just overwrite sysctl_sched_energy_aware, otherwise you enable EAS for users that explicitly disabled it. If it ever comes to other users wanting this we might need a eas_pause counter so this can be nested, but let's just hope that's never needed. Regards, Christian
On Mon, Nov 18, 2024 at 1:52 AM Christian Loehle <christian.loehle@arm.com> wrote: > > On 11/14/24 22:09, Saravana Kannan wrote: > > As of today, the scheduler doesn't spread out all the kworker threads > > across all the available CPUs during suspend/resume. This causes > > significant resume latency during the dpm_resume*() phases. > > > > System resume latency is a very user-visible event. Reducing the > > latency is more important than trying to be energy aware during that > > period. > > > > Since there are no userspace processes running during this time and > > this is a very short time window, we can simply disable EAS during > > resume so that the parallel resume of the devices is spread across all > > the CPUs. > > > > On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic > > plus disabling EAS for resume yields significant improvements: > > +---------------------------+-----------+------------+------------------+ > > | Phase | Old full sync | New full async | % change | > > | | | + EAS disabled | | > > +---------------------------+-----------+------------+------------------+ > > | Total dpm_suspend*() time | 107 ms | 62 ms | -42% | > > +---------------------------+-----------+------------+------------------+ > > | Total dpm_resume*() time | 75 ms | 61 ms | -19% | > > +---------------------------+-----------+------------+------------------+ > > | Sum | 182 ms | 123 ms | -32% | > > +---------------------------+-----------+------------+------------------+ > > > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > --- > > kernel/power/suspend.c | 16 ++++++++++++++++ > > kernel/sched/topology.c | 13 +++++++++++++ > > 2 files changed, 29 insertions(+) > > > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > > index 09f8397bae15..7304dc39958f 100644 > > --- a/kernel/power/suspend.c > > +++ b/kernel/power/suspend.c > > @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void) > > local_irq_enable(); > > } > > > > +/* > > + * Intentionally not part of a header file to avoid risk of abuse by other > > + * drivers. > > + */ > > +void sched_set_energy_aware(unsigned int enable); > > + > > /** > > * suspend_enter - Make the system enter the given sleep state. > > * @state: System sleep state to enter. > > @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) > > > > Platform_wake: > > platform_resume_noirq(state); > > + /* > > + * We do this only for resume instead of suspend and resume for these > > + * reasons: > > + * - Performance is more important than power for resume. > > + * - Power spent entering suspend is more important for suspend. Also, > > + * stangely, disabling EAS was making suspent a few milliseconds > > + * slower in my testing. > > s/stangely/strangely > s/suspent/suspend Will fix it in the next version. > I'd also be curious why that is. Disabling EAS shouldn't be that expensive. > What if you just hack the static branch switch (without the sd rebuild)? I don't think the enabling/disabling is the expensive part. Because I do it around dpm_resume*() and it helps performance. I tried to see if I could spot a reason, looking at the trace. But nothing stood out. My educated guess is that when going into suspend, the "thundering herd" happens early (all the leaf nodes suspend first) and then peters out. Whereas, during resume it's a slow ramp up until the "thundering herd" happens at the end (all the leaf nodes resume last). Spreading out the threads immediately (no EAS) probably has a different impact on these two styles of thundering herds. > > > + */ > > + sched_set_energy_aware(0); > > dpm_resume_noirq(PMSG_RESUME); > > > > Platform_early_resume: > > @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state) > > Resume_devices: > > suspend_test_start(); > > dpm_resume_end(PMSG_RESUME); > > + sched_set_energy_aware(1); > > suspend_test_finish("resume devices"); > > trace_suspend_resume(TPS("resume_console"), state, true); > > resume_console(); > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > > index 9748a4c8d668..c069c0b17cbf 100644 > > --- a/kernel/sched/topology.c > > +++ b/kernel/sched/topology.c > > @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void) > > mutex_unlock(&sched_energy_mutex); > > } > > > > +void sched_set_energy_aware(unsigned int enable) > > bool enable? Will do. > > > +{ > > + int state; > > + > > + if (!sched_is_eas_possible(cpu_active_mask)) > > + return; > > + > > + sysctl_sched_energy_aware = enable; > > + state = static_branch_unlikely(&sched_energy_present); > > + if (state != sysctl_sched_energy_aware) > > + rebuild_sched_domains_energy(); > > +} > > + > > This definitely shouldn't just overwrite > sysctl_sched_energy_aware, otherwise you enable EAS > for users that explicitly disabled it. Good point. Will fix it in the next version. Thanks for the review! -Saravana > > If it ever comes to other users wanting this we might > need a eas_pause counter so this can be nested, but > let's just hope that's never needed. > > Regards, > Christian >
Hi Saravana, kernel test robot noticed the following build warnings: [auto build test WARNING on rafael-pm/linux-next] [also build test WARNING on rafael-pm/bleeding-edge tip/sched/core amd-pstate/linux-next amd-pstate/bleeding-edge linus/master v6.12-rc7 next-20241115] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Saravana-Kannan/PM-sleep-Fix-runtime-PM-issue-in-dpm_resume/20241115-183855 base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next patch link: https://lore.kernel.org/r/20241114220921.2529905-6-saravanak%40google.com patch subject: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases config: arm-randconfig-003-20241117 (https://download.01.org/0day-ci/archive/20241117/202411172344.QqFap290-lkp@intel.com/config) compiler: arm-linux-gnueabi-gcc (GCC) 14.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241117/202411172344.QqFap290-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202411172344.QqFap290-lkp@intel.com/ All warnings (new ones prefixed by >>): In file included from kernel/sched/build_utility.c:88: >> kernel/sched/topology.c:287:6: warning: no previous prototype for 'sched_set_energy_aware' [-Wmissing-prototypes] 287 | void sched_set_energy_aware(unsigned int enable) | ^~~~~~~~~~~~~~~~~~~~~~ vim +/sched_set_energy_aware +287 kernel/sched/topology.c 286 > 287 void sched_set_energy_aware(unsigned int enable) 288 { 289 int state; 290 291 if (!sched_is_eas_possible(cpu_active_mask)) 292 return; 293 294 sysctl_sched_energy_aware = enable; 295 state = static_branch_unlikely(&sched_energy_present); 296 if (state != sysctl_sched_energy_aware) 297 rebuild_sched_domains_energy(); 298 } 299 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
Hi Saravana, kernel test robot noticed the following build errors: [auto build test ERROR on rafael-pm/linux-next] [also build test ERROR on rafael-pm/bleeding-edge tip/sched/core amd-pstate/linux-next amd-pstate/bleeding-edge linus/master v6.12-rc7 next-20241115] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Saravana-Kannan/PM-sleep-Fix-runtime-PM-issue-in-dpm_resume/20241115-183855 base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next patch link: https://lore.kernel.org/r/20241114220921.2529905-6-saravanak%40google.com patch subject: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases config: powerpc-tqm8560_defconfig (https://download.01.org/0day-ci/archive/20241117/202411170920.Pv29JceD-lkp@intel.com/config) compiler: powerpc-linux-gcc (GCC) 14.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241117/202411170920.Pv29JceD-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202411170920.Pv29JceD-lkp@intel.com/ All errors (new ones prefixed by >>): powerpc-linux-ld: kernel/power/suspend.o: in function `suspend_enter': suspend.c:(.text+0x414): undefined reference to `sched_set_energy_aware' >> powerpc-linux-ld: suspend.c:(.text+0x63c): undefined reference to `sched_set_energy_aware' powerpc-linux-ld: kernel/power/suspend.o: in function `suspend_devices_and_enter': suspend.c:(.text+0x834): undefined reference to `sched_set_energy_aware' powerpc-linux-ld: suspend.c:(.text+0x950): undefined reference to `sched_set_energy_aware' powerpc-linux-ld: suspend.c:(.text+0x970): undefined reference to `sched_set_energy_aware' -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
Hi Saravana, kernel test robot noticed the following build errors: [auto build test ERROR on rafael-pm/linux-next] [also build test ERROR on rafael-pm/bleeding-edge tip/sched/core amd-pstate/linux-next amd-pstate/bleeding-edge linus/master v6.12-rc7 next-20241115] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Saravana-Kannan/PM-sleep-Fix-runtime-PM-issue-in-dpm_resume/20241115-183855 base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next patch link: https://lore.kernel.org/r/20241114220921.2529905-6-saravanak%40google.com patch subject: [PATCH v1 5/5] PM: sleep: Spread out async kworker threads during dpm_resume*() phases config: arm-s3c6400_defconfig (https://download.01.org/0day-ci/archive/20241117/202411170802.SnHuptE7-lkp@intel.com/config) compiler: arm-linux-gnueabi-gcc (GCC) 14.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241117/202411170802.SnHuptE7-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202411170802.SnHuptE7-lkp@intel.com/ All errors (new ones prefixed by >>): arm-linux-gnueabi-ld: kernel/power/suspend.o: in function `suspend_enter': >> kernel/power/suspend.c:485:(.text+0x2f4): undefined reference to `sched_set_energy_aware' >> arm-linux-gnueabi-ld: kernel/power/suspend.c:485:(.text+0x48c): undefined reference to `sched_set_energy_aware' arm-linux-gnueabi-ld: kernel/power/suspend.c:485:(.text+0x4c0): undefined reference to `sched_set_energy_aware' arm-linux-gnueabi-ld: kernel/power/suspend.c:485:(.text+0x4d0): undefined reference to `sched_set_energy_aware' arm-linux-gnueabi-ld: kernel/power/suspend.o: in function `suspend_devices_and_enter': kernel/power/suspend.c:538:(.text+0x748): undefined reference to `sched_set_energy_aware' vim +485 kernel/power/suspend.c 401 402 /** 403 * suspend_enter - Make the system enter the given sleep state. 404 * @state: System sleep state to enter. 405 * @wakeup: Returns information that the sleep state should not be re-entered. 406 * 407 * This function should be called after devices have been suspended. 408 */ 409 static int suspend_enter(suspend_state_t state, bool *wakeup) 410 { 411 int error; 412 413 error = platform_suspend_prepare(state); 414 if (error) 415 goto Platform_finish; 416 417 error = dpm_suspend_late(PMSG_SUSPEND); 418 if (error) { 419 pr_err("late suspend of devices failed\n"); 420 goto Platform_finish; 421 } 422 error = platform_suspend_prepare_late(state); 423 if (error) 424 goto Devices_early_resume; 425 426 error = dpm_suspend_noirq(PMSG_SUSPEND); 427 if (error) { 428 pr_err("noirq suspend of devices failed\n"); 429 goto Platform_early_resume; 430 } 431 error = platform_suspend_prepare_noirq(state); 432 if (error) 433 goto Platform_wake; 434 435 if (suspend_test(TEST_PLATFORM)) 436 goto Platform_wake; 437 438 if (state == PM_SUSPEND_TO_IDLE) { 439 s2idle_loop(); 440 goto Platform_wake; 441 } 442 443 error = pm_sleep_disable_secondary_cpus(); 444 if (error || suspend_test(TEST_CPUS)) 445 goto Enable_cpus; 446 447 arch_suspend_disable_irqs(); 448 BUG_ON(!irqs_disabled()); 449 450 system_state = SYSTEM_SUSPEND; 451 452 error = syscore_suspend(); 453 if (!error) { 454 *wakeup = pm_wakeup_pending(); 455 if (!(suspend_test(TEST_CORE) || *wakeup)) { 456 trace_suspend_resume(TPS("machine_suspend"), 457 state, true); 458 error = suspend_ops->enter(state); 459 trace_suspend_resume(TPS("machine_suspend"), 460 state, false); 461 } else if (*wakeup) { 462 error = -EBUSY; 463 } 464 syscore_resume(); 465 } 466 467 system_state = SYSTEM_RUNNING; 468 469 arch_suspend_enable_irqs(); 470 BUG_ON(irqs_disabled()); 471 472 Enable_cpus: 473 pm_sleep_enable_secondary_cpus(); 474 475 Platform_wake: 476 platform_resume_noirq(state); 477 /* 478 * We do this only for resume instead of suspend and resume for these 479 * reasons: 480 * - Performance is more important than power for resume. 481 * - Power spent entering suspend is more important for suspend. Also, 482 * stangely, disabling EAS was making suspent a few milliseconds 483 * slower in my testing. 484 */ > 485 sched_set_energy_aware(0); 486 dpm_resume_noirq(PMSG_RESUME); 487 488 Platform_early_resume: 489 platform_resume_early(state); 490 491 Devices_early_resume: 492 dpm_resume_early(PMSG_RESUME); 493 494 Platform_finish: 495 platform_resume_finish(state); 496 return error; 497 } 498 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
On Thu, 14 Nov 2024 at 23:09, Saravana Kannan <saravanak@google.com> wrote: > > As of today, the scheduler doesn't spread out all the kworker threads > across all the available CPUs during suspend/resume. This causes > significant resume latency during the dpm_resume*() phases. > > System resume latency is a very user-visible event. Reducing the > latency is more important than trying to be energy aware during that > period. > > Since there are no userspace processes running during this time and > this is a very short time window, we can simply disable EAS during > resume so that the parallel resume of the devices is spread across all > the CPUs. > > On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic > plus disabling EAS for resume yields significant improvements: > +---------------------------+-----------+------------+------------------+ > | Phase | Old full sync | New full async | % change | > | | | + EAS disabled | | > +---------------------------+-----------+------------+------------------+ > | Total dpm_suspend*() time | 107 ms | 62 ms | -42% | > +---------------------------+-----------+------------+------------------+ > | Total dpm_resume*() time | 75 ms | 61 ms | -19% | > +---------------------------+-----------+------------+------------------+ > | Sum | 182 ms | 123 ms | -32% | > +---------------------------+-----------+------------+------------------+ in cover letter you have figures for - Old full sync - New full async - New full async + EAS disabled you should better use the figures for New full async vs New full async + EAS disabled to show EAS disabled impact I would be interested to get figures about the impact of disabling it during full suspend sequence as I'm not convince that it's worth the complexity especially with fix OPP during suspend > > Signed-off-by: Saravana Kannan <saravanak@google.com> > --- > kernel/power/suspend.c | 16 ++++++++++++++++ > kernel/sched/topology.c | 13 +++++++++++++ > 2 files changed, 29 insertions(+) > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > index 09f8397bae15..7304dc39958f 100644 > --- a/kernel/power/suspend.c > +++ b/kernel/power/suspend.c > @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void) > local_irq_enable(); > } > > +/* > + * Intentionally not part of a header file to avoid risk of abuse by other > + * drivers. > + */ > +void sched_set_energy_aware(unsigned int enable); > + > /** > * suspend_enter - Make the system enter the given sleep state. > * @state: System sleep state to enter. > @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) > > Platform_wake: > platform_resume_noirq(state); > + /* > + * We do this only for resume instead of suspend and resume for these > + * reasons: > + * - Performance is more important than power for resume. > + * - Power spent entering suspend is more important for suspend. Also, > + * stangely, disabling EAS was making suspent a few milliseconds > + * slower in my testing. > + */ > + sched_set_energy_aware(0); > dpm_resume_noirq(PMSG_RESUME); > > Platform_early_resume: > @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state) > Resume_devices: > suspend_test_start(); > dpm_resume_end(PMSG_RESUME); > + sched_set_energy_aware(1); If we end up having a special scheduling mode during suspend, we should make the function more generic and not only EAS/ smartphone specific Like a sched_suspend and sched_resume > suspend_test_finish("resume devices"); > trace_suspend_resume(TPS("resume_console"), state, true); > resume_console(); > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > index 9748a4c8d668..c069c0b17cbf 100644 > --- a/kernel/sched/topology.c > +++ b/kernel/sched/topology.c > @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void) > mutex_unlock(&sched_energy_mutex); > } > > +void sched_set_energy_aware(unsigned int enable) This is a copy/paste of sched_energy_aware_handler() below, we should have 1 helper for both > +{ > + int state; > + > + if (!sched_is_eas_possible(cpu_active_mask)) > + return; > + > + sysctl_sched_energy_aware = enable; > + state = static_branch_unlikely(&sched_energy_present); > + if (state != sysctl_sched_energy_aware) > + rebuild_sched_domains_energy(); > +} > + > #ifdef CONFIG_PROC_SYSCTL > static int sched_energy_aware_handler(const struct ctl_table *table, int write, > void *buffer, size_t *lenp, loff_t *ppos) > -- > 2.47.0.338.g60cca15819-goog >
On Fri, Nov 15, 2024 at 8:13 AM Vincent Guittot <vincent.guittot@linaro.org> wrote: > > On Thu, 14 Nov 2024 at 23:09, Saravana Kannan <saravanak@google.com> wrote: > > > > As of today, the scheduler doesn't spread out all the kworker threads > > across all the available CPUs during suspend/resume. This causes > > significant resume latency during the dpm_resume*() phases. > > > > System resume latency is a very user-visible event. Reducing the > > latency is more important than trying to be energy aware during that > > period. > > > > Since there are no userspace processes running during this time and > > this is a very short time window, we can simply disable EAS during > > resume so that the parallel resume of the devices is spread across all > > the CPUs. > > > > On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic > > plus disabling EAS for resume yields significant improvements: > > +---------------------------+-----------+------------+------------------+ > > | Phase | Old full sync | New full async | % change | > > | | | + EAS disabled | | > > +---------------------------+-----------+------------+------------------+ > > | Total dpm_suspend*() time | 107 ms | 62 ms | -42% | > > +---------------------------+-----------+------------+------------------+ > > | Total dpm_resume*() time | 75 ms | 61 ms | -19% | > > +---------------------------+-----------+------------+------------------+ > > | Sum | 182 ms | 123 ms | -32% | > > +---------------------------+-----------+------------+------------------+ > > in cover letter you have figures for > - Old full sync > - New full async > - New full async + EAS disabled > > you should better use the figures for New full async vs New full > async + EAS disabled to show EAS disabled impact I do give those numbers in the commit text of each patch making the changes. Patch 4 commit text shows how it's improving things compared to the older logic full sync (this is the baseline) - resume is 1% faster. Patch 5 commit text shows you how disabling EAS is improving numbers compared to baseline - resume 19% faster. So, yeah, all the numbers are there in one of these emails. Patch 5 (which is the only one touching EAS) is the one that has the comparison you are asking for. > I would be interested to get figures about the impact of disabling it > during full suspend sequence as I'm not convince that it's worth the > complexity especially with fix OPP during suspend 1. Device suspend actually got worse by 5ms or so. I already provided that. 2. As I said in the Patch 5, suspend is more about reducing the energy going into suspend. It's a balance of how quick you can be to how much power you use to be quick. So, disabling EAS across all of suspend/resume will have a huge impact on power because userspace is still running, there are a ton of threads and userspace could get preempted between disabling suspend and kicking off suspend. Lots of obvious power concerns overall. Thanks, Saravana > > > > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > --- > > kernel/power/suspend.c | 16 ++++++++++++++++ > > kernel/sched/topology.c | 13 +++++++++++++ > > 2 files changed, 29 insertions(+) > > > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > > index 09f8397bae15..7304dc39958f 100644 > > --- a/kernel/power/suspend.c > > +++ b/kernel/power/suspend.c > > @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void) > > local_irq_enable(); > > } > > > > +/* > > + * Intentionally not part of a header file to avoid risk of abuse by other > > + * drivers. > > + */ > > +void sched_set_energy_aware(unsigned int enable); > > + > > /** > > * suspend_enter - Make the system enter the given sleep state. > > * @state: System sleep state to enter. > > @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) > > > > Platform_wake: > > platform_resume_noirq(state); > > + /* > > + * We do this only for resume instead of suspend and resume for these > > + * reasons: > > + * - Performance is more important than power for resume. > > + * - Power spent entering suspend is more important for suspend. Also, > > + * stangely, disabling EAS was making suspent a few milliseconds > > + * slower in my testing. > > + */ > > + sched_set_energy_aware(0); > > dpm_resume_noirq(PMSG_RESUME); > > > > Platform_early_resume: > > @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state) > > Resume_devices: > > suspend_test_start(); > > dpm_resume_end(PMSG_RESUME); > > + sched_set_energy_aware(1); > > If we end up having a special scheduling mode during suspend, we > should make the function more generic and not only EAS/ smartphone > specific > > Like a sched_suspend and sched_resume > > > suspend_test_finish("resume devices"); > > trace_suspend_resume(TPS("resume_console"), state, true); > > resume_console(); > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > > index 9748a4c8d668..c069c0b17cbf 100644 > > --- a/kernel/sched/topology.c > > +++ b/kernel/sched/topology.c > > @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void) > > mutex_unlock(&sched_energy_mutex); > > } > > > > +void sched_set_energy_aware(unsigned int enable) > > This is a copy/paste of sched_energy_aware_handler() below, we should > have 1 helper for both > > > +{ > > + int state; > > + > > + if (!sched_is_eas_possible(cpu_active_mask)) > > + return; > > + > > + sysctl_sched_energy_aware = enable; > > + state = static_branch_unlikely(&sched_energy_present); > > + if (state != sysctl_sched_energy_aware) > > + rebuild_sched_domains_energy(); > > +} > > + > > #ifdef CONFIG_PROC_SYSCTL > > static int sched_energy_aware_handler(const struct ctl_table *table, int write, > > void *buffer, size_t *lenp, loff_t *ppos) > > -- > > 2.47.0.338.g60cca15819-goog > >
On Thu, Nov 14, 2024 at 2:09 PM Saravana Kannan <saravanak@google.com> wrote: > > As of today, the scheduler doesn't spread out all the kworker threads > across all the available CPUs during suspend/resume. This causes > significant resume latency during the dpm_resume*() phases. > > System resume latency is a very user-visible event. Reducing the > latency is more important than trying to be energy aware during that > period. > > Since there are no userspace processes running during this time and > this is a very short time window, we can simply disable EAS during > resume so that the parallel resume of the devices is spread across all > the CPUs. > > On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic > plus disabling EAS for resume yields significant improvements: > +---------------------------+-----------+------------+------------------+ > | Phase | Old full sync | New full async | % change | > | | | + EAS disabled | | > +---------------------------+-----------+------------+------------------+ > | Total dpm_suspend*() time | 107 ms | 62 ms | -42% | > +---------------------------+-----------+------------+------------------+ > | Total dpm_resume*() time | 75 ms | 61 ms | -19% | > +---------------------------+-----------+------------+------------------+ > | Sum | 182 ms | 123 ms | -32% | > +---------------------------+-----------+------------+------------------+ > > Signed-off-by: Saravana Kannan <saravanak@google.com> > --- > kernel/power/suspend.c | 16 ++++++++++++++++ > kernel/sched/topology.c | 13 +++++++++++++ > 2 files changed, 29 insertions(+) > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > index 09f8397bae15..7304dc39958f 100644 > --- a/kernel/power/suspend.c > +++ b/kernel/power/suspend.c > @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void) > local_irq_enable(); > } > > +/* > + * Intentionally not part of a header file to avoid risk of abuse by other > + * drivers. > + */ > +void sched_set_energy_aware(unsigned int enable); > + > /** > * suspend_enter - Make the system enter the given sleep state. > * @state: System sleep state to enter. > @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) > > Platform_wake: > platform_resume_noirq(state); > + /* > + * We do this only for resume instead of suspend and resume for these > + * reasons: > + * - Performance is more important than power for resume. > + * - Power spent entering suspend is more important for suspend. Also, > + * stangely, disabling EAS was making suspent a few milliseconds > + * slower in my testing. > + */ > + sched_set_energy_aware(0); > dpm_resume_noirq(PMSG_RESUME); > > Platform_early_resume: > @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state) > Resume_devices: > suspend_test_start(); > dpm_resume_end(PMSG_RESUME); > + sched_set_energy_aware(1); > suspend_test_finish("resume devices"); > trace_suspend_resume(TPS("resume_console"), state, true); > resume_console(); > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > index 9748a4c8d668..c069c0b17cbf 100644 > --- a/kernel/sched/topology.c > +++ b/kernel/sched/topology.c > @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void) > mutex_unlock(&sched_energy_mutex); > } > > +void sched_set_energy_aware(unsigned int enable) CC kernel/sched/build_utility.o In file included from kernel/sched/build_utility.c:88: kernel/sched/topology.c:287:6: warning: no previous prototype for ‘sched_set_energy_aware’ [-Wmissing-prototypes] 287 | void sched_set_energy_aware(unsigned int enable) | ^~~~~~~~~~~~~~~~~~~~~~ Peter/Vincent, I noticed that I'm getting a warning for this line. But I'm not sure what to do about it. I intentionally didn't put this in a header file because I'm guessing we don't want to make this available to drivers/frameworks in general. Let me know how you want me to handle this. -Saravana > +{ > + int state; > + > + if (!sched_is_eas_possible(cpu_active_mask)) > + return; > + > + sysctl_sched_energy_aware = enable; > + state = static_branch_unlikely(&sched_energy_present); > + if (state != sysctl_sched_energy_aware) > + rebuild_sched_domains_energy(); > +} > + > #ifdef CONFIG_PROC_SYSCTL > static int sched_energy_aware_handler(const struct ctl_table *table, int write, > void *buffer, size_t *lenp, loff_t *ppos) > -- > 2.47.0.338.g60cca15819-goog >
On Fri, 15 Nov 2024 at 06:25, Saravana Kannan <saravanak@google.com> wrote: > > On Thu, Nov 14, 2024 at 2:09 PM Saravana Kannan <saravanak@google.com> wrote: > > > > As of today, the scheduler doesn't spread out all the kworker threads > > across all the available CPUs during suspend/resume. This causes > > significant resume latency during the dpm_resume*() phases. > > > > System resume latency is a very user-visible event. Reducing the > > latency is more important than trying to be energy aware during that > > period. > > > > Since there are no userspace processes running during this time and > > this is a very short time window, we can simply disable EAS during > > resume so that the parallel resume of the devices is spread across all > > the CPUs. > > > > On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic > > plus disabling EAS for resume yields significant improvements: > > +---------------------------+-----------+------------+------------------+ > > | Phase | Old full sync | New full async | % change | > > | | | + EAS disabled | | > > +---------------------------+-----------+------------+------------------+ > > | Total dpm_suspend*() time | 107 ms | 62 ms | -42% | > > +---------------------------+-----------+------------+------------------+ > > | Total dpm_resume*() time | 75 ms | 61 ms | -19% | > > +---------------------------+-----------+------------+------------------+ > > | Sum | 182 ms | 123 ms | -32% | > > +---------------------------+-----------+------------+------------------+ > > > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > --- > > kernel/power/suspend.c | 16 ++++++++++++++++ > > kernel/sched/topology.c | 13 +++++++++++++ > > 2 files changed, 29 insertions(+) > > > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > > index 09f8397bae15..7304dc39958f 100644 > > --- a/kernel/power/suspend.c > > +++ b/kernel/power/suspend.c > > @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void) > > local_irq_enable(); > > } > > > > +/* > > + * Intentionally not part of a header file to avoid risk of abuse by other > > + * drivers. > > + */ > > +void sched_set_energy_aware(unsigned int enable); extern void sched_set_energy_aware(unsigned int enable); clear the warning > > + > > /** > > * suspend_enter - Make the system enter the given sleep state. > > * @state: System sleep state to enter. > > @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) > > > > Platform_wake: > > platform_resume_noirq(state); > > + /* > > + * We do this only for resume instead of suspend and resume for these > > + * reasons: > > + * - Performance is more important than power for resume. > > + * - Power spent entering suspend is more important for suspend. Also, > > + * stangely, disabling EAS was making suspent a few milliseconds > > + * slower in my testing. > > + */ > > + sched_set_energy_aware(0); > > dpm_resume_noirq(PMSG_RESUME); > > > > Platform_early_resume: > > @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state) > > Resume_devices: > > suspend_test_start(); > > dpm_resume_end(PMSG_RESUME); > > + sched_set_energy_aware(1); > > suspend_test_finish("resume devices"); > > trace_suspend_resume(TPS("resume_console"), state, true); > > resume_console(); > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > > index 9748a4c8d668..c069c0b17cbf 100644 > > --- a/kernel/sched/topology.c > > +++ b/kernel/sched/topology.c > > @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void) > > mutex_unlock(&sched_energy_mutex); > > } > > > > +void sched_set_energy_aware(unsigned int enable) > > CC kernel/sched/build_utility.o > In file included from kernel/sched/build_utility.c:88: > kernel/sched/topology.c:287:6: warning: no previous prototype for > ‘sched_set_energy_aware’ [-Wmissing-prototypes] > 287 | void sched_set_energy_aware(unsigned int enable) > | ^~~~~~~~~~~~~~~~~~~~~~ > > Peter/Vincent, > > I noticed that I'm getting a warning for this line. But I'm not sure > what to do about it. I intentionally didn't put this in a header file > because I'm guessing we don't want to make this available to > drivers/frameworks in general. > > Let me know how you want me to handle this. > > -Saravana > > > +{ > > + int state; > > + > > + if (!sched_is_eas_possible(cpu_active_mask)) > > + return; > > + > > + sysctl_sched_energy_aware = enable; > > + state = static_branch_unlikely(&sched_energy_present); > > + if (state != sysctl_sched_energy_aware) > > + rebuild_sched_domains_energy(); > > +} > > + > > #ifdef CONFIG_PROC_SYSCTL > > static int sched_energy_aware_handler(const struct ctl_table *table, int write, > > void *buffer, size_t *lenp, loff_t *ppos) > > -- > > 2.47.0.338.g60cca15819-goog > >
Hi Saravana, On Fri, Nov 15, 2024 at 6:25 AM Saravana Kannan <saravanak@google.com> wrote: > On Thu, Nov 14, 2024 at 2:09 PM Saravana Kannan <saravanak@google.com> wrote: > > As of today, the scheduler doesn't spread out all the kworker threads > > across all the available CPUs during suspend/resume. This causes > > significant resume latency during the dpm_resume*() phases. > > > > System resume latency is a very user-visible event. Reducing the > > latency is more important than trying to be energy aware during that > > period. > > > > Since there are no userspace processes running during this time and > > this is a very short time window, we can simply disable EAS during > > resume so that the parallel resume of the devices is spread across all > > the CPUs. > > > > On a Pixel 6, averaging over 100 suspend/resume cycles, the new logic > > plus disabling EAS for resume yields significant improvements: > > +---------------------------+-----------+------------+------------------+ > > | Phase | Old full sync | New full async | % change | > > | | | + EAS disabled | | > > +---------------------------+-----------+------------+------------------+ > > | Total dpm_suspend*() time | 107 ms | 62 ms | -42% | > > +---------------------------+-----------+------------+------------------+ > > | Total dpm_resume*() time | 75 ms | 61 ms | -19% | > > +---------------------------+-----------+------------+------------------+ > > | Sum | 182 ms | 123 ms | -32% | > > +---------------------------+-----------+------------+------------------+ > > > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > --- > > kernel/power/suspend.c | 16 ++++++++++++++++ > > kernel/sched/topology.c | 13 +++++++++++++ > > 2 files changed, 29 insertions(+) > > > > diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c > > index 09f8397bae15..7304dc39958f 100644 > > --- a/kernel/power/suspend.c > > +++ b/kernel/power/suspend.c > > @@ -393,6 +393,12 @@ void __weak arch_suspend_enable_irqs(void) > > local_irq_enable(); > > } > > > > +/* > > + * Intentionally not part of a header file to avoid risk of abuse by other > > + * drivers. > > + */ > > +void sched_set_energy_aware(unsigned int enable); > > + > > /** > > * suspend_enter - Make the system enter the given sleep state. > > * @state: System sleep state to enter. > > @@ -468,6 +474,15 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) > > > > Platform_wake: > > platform_resume_noirq(state); > > + /* > > + * We do this only for resume instead of suspend and resume for these > > + * reasons: > > + * - Performance is more important than power for resume. > > + * - Power spent entering suspend is more important for suspend. Also, > > + * stangely, disabling EAS was making suspent a few milliseconds > > + * slower in my testing. > > + */ > > + sched_set_energy_aware(0); > > dpm_resume_noirq(PMSG_RESUME); > > > > Platform_early_resume: > > @@ -520,6 +535,7 @@ int suspend_devices_and_enter(suspend_state_t state) > > Resume_devices: > > suspend_test_start(); > > dpm_resume_end(PMSG_RESUME); > > + sched_set_energy_aware(1); > > suspend_test_finish("resume devices"); > > trace_suspend_resume(TPS("resume_console"), state, true); > > resume_console(); > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > > index 9748a4c8d668..c069c0b17cbf 100644 > > --- a/kernel/sched/topology.c > > +++ b/kernel/sched/topology.c > > @@ -284,6 +284,19 @@ void rebuild_sched_domains_energy(void) > > mutex_unlock(&sched_energy_mutex); > > } > > > > +void sched_set_energy_aware(unsigned int enable) > > CC kernel/sched/build_utility.o > In file included from kernel/sched/build_utility.c:88: > kernel/sched/topology.c:287:6: warning: no previous prototype for > ‘sched_set_energy_aware’ [-Wmissing-prototypes] > 287 | void sched_set_energy_aware(unsigned int enable) > | ^~~~~~~~~~~~~~~~~~~~~~ > > Peter/Vincent, > > I noticed that I'm getting a warning for this line. But I'm not sure > what to do about it. I intentionally didn't put this in a header file > because I'm guessing we don't want to make this available to > drivers/frameworks in general. > > Let me know how you want me to handle this. Put the prototype in kernel/sched/sched.h, and include ../sched/sched.h from kernel/power/suspend.c? Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
© 2016 - 2024 Red Hat, Inc.