PM: QoS: Introduce a CPU system-wakeup QoS limit for s2idle

[PATCH v2 4/4] Documentation: power/cpuidle: Document the CPU system-wakeup latency QoS

Posted by Ulf Hansson 3 months, 3 weeks ago

Let's document how the new CPU system-wakeup latency QoS can be used from
user space, along with how the corresponding latency constraint gets
respected during s2idle.

Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---

Changes in v2:
	- New patch.

---
 Documentation/admin-guide/pm/cpuidle.rst | 7 +++++++
 Documentation/power/pm_qos_interface.rst | 9 +++++----
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/pm/cpuidle.rst b/Documentation/admin-guide/pm/cpuidle.rst
index 0c090b076224..3f6f79a9bc8f 100644
--- a/Documentation/admin-guide/pm/cpuidle.rst
+++ b/Documentation/admin-guide/pm/cpuidle.rst
@@ -580,6 +580,13 @@ the given CPU as the upper limit for the exit latency of the idle states that
 they are allowed to select for that CPU.  They should never select any idle
 states with exit latency beyond that limit.
 
+While the above CPU QoS constraints applies to a running system, user space may
+also request a CPU system-wakeup latency QoS limit, via the `cpu_wakeup_latency`
+file.  This QoS constraint is respected when selecting a suitable idle state
+for the CPUs, while entering the system-wide suspend-to-idle sleep state.
+
+Note that, in regards how to manage the QoS request from user space for the
+`cpu_wakeup_latency` file, it works according to the `cpu_dma_latency` file.
 
 Idle States Control Via Kernel Command Line
 ===========================================
diff --git a/Documentation/power/pm_qos_interface.rst b/Documentation/power/pm_qos_interface.rst
index 5019c79c7710..723f4714691a 100644
--- a/Documentation/power/pm_qos_interface.rst
+++ b/Documentation/power/pm_qos_interface.rst
@@ -55,7 +55,8 @@ int cpu_latency_qos_request_active(handle):
 
 From user space:
 
-The infrastructure exposes one device node, /dev/cpu_dma_latency, for the CPU
+The infrastructure exposes two separate device nodes, /dev/cpu_dma_latency for
+the CPU latency QoS and /dev/cpu_wakeup_latency for the CPU system-wakeup
 latency QoS.
 
 Only processes can register a PM QoS request.  To provide for automatic
@@ -63,15 +64,15 @@ cleanup of a process, the interface requires the process to register its
 parameter requests as follows.
 
 To register the default PM QoS target for the CPU latency QoS, the process must
-open /dev/cpu_dma_latency.
+open /dev/cpu_dma_latency. To register a CPU system-wakeup QoS limit, the
+process must open /dev/cpu_wakeup_latency.
 
 As long as the device node is held open that process has a registered
 request on the parameter.
 
 To change the requested target value, the process needs to write an s32 value to
 the open device node.  Alternatively, it can write a hex string for the value
-using the 10 char long format e.g. "0x12345678".  This translates to a
-cpu_latency_qos_update_request() call.
+using the 10 char long format e.g. "0x12345678".
 
 To remove the user mode request for a target value simply close the device
 node.
-- 
2.43.0

Re: [PATCH v2 4/4] Documentation: power/cpuidle: Document the CPU system-wakeup latency QoS

Posted by Dhruva Gole 3 months, 1 week ago

Hi Ulf,

On Oct 16, 2025 at 17:19:24 +0200, Ulf Hansson wrote:
> Let's document how the new CPU system-wakeup latency QoS can be used from
> user space, along with how the corresponding latency constraint gets
> respected during s2idle.
> 
> Cc: Jonathan Corbet <corbet@lwn.net>
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> ---
> 
> Changes in v2:
> 	- New patch.

Similar to how I did for v1 RFC,
I have applied this series on a ti-linux-6.12 branch[1] and have been testing on
the TI K3 AM62L device, my 2 cents:

> 
> ---
>  Documentation/admin-guide/pm/cpuidle.rst | 7 +++++++
>  Documentation/power/pm_qos_interface.rst | 9 +++++----
>  2 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/admin-guide/pm/cpuidle.rst b/Documentation/admin-guide/pm/cpuidle.rst
> index 0c090b076224..3f6f79a9bc8f 100644
> --- a/Documentation/admin-guide/pm/cpuidle.rst
> +++ b/Documentation/admin-guide/pm/cpuidle.rst
> @@ -580,6 +580,13 @@ the given CPU as the upper limit for the exit latency of the idle states that
>  they are allowed to select for that CPU.  They should never select any idle
>  states with exit latency beyond that limit.
>  
> +While the above CPU QoS constraints applies to a running system, user space may
> +also request a CPU system-wakeup latency QoS limit, via the `cpu_wakeup_latency`
> +file.  This QoS constraint is respected when selecting a suitable idle state
> +for the CPUs, while entering the system-wide suspend-to-idle sleep state.
> +
> +Note that, in regards how to manage the QoS request from user space for the
> +`cpu_wakeup_latency` file, it works according to the `cpu_dma_latency` file.
>  
>  Idle States Control Via Kernel Command Line
>  ===========================================
> diff --git a/Documentation/power/pm_qos_interface.rst b/Documentation/power/pm_qos_interface.rst
> index 5019c79c7710..723f4714691a 100644
> --- a/Documentation/power/pm_qos_interface.rst
> +++ b/Documentation/power/pm_qos_interface.rst
> @@ -55,7 +55,8 @@ int cpu_latency_qos_request_active(handle):
>  
>  From user space:
>  
> -The infrastructure exposes one device node, /dev/cpu_dma_latency, for the CPU
> +The infrastructure exposes two separate device nodes, /dev/cpu_dma_latency for
> +the CPU latency QoS and /dev/cpu_wakeup_latency for the CPU system-wakeup

If others are interested to test this out, I have a quick and dirty C
program here that you can compile on the target to test setting
constraints [2]

>  latency QoS.
>  
>  Only processes can register a PM QoS request.  To provide for automatic
> @@ -63,15 +64,15 @@ cleanup of a process, the interface requires the process to register its
>  parameter requests as follows.
>  
>  To register the default PM QoS target for the CPU latency QoS, the process must
> -open /dev/cpu_dma_latency.
> +open /dev/cpu_dma_latency. To register a CPU system-wakeup QoS limit, the
> +process must open /dev/cpu_wakeup_latency.
>  
>  As long as the device node is held open that process has a registered
>  request on the parameter.
>  
>  To change the requested target value, the process needs to write an s32 value to
>  the open device node.  Alternatively, it can write a hex string for the value
> -using the 10 char long format e.g. "0x12345678".  This translates to a
> -cpu_latency_qos_update_request() call.
> +using the 10 char long format e.g. "0x12345678".

Here, can we please also mention the units ns or msec? I see that you
might have changed from usec to nsec from v1->v2, which may not be obvious to
everyone at first glance.

Also, In my local setup I have a single CPU system with the following
low power-states:

8<----------------------------------------------------------------------------
	idle-states {
		entry-method = "psci";

		CLST_STBY: STBY {
			compatible = "arm,idle-state";
			idle-state-name = "Standby";
			arm,psci-suspend-param = <0x00000001>;
			entry-latency-us = <300>;
			exit-latency-us = <600>;
			min-residency-us = <1000>;
		};
	};
[...]
	domain-idle-states {
		main_sleep_0: main-deep-sleep {
			compatible = "domain-idle-state";
			arm,psci-suspend-param = <0x13333>;
			entry-latency-us = <1000>;
			exit-latency-us = <1000>;
			min-residency-us = <500000>;
			local-timer-stop;
		};

		main_sleep_1: main-sleep-rtcddr {
			compatible = "domain-idle-state";
			arm,psci-suspend-param = <0x12333>;
			local-timer-stop;
			entry-latency-us = <300000>;
			exit-latency-us = <600000>;
			min-residency-us = <1000000>;
		};
	};


---------------------------------------------------------------------->8

Now, when I set the latency constraint 0x7a110 into cpu_wakeup_latency,
I expect it _not_ to pick main_sleep_0 because it has min-residency of
0x7A120 (500000 us) and since 0x7a110 < 0x7a120 I expect the governor
should pick the least latency state of the cpu which is the CLST_STBY or
maybe just kernel WFI (which is the default lowest possible idle state?).

I decided to go even lower with just setting 0x1000 (4096), but even
then s2idle picked main_sleep_0!

Only after I set something very very low like 0x1 or 0x10 did it pick
the shallower state than main_sleep_0...

I haven't dug deeper into where things might be getting miscalculated
yet but just thought to share my experiments with you before you respin
the next rev. Curious to know if I may be just confusing the units or am
missing something obvious here?

Few of the other things that I tried that _did_ work was, setting
constraint to 0x1312D00 (20000000) which is obviously much higher than
the highest min-residency , and then I can see s2idle pick the deepest
state ie. main_sleep_1. So that worked as expected.

In conclusion, I am happy that this still works in a way that I am able to
switch between low power states, but just not in the most explainable
way :(

[1] https://github.com/DhruvaG2000/dbg-linux/tree/tiL6.12-am62l-s2idle-prep-v2
[2] https://gist.github.com/DhruvaG2000/a902b815b5db296bb7096ad7cb093929

-- 
Best regards,
Dhruva Gole
Texas Instruments Incorporated

Re: [PATCH v2 4/4] Documentation: power/cpuidle: Document the CPU system-wakeup latency QoS

Posted by Ulf Hansson 3 months, 1 week ago

On Fri, 31 Oct 2025 at 11:57, Dhruva Gole <d-gole@ti.com> wrote:
>
> Hi Ulf,
>
> On Oct 16, 2025 at 17:19:24 +0200, Ulf Hansson wrote:
> > Let's document how the new CPU system-wakeup latency QoS can be used from
> > user space, along with how the corresponding latency constraint gets
> > respected during s2idle.
> >
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > ---
> >
> > Changes in v2:
> >       - New patch.
>
> Similar to how I did for v1 RFC,
> I have applied this series on a ti-linux-6.12 branch[1] and have been testing on
> the TI K3 AM62L device, my 2 cents:
>
> >
> > ---
> >  Documentation/admin-guide/pm/cpuidle.rst | 7 +++++++
> >  Documentation/power/pm_qos_interface.rst | 9 +++++----
> >  2 files changed, 12 insertions(+), 4 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/pm/cpuidle.rst b/Documentation/admin-guide/pm/cpuidle.rst
> > index 0c090b076224..3f6f79a9bc8f 100644
> > --- a/Documentation/admin-guide/pm/cpuidle.rst
> > +++ b/Documentation/admin-guide/pm/cpuidle.rst
> > @@ -580,6 +580,13 @@ the given CPU as the upper limit for the exit latency of the idle states that
> >  they are allowed to select for that CPU.  They should never select any idle
> >  states with exit latency beyond that limit.
> >
> > +While the above CPU QoS constraints applies to a running system, user space may
> > +also request a CPU system-wakeup latency QoS limit, via the `cpu_wakeup_latency`
> > +file.  This QoS constraint is respected when selecting a suitable idle state
> > +for the CPUs, while entering the system-wide suspend-to-idle sleep state.
> > +
> > +Note that, in regards how to manage the QoS request from user space for the
> > +`cpu_wakeup_latency` file, it works according to the `cpu_dma_latency` file.
> >
> >  Idle States Control Via Kernel Command Line
> >  ===========================================
> > diff --git a/Documentation/power/pm_qos_interface.rst b/Documentation/power/pm_qos_interface.rst
> > index 5019c79c7710..723f4714691a 100644
> > --- a/Documentation/power/pm_qos_interface.rst
> > +++ b/Documentation/power/pm_qos_interface.rst
> > @@ -55,7 +55,8 @@ int cpu_latency_qos_request_active(handle):
> >
> >  From user space:
> >
> > -The infrastructure exposes one device node, /dev/cpu_dma_latency, for the CPU
> > +The infrastructure exposes two separate device nodes, /dev/cpu_dma_latency for
> > +the CPU latency QoS and /dev/cpu_wakeup_latency for the CPU system-wakeup
>
> If others are interested to test this out, I have a quick and dirty C
> program here that you can compile on the target to test setting
> constraints [2]
>
> >  latency QoS.
> >
> >  Only processes can register a PM QoS request.  To provide for automatic
> > @@ -63,15 +64,15 @@ cleanup of a process, the interface requires the process to register its
> >  parameter requests as follows.
> >
> >  To register the default PM QoS target for the CPU latency QoS, the process must
> > -open /dev/cpu_dma_latency.
> > +open /dev/cpu_dma_latency. To register a CPU system-wakeup QoS limit, the
> > +process must open /dev/cpu_wakeup_latency.
> >
> >  As long as the device node is held open that process has a registered
> >  request on the parameter.
> >
> >  To change the requested target value, the process needs to write an s32 value to
> >  the open device node.  Alternatively, it can write a hex string for the value
> > -using the 10 char long format e.g. "0x12345678".  This translates to a
> > -cpu_latency_qos_update_request() call.
> > +using the 10 char long format e.g. "0x12345678".
>
> Here, can we please also mention the units ns or msec? I see that you
> might have changed from usec to nsec from v1->v2, which may not be obvious to
> everyone at first glance.

I haven't changed the unit in-between the versions, but just using the
same format as cpu_dma_latency.

Yes, I agree the unit deserves to be described, but I suggest we make
that a separate change as the unit should be described for the
existing cpu_dma_latency too.

>
> Also, In my local setup I have a single CPU system with the following
> low power-states:
>
> 8<----------------------------------------------------------------------------
>         idle-states {
>                 entry-method = "psci";
>
>                 CLST_STBY: STBY {
>                         compatible = "arm,idle-state";
>                         idle-state-name = "Standby";
>                         arm,psci-suspend-param = <0x00000001>;
>                         entry-latency-us = <300>;
>                         exit-latency-us = <600>;
>                         min-residency-us = <1000>;
>                 };
>         };
> [...]
>         domain-idle-states {
>                 main_sleep_0: main-deep-sleep {
>                         compatible = "domain-idle-state";
>                         arm,psci-suspend-param = <0x13333>;
>                         entry-latency-us = <1000>;
>                         exit-latency-us = <1000>;
>                         min-residency-us = <500000>;
>                         local-timer-stop;
>                 };
>
>                 main_sleep_1: main-sleep-rtcddr {
>                         compatible = "domain-idle-state";
>                         arm,psci-suspend-param = <0x12333>;
>                         local-timer-stop;
>                         entry-latency-us = <300000>;
>                         exit-latency-us = <600000>;
>                         min-residency-us = <1000000>;
>                 };
>         };
>
>
> ---------------------------------------------------------------------->8
>
> Now, when I set the latency constraint 0x7a110 into cpu_wakeup_latency,
> I expect it _not_ to pick main_sleep_0 because it has min-residency of
> 0x7A120 (500000 us) and since 0x7a110 < 0x7a120 I expect the governor
> should pick the least latency state of the cpu which is the CLST_STBY or
> maybe just kernel WFI (which is the default lowest possible idle state?).
>
> I decided to go even lower with just setting 0x1000 (4096), but even
> then s2idle picked main_sleep_0!
>
> Only after I set something very very low like 0x1 or 0x10 did it pick
> the shallower state than main_sleep_0...

The residency has nothing to do with QoS.

It's only the entry+exit latency that matters during state selection.

>
> I haven't dug deeper into where things might be getting miscalculated
> yet but just thought to share my experiments with you before you respin
> the next rev. Curious to know if I may be just confusing the units or am
> missing something obvious here?

See above.

>
> Few of the other things that I tried that _did_ work was, setting
> constraint to 0x1312D00 (20000000) which is obviously much higher than
> the highest min-residency , and then I can see s2idle pick the deepest
> state ie. main_sleep_1. So that worked as expected.
>
> In conclusion, I am happy that this still works in a way that I am able to
> switch between low power states, but just not in the most explainable
> way :(

I hope this above makes sense to you - and thanks a lot for helping
out with testing!

Kind regards
Uffe

>
> [1] https://github.com/DhruvaG2000/dbg-linux/tree/tiL6.12-am62l-s2idle-prep-v2
> [2] https://gist.github.com/DhruvaG2000/a902b815b5db296bb7096ad7cb093929
>
> --
> Best regards,
> Dhruva Gole
> Texas Instruments Incorporated

[PATCH v2 1/4] PM: QoS: Introduce a CPU system-wakeup QoS limit
[PATCH v2 2/4] pmdomain: Respect the CPU system-wakeup QoS limit during s2idle
[PATCH v2 3/4] sched: idle: Respect the CPU system-wakeup QoS limit for s2idle
[PATCH v2 4/4] Documentation: power/cpuidle: Document the CPU system-wakeup latency QoS