[PATCH 0/3] Allow specifying an S2RAM sleep on pre-SYSTEM_SUSPEND PSCI impls

Konrad Dybcio posted 3 patches 1 year, 3 months ago
Documentation/devicetree/bindings/arm/psci.yaml |  6 ++++
drivers/firmware/psci/psci.c                    | 44 ++++++++++++++++++++++---
2 files changed, 46 insertions(+), 4 deletions(-)
[PATCH 0/3] Allow specifying an S2RAM sleep on pre-SYSTEM_SUSPEND PSCI impls
Posted by Konrad Dybcio 1 year, 3 months ago
Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through
CPU_SUSPEND instead. Inform Linux about that.
Please see the commit messages for a more detailed explanation.

This is effectively a more educated follow-up to [1].

The ultimate goal is to stop making Linux think that certain states
only concern cores/clusters, and consequently setting
pm_set_suspend/resume_via_firmware(), so that client drivers (such as
NVMe, see related discussion over at [2]) can make informed decisions
about assuming the power state of the device they govern.

If this series gets green light, I'll push a follow-up one that wires
up said sleep state on Qualcomm SoCs across the board.

[1] https://lore.kernel.org/linux-arm-kernel/20231227-topic-psci_fw_sus-v1-0-6910add70bf3@linaro.org/
[2] https://lore.kernel.org/linux-nvme/20241024-topic-nvmequirk-v1-1-51249999d409@oss.qualcomm.com/

Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
---
Konrad Dybcio (3):
      dt-bindings: arm,psci: Allow S2RAM power_state parameter description
      firmware/psci: Set pm_set_resume/suspend_via_firmware() for SYSTEM_SUSPEND
      firmware/psci: Allow specifying an S2RAM state through CPU_SUSPEND

 Documentation/devicetree/bindings/arm/psci.yaml |  6 ++++
 drivers/firmware/psci/psci.c                    | 44 ++++++++++++++++++++++---
 2 files changed, 46 insertions(+), 4 deletions(-)
---
base-commit: a39230ecf6b3057f5897bc4744a790070cfbe7a8
change-id: 20241028-topic-cpu_suspend_s2ram-28fc095d0aa4

Best regards,
-- 
Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Re: [PATCH 0/3] Allow specifying an S2RAM sleep on pre-SYSTEM_SUSPEND PSCI impls
Posted by Ulf Hansson 1 year, 2 months ago
On Mon, 28 Oct 2024 at 15:24, Konrad Dybcio <konradybcio@kernel.org> wrote:
>
> Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through
> CPU_SUSPEND instead. Inform Linux about that.
> Please see the commit messages for a more detailed explanation.
>
> This is effectively a more educated follow-up to [1].
>
> The ultimate goal is to stop making Linux think that certain states
> only concern cores/clusters, and consequently setting
> pm_set_suspend/resume_via_firmware(), so that client drivers (such as
> NVMe, see related discussion over at [2]) can make informed decisions
> about assuming the power state of the device they govern.

In my opinion, this is not really the correct way to do it. Using
pm_set_suspend/resume_via_firmware() works fine for x86/ACPI, but not
for PSCI like this. Let me elaborate. If the NVMe storage device is
sharing the same power-rail as the CPU cluster, then yes we should use
PSCI to control it. But is that really the case? If so, there are in
principle two ways forward to deal with this correctly.

1) If PSCI OSI mode is being used, the corresponding NVMe storage
device should be hooked up to the CPU PM cluster domain via genpd and
controlled as any other devices sharing the cluster-rail. In this way,
genpd together with the cpuidle-psci-domain can decide whether it's
okay to turn off the cluster. I believe this is the preferred way, but
2) would work fine too.

2) If PSCI PC mode is being used, a separate channel/interface to the
FW (like SCMI or rpmh in the QC case), should inform the FW whether
NVMe needs the power to it. This information should then be taken into
account by the PSCI FW when it decides what low-power-state to enter,
which ultimately means whether the cluster-rail can be turned off or
not.

Assuming PSCI OSI mode is used here. Then if 1) doesn't work for you,
please elaborate on why, so we can help to make it work, as it should.

[...]

Kind regards
Uffe
Re: [PATCH 0/3] Allow specifying an S2RAM sleep on pre-SYSTEM_SUSPEND PSCI impls
Posted by Konrad Dybcio 1 year, 2 months ago
On 14.11.2024 4:30 PM, Ulf Hansson wrote:
> On Mon, 28 Oct 2024 at 15:24, Konrad Dybcio <konradybcio@kernel.org> wrote:
>>
>> Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through
>> CPU_SUSPEND instead. Inform Linux about that.
>> Please see the commit messages for a more detailed explanation.
>>
>> This is effectively a more educated follow-up to [1].
>>
>> The ultimate goal is to stop making Linux think that certain states
>> only concern cores/clusters, and consequently setting
>> pm_set_suspend/resume_via_firmware(), so that client drivers (such as
>> NVMe, see related discussion over at [2]) can make informed decisions
>> about assuming the power state of the device they govern.
> 
> In my opinion, this is not really the correct way to do it. Using
> pm_set_suspend/resume_via_firmware() works fine for x86/ACPI, but not
> for PSCI like this. Let me elaborate. If the NVMe storage device is
> sharing the same power-rail as the CPU cluster, then yes we should use
> PSCI to control it. But is that really the case? If so, there are in
> principle two ways forward to deal with this correctly.
> 
> 1) If PSCI OSI mode is being used, the corresponding NVMe storage
> device should be hooked up to the CPU PM cluster domain via genpd and
> controlled as any other devices sharing the cluster-rail. In this way,
> genpd together with the cpuidle-psci-domain can decide whether it's
> okay to turn off the cluster. I believe this is the preferred way, but
> 2) would work fine too.
> 
> 2) If PSCI PC mode is being used, a separate channel/interface to the
> FW (like SCMI or rpmh in the QC case), should inform the FW whether
> NVMe needs the power to it. This information should then be taken into
> account by the PSCI FW when it decides what low-power-state to enter,
> which ultimately means whether the cluster-rail can be turned off or
> not.

This assumes PSCI only governs the CPU power rail. But what I'd
guesstimate is that in most implementations if system-level suspend is
there at all (no matter through which call), as per the spec, it at
least also projects onto the DDR power state (like in this i.mx
impl here [1]), or some uncore peripherals (like in Tegra's case with
some secure element being toggled at [2])

> Assuming PSCI OSI mode is used here. Then if 1) doesn't work for you,
> please elaborate on why, so we can help to make it work, as it should.

On Qualcomm platforms, RPMh is the central authority when it comes
to power governance, but by design, the CPUs must be off (and with a
specific magic cookie) for the RPMh hardware to consider powering off
very power hungry parts of the system, such as general i/o rails.

So again, PSCI must be fed a specific value for the rest of the hw
to react. The "S2RAM state" isn't really a cpuidle state, because
it doesn't differ from many shallower states as far as the cpu/cluster
are concerned. If that all isn't in place, the platform never actually
enters any "real" sleep state, other than "CPU and some controllable
IP blocks are runtime-suspended".

This effectively is very close to what ACPI+x86 do - there's a
co-processor/firmware that does a lot of things behind your back and
all you can do is *ask* it to change some handwavily-defined P/Cstate
that affects a huge chunk of silicon.

Konrad

[1] https://github.com/nxp-imx/imx-atf/blob/lf_v2.6/plat/imx/imx8m/imx8mp/imx8mp_lpa_psci.c#L474
[2] https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/nvidia/tegra/soc/t210/plat_psci_handlers.c#L214
Re: [PATCH 0/3] Allow specifying an S2RAM sleep on pre-SYSTEM_SUSPEND PSCI impls
Posted by Ulf Hansson 1 year, 2 months ago
+ Maulik, Vincent

On Thu, 5 Dec 2024 at 21:34, Konrad Dybcio
<konrad.dybcio@oss.qualcomm.com> wrote:
>
> On 14.11.2024 4:30 PM, Ulf Hansson wrote:
> > On Mon, 28 Oct 2024 at 15:24, Konrad Dybcio <konradybcio@kernel.org> wrote:
> >>
> >> Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through
> >> CPU_SUSPEND instead. Inform Linux about that.
> >> Please see the commit messages for a more detailed explanation.
> >>
> >> This is effectively a more educated follow-up to [1].
> >>
> >> The ultimate goal is to stop making Linux think that certain states
> >> only concern cores/clusters, and consequently setting
> >> pm_set_suspend/resume_via_firmware(), so that client drivers (such as
> >> NVMe, see related discussion over at [2]) can make informed decisions
> >> about assuming the power state of the device they govern.
> >
> > In my opinion, this is not really the correct way to do it. Using
> > pm_set_suspend/resume_via_firmware() works fine for x86/ACPI, but not
> > for PSCI like this. Let me elaborate. If the NVMe storage device is
> > sharing the same power-rail as the CPU cluster, then yes we should use
> > PSCI to control it. But is that really the case? If so, there are in
> > principle two ways forward to deal with this correctly.
> >
> > 1) If PSCI OSI mode is being used, the corresponding NVMe storage
> > device should be hooked up to the CPU PM cluster domain via genpd and
> > controlled as any other devices sharing the cluster-rail. In this way,
> > genpd together with the cpuidle-psci-domain can decide whether it's
> > okay to turn off the cluster. I believe this is the preferred way, but
> > 2) would work fine too.
> >
> > 2) If PSCI PC mode is being used, a separate channel/interface to the
> > FW (like SCMI or rpmh in the QC case), should inform the FW whether
> > NVMe needs the power to it. This information should then be taken into
> > account by the PSCI FW when it decides what low-power-state to enter,
> > which ultimately means whether the cluster-rail can be turned off or
> > not.
>
> This assumes PSCI only governs the CPU power rail. But what I'd
> guesstimate is that in most implementations if system-level suspend is
> there at all (no matter through which call), as per the spec, it at
> least also projects onto the DDR power state (like in this i.mx
> impl here [1]), or some uncore peripherals (like in Tegra's case with
> some secure element being toggled at [2])

Right, I certainly understand the above. There are different parts of
an SoC that may be sharing the same power-island as the CPUs.

The question here is whether the NVMe storage device is part of that
power-island too on some QC SoCs?

>
> > Assuming PSCI OSI mode is used here. Then if 1) doesn't work for you,
> > please elaborate on why, so we can help to make it work, as it should.
>
> On Qualcomm platforms, RPMh is the central authority when it comes
> to power governance, but by design, the CPUs must be off (and with a
> specific magic cookie) for the RPMh hardware to consider powering off
> very power hungry parts of the system, such as general i/o rails.

Right, that is why the "qcom,rpmh-rsc" device in many cases belongs to
the cluster-power-domain (for PSCI). This allows "qcom,rpmh-rsc" to
control the "last-man" activities and prevent deeper PSCI states
if/when necessary.

>
> So again, PSCI must be fed a specific value for the rest of the hw
> to react. The "S2RAM state" isn't really a cpuidle state, because
> it doesn't differ from many shallower states as far as the cpu/cluster
> are concerned. If that all isn't in place, the platform never actually
> enters any "real" sleep state, other than "CPU and some controllable
> IP blocks are runtime-suspended".

We recently discussed this, offlist, with Maulik - and I think we need
some more clarity around what is actually going on here.

In principle, it looks to me that using S2I with just another deeper
idlestate specified (with another psci-suspend-parameter, representing
a deeper state) should work fine, at least theoretically. Of course,
we may not be able to use that idlestate during regular
cpuidle/runtime but only during S2I, which we need to control in a
smooth way and that is not currently supported (but can be fixed
easily, I think).

In the end, it's the psci-suspend-parameter that is given to the PSCI
FW that informs about what state we can enter.

That said, using S2I may not work without updating the PSCI FW, of
course. For example, there may be FW limitations that require the
boot-CPU( CPU0) to be the last one for these deeper low-power-states.
Whether that is just a FW limitation or whether there are some
additional HW constraints that enforce this, needs to be clarified.

>
> This effectively is very close to what ACPI+x86 do - there's a
> co-processor/firmware that does a lot of things behind your back and
> all you can do is *ask* it to change some handwavily-defined P/Cstate
> that affects a huge chunk of silicon.

Yep, there are similarities.

However, ACPI is for generic device power management. PSCI requires
something additional, such as ARM SCMI or QC's rpm/rsc interface.

>
> Konrad
>
> [1] https://github.com/nxp-imx/imx-atf/blob/lf_v2.6/plat/imx/imx8m/imx8mp/imx8mp_lpa_psci.c#L474
> [2] https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/nvidia/tegra/soc/t210/plat_psci_handlers.c#L214
>

Kind regards
Uffe
Re: [PATCH 0/3] Allow specifying an S2RAM sleep on pre-SYSTEM_SUSPEND PSCI impls
Posted by Konrad Dybcio 1 year, 1 month ago
On 6.12.2024 10:53 AM, Ulf Hansson wrote:
> + Maulik, Vincent
> 
> On Thu, 5 Dec 2024 at 21:34, Konrad Dybcio
> <konrad.dybcio@oss.qualcomm.com> wrote:
>>
>> On 14.11.2024 4:30 PM, Ulf Hansson wrote:
>>> On Mon, 28 Oct 2024 at 15:24, Konrad Dybcio <konradybcio@kernel.org> wrote:
>>>>
>>>> Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through
>>>> CPU_SUSPEND instead. Inform Linux about that.
>>>> Please see the commit messages for a more detailed explanation.
>>>>
>>>> This is effectively a more educated follow-up to [1].
>>>>
>>>> The ultimate goal is to stop making Linux think that certain states
>>>> only concern cores/clusters, and consequently setting
>>>> pm_set_suspend/resume_via_firmware(), so that client drivers (such as
>>>> NVMe, see related discussion over at [2]) can make informed decisions
>>>> about assuming the power state of the device they govern.
>>>
>>> In my opinion, this is not really the correct way to do it. Using
>>> pm_set_suspend/resume_via_firmware() works fine for x86/ACPI, but not
>>> for PSCI like this. Let me elaborate. If the NVMe storage device is
>>> sharing the same power-rail as the CPU cluster, then yes we should use
>>> PSCI to control it. But is that really the case? If so, there are in
>>> principle two ways forward to deal with this correctly.
>>>
>>> 1) If PSCI OSI mode is being used, the corresponding NVMe storage
>>> device should be hooked up to the CPU PM cluster domain via genpd and
>>> controlled as any other devices sharing the cluster-rail. In this way,
>>> genpd together with the cpuidle-psci-domain can decide whether it's
>>> okay to turn off the cluster. I believe this is the preferred way, but
>>> 2) would work fine too.
>>>
>>> 2) If PSCI PC mode is being used, a separate channel/interface to the
>>> FW (like SCMI or rpmh in the QC case), should inform the FW whether
>>> NVMe needs the power to it. This information should then be taken into
>>> account by the PSCI FW when it decides what low-power-state to enter,
>>> which ultimately means whether the cluster-rail can be turned off or
>>> not.
>>
>> This assumes PSCI only governs the CPU power rail. But what I'd
>> guesstimate is that in most implementations if system-level suspend is
>> there at all (no matter through which call), as per the spec, it at
>> least also projects onto the DDR power state (like in this i.mx
>> impl here [1]), or some uncore peripherals (like in Tegra's case with
>> some secure element being toggled at [2])
> 
> Right, I certainly understand the above. There are different parts of
> an SoC that may be sharing the same power-island as the CPUs.
> 
> The question here is whether the NVMe storage device is part of that
> power-island too on some QC SoCs?

Yes, but not exclusively (i.e. there can also be other voltage rails or
similar that may or may not be manged by Linux, depending on the SoC)

>>> Assuming PSCI OSI mode is used here. Then if 1) doesn't work for you,
>>> please elaborate on why, so we can help to make it work, as it should.
>>
>> On Qualcomm platforms, RPMh is the central authority when it comes
>> to power governance, but by design, the CPUs must be off (and with a
>> specific magic cookie) for the RPMh hardware to consider powering off
>> very power hungry parts of the system, such as general i/o rails.
> 
> Right, that is why the "qcom,rpmh-rsc" device in many cases belongs to
> the cluster-power-domain (for PSCI). This allows "qcom,rpmh-rsc" to
> control the "last-man" activities and prevent deeper PSCI states
> if/when necessary.

Problem is, today we only describe the RSC connected to the CPU cluster.
Newer SoCs have multiple RSCs, which long story short allow for certain
IP blocks to operate and have their power managed without the CPU block
being involved, or even online.

The CPU RSC can only reliably probe the CPU online status, as all other
IPs can be requested to stay powered from an external entity (e.g. a DSP,
secure world and similar), so the driver can only do its best to try and
prevent obviously-going-to-fail idle entries when CPUs are online.

>> So again, PSCI must be fed a specific value for the rest of the hw
>> to react. The "S2RAM state" isn't really a cpuidle state, because
>> it doesn't differ from many shallower states as far as the cpu/cluster
>> are concerned. If that all isn't in place, the platform never actually
>> enters any "real" sleep state, other than "CPU and some controllable
>> IP blocks are runtime-suspended".
> 
> We recently discussed this, offlist, with Maulik - and I think we need
> some more clarity around what is actually going on here.
> 
> In principle, it looks to me that using S2I with just another deeper
> idlestate specified (with another psci-suspend-parameter, representing
> a deeper state) should work fine, at least theoretically. Of course,
> we may not be able to use that idlestate during regular
> cpuidle/runtime but only during S2I, which we need to control in a
> smooth way and that is not currently supported (but can be fixed
> easily, I think).
> 
> In the end, it's the psci-suspend-parameter that is given to the PSCI
> FW that informs about what state we can enter.
> 
> That said, using S2I may not work without updating the PSCI FW, of
> course. For example, there may be FW limitations that require the
> boot-CPU( CPU0) to be the last one for these deeper low-power-states.
> Whether that is just a FW limitation or whether there are some
> additional HW constraints that enforce this, needs to be clarified.

Yeah, not being able to runtime-idle into that state is one issue,
and another one being successfully entering the S2RAM state may
require us to reinitialize some hardware. Currently, Linux has no
way of knowing that state is any different from the rest, but
marking it as S2RAM would allow to check for PM_SUSPEND_MEM vs
PM_SUSPEND_TO_IDLE

>> This effectively is very close to what ACPI+x86 do - there's a
>> co-processor/firmware that does a lot of things behind your back and
>> all you can do is *ask* it to change some handwavily-defined P/Cstate
>> that affects a huge chunk of silicon.
> 
> Yep, there are similarities.
> 
> However, ACPI is for generic device power management. PSCI requires
> something additional, such as ARM SCMI or QC's rpm/rsc interface.

Right, we're not yet fully there with "for_each_device(fw_shut_down())"

Konrad

> 
>>
>> Konrad
>>
>> [1] https://github.com/nxp-imx/imx-atf/blob/lf_v2.6/plat/imx/imx8m/imx8mp/imx8mp_lpa_psci.c#L474
>> [2] https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/nvidia/tegra/soc/t210/plat_psci_handlers.c#L214
>>
> 
> Kind regards
> Uffe
Re: [PATCH 0/3] Allow specifying an S2RAM sleep on pre-SYSTEM_SUSPEND PSCI impls
Posted by Elliot Berman 1 year, 2 months ago
On Mon, Oct 28, 2024 at 03:22:56PM +0100, Konrad Dybcio wrote:
> Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through
> CPU_SUSPEND instead. Inform Linux about that.
> Please see the commit messages for a more detailed explanation.
> 
> This is effectively a more educated follow-up to [1].
> 
> The ultimate goal is to stop making Linux think that certain states
> only concern cores/clusters, and consequently setting
> pm_set_suspend/resume_via_firmware(), so that client drivers (such as
> NVMe, see related discussion over at [2]) can make informed decisions
> about assuming the power state of the device they govern.
> 
> If this series gets green light, I'll push a follow-up one that wires
> up said sleep state on Qualcomm SoCs across the board.
> 
> [1] https://lore.kernel.org/linux-arm-kernel/20231227-topic-psci_fw_sus-v1-0-6910add70bf3@linaro.org/
> [2] https://lore.kernel.org/linux-nvme/20241024-topic-nvmequirk-v1-1-51249999d409@oss.qualcomm.com/
> 

I got a bit confused, but I think I might've pieced it together. Konrad
wants to support s2ram (not clear why) on Qualcomm SoCs from 2015-2023.
On these SoCs, PSCI_SYSTEM_SUSPEND (s2ram) isn't supported but doing
s2idle gets you the same effect. You'd like s2ram to work, so you
provide a way to replace the PSCI_SYSTEM_SUSPEND param with
(effectively) the CPU_SUSPEND command. If this is the wrong
understanding, please correct me.

Could patch 2 be sent separately? I think it seems fine without the
rest of the series.

I'm not sure why you'd like to support s2ram. Is it *only* that you'd
like to be able to set pm_set_supend/resume_via_firmware()? I hope this
doesn't sound silly: what if you register a platform_s2idle_ops for the
relevant SoCs which calls pm_set_suspend/resume_via_firwmare()?

- Elliot
Re: [PATCH 0/3] Allow specifying an S2RAM sleep on pre-SYSTEM_SUSPEND PSCI impls
Posted by Konrad Dybcio 1 year, 1 month ago
On 14.11.2024 2:10 AM, Elliot Berman wrote:
> On Mon, Oct 28, 2024 at 03:22:56PM +0100, Konrad Dybcio wrote:
>> Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through
>> CPU_SUSPEND instead. Inform Linux about that.
>> Please see the commit messages for a more detailed explanation.
>>
>> This is effectively a more educated follow-up to [1].
>>
>> The ultimate goal is to stop making Linux think that certain states
>> only concern cores/clusters, and consequently setting
>> pm_set_suspend/resume_via_firmware(), so that client drivers (such as
>> NVMe, see related discussion over at [2]) can make informed decisions
>> about assuming the power state of the device they govern.
>>
>> If this series gets green light, I'll push a follow-up one that wires
>> up said sleep state on Qualcomm SoCs across the board.
>>
>> [1] https://lore.kernel.org/linux-arm-kernel/20231227-topic-psci_fw_sus-v1-0-6910add70bf3@linaro.org/
>> [2] https://lore.kernel.org/linux-nvme/20241024-topic-nvmequirk-v1-1-51249999d409@oss.qualcomm.com/
>>
> 
> I got a bit confused, but I think I might've pieced it together. Konrad
> wants to support s2ram (not clear why) on Qualcomm SoCs from 2015-2023.
> On these SoCs, PSCI_SYSTEM_SUSPEND (s2ram) isn't supported but doing
> s2idle gets you the same effect. You'd like s2ram to work, so you
> provide a way to replace the PSCI_SYSTEM_SUSPEND param with
> (effectively) the CPU_SUSPEND command. If this is the wrong
> understanding, please correct me.
> 
> Could patch 2 be sent separately? I think it seems fine without the
> rest of the series.
> 
> I'm not sure why you'd like to support s2ram. Is it *only* that you'd
> like to be able to set pm_set_supend/resume_via_firmware()? I hope this
> doesn't sound silly: what if you register a platform_s2idle_ops for the
> relevant SoCs which calls pm_set_suspend/resume_via_firwmare()?

S2RAM is what you get after entering a certain state, but currently
it's presented as just another (s2idle) idle state.

That means some hardware that may need to be reinitialized, isn't as
Linux has no clue it might have lost power.

One of such cases is the PCIe block, with storage drivers specifically
looking for pm_suspend_via_firmware, but that's unfortunately not the
whole list.

Konrad
Re: [PATCH 0/3] Allow specifying an S2RAM sleep on pre-SYSTEM_SUSPEND PSCI impls
Posted by Sudeep Holla 1 year, 1 month ago
On Thu, Dec 19, 2024 at 08:26:51PM +0100, Konrad Dybcio wrote:
> On 14.11.2024 2:10 AM, Elliot Berman wrote:
>
> > I'm not sure why you'd like to support s2ram. Is it *only* that you'd
> > like to be able to set pm_set_supend/resume_via_firmware()? I hope this
> > doesn't sound silly: what if you register a platform_s2idle_ops for the
> > relevant SoCs which calls pm_set_suspend/resume_via_firwmare()?
>
> S2RAM is what you get after entering a certain state, but currently
> it's presented as just another (s2idle) idle state.
>

Just to be clear, I assume you mean CPU_SUSPEND idle state. There is
no special or different s2idle idle states IIUC.

> That means some hardware that may need to be reinitialized, isn't as
> Linux has no clue it might have lost power.
>

Interesting, so this means firmware doesn't automatically save and restore
states yet exposes it as CPU_SUSPEND idle state.

> One of such cases is the PCIe block, with storage drivers specifically
> looking for pm_suspend_via_firmware, but that's unfortunately not the
> whole list.
>

Well I can now imagine and I understand what's wrong here. An idle state
is exposed to OS with an expectation that OS saves and restores certain
state. Unless you tie it some other power domains that theses devices
share, it is hard for OS to know the state is being lost and it needs
to save and restore them. It is simple wrong to assume that OS needs
to take care of them even though the power domain hierarchy doesn't
represent this dependency to enter such a state. cpuidle-psci-domain.c
takes care of this IIUC. Ulf can provide details if you are interested.

--
Regards,
Sudeep
Re: [PATCH 0/3] Allow specifying an S2RAM sleep on pre-SYSTEM_SUSPEND PSCI impls
Posted by Konrad Dybcio 1 year, 1 month ago
On 20.12.2024 12:39 PM, Sudeep Holla wrote:
> On Thu, Dec 19, 2024 at 08:26:51PM +0100, Konrad Dybcio wrote:
>> On 14.11.2024 2:10 AM, Elliot Berman wrote:
>>
>>> I'm not sure why you'd like to support s2ram. Is it *only* that you'd
>>> like to be able to set pm_set_supend/resume_via_firmware()? I hope this
>>> doesn't sound silly: what if you register a platform_s2idle_ops for the
>>> relevant SoCs which calls pm_set_suspend/resume_via_firwmare()?
>>
>> S2RAM is what you get after entering a certain state, but currently
>> it's presented as just another (s2idle) idle state.
>>
> 
> Just to be clear, I assume you mean CPU_SUSPEND idle state. There is
> no special or different s2idle idle states IIUC.

Yeah, right.

>> That means some hardware that may need to be reinitialized, isn't as
>> Linux has no clue it might have lost power.
>>
> 
> Interesting, so this means firmware doesn't automatically save and restore
> states yet exposes it as CPU_SUSPEND idle state.

Reading the spec, I'm pretty sure PSCI calls should only mess with the
power state of the cores, core-adjacent peripherals and GIC.

Reading section 5.20.1 (SYSTEM_SUSPEND / Intended use) I think it says
mostly what I'm trying to convey:


"In a typical implementation, the semantics are equivalent to a
CPU_SUSPEND to the deepest low-power state. However, it is possible that
an implementation might reserve a deeper state for SYSTEM_SUSPEND than
those used with CPU_SUSPEND."

- this is the situation on QC platforms, with the case of not reserving a
deeper state for SYSTEM_SUSPEND

>> One of such cases is the PCIe block, with storage drivers specifically
>> looking for pm_suspend_via_firmware, but that's unfortunately not the
>> whole list.
>>
> 
> Well I can now imagine and I understand what's wrong here. An idle state
> is exposed to OS with an expectation that OS saves and restores certain
> state. Unless you tie it some other power domains that theses devices
> share, it is hard for OS to know the state is being lost and it needs
> to save and restore them. It is simple wrong to assume that OS needs
> to take care of them even though the power domain hierarchy doesn't
> represent this dependency to enter such a state. cpuidle-psci-domain.c
> takes care of this IIUC. Ulf can provide details if you are interested.

The spec disagrees:

"Note that entering the system into S2 or S3 carries with it several
preconditions. For example, all devices in the system must be in a state
that is compatible with entry into the system state"

- this also happens to be relevant here, given PSCI is not supposed to
power-govern the entire SoC, but only the CPU block. We have specialty
hardware that does power management for non-CPU IPs, but to request
a system power rail disablement, it must be done in conjunction with the
CPU requesting such CPU_SUSPEND state. And only after the required hardware
is de-initialized.

Konrad
Re: [PATCH 0/3] Allow specifying an S2RAM sleep on pre-SYSTEM_SUSPEND PSCI impls
Posted by Sudeep Holla 1 year, 1 month ago
On Fri, Dec 20, 2024 at 01:42:04PM +0100, Konrad Dybcio wrote:
> On 20.12.2024 12:39 PM, Sudeep Holla wrote:
> > On Thu, Dec 19, 2024 at 08:26:51PM +0100, Konrad Dybcio wrote:
> >> On 14.11.2024 2:10 AM, Elliot Berman wrote:
> >>
> >>> I'm not sure why you'd like to support s2ram. Is it *only* that you'd
> >>> like to be able to set pm_set_supend/resume_via_firmware()? I hope this
> >>> doesn't sound silly: what if you register a platform_s2idle_ops for the
> >>> relevant SoCs which calls pm_set_suspend/resume_via_firwmare()?
> >>
> >> S2RAM is what you get after entering a certain state, but currently
> >> it's presented as just another (s2idle) idle state.
> >>
> >
> > Just to be clear, I assume you mean CPU_SUSPEND idle state. There is
> > no special or different s2idle idle states IIUC.
>
> Yeah, right.
>
> >> That means some hardware that may need to be reinitialized, isn't as
> >> Linux has no clue it might have lost power.
> >>
> >
> > Interesting, so this means firmware doesn't automatically save and restore
> > states yet exposes it as CPU_SUSPEND idle state.
>
> Reading the spec, I'm pretty sure PSCI calls should only mess with the
> power state of the cores, core-adjacent peripherals and GIC.
>
> Reading section 5.20.1 (SYSTEM_SUSPEND / Intended use) I think it says
> mostly what I'm trying to convey:
>
>
> "In a typical implementation, the semantics are equivalent to a
> CPU_SUSPEND to the deepest low-power state. However, it is possible that
> an implementation might reserve a deeper state for SYSTEM_SUSPEND than
> those used with CPU_SUSPEND."
>

Yes these text help to understand the interface easily. If they were same,
do you think we would have defined 2 different interfaces.

--
Regards,
Sudeep
Re: [PATCH 0/3] Allow specifying an S2RAM sleep on pre-SYSTEM_SUSPEND PSCI impls
Posted by Konrad Dybcio 1 year, 1 month ago
On 20.12.2024 2:58 PM, Sudeep Holla wrote:
> On Fri, Dec 20, 2024 at 01:42:04PM +0100, Konrad Dybcio wrote:
>> On 20.12.2024 12:39 PM, Sudeep Holla wrote:
>>> On Thu, Dec 19, 2024 at 08:26:51PM +0100, Konrad Dybcio wrote:
>>>> On 14.11.2024 2:10 AM, Elliot Berman wrote:
>>>>
>>>>> I'm not sure why you'd like to support s2ram. Is it *only* that you'd
>>>>> like to be able to set pm_set_supend/resume_via_firmware()? I hope this
>>>>> doesn't sound silly: what if you register a platform_s2idle_ops for the
>>>>> relevant SoCs which calls pm_set_suspend/resume_via_firwmare()?
>>>>
>>>> S2RAM is what you get after entering a certain state, but currently
>>>> it's presented as just another (s2idle) idle state.
>>>>
>>>
>>> Just to be clear, I assume you mean CPU_SUSPEND idle state. There is
>>> no special or different s2idle idle states IIUC.
>>
>> Yeah, right.
>>
>>>> That means some hardware that may need to be reinitialized, isn't as
>>>> Linux has no clue it might have lost power.
>>>>
>>>
>>> Interesting, so this means firmware doesn't automatically save and restore
>>> states yet exposes it as CPU_SUSPEND idle state.
>>
>> Reading the spec, I'm pretty sure PSCI calls should only mess with the
>> power state of the cores, core-adjacent peripherals and GIC.
>>
>> Reading section 5.20.1 (SYSTEM_SUSPEND / Intended use) I think it says
>> mostly what I'm trying to convey:
>>
>>
>> "In a typical implementation, the semantics are equivalent to a
>> CPU_SUSPEND to the deepest low-power state. However, it is possible that
>> an implementation might reserve a deeper state for SYSTEM_SUSPEND than
>> those used with CPU_SUSPEND."
>>
> 
> Yes these text help to understand the interface easily. If they were same,
> do you think we would have defined 2 different interfaces.

I would happen to think that, yes. Especially since the reference firmware
implementation does *exactly this*:

https://github.com/ARM-software/arm-trusted-firmware/blob/master/lib/psci/psci_main.c#L179-L221

PSCI_SYSTEM_SUSPEND seems to be simply meant as a wrapper around a specific
CPU_SUSPEND state (which may or may not be only callable from inside the
firmware when SYSTEM_SUSPEND specifically is requested, for reasons),
in a platform-agnostic way, so that the OS can enter suspend without
providing that magic StateID on all supported platforms.
But since it already requires more elbow grease on the peripheral IP side,
I'm not really convinced it's that much useful.

Plus, the optional bit of doing more work behind the scenes doesn't seem
to be very wildly used across TF-A supported platforms.

So please, stop making the argument that it's any different. The firmware
I'm dealing with simply didn't expose the same thing twice, in perfect
accordance with the spec.

Konrad
Re: [PATCH 0/3] Allow specifying an S2RAM sleep on pre-SYSTEM_SUSPEND PSCI impls
Posted by Sudeep Holla 1 year, 1 month ago
On Fri, Dec 20, 2024 at 03:20:37PM +0100, Konrad Dybcio wrote:
>
> I would happen to think that, yes. Especially since the reference firmware
> implementation does *exactly this*:
>
> https://github.com/ARM-software/arm-trusted-firmware/blob/master/lib/psci/psci_main.c#L179-L221
>
> PSCI_SYSTEM_SUSPEND seems to be simply meant as a wrapper around a specific
> CPU_SUSPEND state (which may or may not be only callable from inside the
> firmware when SYSTEM_SUSPEND specifically is requested, for reasons),
> in a platform-agnostic way, so that the OS can enter suspend without
> providing that magic StateID on all supported platforms.

Exactly, that's how it can be OS and platform agnostic. Yet this platform
considered to optimise by not just providing it as a wrapper(if it was
that simple on your platform too) without running any tests and leaving
it to interested parties like you to mess around to get it working.
That practice needs to be fixed and this change won't help and once we
fix this, more such special treatment fixes are needed on newer platforms.
So lets stop and ensure things are fixed properly.

> But since it already requires more elbow grease on the peripheral IP side,
> I'm not really convinced it's that much useful.
>
> Plus, the optional bit of doing more work behind the scenes doesn't seem
> to be very wildly used across TF-A supported platforms.
>
> So please, stop making the argument that it's any different. The firmware
> I'm dealing with simply didn't expose the same thing twice, in perfect
> accordance with the spec.
>

So that it can continue to do so in the future ?
Thanks but no thanks. NACK with no arguments as requested.

--
Regards,
Sudeep
Re: [PATCH 0/3] Allow specifying an S2RAM sleep on pre-SYSTEM_SUSPEND PSCI impls
Posted by Konrad Dybcio 1 year, 1 month ago
On 20.12.2024 3:36 PM, Sudeep Holla wrote:
> On Fri, Dec 20, 2024 at 03:20:37PM +0100, Konrad Dybcio wrote:
>>
>> I would happen to think that, yes. Especially since the reference firmware
>> implementation does *exactly this*:
>>
>> https://github.com/ARM-software/arm-trusted-firmware/blob/master/lib/psci/psci_main.c#L179-L221
>>
>> PSCI_SYSTEM_SUSPEND seems to be simply meant as a wrapper around a specific
>> CPU_SUSPEND state (which may or may not be only callable from inside the
>> firmware when SYSTEM_SUSPEND specifically is requested, for reasons),
>> in a platform-agnostic way, so that the OS can enter suspend without
>> providing that magic StateID on all supported platforms.
> 
> Exactly, that's how it can be OS and platform agnostic. Yet this platform
> considered to optimise by not just providing it as a wrapper(if it was
> that simple on your platform too) without running any tests and leaving
> it to interested parties like you to mess around to get it working.
> That practice needs to be fixed and this change won't help and once we
> fix this, more such special treatment fixes are needed on newer platforms.
> So lets stop and ensure things are fixed properly.

And then remove CPU_SUSPEND support if CPU_SUSPEND2 comes in a spec update
because it's not generic enough? Sorry, this is not acceptable.

If you enforce PSCI as the only way of doing SMP/cpuidle/platform suspend
upstream on arm64, you should not gatekeep existing implementations that are
actually in line with the written spec, just because you don't happen to
like them.

If you want to start the process of getting rid of those, amend the spec
to deprecate and/or forbid system-level suspend in CPU_SUSPEND in future
PSCI versions. But you can't retroactively change your decisions like that.

>> But since it already requires more elbow grease on the peripheral IP side,
>> I'm not really convinced it's that much useful.
>>
>> Plus, the optional bit of doing more work behind the scenes doesn't seem
>> to be very wildly used across TF-A supported platforms.
>>
>> So please, stop making the argument that it's any different. The firmware
>> I'm dealing with simply didn't expose the same thing twice, in perfect
>> accordance with the spec.
>>
> 
> So that it can continue to do so in the future ?
> Thanks but no thanks. NACK with no arguments as requested.

That's already been "fixed" on QC platforms starting around 2022, as
mentioned in this series.

Konrad
Re: [PATCH 0/3] Allow specifying an S2RAM sleep on pre-SYSTEM_SUSPEND PSCI impls
Posted by Manivannan Sadhasivam 1 year, 2 months ago
On Mon, Oct 28, 2024 at 03:22:56PM +0100, Konrad Dybcio wrote:
> Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through
> CPU_SUSPEND instead. Inform Linux about that.
> Please see the commit messages for a more detailed explanation.
> 

It is still not PSCI_SYSTEM_SUSPEND though...

> This is effectively a more educated follow-up to [1].
> 
> The ultimate goal is to stop making Linux think that certain states
> only concern cores/clusters, and consequently setting
> pm_set_suspend/resume_via_firmware(), so that client drivers (such as
> NVMe, see related discussion over at [2]) can make informed decisions
> about assuming the power state of the device they govern.
> 
> If this series gets green light, I'll push a follow-up one that wires
> up said sleep state on Qualcomm SoCs across the board.
> 

Sorry. I don't think PSCI is the right place for this. Qcom SoCs have a common
firmware across all segments (mostly), so there is no S2R involved and only
S2Idle. If you use PSCI to implement suspend_via_firmware(), then all the SoCs
making use of the PSCI implementation will have the same behavior. I don't think
we would want that.

For instance, if a Qcom SoC is used in an android tablet with the same firmware,
then this would allow the NVMe device to be turned off during system suspend all
the time when user presses the lock button. And this will cause NVMe device to
wear out faster. The said approach will work fine for non-android usecases
though.

I have a couple of ideas in mind that I will post to NVMe list itself.

- Mani

> [1] https://lore.kernel.org/linux-arm-kernel/20231227-topic-psci_fw_sus-v1-0-6910add70bf3@linaro.org/
> [2] https://lore.kernel.org/linux-nvme/20241024-topic-nvmequirk-v1-1-51249999d409@oss.qualcomm.com/
> 
> Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
> ---
> Konrad Dybcio (3):
>       dt-bindings: arm,psci: Allow S2RAM power_state parameter description
>       firmware/psci: Set pm_set_resume/suspend_via_firmware() for SYSTEM_SUSPEND
>       firmware/psci: Allow specifying an S2RAM state through CPU_SUSPEND
> 
>  Documentation/devicetree/bindings/arm/psci.yaml |  6 ++++
>  drivers/firmware/psci/psci.c                    | 44 ++++++++++++++++++++++---
>  2 files changed, 46 insertions(+), 4 deletions(-)
> ---
> base-commit: a39230ecf6b3057f5897bc4744a790070cfbe7a8
> change-id: 20241028-topic-cpu_suspend_s2ram-28fc095d0aa4
> 
> Best regards,
> -- 
> Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
> 

-- 
மணிவண்ணன் சதாசிவம்
Re: [PATCH 0/3] Allow specifying an S2RAM sleep on pre-SYSTEM_SUSPEND PSCI impls
Posted by Konrad Dybcio 1 year, 2 months ago

On 11/12/24 19:01, Manivannan Sadhasivam wrote:
> On Mon, Oct 28, 2024 at 03:22:56PM +0100, Konrad Dybcio wrote:
>> Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through
>> CPU_SUSPEND instead. Inform Linux about that.
>> Please see the commit messages for a more detailed explanation.
>>
> 
> It is still not PSCI_SYSTEM_SUSPEND though...

It *literally* does the same thing on devices where it's exposed.

> 
>> This is effectively a more educated follow-up to [1].
>>
>> The ultimate goal is to stop making Linux think that certain states
>> only concern cores/clusters, and consequently setting
>> pm_set_suspend/resume_via_firmware(), so that client drivers (such as
>> NVMe, see related discussion over at [2]) can make informed decisions
>> about assuming the power state of the device they govern.
>>
>> If this series gets green light, I'll push a follow-up one that wires
>> up said sleep state on Qualcomm SoCs across the board.
>>
> 
> Sorry. I don't think PSCI is the right place for this. Qcom SoCs have a common
> firmware across all segments (mostly), 

This ^

> so there is no S2R involved and only S2Idle.

is not at all related to this ^, the "so" makes no sense.

(also you're wrong, this *is* S2RAM)

> If you use PSCI to implement suspend_via_firmware(), then all the SoCs
> making use of the PSCI implementation will have the same behavior. I don't think
> we would want that.

This is an issue with the NVMe framework that is totally unrelated to this
change, see below. Also, the code only sets that on targets where such state
exists and is described.

> For instance, if a Qcom SoC is used in an android tablet with the same firmware,
> then this would allow the NVMe device to be turned off during system suspend all
> the time when user presses the lock button. And this will cause NVMe device to
> wear out faster. The said approach will work fine for non-android usecases
> though.

The NVMe framework doesn't make a distinction between "phone screen off" and
"laptop lid closed & thrown in a bag" on *any* platform. The usecase you're
describing is not supported as of today since nobody *actually* has NVMe on a
phone that also happens to run upstream Linux.
I'm not going to solve imaginary problems.

Besides, userspace already has sysfs to tune device power state knobs. Which
Android uses very extensively on market devices.

Konrad
Re: [PATCH 0/3] Allow specifying an S2RAM sleep on pre-SYSTEM_SUSPEND PSCI impls
Posted by Manivannan Sadhasivam 1 year, 2 months ago
On Tue, Nov 12, 2024 at 07:32:36PM +0100, Konrad Dybcio wrote:
> 
> 
> On 11/12/24 19:01, Manivannan Sadhasivam wrote:
> > On Mon, Oct 28, 2024 at 03:22:56PM +0100, Konrad Dybcio wrote:
> > > Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through
> > > CPU_SUSPEND instead. Inform Linux about that.
> > > Please see the commit messages for a more detailed explanation.
> > > 
> > 
> > It is still not PSCI_SYSTEM_SUSPEND though...
> 
> It *literally* does the same thing on devices where it's exposed.
> 

But still...

> > 
> > > This is effectively a more educated follow-up to [1].
> > > 
> > > The ultimate goal is to stop making Linux think that certain states
> > > only concern cores/clusters, and consequently setting
> > > pm_set_suspend/resume_via_firmware(), so that client drivers (such as
> > > NVMe, see related discussion over at [2]) can make informed decisions
> > > about assuming the power state of the device they govern.
> > > 
> > > If this series gets green light, I'll push a follow-up one that wires
> > > up said sleep state on Qualcomm SoCs across the board.
> > > 
> > 
> > Sorry. I don't think PSCI is the right place for this. Qcom SoCs have a common
> > firmware across all segments (mostly),
> 
> This ^
> 
> > so there is no S2R involved and only S2Idle.
> 
> is not at all related to this ^, the "so" makes no sense.
> 
> (also you're wrong, this *is* S2RAM)
> 

What? Qcom SoCs supporting S2R? I'm unheard of.

> > If you use PSCI to implement suspend_via_firmware(), then all the SoCs
> > making use of the PSCI implementation will have the same behavior. I don't think
> > we would want that.
> 
> This is an issue with the NVMe framework that is totally unrelated to this
> change, see below. Also, the code only sets that on targets where such state
> exists and is described.
> 

Well, you are doing it just because you want the NVMe device to learn about the
platform requirement.

> > For instance, if a Qcom SoC is used in an android tablet with the same firmware,
> > then this would allow the NVMe device to be turned off during system suspend all
> > the time when user presses the lock button. And this will cause NVMe device to
> > wear out faster. The said approach will work fine for non-android usecases
> > though.
> 
> The NVMe framework doesn't make a distinction between "phone screen off" and
> "laptop lid closed & thrown in a bag" on *any* platform. The usecase you're
> describing is not supported as of today since nobody *actually* has NVMe on a
> phone that also happens to run upstream Linux.
> I'm not going to solve imaginary problems.
> 

Not just phone, NVMe device could be running on an android tablet. I'm not
talking about an imaginary problem, but a real problem that is in a forseeable
future (that is also the reason why NVMe developers doesn't want to put the
device into power down mode always during system suspend).

And with this change, you are just going to make the NVMe lifetime miserable on
those platforms.

- Mani

> Besides, userspace already has sysfs to tune device power state knobs. Which
> Android uses very extensively on market devices.
> 
> Konrad

-- 
மணிவண்ணன் சதாசிவம்
Re: [PATCH 0/3] Allow specifying an S2RAM sleep on pre-SYSTEM_SUSPEND PSCI impls
Posted by Konrad Dybcio 1 year, 2 months ago

On 11/12/24 19:43, Manivannan Sadhasivam wrote:
> On Tue, Nov 12, 2024 at 07:32:36PM +0100, Konrad Dybcio wrote:
>>
>>
>> On 11/12/24 19:01, Manivannan Sadhasivam wrote:
>>> On Mon, Oct 28, 2024 at 03:22:56PM +0100, Konrad Dybcio wrote:
>>>> Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through
>>>> CPU_SUSPEND instead. Inform Linux about that.
>>>> Please see the commit messages for a more detailed explanation.
>>>>
>>>
>>> It is still not PSCI_SYSTEM_SUSPEND though...
>>
>> It *literally* does the same thing on devices where it's exposed.
>>
> 
> But still...

Still-what? We can't replace the signed firmware on (unironically) tens
of millions of devices in the wild and this is how it exposes that sleep
state. This is how arm platforms did it before the PSCI spec was
updated and SYSTEM_SUSPEND is *still optional today*.


>>>> This is effectively a more educated follow-up to [1].
>>>>
>>>> The ultimate goal is to stop making Linux think that certain states
>>>> only concern cores/clusters, and consequently setting
>>>> pm_set_suspend/resume_via_firmware(), so that client drivers (such as
>>>> NVMe, see related discussion over at [2]) can make informed decisions
>>>> about assuming the power state of the device they govern.
>>>>
>>>> If this series gets green light, I'll push a follow-up one that wires
>>>> up said sleep state on Qualcomm SoCs across the board.
>>>>
>>>
>>> Sorry. I don't think PSCI is the right place for this. Qcom SoCs have a common
>>> firmware across all segments (mostly),
>>
>> This ^
>>
>>> so there is no S2R involved and only S2Idle.
>>
>> is not at all related to this ^, the "so" makes no sense.
>>
>> (also you're wrong, this *is* S2RAM)
>>
> 
> What? Qcom SoCs supporting S2R? I'm unheard of.

Maybe you're thinking of hibernation, which is not widely (if at all)
supported.


>>> If you use PSCI to implement suspend_via_firmware(), then all the SoCs
>>> making use of the PSCI implementation will have the same behavior. I don't think
>>> we would want that.
>>
>> This is an issue with the NVMe framework that is totally unrelated to this
>> change, see below. Also, the code only sets that on targets where such state
>> exists and is described.
>>
> 
> Well, you are doing it just because you want the NVMe device to learn about the
> platform requirement.

And I can't see why you're having a problem with this. It's exactly how it
works on x86 too. Modern Standby also shuts down storage on Windows,
regardless of the CPU architecture.
  
>>> For instance, if a Qcom SoC is used in an android tablet with the same firmware,
>>> then this would allow the NVMe device to be turned off during system suspend all
>>> the time when user presses the lock button. And this will cause NVMe device to
>>> wear out faster. The said approach will work fine for non-android usecases
>>> though.
>>
>> The NVMe framework doesn't make a distinction between "phone screen off" and
>> "laptop lid closed & thrown in a bag" on *any* platform. The usecase you're
>> describing is not supported as of today since nobody *actually* has NVMe on a
>> phone that also happens to run upstream Linux.
>> I'm not going to solve imaginary problems.
>>
> 
> Not just phone, NVMe device could be running on an android tablet.

'Could' very much makes it imaginary. There are no supported devices that
fall into this category.

> I'm not
> talking about an imaginary problem, but a real problem that is in a forseeable
> future

Keyword: future. This issue has been on hold for years because of 'issues'
that are pinky promised to happen eventually, without anyone suggesting any
actually acceptable solutions. This just undermines progress.

> (that is also the reason why NVMe developers doesn't want to put the
> device into power down mode always during system suspend).

This is the current behavior on any new x86 laptop, and has been for a
couple of years.

> And with this change, you are just going to make the NVMe lifetime miserable on
> those platforms.

Fearmongering and hearsay. See above.

Konrad
Re: [PATCH 0/3] Allow specifying an S2RAM sleep on pre-SYSTEM_SUSPEND PSCI impls
Posted by Manivannan Sadhasivam 1 year, 2 months ago
On Tue, Nov 12, 2024 at 08:04:34PM +0100, Konrad Dybcio wrote:
> 
> 
> On 11/12/24 19:43, Manivannan Sadhasivam wrote:
> > On Tue, Nov 12, 2024 at 07:32:36PM +0100, Konrad Dybcio wrote:
> > > 
> > > 
> > > On 11/12/24 19:01, Manivannan Sadhasivam wrote:
> > > > On Mon, Oct 28, 2024 at 03:22:56PM +0100, Konrad Dybcio wrote:
> > > > > Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through
> > > > > CPU_SUSPEND instead. Inform Linux about that.
> > > > > Please see the commit messages for a more detailed explanation.
> > > > > 
> > > > 
> > > > It is still not PSCI_SYSTEM_SUSPEND though...
> > > 
> > > It *literally* does the same thing on devices where it's exposed.
> > > 
> > 
> > But still...
> 
> Still-what? We can't replace the signed firmware on (unironically) tens
> of millions of devices in the wild and this is how it exposes that sleep
> state. This is how arm platforms did it before the PSCI spec was
> updated and SYSTEM_SUSPEND is *still optional today*.
> 

I never asked you to replace the firmware in first place, so don't quote the
fact I never said. I see this approach as a way of abusing/faking PSCI system
suspend.

Moreover, I heard from Bjorn that Qcom doesn't want to put the PCIe devices into
D3Cold during system suspend for future platforms (based on their
experimentation). So if drivers rely on this static information, then even Qcom
cannot achieve what they want.

> 
> > > > > This is effectively a more educated follow-up to [1].
> > > > > 
> > > > > The ultimate goal is to stop making Linux think that certain states
> > > > > only concern cores/clusters, and consequently setting
> > > > > pm_set_suspend/resume_via_firmware(), so that client drivers (such as
> > > > > NVMe, see related discussion over at [2]) can make informed decisions
> > > > > about assuming the power state of the device they govern.
> > > > > 
> > > > > If this series gets green light, I'll push a follow-up one that wires
> > > > > up said sleep state on Qualcomm SoCs across the board.
> > > > > 
> > > > 
> > > > Sorry. I don't think PSCI is the right place for this. Qcom SoCs have a common
> > > > firmware across all segments (mostly),
> > > 
> > > This ^
> > > 
> > > > so there is no S2R involved and only S2Idle.
> > > 
> > > is not at all related to this ^, the "so" makes no sense.
> > > 
> > > (also you're wrong, this *is* S2RAM)
> > > 
> > 
> > What? Qcom SoCs supporting S2R? I'm unheard of.
> 
> Maybe you're thinking of hibernation, which is not widely (if at all)
> supported.
> 

Not hibernation. The Qcom platforms I've aware of all support only S2Idle. I
don't work for Qcom, so I may be missing some insider information.

> 
> > > > If you use PSCI to implement suspend_via_firmware(), then all the SoCs
> > > > making use of the PSCI implementation will have the same behavior. I don't think
> > > > we would want that.
> > > 
> > > This is an issue with the NVMe framework that is totally unrelated to this
> > > change, see below. Also, the code only sets that on targets where such state
> > > exists and is described.
> > > 
> > 
> > Well, you are doing it just because you want the NVMe device to learn about the
> > platform requirement.
> 
> And I can't see why you're having a problem with this. It's exactly how it
> works on x86 too. Modern Standby also shuts down storage on Windows,
> regardless of the CPU architecture.

It is not just my problem. I'm expressing the concern that NVMe folks have and
already expressed over the similar solutions I proposed. And I cannot just
overrule them.

> > > > For instance, if a Qcom SoC is used in an android tablet with the same firmware,
> > > > then this would allow the NVMe device to be turned off during system suspend all
> > > > the time when user presses the lock button. And this will cause NVMe device to
> > > > wear out faster. The said approach will work fine for non-android usecases
> > > > though.
> > > 
> > > The NVMe framework doesn't make a distinction between "phone screen off" and
> > > "laptop lid closed & thrown in a bag" on *any* platform. The usecase you're
> > > describing is not supported as of today since nobody *actually* has NVMe on a
> > > phone that also happens to run upstream Linux.
> > > I'm not going to solve imaginary problems.
> > > 
> > 
> > Not just phone, NVMe device could be running on an android tablet.
> 
> 'Could' very much makes it imaginary. There are no supported devices that
> fall into this category.
> 

Agree that there are no products in the market (yet). But having NMVe on
handheld devices is not something I would quote as 'imaginary'.

> > I'm not
> > talking about an imaginary problem, but a real problem that is in a forseeable
> > future
> 
> Keyword: future. This issue has been on hold for years because of 'issues'
> that are pinky promised to happen eventually, without anyone suggesting any
> actually acceptable solutions. This just undermines progress.
> 

Not true. There are solutions suggested, but then it always takes time to reach
consensus. One of the approach that I'm about to propose is to have a userspace
knob that specifies whether the device can be powered down or not (leaving the
default behavior to put them in low power state). Because, the decision to put
the devices into power down or low power state sounds more like an userspace
policy. It was discussed at LPC 2023.

> > (that is also the reason why NVMe developers doesn't want to put the
> > device into power down mode always during system suspend).
> 
> This is the current behavior on any new x86 laptop, and has been for a
> couple of years.
> 
> > And with this change, you are just going to make the NVMe lifetime miserable on
> > those platforms.
> 
> Fearmongering and hearsay. See above.
> 

I can only wish you best of luck with this approach!

- Mani

-- 
மணிவண்ணன் சதாசிவம்
Re: [PATCH 0/3] Allow specifying an S2RAM sleep on pre-SYSTEM_SUSPEND PSCI impls
Posted by Konrad Dybcio 1 year, 1 month ago
On 13.11.2024 9:05 AM, Manivannan Sadhasivam wrote:
> On Tue, Nov 12, 2024 at 08:04:34PM +0100, Konrad Dybcio wrote:
>>
>>
>> On 11/12/24 19:43, Manivannan Sadhasivam wrote:
>>> On Tue, Nov 12, 2024 at 07:32:36PM +0100, Konrad Dybcio wrote:
>>>>
>>>>
>>>> On 11/12/24 19:01, Manivannan Sadhasivam wrote:
>>>>> On Mon, Oct 28, 2024 at 03:22:56PM +0100, Konrad Dybcio wrote:
>>>>>> Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through
>>>>>> CPU_SUSPEND instead. Inform Linux about that.
>>>>>> Please see the commit messages for a more detailed explanation.
>>>>>>
>>>>>
>>>>> It is still not PSCI_SYSTEM_SUSPEND though...
>>>>
>>>> It *literally* does the same thing on devices where it's exposed.
>>>>
>>>
>>> But still...
>>
>> Still-what? We can't replace the signed firmware on (unironically) tens
>> of millions of devices in the wild and this is how it exposes that sleep
>> state. This is how arm platforms did it before the PSCI spec was
>> updated and SYSTEM_SUSPEND is *still optional today*.
>>
> 
> I never asked you to replace the firmware in first place, so don't quote the
> fact I never said.

Never implied you did. I'm putting pressure on the fact that we can't
update the firmware on such devices to expose PSCI_SYSTEM_SUSPEND.

> I see this approach as a way of abusing/faking PSCI system
> suspend.

And I disagree. I can't stress this enough, calling PSCI_SYSTEM_SUSPEND
is literally internally equivalent to calling PSCI_CPU_SUSPEND(magicval).

> 
> Moreover, I heard from Bjorn that Qcom doesn't want to put the PCIe devices into
> D3Cold during system suspend for future platforms (based on their
> experimentation). So if drivers rely on this static information, then even Qcom
> cannot achieve what they want.
> 
>>
>>>>>> This is effectively a more educated follow-up to [1].
>>>>>>
>>>>>> The ultimate goal is to stop making Linux think that certain states
>>>>>> only concern cores/clusters, and consequently setting
>>>>>> pm_set_suspend/resume_via_firmware(), so that client drivers (such as
>>>>>> NVMe, see related discussion over at [2]) can make informed decisions
>>>>>> about assuming the power state of the device they govern.
>>>>>>
>>>>>> If this series gets green light, I'll push a follow-up one that wires
>>>>>> up said sleep state on Qualcomm SoCs across the board.
>>>>>>
>>>>>
>>>>> Sorry. I don't think PSCI is the right place for this. Qcom SoCs have a common
>>>>> firmware across all segments (mostly),
>>>>
>>>> This ^
>>>>
>>>>> so there is no S2R involved and only S2Idle.
>>>>
>>>> is not at all related to this ^, the "so" makes no sense.
>>>>
>>>> (also you're wrong, this *is* S2RAM)
>>>>
>>>
>>> What? Qcom SoCs supporting S2R? I'm unheard of.
>>
>> Maybe you're thinking of hibernation, which is not widely (if at all)
>> supported.
>>
> 
> Not hibernation. The Qcom platforms I've aware of all support only S2Idle. I
> don't work for Qcom, so I may be missing some insider information.

I think this is the main source of misunderstanding in this entire thread.

CXPC is S2RAM. Not S2idle.

Shallower sleep states on QC platforms are S2idle.

>>>>> If you use PSCI to implement suspend_via_firmware(), then all the SoCs
>>>>> making use of the PSCI implementation will have the same behavior. I don't think
>>>>> we would want that.
>>>>
>>>> This is an issue with the NVMe framework that is totally unrelated to this
>>>> change, see below. Also, the code only sets that on targets where such state
>>>> exists and is described.
>>>>
>>>
>>> Well, you are doing it just because you want the NVMe device to learn about the
>>> platform requirement.
>>
>> And I can't see why you're having a problem with this. It's exactly how it
>> works on x86 too. Modern Standby also shuts down storage on Windows,
>> regardless of the CPU architecture.
> 
> It is not just my problem. I'm expressing the concern that NVMe folks have and
> already expressed over the similar solutions I proposed. And I cannot just
> overrule them.

Sure, but if PSCI_SYSTEM_SUSPEND implies S2ram, why should the behavior be
different purely based on the architectural idle implementation?

Moreover, if the same platform can be booted with ACPI or DT, why should
power state switching work differently, considering both would describe
the hardware accurately?

>>>>> For instance, if a Qcom SoC is used in an android tablet with the same firmware,
>>>>> then this would allow the NVMe device to be turned off during system suspend all
>>>>> the time when user presses the lock button. And this will cause NVMe device to
>>>>> wear out faster. The said approach will work fine for non-android usecases
>>>>> though.
>>>>
>>>> The NVMe framework doesn't make a distinction between "phone screen off" and
>>>> "laptop lid closed & thrown in a bag" on *any* platform. The usecase you're
>>>> describing is not supported as of today since nobody *actually* has NVMe on a
>>>> phone that also happens to run upstream Linux.
>>>> I'm not going to solve imaginary problems.
>>>>
>>>
>>> Not just phone, NVMe device could be running on an android tablet.
>>
>> 'Could' very much makes it imaginary. There are no supported devices that
>> fall into this category.
>>
> 
> Agree that there are no products in the market (yet). But having NMVe on
> handheld devices is not something I would quote as 'imaginary'.
> 
>>> I'm not
>>> talking about an imaginary problem, but a real problem that is in a forseeable
>>> future
>>
>> Keyword: future. This issue has been on hold for years because of 'issues'
>> that are pinky promised to happen eventually, without anyone suggesting any
>> actually acceptable solutions. This just undermines progress.
>>
> 
> Not true. There are solutions suggested, but then it always takes time to reach
> consensus. One of the approach that I'm about to propose is to have a userspace
> knob that specifies whether the device can be powered down or not (leaving the
> default behavior to put them in low power state). Because, the decision to put
> the devices into power down or low power state sounds more like an userspace
> policy. It was discussed at LPC 2023.

Sure, however I believe it is perfectly reasonable to change the
default setting there based on platform capabilities.

Konrad

> 
>>> (that is also the reason why NVMe developers doesn't want to put the
>>> device into power down mode always during system suspend).
>>
>> This is the current behavior on any new x86 laptop, and has been for a
>> couple of years.
>>
>>> And with this change, you are just going to make the NVMe lifetime miserable on
>>> those platforms.
>>
>> Fearmongering and hearsay. See above.
>>
> 
> I can only wish you best of luck with this approach!
> 
> - Mani
>