Documentation/devicetree/bindings/arm/psci.yaml | 6 ++++ drivers/firmware/psci/psci.c | 44 ++++++++++++++++++++++--- 2 files changed, 46 insertions(+), 4 deletions(-)
Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through
CPU_SUSPEND instead. Inform Linux about that.
Please see the commit messages for a more detailed explanation.
This is effectively a more educated follow-up to [1].
The ultimate goal is to stop making Linux think that certain states
only concern cores/clusters, and consequently setting
pm_set_suspend/resume_via_firmware(), so that client drivers (such as
NVMe, see related discussion over at [2]) can make informed decisions
about assuming the power state of the device they govern.
If this series gets green light, I'll push a follow-up one that wires
up said sleep state on Qualcomm SoCs across the board.
[1] https://lore.kernel.org/linux-arm-kernel/20231227-topic-psci_fw_sus-v1-0-6910add70bf3@linaro.org/
[2] https://lore.kernel.org/linux-nvme/20241024-topic-nvmequirk-v1-1-51249999d409@oss.qualcomm.com/
Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
---
Konrad Dybcio (3):
dt-bindings: arm,psci: Allow S2RAM power_state parameter description
firmware/psci: Set pm_set_resume/suspend_via_firmware() for SYSTEM_SUSPEND
firmware/psci: Allow specifying an S2RAM state through CPU_SUSPEND
Documentation/devicetree/bindings/arm/psci.yaml | 6 ++++
drivers/firmware/psci/psci.c | 44 ++++++++++++++++++++++---
2 files changed, 46 insertions(+), 4 deletions(-)
---
base-commit: a39230ecf6b3057f5897bc4744a790070cfbe7a8
change-id: 20241028-topic-cpu_suspend_s2ram-28fc095d0aa4
Best regards,
--
Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
On Mon, 28 Oct 2024 at 15:24, Konrad Dybcio <konradybcio@kernel.org> wrote: > > Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through > CPU_SUSPEND instead. Inform Linux about that. > Please see the commit messages for a more detailed explanation. > > This is effectively a more educated follow-up to [1]. > > The ultimate goal is to stop making Linux think that certain states > only concern cores/clusters, and consequently setting > pm_set_suspend/resume_via_firmware(), so that client drivers (such as > NVMe, see related discussion over at [2]) can make informed decisions > about assuming the power state of the device they govern. In my opinion, this is not really the correct way to do it. Using pm_set_suspend/resume_via_firmware() works fine for x86/ACPI, but not for PSCI like this. Let me elaborate. If the NVMe storage device is sharing the same power-rail as the CPU cluster, then yes we should use PSCI to control it. But is that really the case? If so, there are in principle two ways forward to deal with this correctly. 1) If PSCI OSI mode is being used, the corresponding NVMe storage device should be hooked up to the CPU PM cluster domain via genpd and controlled as any other devices sharing the cluster-rail. In this way, genpd together with the cpuidle-psci-domain can decide whether it's okay to turn off the cluster. I believe this is the preferred way, but 2) would work fine too. 2) If PSCI PC mode is being used, a separate channel/interface to the FW (like SCMI or rpmh in the QC case), should inform the FW whether NVMe needs the power to it. This information should then be taken into account by the PSCI FW when it decides what low-power-state to enter, which ultimately means whether the cluster-rail can be turned off or not. Assuming PSCI OSI mode is used here. Then if 1) doesn't work for you, please elaborate on why, so we can help to make it work, as it should. [...] Kind regards Uffe
On 14.11.2024 4:30 PM, Ulf Hansson wrote: > On Mon, 28 Oct 2024 at 15:24, Konrad Dybcio <konradybcio@kernel.org> wrote: >> >> Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through >> CPU_SUSPEND instead. Inform Linux about that. >> Please see the commit messages for a more detailed explanation. >> >> This is effectively a more educated follow-up to [1]. >> >> The ultimate goal is to stop making Linux think that certain states >> only concern cores/clusters, and consequently setting >> pm_set_suspend/resume_via_firmware(), so that client drivers (such as >> NVMe, see related discussion over at [2]) can make informed decisions >> about assuming the power state of the device they govern. > > In my opinion, this is not really the correct way to do it. Using > pm_set_suspend/resume_via_firmware() works fine for x86/ACPI, but not > for PSCI like this. Let me elaborate. If the NVMe storage device is > sharing the same power-rail as the CPU cluster, then yes we should use > PSCI to control it. But is that really the case? If so, there are in > principle two ways forward to deal with this correctly. > > 1) If PSCI OSI mode is being used, the corresponding NVMe storage > device should be hooked up to the CPU PM cluster domain via genpd and > controlled as any other devices sharing the cluster-rail. In this way, > genpd together with the cpuidle-psci-domain can decide whether it's > okay to turn off the cluster. I believe this is the preferred way, but > 2) would work fine too. > > 2) If PSCI PC mode is being used, a separate channel/interface to the > FW (like SCMI or rpmh in the QC case), should inform the FW whether > NVMe needs the power to it. This information should then be taken into > account by the PSCI FW when it decides what low-power-state to enter, > which ultimately means whether the cluster-rail can be turned off or > not. This assumes PSCI only governs the CPU power rail. But what I'd guesstimate is that in most implementations if system-level suspend is there at all (no matter through which call), as per the spec, it at least also projects onto the DDR power state (like in this i.mx impl here [1]), or some uncore peripherals (like in Tegra's case with some secure element being toggled at [2]) > Assuming PSCI OSI mode is used here. Then if 1) doesn't work for you, > please elaborate on why, so we can help to make it work, as it should. On Qualcomm platforms, RPMh is the central authority when it comes to power governance, but by design, the CPUs must be off (and with a specific magic cookie) for the RPMh hardware to consider powering off very power hungry parts of the system, such as general i/o rails. So again, PSCI must be fed a specific value for the rest of the hw to react. The "S2RAM state" isn't really a cpuidle state, because it doesn't differ from many shallower states as far as the cpu/cluster are concerned. If that all isn't in place, the platform never actually enters any "real" sleep state, other than "CPU and some controllable IP blocks are runtime-suspended". This effectively is very close to what ACPI+x86 do - there's a co-processor/firmware that does a lot of things behind your back and all you can do is *ask* it to change some handwavily-defined P/Cstate that affects a huge chunk of silicon. Konrad [1] https://github.com/nxp-imx/imx-atf/blob/lf_v2.6/plat/imx/imx8m/imx8mp/imx8mp_lpa_psci.c#L474 [2] https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/nvidia/tegra/soc/t210/plat_psci_handlers.c#L214
+ Maulik, Vincent On Thu, 5 Dec 2024 at 21:34, Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> wrote: > > On 14.11.2024 4:30 PM, Ulf Hansson wrote: > > On Mon, 28 Oct 2024 at 15:24, Konrad Dybcio <konradybcio@kernel.org> wrote: > >> > >> Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through > >> CPU_SUSPEND instead. Inform Linux about that. > >> Please see the commit messages for a more detailed explanation. > >> > >> This is effectively a more educated follow-up to [1]. > >> > >> The ultimate goal is to stop making Linux think that certain states > >> only concern cores/clusters, and consequently setting > >> pm_set_suspend/resume_via_firmware(), so that client drivers (such as > >> NVMe, see related discussion over at [2]) can make informed decisions > >> about assuming the power state of the device they govern. > > > > In my opinion, this is not really the correct way to do it. Using > > pm_set_suspend/resume_via_firmware() works fine for x86/ACPI, but not > > for PSCI like this. Let me elaborate. If the NVMe storage device is > > sharing the same power-rail as the CPU cluster, then yes we should use > > PSCI to control it. But is that really the case? If so, there are in > > principle two ways forward to deal with this correctly. > > > > 1) If PSCI OSI mode is being used, the corresponding NVMe storage > > device should be hooked up to the CPU PM cluster domain via genpd and > > controlled as any other devices sharing the cluster-rail. In this way, > > genpd together with the cpuidle-psci-domain can decide whether it's > > okay to turn off the cluster. I believe this is the preferred way, but > > 2) would work fine too. > > > > 2) If PSCI PC mode is being used, a separate channel/interface to the > > FW (like SCMI or rpmh in the QC case), should inform the FW whether > > NVMe needs the power to it. This information should then be taken into > > account by the PSCI FW when it decides what low-power-state to enter, > > which ultimately means whether the cluster-rail can be turned off or > > not. > > This assumes PSCI only governs the CPU power rail. But what I'd > guesstimate is that in most implementations if system-level suspend is > there at all (no matter through which call), as per the spec, it at > least also projects onto the DDR power state (like in this i.mx > impl here [1]), or some uncore peripherals (like in Tegra's case with > some secure element being toggled at [2]) Right, I certainly understand the above. There are different parts of an SoC that may be sharing the same power-island as the CPUs. The question here is whether the NVMe storage device is part of that power-island too on some QC SoCs? > > > Assuming PSCI OSI mode is used here. Then if 1) doesn't work for you, > > please elaborate on why, so we can help to make it work, as it should. > > On Qualcomm platforms, RPMh is the central authority when it comes > to power governance, but by design, the CPUs must be off (and with a > specific magic cookie) for the RPMh hardware to consider powering off > very power hungry parts of the system, such as general i/o rails. Right, that is why the "qcom,rpmh-rsc" device in many cases belongs to the cluster-power-domain (for PSCI). This allows "qcom,rpmh-rsc" to control the "last-man" activities and prevent deeper PSCI states if/when necessary. > > So again, PSCI must be fed a specific value for the rest of the hw > to react. The "S2RAM state" isn't really a cpuidle state, because > it doesn't differ from many shallower states as far as the cpu/cluster > are concerned. If that all isn't in place, the platform never actually > enters any "real" sleep state, other than "CPU and some controllable > IP blocks are runtime-suspended". We recently discussed this, offlist, with Maulik - and I think we need some more clarity around what is actually going on here. In principle, it looks to me that using S2I with just another deeper idlestate specified (with another psci-suspend-parameter, representing a deeper state) should work fine, at least theoretically. Of course, we may not be able to use that idlestate during regular cpuidle/runtime but only during S2I, which we need to control in a smooth way and that is not currently supported (but can be fixed easily, I think). In the end, it's the psci-suspend-parameter that is given to the PSCI FW that informs about what state we can enter. That said, using S2I may not work without updating the PSCI FW, of course. For example, there may be FW limitations that require the boot-CPU( CPU0) to be the last one for these deeper low-power-states. Whether that is just a FW limitation or whether there are some additional HW constraints that enforce this, needs to be clarified. > > This effectively is very close to what ACPI+x86 do - there's a > co-processor/firmware that does a lot of things behind your back and > all you can do is *ask* it to change some handwavily-defined P/Cstate > that affects a huge chunk of silicon. Yep, there are similarities. However, ACPI is for generic device power management. PSCI requires something additional, such as ARM SCMI or QC's rpm/rsc interface. > > Konrad > > [1] https://github.com/nxp-imx/imx-atf/blob/lf_v2.6/plat/imx/imx8m/imx8mp/imx8mp_lpa_psci.c#L474 > [2] https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/nvidia/tegra/soc/t210/plat_psci_handlers.c#L214 > Kind regards Uffe
On 6.12.2024 10:53 AM, Ulf Hansson wrote: > + Maulik, Vincent > > On Thu, 5 Dec 2024 at 21:34, Konrad Dybcio > <konrad.dybcio@oss.qualcomm.com> wrote: >> >> On 14.11.2024 4:30 PM, Ulf Hansson wrote: >>> On Mon, 28 Oct 2024 at 15:24, Konrad Dybcio <konradybcio@kernel.org> wrote: >>>> >>>> Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through >>>> CPU_SUSPEND instead. Inform Linux about that. >>>> Please see the commit messages for a more detailed explanation. >>>> >>>> This is effectively a more educated follow-up to [1]. >>>> >>>> The ultimate goal is to stop making Linux think that certain states >>>> only concern cores/clusters, and consequently setting >>>> pm_set_suspend/resume_via_firmware(), so that client drivers (such as >>>> NVMe, see related discussion over at [2]) can make informed decisions >>>> about assuming the power state of the device they govern. >>> >>> In my opinion, this is not really the correct way to do it. Using >>> pm_set_suspend/resume_via_firmware() works fine for x86/ACPI, but not >>> for PSCI like this. Let me elaborate. If the NVMe storage device is >>> sharing the same power-rail as the CPU cluster, then yes we should use >>> PSCI to control it. But is that really the case? If so, there are in >>> principle two ways forward to deal with this correctly. >>> >>> 1) If PSCI OSI mode is being used, the corresponding NVMe storage >>> device should be hooked up to the CPU PM cluster domain via genpd and >>> controlled as any other devices sharing the cluster-rail. In this way, >>> genpd together with the cpuidle-psci-domain can decide whether it's >>> okay to turn off the cluster. I believe this is the preferred way, but >>> 2) would work fine too. >>> >>> 2) If PSCI PC mode is being used, a separate channel/interface to the >>> FW (like SCMI or rpmh in the QC case), should inform the FW whether >>> NVMe needs the power to it. This information should then be taken into >>> account by the PSCI FW when it decides what low-power-state to enter, >>> which ultimately means whether the cluster-rail can be turned off or >>> not. >> >> This assumes PSCI only governs the CPU power rail. But what I'd >> guesstimate is that in most implementations if system-level suspend is >> there at all (no matter through which call), as per the spec, it at >> least also projects onto the DDR power state (like in this i.mx >> impl here [1]), or some uncore peripherals (like in Tegra's case with >> some secure element being toggled at [2]) > > Right, I certainly understand the above. There are different parts of > an SoC that may be sharing the same power-island as the CPUs. > > The question here is whether the NVMe storage device is part of that > power-island too on some QC SoCs? Yes, but not exclusively (i.e. there can also be other voltage rails or similar that may or may not be manged by Linux, depending on the SoC) >>> Assuming PSCI OSI mode is used here. Then if 1) doesn't work for you, >>> please elaborate on why, so we can help to make it work, as it should. >> >> On Qualcomm platforms, RPMh is the central authority when it comes >> to power governance, but by design, the CPUs must be off (and with a >> specific magic cookie) for the RPMh hardware to consider powering off >> very power hungry parts of the system, such as general i/o rails. > > Right, that is why the "qcom,rpmh-rsc" device in many cases belongs to > the cluster-power-domain (for PSCI). This allows "qcom,rpmh-rsc" to > control the "last-man" activities and prevent deeper PSCI states > if/when necessary. Problem is, today we only describe the RSC connected to the CPU cluster. Newer SoCs have multiple RSCs, which long story short allow for certain IP blocks to operate and have their power managed without the CPU block being involved, or even online. The CPU RSC can only reliably probe the CPU online status, as all other IPs can be requested to stay powered from an external entity (e.g. a DSP, secure world and similar), so the driver can only do its best to try and prevent obviously-going-to-fail idle entries when CPUs are online. >> So again, PSCI must be fed a specific value for the rest of the hw >> to react. The "S2RAM state" isn't really a cpuidle state, because >> it doesn't differ from many shallower states as far as the cpu/cluster >> are concerned. If that all isn't in place, the platform never actually >> enters any "real" sleep state, other than "CPU and some controllable >> IP blocks are runtime-suspended". > > We recently discussed this, offlist, with Maulik - and I think we need > some more clarity around what is actually going on here. > > In principle, it looks to me that using S2I with just another deeper > idlestate specified (with another psci-suspend-parameter, representing > a deeper state) should work fine, at least theoretically. Of course, > we may not be able to use that idlestate during regular > cpuidle/runtime but only during S2I, which we need to control in a > smooth way and that is not currently supported (but can be fixed > easily, I think). > > In the end, it's the psci-suspend-parameter that is given to the PSCI > FW that informs about what state we can enter. > > That said, using S2I may not work without updating the PSCI FW, of > course. For example, there may be FW limitations that require the > boot-CPU( CPU0) to be the last one for these deeper low-power-states. > Whether that is just a FW limitation or whether there are some > additional HW constraints that enforce this, needs to be clarified. Yeah, not being able to runtime-idle into that state is one issue, and another one being successfully entering the S2RAM state may require us to reinitialize some hardware. Currently, Linux has no way of knowing that state is any different from the rest, but marking it as S2RAM would allow to check for PM_SUSPEND_MEM vs PM_SUSPEND_TO_IDLE >> This effectively is very close to what ACPI+x86 do - there's a >> co-processor/firmware that does a lot of things behind your back and >> all you can do is *ask* it to change some handwavily-defined P/Cstate >> that affects a huge chunk of silicon. > > Yep, there are similarities. > > However, ACPI is for generic device power management. PSCI requires > something additional, such as ARM SCMI or QC's rpm/rsc interface. Right, we're not yet fully there with "for_each_device(fw_shut_down())" Konrad > >> >> Konrad >> >> [1] https://github.com/nxp-imx/imx-atf/blob/lf_v2.6/plat/imx/imx8m/imx8mp/imx8mp_lpa_psci.c#L474 >> [2] https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/nvidia/tegra/soc/t210/plat_psci_handlers.c#L214 >> > > Kind regards > Uffe
On Mon, Oct 28, 2024 at 03:22:56PM +0100, Konrad Dybcio wrote: > Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through > CPU_SUSPEND instead. Inform Linux about that. > Please see the commit messages for a more detailed explanation. > > This is effectively a more educated follow-up to [1]. > > The ultimate goal is to stop making Linux think that certain states > only concern cores/clusters, and consequently setting > pm_set_suspend/resume_via_firmware(), so that client drivers (such as > NVMe, see related discussion over at [2]) can make informed decisions > about assuming the power state of the device they govern. > > If this series gets green light, I'll push a follow-up one that wires > up said sleep state on Qualcomm SoCs across the board. > > [1] https://lore.kernel.org/linux-arm-kernel/20231227-topic-psci_fw_sus-v1-0-6910add70bf3@linaro.org/ > [2] https://lore.kernel.org/linux-nvme/20241024-topic-nvmequirk-v1-1-51249999d409@oss.qualcomm.com/ > I got a bit confused, but I think I might've pieced it together. Konrad wants to support s2ram (not clear why) on Qualcomm SoCs from 2015-2023. On these SoCs, PSCI_SYSTEM_SUSPEND (s2ram) isn't supported but doing s2idle gets you the same effect. You'd like s2ram to work, so you provide a way to replace the PSCI_SYSTEM_SUSPEND param with (effectively) the CPU_SUSPEND command. If this is the wrong understanding, please correct me. Could patch 2 be sent separately? I think it seems fine without the rest of the series. I'm not sure why you'd like to support s2ram. Is it *only* that you'd like to be able to set pm_set_supend/resume_via_firmware()? I hope this doesn't sound silly: what if you register a platform_s2idle_ops for the relevant SoCs which calls pm_set_suspend/resume_via_firwmare()? - Elliot
On 14.11.2024 2:10 AM, Elliot Berman wrote: > On Mon, Oct 28, 2024 at 03:22:56PM +0100, Konrad Dybcio wrote: >> Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through >> CPU_SUSPEND instead. Inform Linux about that. >> Please see the commit messages for a more detailed explanation. >> >> This is effectively a more educated follow-up to [1]. >> >> The ultimate goal is to stop making Linux think that certain states >> only concern cores/clusters, and consequently setting >> pm_set_suspend/resume_via_firmware(), so that client drivers (such as >> NVMe, see related discussion over at [2]) can make informed decisions >> about assuming the power state of the device they govern. >> >> If this series gets green light, I'll push a follow-up one that wires >> up said sleep state on Qualcomm SoCs across the board. >> >> [1] https://lore.kernel.org/linux-arm-kernel/20231227-topic-psci_fw_sus-v1-0-6910add70bf3@linaro.org/ >> [2] https://lore.kernel.org/linux-nvme/20241024-topic-nvmequirk-v1-1-51249999d409@oss.qualcomm.com/ >> > > I got a bit confused, but I think I might've pieced it together. Konrad > wants to support s2ram (not clear why) on Qualcomm SoCs from 2015-2023. > On these SoCs, PSCI_SYSTEM_SUSPEND (s2ram) isn't supported but doing > s2idle gets you the same effect. You'd like s2ram to work, so you > provide a way to replace the PSCI_SYSTEM_SUSPEND param with > (effectively) the CPU_SUSPEND command. If this is the wrong > understanding, please correct me. > > Could patch 2 be sent separately? I think it seems fine without the > rest of the series. > > I'm not sure why you'd like to support s2ram. Is it *only* that you'd > like to be able to set pm_set_supend/resume_via_firmware()? I hope this > doesn't sound silly: what if you register a platform_s2idle_ops for the > relevant SoCs which calls pm_set_suspend/resume_via_firwmare()? S2RAM is what you get after entering a certain state, but currently it's presented as just another (s2idle) idle state. That means some hardware that may need to be reinitialized, isn't as Linux has no clue it might have lost power. One of such cases is the PCIe block, with storage drivers specifically looking for pm_suspend_via_firmware, but that's unfortunately not the whole list. Konrad
On Thu, Dec 19, 2024 at 08:26:51PM +0100, Konrad Dybcio wrote: > On 14.11.2024 2:10 AM, Elliot Berman wrote: > > > I'm not sure why you'd like to support s2ram. Is it *only* that you'd > > like to be able to set pm_set_supend/resume_via_firmware()? I hope this > > doesn't sound silly: what if you register a platform_s2idle_ops for the > > relevant SoCs which calls pm_set_suspend/resume_via_firwmare()? > > S2RAM is what you get after entering a certain state, but currently > it's presented as just another (s2idle) idle state. > Just to be clear, I assume you mean CPU_SUSPEND idle state. There is no special or different s2idle idle states IIUC. > That means some hardware that may need to be reinitialized, isn't as > Linux has no clue it might have lost power. > Interesting, so this means firmware doesn't automatically save and restore states yet exposes it as CPU_SUSPEND idle state. > One of such cases is the PCIe block, with storage drivers specifically > looking for pm_suspend_via_firmware, but that's unfortunately not the > whole list. > Well I can now imagine and I understand what's wrong here. An idle state is exposed to OS with an expectation that OS saves and restores certain state. Unless you tie it some other power domains that theses devices share, it is hard for OS to know the state is being lost and it needs to save and restore them. It is simple wrong to assume that OS needs to take care of them even though the power domain hierarchy doesn't represent this dependency to enter such a state. cpuidle-psci-domain.c takes care of this IIUC. Ulf can provide details if you are interested. -- Regards, Sudeep
On 20.12.2024 12:39 PM, Sudeep Holla wrote: > On Thu, Dec 19, 2024 at 08:26:51PM +0100, Konrad Dybcio wrote: >> On 14.11.2024 2:10 AM, Elliot Berman wrote: >> >>> I'm not sure why you'd like to support s2ram. Is it *only* that you'd >>> like to be able to set pm_set_supend/resume_via_firmware()? I hope this >>> doesn't sound silly: what if you register a platform_s2idle_ops for the >>> relevant SoCs which calls pm_set_suspend/resume_via_firwmare()? >> >> S2RAM is what you get after entering a certain state, but currently >> it's presented as just another (s2idle) idle state. >> > > Just to be clear, I assume you mean CPU_SUSPEND idle state. There is > no special or different s2idle idle states IIUC. Yeah, right. >> That means some hardware that may need to be reinitialized, isn't as >> Linux has no clue it might have lost power. >> > > Interesting, so this means firmware doesn't automatically save and restore > states yet exposes it as CPU_SUSPEND idle state. Reading the spec, I'm pretty sure PSCI calls should only mess with the power state of the cores, core-adjacent peripherals and GIC. Reading section 5.20.1 (SYSTEM_SUSPEND / Intended use) I think it says mostly what I'm trying to convey: "In a typical implementation, the semantics are equivalent to a CPU_SUSPEND to the deepest low-power state. However, it is possible that an implementation might reserve a deeper state for SYSTEM_SUSPEND than those used with CPU_SUSPEND." - this is the situation on QC platforms, with the case of not reserving a deeper state for SYSTEM_SUSPEND >> One of such cases is the PCIe block, with storage drivers specifically >> looking for pm_suspend_via_firmware, but that's unfortunately not the >> whole list. >> > > Well I can now imagine and I understand what's wrong here. An idle state > is exposed to OS with an expectation that OS saves and restores certain > state. Unless you tie it some other power domains that theses devices > share, it is hard for OS to know the state is being lost and it needs > to save and restore them. It is simple wrong to assume that OS needs > to take care of them even though the power domain hierarchy doesn't > represent this dependency to enter such a state. cpuidle-psci-domain.c > takes care of this IIUC. Ulf can provide details if you are interested. The spec disagrees: "Note that entering the system into S2 or S3 carries with it several preconditions. For example, all devices in the system must be in a state that is compatible with entry into the system state" - this also happens to be relevant here, given PSCI is not supposed to power-govern the entire SoC, but only the CPU block. We have specialty hardware that does power management for non-CPU IPs, but to request a system power rail disablement, it must be done in conjunction with the CPU requesting such CPU_SUSPEND state. And only after the required hardware is de-initialized. Konrad
On Fri, Dec 20, 2024 at 01:42:04PM +0100, Konrad Dybcio wrote: > On 20.12.2024 12:39 PM, Sudeep Holla wrote: > > On Thu, Dec 19, 2024 at 08:26:51PM +0100, Konrad Dybcio wrote: > >> On 14.11.2024 2:10 AM, Elliot Berman wrote: > >> > >>> I'm not sure why you'd like to support s2ram. Is it *only* that you'd > >>> like to be able to set pm_set_supend/resume_via_firmware()? I hope this > >>> doesn't sound silly: what if you register a platform_s2idle_ops for the > >>> relevant SoCs which calls pm_set_suspend/resume_via_firwmare()? > >> > >> S2RAM is what you get after entering a certain state, but currently > >> it's presented as just another (s2idle) idle state. > >> > > > > Just to be clear, I assume you mean CPU_SUSPEND idle state. There is > > no special or different s2idle idle states IIUC. > > Yeah, right. > > >> That means some hardware that may need to be reinitialized, isn't as > >> Linux has no clue it might have lost power. > >> > > > > Interesting, so this means firmware doesn't automatically save and restore > > states yet exposes it as CPU_SUSPEND idle state. > > Reading the spec, I'm pretty sure PSCI calls should only mess with the > power state of the cores, core-adjacent peripherals and GIC. > > Reading section 5.20.1 (SYSTEM_SUSPEND / Intended use) I think it says > mostly what I'm trying to convey: > > > "In a typical implementation, the semantics are equivalent to a > CPU_SUSPEND to the deepest low-power state. However, it is possible that > an implementation might reserve a deeper state for SYSTEM_SUSPEND than > those used with CPU_SUSPEND." > Yes these text help to understand the interface easily. If they were same, do you think we would have defined 2 different interfaces. -- Regards, Sudeep
On 20.12.2024 2:58 PM, Sudeep Holla wrote: > On Fri, Dec 20, 2024 at 01:42:04PM +0100, Konrad Dybcio wrote: >> On 20.12.2024 12:39 PM, Sudeep Holla wrote: >>> On Thu, Dec 19, 2024 at 08:26:51PM +0100, Konrad Dybcio wrote: >>>> On 14.11.2024 2:10 AM, Elliot Berman wrote: >>>> >>>>> I'm not sure why you'd like to support s2ram. Is it *only* that you'd >>>>> like to be able to set pm_set_supend/resume_via_firmware()? I hope this >>>>> doesn't sound silly: what if you register a platform_s2idle_ops for the >>>>> relevant SoCs which calls pm_set_suspend/resume_via_firwmare()? >>>> >>>> S2RAM is what you get after entering a certain state, but currently >>>> it's presented as just another (s2idle) idle state. >>>> >>> >>> Just to be clear, I assume you mean CPU_SUSPEND idle state. There is >>> no special or different s2idle idle states IIUC. >> >> Yeah, right. >> >>>> That means some hardware that may need to be reinitialized, isn't as >>>> Linux has no clue it might have lost power. >>>> >>> >>> Interesting, so this means firmware doesn't automatically save and restore >>> states yet exposes it as CPU_SUSPEND idle state. >> >> Reading the spec, I'm pretty sure PSCI calls should only mess with the >> power state of the cores, core-adjacent peripherals and GIC. >> >> Reading section 5.20.1 (SYSTEM_SUSPEND / Intended use) I think it says >> mostly what I'm trying to convey: >> >> >> "In a typical implementation, the semantics are equivalent to a >> CPU_SUSPEND to the deepest low-power state. However, it is possible that >> an implementation might reserve a deeper state for SYSTEM_SUSPEND than >> those used with CPU_SUSPEND." >> > > Yes these text help to understand the interface easily. If they were same, > do you think we would have defined 2 different interfaces. I would happen to think that, yes. Especially since the reference firmware implementation does *exactly this*: https://github.com/ARM-software/arm-trusted-firmware/blob/master/lib/psci/psci_main.c#L179-L221 PSCI_SYSTEM_SUSPEND seems to be simply meant as a wrapper around a specific CPU_SUSPEND state (which may or may not be only callable from inside the firmware when SYSTEM_SUSPEND specifically is requested, for reasons), in a platform-agnostic way, so that the OS can enter suspend without providing that magic StateID on all supported platforms. But since it already requires more elbow grease on the peripheral IP side, I'm not really convinced it's that much useful. Plus, the optional bit of doing more work behind the scenes doesn't seem to be very wildly used across TF-A supported platforms. So please, stop making the argument that it's any different. The firmware I'm dealing with simply didn't expose the same thing twice, in perfect accordance with the spec. Konrad
On Fri, Dec 20, 2024 at 03:20:37PM +0100, Konrad Dybcio wrote: > > I would happen to think that, yes. Especially since the reference firmware > implementation does *exactly this*: > > https://github.com/ARM-software/arm-trusted-firmware/blob/master/lib/psci/psci_main.c#L179-L221 > > PSCI_SYSTEM_SUSPEND seems to be simply meant as a wrapper around a specific > CPU_SUSPEND state (which may or may not be only callable from inside the > firmware when SYSTEM_SUSPEND specifically is requested, for reasons), > in a platform-agnostic way, so that the OS can enter suspend without > providing that magic StateID on all supported platforms. Exactly, that's how it can be OS and platform agnostic. Yet this platform considered to optimise by not just providing it as a wrapper(if it was that simple on your platform too) without running any tests and leaving it to interested parties like you to mess around to get it working. That practice needs to be fixed and this change won't help and once we fix this, more such special treatment fixes are needed on newer platforms. So lets stop and ensure things are fixed properly. > But since it already requires more elbow grease on the peripheral IP side, > I'm not really convinced it's that much useful. > > Plus, the optional bit of doing more work behind the scenes doesn't seem > to be very wildly used across TF-A supported platforms. > > So please, stop making the argument that it's any different. The firmware > I'm dealing with simply didn't expose the same thing twice, in perfect > accordance with the spec. > So that it can continue to do so in the future ? Thanks but no thanks. NACK with no arguments as requested. -- Regards, Sudeep
On 20.12.2024 3:36 PM, Sudeep Holla wrote: > On Fri, Dec 20, 2024 at 03:20:37PM +0100, Konrad Dybcio wrote: >> >> I would happen to think that, yes. Especially since the reference firmware >> implementation does *exactly this*: >> >> https://github.com/ARM-software/arm-trusted-firmware/blob/master/lib/psci/psci_main.c#L179-L221 >> >> PSCI_SYSTEM_SUSPEND seems to be simply meant as a wrapper around a specific >> CPU_SUSPEND state (which may or may not be only callable from inside the >> firmware when SYSTEM_SUSPEND specifically is requested, for reasons), >> in a platform-agnostic way, so that the OS can enter suspend without >> providing that magic StateID on all supported platforms. > > Exactly, that's how it can be OS and platform agnostic. Yet this platform > considered to optimise by not just providing it as a wrapper(if it was > that simple on your platform too) without running any tests and leaving > it to interested parties like you to mess around to get it working. > That practice needs to be fixed and this change won't help and once we > fix this, more such special treatment fixes are needed on newer platforms. > So lets stop and ensure things are fixed properly. And then remove CPU_SUSPEND support if CPU_SUSPEND2 comes in a spec update because it's not generic enough? Sorry, this is not acceptable. If you enforce PSCI as the only way of doing SMP/cpuidle/platform suspend upstream on arm64, you should not gatekeep existing implementations that are actually in line with the written spec, just because you don't happen to like them. If you want to start the process of getting rid of those, amend the spec to deprecate and/or forbid system-level suspend in CPU_SUSPEND in future PSCI versions. But you can't retroactively change your decisions like that. >> But since it already requires more elbow grease on the peripheral IP side, >> I'm not really convinced it's that much useful. >> >> Plus, the optional bit of doing more work behind the scenes doesn't seem >> to be very wildly used across TF-A supported platforms. >> >> So please, stop making the argument that it's any different. The firmware >> I'm dealing with simply didn't expose the same thing twice, in perfect >> accordance with the spec. >> > > So that it can continue to do so in the future ? > Thanks but no thanks. NACK with no arguments as requested. That's already been "fixed" on QC platforms starting around 2022, as mentioned in this series. Konrad
On Mon, Oct 28, 2024 at 03:22:56PM +0100, Konrad Dybcio wrote: > Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through > CPU_SUSPEND instead. Inform Linux about that. > Please see the commit messages for a more detailed explanation. > It is still not PSCI_SYSTEM_SUSPEND though... > This is effectively a more educated follow-up to [1]. > > The ultimate goal is to stop making Linux think that certain states > only concern cores/clusters, and consequently setting > pm_set_suspend/resume_via_firmware(), so that client drivers (such as > NVMe, see related discussion over at [2]) can make informed decisions > about assuming the power state of the device they govern. > > If this series gets green light, I'll push a follow-up one that wires > up said sleep state on Qualcomm SoCs across the board. > Sorry. I don't think PSCI is the right place for this. Qcom SoCs have a common firmware across all segments (mostly), so there is no S2R involved and only S2Idle. If you use PSCI to implement suspend_via_firmware(), then all the SoCs making use of the PSCI implementation will have the same behavior. I don't think we would want that. For instance, if a Qcom SoC is used in an android tablet with the same firmware, then this would allow the NVMe device to be turned off during system suspend all the time when user presses the lock button. And this will cause NVMe device to wear out faster. The said approach will work fine for non-android usecases though. I have a couple of ideas in mind that I will post to NVMe list itself. - Mani > [1] https://lore.kernel.org/linux-arm-kernel/20231227-topic-psci_fw_sus-v1-0-6910add70bf3@linaro.org/ > [2] https://lore.kernel.org/linux-nvme/20241024-topic-nvmequirk-v1-1-51249999d409@oss.qualcomm.com/ > > Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> > --- > Konrad Dybcio (3): > dt-bindings: arm,psci: Allow S2RAM power_state parameter description > firmware/psci: Set pm_set_resume/suspend_via_firmware() for SYSTEM_SUSPEND > firmware/psci: Allow specifying an S2RAM state through CPU_SUSPEND > > Documentation/devicetree/bindings/arm/psci.yaml | 6 ++++ > drivers/firmware/psci/psci.c | 44 ++++++++++++++++++++++--- > 2 files changed, 46 insertions(+), 4 deletions(-) > --- > base-commit: a39230ecf6b3057f5897bc4744a790070cfbe7a8 > change-id: 20241028-topic-cpu_suspend_s2ram-28fc095d0aa4 > > Best regards, > -- > Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> > -- மணிவண்ணன் சதாசிவம்
On 11/12/24 19:01, Manivannan Sadhasivam wrote: > On Mon, Oct 28, 2024 at 03:22:56PM +0100, Konrad Dybcio wrote: >> Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through >> CPU_SUSPEND instead. Inform Linux about that. >> Please see the commit messages for a more detailed explanation. >> > > It is still not PSCI_SYSTEM_SUSPEND though... It *literally* does the same thing on devices where it's exposed. > >> This is effectively a more educated follow-up to [1]. >> >> The ultimate goal is to stop making Linux think that certain states >> only concern cores/clusters, and consequently setting >> pm_set_suspend/resume_via_firmware(), so that client drivers (such as >> NVMe, see related discussion over at [2]) can make informed decisions >> about assuming the power state of the device they govern. >> >> If this series gets green light, I'll push a follow-up one that wires >> up said sleep state on Qualcomm SoCs across the board. >> > > Sorry. I don't think PSCI is the right place for this. Qcom SoCs have a common > firmware across all segments (mostly), This ^ > so there is no S2R involved and only S2Idle. is not at all related to this ^, the "so" makes no sense. (also you're wrong, this *is* S2RAM) > If you use PSCI to implement suspend_via_firmware(), then all the SoCs > making use of the PSCI implementation will have the same behavior. I don't think > we would want that. This is an issue with the NVMe framework that is totally unrelated to this change, see below. Also, the code only sets that on targets where such state exists and is described. > For instance, if a Qcom SoC is used in an android tablet with the same firmware, > then this would allow the NVMe device to be turned off during system suspend all > the time when user presses the lock button. And this will cause NVMe device to > wear out faster. The said approach will work fine for non-android usecases > though. The NVMe framework doesn't make a distinction between "phone screen off" and "laptop lid closed & thrown in a bag" on *any* platform. The usecase you're describing is not supported as of today since nobody *actually* has NVMe on a phone that also happens to run upstream Linux. I'm not going to solve imaginary problems. Besides, userspace already has sysfs to tune device power state knobs. Which Android uses very extensively on market devices. Konrad
On Tue, Nov 12, 2024 at 07:32:36PM +0100, Konrad Dybcio wrote: > > > On 11/12/24 19:01, Manivannan Sadhasivam wrote: > > On Mon, Oct 28, 2024 at 03:22:56PM +0100, Konrad Dybcio wrote: > > > Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through > > > CPU_SUSPEND instead. Inform Linux about that. > > > Please see the commit messages for a more detailed explanation. > > > > > > > It is still not PSCI_SYSTEM_SUSPEND though... > > It *literally* does the same thing on devices where it's exposed. > But still... > > > > > This is effectively a more educated follow-up to [1]. > > > > > > The ultimate goal is to stop making Linux think that certain states > > > only concern cores/clusters, and consequently setting > > > pm_set_suspend/resume_via_firmware(), so that client drivers (such as > > > NVMe, see related discussion over at [2]) can make informed decisions > > > about assuming the power state of the device they govern. > > > > > > If this series gets green light, I'll push a follow-up one that wires > > > up said sleep state on Qualcomm SoCs across the board. > > > > > > > Sorry. I don't think PSCI is the right place for this. Qcom SoCs have a common > > firmware across all segments (mostly), > > This ^ > > > so there is no S2R involved and only S2Idle. > > is not at all related to this ^, the "so" makes no sense. > > (also you're wrong, this *is* S2RAM) > What? Qcom SoCs supporting S2R? I'm unheard of. > > If you use PSCI to implement suspend_via_firmware(), then all the SoCs > > making use of the PSCI implementation will have the same behavior. I don't think > > we would want that. > > This is an issue with the NVMe framework that is totally unrelated to this > change, see below. Also, the code only sets that on targets where such state > exists and is described. > Well, you are doing it just because you want the NVMe device to learn about the platform requirement. > > For instance, if a Qcom SoC is used in an android tablet with the same firmware, > > then this would allow the NVMe device to be turned off during system suspend all > > the time when user presses the lock button. And this will cause NVMe device to > > wear out faster. The said approach will work fine for non-android usecases > > though. > > The NVMe framework doesn't make a distinction between "phone screen off" and > "laptop lid closed & thrown in a bag" on *any* platform. The usecase you're > describing is not supported as of today since nobody *actually* has NVMe on a > phone that also happens to run upstream Linux. > I'm not going to solve imaginary problems. > Not just phone, NVMe device could be running on an android tablet. I'm not talking about an imaginary problem, but a real problem that is in a forseeable future (that is also the reason why NVMe developers doesn't want to put the device into power down mode always during system suspend). And with this change, you are just going to make the NVMe lifetime miserable on those platforms. - Mani > Besides, userspace already has sysfs to tune device power state knobs. Which > Android uses very extensively on market devices. > > Konrad -- மணிவண்ணன் சதாசிவம்
On 11/12/24 19:43, Manivannan Sadhasivam wrote: > On Tue, Nov 12, 2024 at 07:32:36PM +0100, Konrad Dybcio wrote: >> >> >> On 11/12/24 19:01, Manivannan Sadhasivam wrote: >>> On Mon, Oct 28, 2024 at 03:22:56PM +0100, Konrad Dybcio wrote: >>>> Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through >>>> CPU_SUSPEND instead. Inform Linux about that. >>>> Please see the commit messages for a more detailed explanation. >>>> >>> >>> It is still not PSCI_SYSTEM_SUSPEND though... >> >> It *literally* does the same thing on devices where it's exposed. >> > > But still... Still-what? We can't replace the signed firmware on (unironically) tens of millions of devices in the wild and this is how it exposes that sleep state. This is how arm platforms did it before the PSCI spec was updated and SYSTEM_SUSPEND is *still optional today*. >>>> This is effectively a more educated follow-up to [1]. >>>> >>>> The ultimate goal is to stop making Linux think that certain states >>>> only concern cores/clusters, and consequently setting >>>> pm_set_suspend/resume_via_firmware(), so that client drivers (such as >>>> NVMe, see related discussion over at [2]) can make informed decisions >>>> about assuming the power state of the device they govern. >>>> >>>> If this series gets green light, I'll push a follow-up one that wires >>>> up said sleep state on Qualcomm SoCs across the board. >>>> >>> >>> Sorry. I don't think PSCI is the right place for this. Qcom SoCs have a common >>> firmware across all segments (mostly), >> >> This ^ >> >>> so there is no S2R involved and only S2Idle. >> >> is not at all related to this ^, the "so" makes no sense. >> >> (also you're wrong, this *is* S2RAM) >> > > What? Qcom SoCs supporting S2R? I'm unheard of. Maybe you're thinking of hibernation, which is not widely (if at all) supported. >>> If you use PSCI to implement suspend_via_firmware(), then all the SoCs >>> making use of the PSCI implementation will have the same behavior. I don't think >>> we would want that. >> >> This is an issue with the NVMe framework that is totally unrelated to this >> change, see below. Also, the code only sets that on targets where such state >> exists and is described. >> > > Well, you are doing it just because you want the NVMe device to learn about the > platform requirement. And I can't see why you're having a problem with this. It's exactly how it works on x86 too. Modern Standby also shuts down storage on Windows, regardless of the CPU architecture. >>> For instance, if a Qcom SoC is used in an android tablet with the same firmware, >>> then this would allow the NVMe device to be turned off during system suspend all >>> the time when user presses the lock button. And this will cause NVMe device to >>> wear out faster. The said approach will work fine for non-android usecases >>> though. >> >> The NVMe framework doesn't make a distinction between "phone screen off" and >> "laptop lid closed & thrown in a bag" on *any* platform. The usecase you're >> describing is not supported as of today since nobody *actually* has NVMe on a >> phone that also happens to run upstream Linux. >> I'm not going to solve imaginary problems. >> > > Not just phone, NVMe device could be running on an android tablet. 'Could' very much makes it imaginary. There are no supported devices that fall into this category. > I'm not > talking about an imaginary problem, but a real problem that is in a forseeable > future Keyword: future. This issue has been on hold for years because of 'issues' that are pinky promised to happen eventually, without anyone suggesting any actually acceptable solutions. This just undermines progress. > (that is also the reason why NVMe developers doesn't want to put the > device into power down mode always during system suspend). This is the current behavior on any new x86 laptop, and has been for a couple of years. > And with this change, you are just going to make the NVMe lifetime miserable on > those platforms. Fearmongering and hearsay. See above. Konrad
On Tue, Nov 12, 2024 at 08:04:34PM +0100, Konrad Dybcio wrote: > > > On 11/12/24 19:43, Manivannan Sadhasivam wrote: > > On Tue, Nov 12, 2024 at 07:32:36PM +0100, Konrad Dybcio wrote: > > > > > > > > > On 11/12/24 19:01, Manivannan Sadhasivam wrote: > > > > On Mon, Oct 28, 2024 at 03:22:56PM +0100, Konrad Dybcio wrote: > > > > > Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through > > > > > CPU_SUSPEND instead. Inform Linux about that. > > > > > Please see the commit messages for a more detailed explanation. > > > > > > > > > > > > > It is still not PSCI_SYSTEM_SUSPEND though... > > > > > > It *literally* does the same thing on devices where it's exposed. > > > > > > > But still... > > Still-what? We can't replace the signed firmware on (unironically) tens > of millions of devices in the wild and this is how it exposes that sleep > state. This is how arm platforms did it before the PSCI spec was > updated and SYSTEM_SUSPEND is *still optional today*. > I never asked you to replace the firmware in first place, so don't quote the fact I never said. I see this approach as a way of abusing/faking PSCI system suspend. Moreover, I heard from Bjorn that Qcom doesn't want to put the PCIe devices into D3Cold during system suspend for future platforms (based on their experimentation). So if drivers rely on this static information, then even Qcom cannot achieve what they want. > > > > > > This is effectively a more educated follow-up to [1]. > > > > > > > > > > The ultimate goal is to stop making Linux think that certain states > > > > > only concern cores/clusters, and consequently setting > > > > > pm_set_suspend/resume_via_firmware(), so that client drivers (such as > > > > > NVMe, see related discussion over at [2]) can make informed decisions > > > > > about assuming the power state of the device they govern. > > > > > > > > > > If this series gets green light, I'll push a follow-up one that wires > > > > > up said sleep state on Qualcomm SoCs across the board. > > > > > > > > > > > > > Sorry. I don't think PSCI is the right place for this. Qcom SoCs have a common > > > > firmware across all segments (mostly), > > > > > > This ^ > > > > > > > so there is no S2R involved and only S2Idle. > > > > > > is not at all related to this ^, the "so" makes no sense. > > > > > > (also you're wrong, this *is* S2RAM) > > > > > > > What? Qcom SoCs supporting S2R? I'm unheard of. > > Maybe you're thinking of hibernation, which is not widely (if at all) > supported. > Not hibernation. The Qcom platforms I've aware of all support only S2Idle. I don't work for Qcom, so I may be missing some insider information. > > > > > If you use PSCI to implement suspend_via_firmware(), then all the SoCs > > > > making use of the PSCI implementation will have the same behavior. I don't think > > > > we would want that. > > > > > > This is an issue with the NVMe framework that is totally unrelated to this > > > change, see below. Also, the code only sets that on targets where such state > > > exists and is described. > > > > > > > Well, you are doing it just because you want the NVMe device to learn about the > > platform requirement. > > And I can't see why you're having a problem with this. It's exactly how it > works on x86 too. Modern Standby also shuts down storage on Windows, > regardless of the CPU architecture. It is not just my problem. I'm expressing the concern that NVMe folks have and already expressed over the similar solutions I proposed. And I cannot just overrule them. > > > > For instance, if a Qcom SoC is used in an android tablet with the same firmware, > > > > then this would allow the NVMe device to be turned off during system suspend all > > > > the time when user presses the lock button. And this will cause NVMe device to > > > > wear out faster. The said approach will work fine for non-android usecases > > > > though. > > > > > > The NVMe framework doesn't make a distinction between "phone screen off" and > > > "laptop lid closed & thrown in a bag" on *any* platform. The usecase you're > > > describing is not supported as of today since nobody *actually* has NVMe on a > > > phone that also happens to run upstream Linux. > > > I'm not going to solve imaginary problems. > > > > > > > Not just phone, NVMe device could be running on an android tablet. > > 'Could' very much makes it imaginary. There are no supported devices that > fall into this category. > Agree that there are no products in the market (yet). But having NMVe on handheld devices is not something I would quote as 'imaginary'. > > I'm not > > talking about an imaginary problem, but a real problem that is in a forseeable > > future > > Keyword: future. This issue has been on hold for years because of 'issues' > that are pinky promised to happen eventually, without anyone suggesting any > actually acceptable solutions. This just undermines progress. > Not true. There are solutions suggested, but then it always takes time to reach consensus. One of the approach that I'm about to propose is to have a userspace knob that specifies whether the device can be powered down or not (leaving the default behavior to put them in low power state). Because, the decision to put the devices into power down or low power state sounds more like an userspace policy. It was discussed at LPC 2023. > > (that is also the reason why NVMe developers doesn't want to put the > > device into power down mode always during system suspend). > > This is the current behavior on any new x86 laptop, and has been for a > couple of years. > > > And with this change, you are just going to make the NVMe lifetime miserable on > > those platforms. > > Fearmongering and hearsay. See above. > I can only wish you best of luck with this approach! - Mani -- மணிவண்ணன் சதாசிவம்
On 13.11.2024 9:05 AM, Manivannan Sadhasivam wrote: > On Tue, Nov 12, 2024 at 08:04:34PM +0100, Konrad Dybcio wrote: >> >> >> On 11/12/24 19:43, Manivannan Sadhasivam wrote: >>> On Tue, Nov 12, 2024 at 07:32:36PM +0100, Konrad Dybcio wrote: >>>> >>>> >>>> On 11/12/24 19:01, Manivannan Sadhasivam wrote: >>>>> On Mon, Oct 28, 2024 at 03:22:56PM +0100, Konrad Dybcio wrote: >>>>>> Certain firmwares expose exactly what PSCI_SYSTEM_SUSPEND does through >>>>>> CPU_SUSPEND instead. Inform Linux about that. >>>>>> Please see the commit messages for a more detailed explanation. >>>>>> >>>>> >>>>> It is still not PSCI_SYSTEM_SUSPEND though... >>>> >>>> It *literally* does the same thing on devices where it's exposed. >>>> >>> >>> But still... >> >> Still-what? We can't replace the signed firmware on (unironically) tens >> of millions of devices in the wild and this is how it exposes that sleep >> state. This is how arm platforms did it before the PSCI spec was >> updated and SYSTEM_SUSPEND is *still optional today*. >> > > I never asked you to replace the firmware in first place, so don't quote the > fact I never said. Never implied you did. I'm putting pressure on the fact that we can't update the firmware on such devices to expose PSCI_SYSTEM_SUSPEND. > I see this approach as a way of abusing/faking PSCI system > suspend. And I disagree. I can't stress this enough, calling PSCI_SYSTEM_SUSPEND is literally internally equivalent to calling PSCI_CPU_SUSPEND(magicval). > > Moreover, I heard from Bjorn that Qcom doesn't want to put the PCIe devices into > D3Cold during system suspend for future platforms (based on their > experimentation). So if drivers rely on this static information, then even Qcom > cannot achieve what they want. > >> >>>>>> This is effectively a more educated follow-up to [1]. >>>>>> >>>>>> The ultimate goal is to stop making Linux think that certain states >>>>>> only concern cores/clusters, and consequently setting >>>>>> pm_set_suspend/resume_via_firmware(), so that client drivers (such as >>>>>> NVMe, see related discussion over at [2]) can make informed decisions >>>>>> about assuming the power state of the device they govern. >>>>>> >>>>>> If this series gets green light, I'll push a follow-up one that wires >>>>>> up said sleep state on Qualcomm SoCs across the board. >>>>>> >>>>> >>>>> Sorry. I don't think PSCI is the right place for this. Qcom SoCs have a common >>>>> firmware across all segments (mostly), >>>> >>>> This ^ >>>> >>>>> so there is no S2R involved and only S2Idle. >>>> >>>> is not at all related to this ^, the "so" makes no sense. >>>> >>>> (also you're wrong, this *is* S2RAM) >>>> >>> >>> What? Qcom SoCs supporting S2R? I'm unheard of. >> >> Maybe you're thinking of hibernation, which is not widely (if at all) >> supported. >> > > Not hibernation. The Qcom platforms I've aware of all support only S2Idle. I > don't work for Qcom, so I may be missing some insider information. I think this is the main source of misunderstanding in this entire thread. CXPC is S2RAM. Not S2idle. Shallower sleep states on QC platforms are S2idle. >>>>> If you use PSCI to implement suspend_via_firmware(), then all the SoCs >>>>> making use of the PSCI implementation will have the same behavior. I don't think >>>>> we would want that. >>>> >>>> This is an issue with the NVMe framework that is totally unrelated to this >>>> change, see below. Also, the code only sets that on targets where such state >>>> exists and is described. >>>> >>> >>> Well, you are doing it just because you want the NVMe device to learn about the >>> platform requirement. >> >> And I can't see why you're having a problem with this. It's exactly how it >> works on x86 too. Modern Standby also shuts down storage on Windows, >> regardless of the CPU architecture. > > It is not just my problem. I'm expressing the concern that NVMe folks have and > already expressed over the similar solutions I proposed. And I cannot just > overrule them. Sure, but if PSCI_SYSTEM_SUSPEND implies S2ram, why should the behavior be different purely based on the architectural idle implementation? Moreover, if the same platform can be booted with ACPI or DT, why should power state switching work differently, considering both would describe the hardware accurately? >>>>> For instance, if a Qcom SoC is used in an android tablet with the same firmware, >>>>> then this would allow the NVMe device to be turned off during system suspend all >>>>> the time when user presses the lock button. And this will cause NVMe device to >>>>> wear out faster. The said approach will work fine for non-android usecases >>>>> though. >>>> >>>> The NVMe framework doesn't make a distinction between "phone screen off" and >>>> "laptop lid closed & thrown in a bag" on *any* platform. The usecase you're >>>> describing is not supported as of today since nobody *actually* has NVMe on a >>>> phone that also happens to run upstream Linux. >>>> I'm not going to solve imaginary problems. >>>> >>> >>> Not just phone, NVMe device could be running on an android tablet. >> >> 'Could' very much makes it imaginary. There are no supported devices that >> fall into this category. >> > > Agree that there are no products in the market (yet). But having NMVe on > handheld devices is not something I would quote as 'imaginary'. > >>> I'm not >>> talking about an imaginary problem, but a real problem that is in a forseeable >>> future >> >> Keyword: future. This issue has been on hold for years because of 'issues' >> that are pinky promised to happen eventually, without anyone suggesting any >> actually acceptable solutions. This just undermines progress. >> > > Not true. There are solutions suggested, but then it always takes time to reach > consensus. One of the approach that I'm about to propose is to have a userspace > knob that specifies whether the device can be powered down or not (leaving the > default behavior to put them in low power state). Because, the decision to put > the devices into power down or low power state sounds more like an userspace > policy. It was discussed at LPC 2023. Sure, however I believe it is perfectly reasonable to change the default setting there based on platform capabilities. Konrad > >>> (that is also the reason why NVMe developers doesn't want to put the >>> device into power down mode always during system suspend). >> >> This is the current behavior on any new x86 laptop, and has been for a >> couple of years. >> >>> And with this change, you are just going to make the NVMe lifetime miserable on >>> those platforms. >> >> Fearmongering and hearsay. See above. >> > > I can only wish you best of luck with this approach! > > - Mani >
© 2016 - 2026 Red Hat, Inc.