drivers/soc/qcom/rpmh-rsc.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+)
Hi,
This is just an attempt to let the device drivers know of the quirky platform
behavior of mimicking s2ram in s2idle for older Qcom SoCs (pre-Hamoa and
non-chromebooks) using RPMh. This information is important for the device
drivers as they need to prepare for the possible power loss during system
suspend by shutting down or resetting the devices.
For implementation, s2idle_ops is registered during the boot of the rpmh-rsc
driver based on the machine compatible limited to Makena (SC8280XP) as a
proof-of-concept. If this approach gets consensus, I plan to have a helper
that lists the compatibles of all SoCs exhibiting this behavior. Since there is
no reliable way to find out whether s2idle is the only low power state supported
or not during boot, I resorted to compatible based matching.
One could argue that this s2idle_ops should be registered in the PSCI driver
similar to s2ram [1]. But I didn't prefer that since from PSCI point of view,
only CPUs should be parked in low power states during s2idle (CPU_SUSPEND) and
the peripherals should not be affected. Though in the past, an argument [2] was
raised citing the PSCI spec wording that allows the vendors to implement system
level low power states during CPU_SUSPEND. But that argument was not well
received by the PSCI maintainers.
Moreover, RPMh is the entity that implements the s2ram like deeper low power
state during system suspend. So it made sense to add the ops in this driver.
Note: This series is compile tested only. If one tests this series on Makena
platform, NVMe should get shutdown during suspend as confirmed by the dmesg log
similar to below after resume:
nvme nvme0: 12/0/0 default/read/poll queues
This series, together with the upcoming PCIe D3Cold support should allow Makena
(and other similar SoCs once added) to enter the deep low power mode a.k.a CXPC.
[1] https://lore.kernel.org/all/20251231162126.7728-1-manivannan.sadhasivam@oss.qualcomm.com
[2] https://lore.kernel.org/all/54cc4221-ba5f-4741-9033-20874265ca01@oss.qualcomm.com
Manivannan Sadhasivam (1):
soc: qcom: rpmh-rsc: Register s2idle_ops to indicate s2ram behavior in
s2idle
drivers/soc/qcom/rpmh-rsc.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
--
2.48.1
On Thu, Jan 01, 2026 at 10:21:57PM +0530, Manivannan Sadhasivam wrote: > Hi, > Many thanks for looking into this problem, Mani. > This is just an attempt to let the device drivers know of the quirky platform > behavior of mimicking s2ram in s2idle for older Qcom SoCs (pre-Hamoa and > non-chromebooks) using RPMh. This information is important for the device > drivers as they need to prepare for the possible power loss during system > suspend by shutting down or resetting the devices. > Is this really what happens? What _exactly_ is the purpose of pm_suspend_via_firmware() and pm_resume_via_firmware()? The kernel-doc for pm_suspend_via_firmware() states "return true if platform firmware is going to be invoked at end of system suspend...in order to complete it". To me this indicates that the purpose of this mechanism is to signal Linux that we're running on a target where platform firmware will cut power once we reach the bottom of the suspend chain, so the individual drivers has to save/restore state in order to prepare for this. But if I understand the bottom of the s2idle sequence, is that all we do is enter idle and the kernel will pick the lowest idle idle-state, which based on the psci-suspend-param will signal the system that the resources needed by the CPU subsystem can be powered down. But it's not really powering things down, it's releasing the vote of the CPU subsystem - which if the driver didn't vote means power will be lost. So for any drivers that relies on the implicit vote from the CPU subsystem to sustain their operation this flag is correct. For many IP blocks, the register state and/or some functionality will be retained in this state - e.g. register state will be retained, or functionality will be sustained by sleep power. So they can safely ignore the flag. In some cases state isn't retained when this happens, so e.g. PHY drivers does perform save and restore operations, despite the flag currently not indicating that power will be lost. What's unique with Makena in this regard, is that in this state the PCIe link can not be sustained, so we either need to tear things down, or the PCIe controller needs to vote directly on those resources it need in order to sustain its link. It seems to me that this patch uses the global flag, in order to signal the PCIe stack specifically, that it needs to save/restore state. I'm concerned that support for retaining state in the PCIe subsystem isn't a global question. Then parallel to this discussion, we have the actual s2ram feature showing up on some targets, where the SoC actually do power off. Here the automatic retaining of register and functional state is not going to happen - device drivers must save and restore state. Do we force all "pre-Hamoa" targets into this same behavior? Or do we have a different flag for saying "at the end of suspend power will be lost" there? PS. Think the commit message of the change itself falls short in capturing the problem. The commit message of this change will become the documentation on which many future discussions will be based. > For implementation, s2idle_ops is registered during the boot of the rpmh-rsc > driver based on the machine compatible limited to Makena (SC8280XP) as a > proof-of-concept. If this approach gets consensus, I plan to have a helper > that lists the compatibles of all SoCs exhibiting this behavior. Since there is > no reliable way to find out whether s2idle is the only low power state supported > or not during boot, I resorted to compatible based matching. > > One could argue that this s2idle_ops should be registered in the PSCI driver > similar to s2ram [1]. But I didn't prefer that since from PSCI point of view, > only CPUs should be parked in low power states during s2idle (CPU_SUSPEND) and > the peripherals should not be affected. Though in the past, an argument [2] was > raised citing the PSCI spec wording that allows the vendors to implement system > level low power states during CPU_SUSPEND. But that argument was not well > received by the PSCI maintainers. > It makes sense to me to have PSCI only manage CPUs and its necessary peripherals. But the definition of pm_suspend_via_firmware() in our case would then be, "if your driver piggybacks on e.g. the CPU subsystem's vote for critical resources, you might be in for a surprise". Which makes me feel that the Makena quirk should be in the PCIe controller driver... Regards, Bjorn > Moreover, RPMh is the entity that implements the s2ram like deeper low power > state during system suspend. So it made sense to add the ops in this driver. > > Note: This series is compile tested only. If one tests this series on Makena > platform, NVMe should get shutdown during suspend as confirmed by the dmesg log > similar to below after resume: > > nvme nvme0: 12/0/0 default/read/poll queues > > This series, together with the upcoming PCIe D3Cold support should allow Makena > (and other similar SoCs once added) to enter the deep low power mode a.k.a CXPC. > > [1] https://lore.kernel.org/all/20251231162126.7728-1-manivannan.sadhasivam@oss.qualcomm.com > [2] https://lore.kernel.org/all/54cc4221-ba5f-4741-9033-20874265ca01@oss.qualcomm.com > > Manivannan Sadhasivam (1): > soc: qcom: rpmh-rsc: Register s2idle_ops to indicate s2ram behavior in > s2idle > > drivers/soc/qcom/rpmh-rsc.c | 23 +++++++++++++++++++++++ > 1 file changed, 23 insertions(+) > > -- > 2.48.1 >
On Mon, Jan 05, 2026 at 08:04:54AM -0600, Bjorn Andersson wrote: > On Thu, Jan 01, 2026 at 10:21:57PM +0530, Manivannan Sadhasivam wrote: > > Hi, > > > > Many thanks for looking into this problem, Mani. > > > This is just an attempt to let the device drivers know of the quirky platform > > behavior of mimicking s2ram in s2idle for older Qcom SoCs (pre-Hamoa and > > non-chromebooks) using RPMh. This information is important for the device > > drivers as they need to prepare for the possible power loss during system > > suspend by shutting down or resetting the devices. > > > > Is this really what happens? What _exactly_ is the purpose of > pm_suspend_via_firmware() and pm_resume_via_firmware()? > > The kernel-doc for pm_suspend_via_firmware() states "return true if > platform firmware is going to be invoked at end of system suspend...in > order to complete it". > > To me this indicates that the purpose of this mechanism is to signal > Linux that we're running on a target where platform firmware will cut > power once we reach the bottom of the suspend chain, so the individual > drivers has to save/restore state in order to prepare for this. > Yes! > But if I understand the bottom of the s2idle sequence, is that all we do > is enter idle and the kernel will pick the lowest idle idle-state, which > based on the psci-suspend-param will signal the system that the > resources needed by the CPU subsystem can be powered down. > > But it's not really powering things down, it's releasing the vote of the > CPU subsystem - which if the driver didn't vote means power will be > lost. So for any drivers that relies on the implicit vote from the CPU > subsystem to sustain their operation this flag is correct. > > For many IP blocks, the register state and/or some functionality will be > retained in this state - e.g. register state will be retained, or > functionality will be sustained by sleep power. So they can safely > ignore the flag. > > In some cases state isn't retained when this happens, so e.g. PHY > drivers does perform save and restore operations, despite the flag > currently not indicating that power will be lost. > > > What's unique with Makena in this regard, is that in this state the PCIe > link can not be sustained, so we either need to tear things down, or the > PCIe controller needs to vote directly on those resources it need in > order to sustain its link. > > It seems to me that this patch uses the global flag, in order to signal > the PCIe stack specifically, that it needs to save/restore state. I'm > concerned that support for retaining state in the PCIe subsystem isn't a > global question. > No, this is not a PCIe stack limitation, but PCIe client driver limitation like NVMe. PCIe stack does save and restore the device config space during system suspend (even during runtime suspend sometimes). But the NVMe driver will only shutdown the controller if this global flag is set or other conditions are satisfied. Otherwise, it will just park the device in low power state. So if the target enters s2ram (deep sleep) or CXPC in Makena, controller context will be lost, leading to resume failure. We tried to fix the issue in the NVMe driver so far, but all efforts ended up in vain. NVMe maintainers preferred to rely on some PCI/PM APIs for this. Initially I looked into creating one such API, but then figured out that I can just make use of this global flag and not touch the NVMe driver at all. This worked well for s2ram, so I resent a patch submitted by Konrad in 2024 [1]. But for s2idle and Makena, I thought I could reuse the same global flag and achieve the same net result. [1] https://lore.kernel.org/all/20251231162126.7728-1-manivannan.sadhasivam@oss.qualcomm.com > > > Then parallel to this discussion, we have the actual s2ram feature > showing up on some targets, where the SoC actually do power off. Here > the automatic retaining of register and functional state is not going to > happen - device drivers must save and restore state. > AFAIK, atleast on compute targets, s2ram == s2idle + CXPC. Only difference is that the system wide low power mode transition will happen in PSCI SYSTEM_SUSPEND phase as opposed to CPU_SUSPEND. Only on auto targets, we have the 'deep sleep' feature which behaves like x86 s2ram i.e., turning off power to most of the components. But from Linux POV, both are just s2ram. This is one more motivation for me to use this global flag for s2ram. Oh there is one more limitation with doing CXPC without D3Cold (link retention). Linux cannot handle PCIe wakeup reliably. So if you have WLAN or USB keyboard connected to a PCIe-USB hub, wakeup will not work and will result in PCIe link down with upstream. We need to have a hw mechanism to handle PCIe wakeup. But that is not currently supported and I don't know when that support will land. So if wakeup is not supported, I thought doing D3Cold + CXPC would be better in terms of power saving (not much difference, but still...). This is the only motivation for me to use this flag for all pre-Hamoa SoCs. > Do we force all "pre-Hamoa" targets into this same behavior? Or do we > have a different flag for saying "at the end of suspend power will be > lost" there? > If you don't agree with the above reasoning, I can just limit its usage to Makena and s2ram. > > PS. Think the commit message of the change itself falls short in > capturing the problem. The commit message of this change will become the > documentation on which many future discussions will be based. > Sorry about that. I will add more context while sending non-RFC version. > > For implementation, s2idle_ops is registered during the boot of the rpmh-rsc > > driver based on the machine compatible limited to Makena (SC8280XP) as a > > proof-of-concept. If this approach gets consensus, I plan to have a helper > > that lists the compatibles of all SoCs exhibiting this behavior. Since there is > > no reliable way to find out whether s2idle is the only low power state supported > > or not during boot, I resorted to compatible based matching. > > > > One could argue that this s2idle_ops should be registered in the PSCI driver > > similar to s2ram [1]. But I didn't prefer that since from PSCI point of view, > > only CPUs should be parked in low power states during s2idle (CPU_SUSPEND) and > > the peripherals should not be affected. Though in the past, an argument [2] was > > raised citing the PSCI spec wording that allows the vendors to implement system > > level low power states during CPU_SUSPEND. But that argument was not well > > received by the PSCI maintainers. > > > > It makes sense to me to have PSCI only manage CPUs and its necessary > peripherals. > > But the definition of pm_suspend_via_firmware() in our case would then > be, "if your driver piggybacks on e.g. the CPU subsystem's vote for > critical resources, you might be in for a surprise". > > Which makes me feel that the Makena quirk should be in the PCIe > controller driver... > Do you still think so? - Mani -- மணிவண்ணன் சதாசிவம்
On Mon, Jan 05, 2026 at 10:27:41PM +0530, Manivannan Sadhasivam wrote:
> On Mon, Jan 05, 2026 at 08:04:54AM -0600, Bjorn Andersson wrote:
> > On Thu, Jan 01, 2026 at 10:21:57PM +0530, Manivannan Sadhasivam wrote:
> > > Hi,
> > >
> >
> > Many thanks for looking into this problem, Mani.
> >
> > > This is just an attempt to let the device drivers know of the quirky platform
> > > behavior of mimicking s2ram in s2idle for older Qcom SoCs (pre-Hamoa and
> > > non-chromebooks) using RPMh. This information is important for the device
> > > drivers as they need to prepare for the possible power loss during system
> > > suspend by shutting down or resetting the devices.
> > >
> >
> > Is this really what happens? What _exactly_ is the purpose of
> > pm_suspend_via_firmware() and pm_resume_via_firmware()?
> >
> > The kernel-doc for pm_suspend_via_firmware() states "return true if
> > platform firmware is going to be invoked at end of system suspend...in
> > order to complete it".
> >
> > To me this indicates that the purpose of this mechanism is to signal
> > Linux that we're running on a target where platform firmware will cut
> > power once we reach the bottom of the suspend chain, so the individual
> > drivers has to save/restore state in order to prepare for this.
> >
>
> Yes!
>
But that's not the case here. In fact, iirc even on Makena the PCIe
controller is retained, it's just the link that can't be sustained
without CX - but I might misremember.
> > But if I understand the bottom of the s2idle sequence, is that all we do
> > is enter idle and the kernel will pick the lowest idle idle-state, which
> > based on the psci-suspend-param will signal the system that the
> > resources needed by the CPU subsystem can be powered down.
> >
> > But it's not really powering things down, it's releasing the vote of the
> > CPU subsystem - which if the driver didn't vote means power will be
> > lost. So for any drivers that relies on the implicit vote from the CPU
> > subsystem to sustain their operation this flag is correct.
> >
> > For many IP blocks, the register state and/or some functionality will be
> > retained in this state - e.g. register state will be retained, or
> > functionality will be sustained by sleep power. So they can safely
> > ignore the flag.
> >
> > In some cases state isn't retained when this happens, so e.g. PHY
> > drivers does perform save and restore operations, despite the flag
> > currently not indicating that power will be lost.
> >
> >
> > What's unique with Makena in this regard, is that in this state the PCIe
> > link can not be sustained, so we either need to tear things down, or the
> > PCIe controller needs to vote directly on those resources it need in
> > order to sustain its link.
> >
> > It seems to me that this patch uses the global flag, in order to signal
> > the PCIe stack specifically, that it needs to save/restore state. I'm
> > concerned that support for retaining state in the PCIe subsystem isn't a
> > global question.
> >
>
> No, this is not a PCIe stack limitation, but PCIe client driver limitation
> like NVMe. PCIe stack does save and restore the device config space during
> system suspend (even during runtime suspend sometimes).
>
> But the NVMe driver will only shutdown the controller if this global flag is set
> or other conditions are satisfied. Otherwise, it will just park the device in
> low power state. So if the target enters s2ram (deep sleep) or CXPC in Makena,
> controller context will be lost, leading to resume failure.
>
So, you're saying that this is the only PCIe client driver that needs
this special handling?
> We tried to fix the issue in the NVMe driver so far, but all efforts ended up in
> vain. NVMe maintainers preferred to rely on some PCI/PM APIs for this. Initially
> I looked into creating one such API, but then figured out that I can just make
> use of this global flag and not touch the NVMe driver at all.
>
At the same time acpi_storage_d3() tells me that this problem is already
accepted by the community, we just don't have a way to signal the same
in our system.
> This worked well for s2ram, so I resent a patch submitted by Konrad in 2024 [1].
> But for s2idle and Makena, I thought I could reuse the same global flag and
> achieve the same net result.
>
My problem remains that we're using the global "power to all IP-blocks
will be lost"-flag to say "on Makena, the PCIe controller doesn't retain
state in low power mode", to a single driver.
And in particular, when taking DeepSleep into consideration, there's
going to be a lot of work performed by drivers when "power to all
IP-blocks will be lost".
> [1] https://lore.kernel.org/all/20251231162126.7728-1-manivannan.sadhasivam@oss.qualcomm.com
>
> >
> >
> > Then parallel to this discussion, we have the actual s2ram feature
> > showing up on some targets, where the SoC actually do power off. Here
> > the automatic retaining of register and functional state is not going to
> > happen - device drivers must save and restore state.
> >
>
> AFAIK, atleast on compute targets, s2ram == s2idle + CXPC. Only difference is
> that the system wide low power mode transition will happen in PSCI
> SYSTEM_SUSPEND phase as opposed to CPU_SUSPEND.
>
But this means that our normal assumptions about retention holds, except
for PCIe on Makena.
> Only on auto targets, we have the 'deep sleep' feature which behaves like x86
> s2ram i.e., turning off power to most of the components. But from Linux POV,
> both are just s2ram. This is one more motivation for me to use this global flag
> for s2ram.
>
Not only on auto targets, the DeepSleep feature exists in IoT and other
places as well.
As I said in my other reply, we can decide that DeepSleep == s2ram from
Linux PoV, but that has implications - beyond what I think you're
willing to accept on your laptop.
> Oh there is one more limitation with doing CXPC without D3Cold (link retention).
> Linux cannot handle PCIe wakeup reliably. So if you have WLAN or USB keyboard
> connected to a PCIe-USB hub, wakeup will not work and will result in PCIe link
> down with upstream. We need to have a hw mechanism to handle PCIe wakeup. But
> that is not currently supported and I don't know when that support will land.
>
> So if wakeup is not supported, I thought doing D3Cold + CXPC would be better in
> terms of power saving (not much difference, but still...). This is the only
> motivation for me to use this flag for all pre-Hamoa SoCs.
>
I'm glad that you're looking into this part, we want that.
> > Do we force all "pre-Hamoa" targets into this same behavior? Or do we
> > have a different flag for saying "at the end of suspend power will be
> > lost" there?
> >
>
> If you don't agree with the above reasoning, I can just limit its usage to
> Makena and s2ram.
>
I don't think I have all the details, but if we're saying that Makena is
broken and need special treatment in NVMe, it would be better to say
just that, with a patch in check_vendor_combination_bug()
/* Qualcomm SC8280XP can not enter low-power with PCIe link active */
if (of_machine_is_compatible("qcom,sc8280xp"))
return NVME_QUIRK_SIMPLE_SUSPEND;
Alternatively, make it possible to set StorageD3Enable in DeviceTree.
This flag exists in ACPI for a reason, why wouldn't we see the same
problems when describing the system using DeviceTree?
> >
> > PS. Think the commit message of the change itself falls short in
> > capturing the problem. The commit message of this change will become the
> > documentation on which many future discussions will be based.
> >
>
> Sorry about that. I will add more context while sending non-RFC version.
>
> > > For implementation, s2idle_ops is registered during the boot of the rpmh-rsc
> > > driver based on the machine compatible limited to Makena (SC8280XP) as a
> > > proof-of-concept. If this approach gets consensus, I plan to have a helper
> > > that lists the compatibles of all SoCs exhibiting this behavior. Since there is
> > > no reliable way to find out whether s2idle is the only low power state supported
> > > or not during boot, I resorted to compatible based matching.
> > >
> > > One could argue that this s2idle_ops should be registered in the PSCI driver
> > > similar to s2ram [1]. But I didn't prefer that since from PSCI point of view,
> > > only CPUs should be parked in low power states during s2idle (CPU_SUSPEND) and
> > > the peripherals should not be affected. Though in the past, an argument [2] was
> > > raised citing the PSCI spec wording that allows the vendors to implement system
> > > level low power states during CPU_SUSPEND. But that argument was not well
> > > received by the PSCI maintainers.
> > >
> >
> > It makes sense to me to have PSCI only manage CPUs and its necessary
> > peripherals.
> >
> > But the definition of pm_suspend_via_firmware() in our case would then
> > be, "if your driver piggybacks on e.g. the CPU subsystem's vote for
> > critical resources, you might be in for a surprise".
> >
> > Which makes me feel that the Makena quirk should be in the PCIe
> > controller driver...
> >
>
> Do you still think so?
>
Yes, not based on the current state of the kernel. But all the
implications that will follow, in particular in relation to DeepSleep.
Regards,
Bjorn
> - Mani
>
> --
> மணிவண்ணன் சதாசிவம்
On Tue, Jan 06, 2026 at 11:21:59AM -0600, Bjorn Andersson wrote:
> On Mon, Jan 05, 2026 at 10:27:41PM +0530, Manivannan Sadhasivam wrote:
> > On Mon, Jan 05, 2026 at 08:04:54AM -0600, Bjorn Andersson wrote:
> > > On Thu, Jan 01, 2026 at 10:21:57PM +0530, Manivannan Sadhasivam wrote:
> > > > Hi,
> > > >
> > >
> > > Many thanks for looking into this problem, Mani.
> > >
> > > > This is just an attempt to let the device drivers know of the quirky platform
> > > > behavior of mimicking s2ram in s2idle for older Qcom SoCs (pre-Hamoa and
> > > > non-chromebooks) using RPMh. This information is important for the device
> > > > drivers as they need to prepare for the possible power loss during system
> > > > suspend by shutting down or resetting the devices.
> > > >
> > >
> > > Is this really what happens? What _exactly_ is the purpose of
> > > pm_suspend_via_firmware() and pm_resume_via_firmware()?
> > >
> > > The kernel-doc for pm_suspend_via_firmware() states "return true if
> > > platform firmware is going to be invoked at end of system suspend...in
> > > order to complete it".
> > >
> > > To me this indicates that the purpose of this mechanism is to signal
> > > Linux that we're running on a target where platform firmware will cut
> > > power once we reach the bottom of the suspend chain, so the individual
> > > drivers has to save/restore state in order to prepare for this.
> > >
> >
> > Yes!
> >
>
> But that's not the case here. In fact, iirc even on Makena the PCIe
> controller is retained, it's just the link that can't be sustained
> without CX - but I might misremember.
>
There is no MX retention for PCIe on Makena.
> > > But if I understand the bottom of the s2idle sequence, is that all we do
> > > is enter idle and the kernel will pick the lowest idle idle-state, which
> > > based on the psci-suspend-param will signal the system that the
> > > resources needed by the CPU subsystem can be powered down.
> > >
> > > But it's not really powering things down, it's releasing the vote of the
> > > CPU subsystem - which if the driver didn't vote means power will be
> > > lost. So for any drivers that relies on the implicit vote from the CPU
> > > subsystem to sustain their operation this flag is correct.
> > >
> > > For many IP blocks, the register state and/or some functionality will be
> > > retained in this state - e.g. register state will be retained, or
> > > functionality will be sustained by sleep power. So they can safely
> > > ignore the flag.
> > >
> > > In some cases state isn't retained when this happens, so e.g. PHY
> > > drivers does perform save and restore operations, despite the flag
> > > currently not indicating that power will be lost.
> > >
> > >
> > > What's unique with Makena in this regard, is that in this state the PCIe
> > > link can not be sustained, so we either need to tear things down, or the
> > > PCIe controller needs to vote directly on those resources it need in
> > > order to sustain its link.
> > >
> > > It seems to me that this patch uses the global flag, in order to signal
> > > the PCIe stack specifically, that it needs to save/restore state. I'm
> > > concerned that support for retaining state in the PCIe subsystem isn't a
> > > global question.
> > >
> >
> > No, this is not a PCIe stack limitation, but PCIe client driver limitation
> > like NVMe. PCIe stack does save and restore the device config space during
> > system suspend (even during runtime suspend sometimes).
> >
> > But the NVMe driver will only shutdown the controller if this global flag is set
> > or other conditions are satisfied. Otherwise, it will just park the device in
> > low power state. So if the target enters s2ram (deep sleep) or CXPC in Makena,
> > controller context will be lost, leading to resume failure.
> >
>
> So, you're saying that this is the only PCIe client driver that needs
> this special handling?
>
As of now, yes.
> > We tried to fix the issue in the NVMe driver so far, but all efforts ended up in
> > vain. NVMe maintainers preferred to rely on some PCI/PM APIs for this. Initially
> > I looked into creating one such API, but then figured out that I can just make
> > use of this global flag and not touch the NVMe driver at all.
> >
>
> At the same time acpi_storage_d3() tells me that this problem is already
> accepted by the community, we just don't have a way to signal the same
> in our system.
>
There are many solutions initially accepted for x86/ACPI which are not getting
accepted now.
> > This worked well for s2ram, so I resent a patch submitted by Konrad in 2024 [1].
> > But for s2idle and Makena, I thought I could reuse the same global flag and
> > achieve the same net result.
> >
>
> My problem remains that we're using the global "power to all IP-blocks
> will be lost"-flag to say "on Makena, the PCIe controller doesn't retain
> state in low power mode", to a single driver.
>
I agree with your concern. If not global, we need to either fix it in PCI core
or in the NVMe driver. Unfortunately, for an one-off issue like this, making
change in both proves difficult. As you may know, our initial work was to get it
fixed in the NVMe driver, but that didn't fly.
> And in particular, when taking DeepSleep into consideration, there's
> going to be a lot of work performed by drivers when "power to all
> IP-blocks will be lost".
>
> > [1] https://lore.kernel.org/all/20251231162126.7728-1-manivannan.sadhasivam@oss.qualcomm.com
> >
> > >
> > >
> > > Then parallel to this discussion, we have the actual s2ram feature
> > > showing up on some targets, where the SoC actually do power off. Here
> > > the automatic retaining of register and functional state is not going to
> > > happen - device drivers must save and restore state.
> > >
> >
> > AFAIK, atleast on compute targets, s2ram == s2idle + CXPC. Only difference is
> > that the system wide low power mode transition will happen in PSCI
> > SYSTEM_SUSPEND phase as opposed to CPU_SUSPEND.
> >
>
> But this means that our normal assumptions about retention holds, except
> for PCIe on Makena.
>
> > Only on auto targets, we have the 'deep sleep' feature which behaves like x86
> > s2ram i.e., turning off power to most of the components. But from Linux POV,
> > both are just s2ram. This is one more motivation for me to use this global flag
> > for s2ram.
> >
>
> Not only on auto targets, the DeepSleep feature exists in IoT and other
> places as well.
>
> As I said in my other reply, we can decide that DeepSleep == s2ram from
> Linux PoV, but that has implications - beyond what I think you're
> willing to accept on your laptop.
>
Agree.
> > Oh there is one more limitation with doing CXPC without D3Cold (link retention).
> > Linux cannot handle PCIe wakeup reliably. So if you have WLAN or USB keyboard
> > connected to a PCIe-USB hub, wakeup will not work and will result in PCIe link
> > down with upstream. We need to have a hw mechanism to handle PCIe wakeup. But
> > that is not currently supported and I don't know when that support will land.
> >
> > So if wakeup is not supported, I thought doing D3Cold + CXPC would be better in
> > terms of power saving (not much difference, but still...). This is the only
> > motivation for me to use this flag for all pre-Hamoa SoCs.
> >
>
> I'm glad that you're looking into this part, we want that.
>
> > > Do we force all "pre-Hamoa" targets into this same behavior? Or do we
> > > have a different flag for saying "at the end of suspend power will be
> > > lost" there?
> > >
> >
> > If you don't agree with the above reasoning, I can just limit its usage to
> > Makena and s2ram.
> >
>
> I don't think I have all the details, but if we're saying that Makena is
> broken and need special treatment in NVMe, it would be better to say
> just that, with a patch in check_vendor_combination_bug()
>
> /* Qualcomm SC8280XP can not enter low-power with PCIe link active */
> if (of_machine_is_compatible("qcom,sc8280xp"))
> return NVME_QUIRK_SIMPLE_SUSPEND;
>
I tried it in multiple attempts. Recent one is this:
https://lore.kernel.org/all/20250126050309.7243-1-manivannan.sadhasivam@linaro.org/
And here is the reply:
https://lore.kernel.org/all/20250210040446.GA2823@lst.de/
>
> Alternatively, make it possible to set StorageD3Enable in DeviceTree.
> This flag exists in ACPI for a reason, why wouldn't we see the same
> problems when describing the system using DeviceTree?
>
This is not a ACPI spec defined object, but just a MSFT way of telling the OS
when to enter D3 (D3Hot or D3Cold) for storage devices. If we want to go with
the DT property, it has to be something like 'qcom,no-retention'. I don't think
a generic property makes sense here.
Even then, we would need a PCI API to translate that for client drivers. I can
look into this direction.
- Mani
--
மணிவண்ணன் சதாசிவம்
On Thu, Jan 08, 2026 at 10:55:16AM +0530, Manivannan Sadhasivam wrote:
> On Tue, Jan 06, 2026 at 11:21:59AM -0600, Bjorn Andersson wrote:
> > On Mon, Jan 05, 2026 at 10:27:41PM +0530, Manivannan Sadhasivam wrote:
> > > On Mon, Jan 05, 2026 at 08:04:54AM -0600, Bjorn Andersson wrote:
> > > > On Thu, Jan 01, 2026 at 10:21:57PM +0530, Manivannan Sadhasivam wrote:
> > > > > Hi,
> > > > >
> > > >
> > > > Many thanks for looking into this problem, Mani.
> > > >
> > > > > This is just an attempt to let the device drivers know of the quirky platform
> > > > > behavior of mimicking s2ram in s2idle for older Qcom SoCs (pre-Hamoa and
> > > > > non-chromebooks) using RPMh. This information is important for the device
> > > > > drivers as they need to prepare for the possible power loss during system
> > > > > suspend by shutting down or resetting the devices.
> > > > >
> > > >
> > > > Is this really what happens? What _exactly_ is the purpose of
> > > > pm_suspend_via_firmware() and pm_resume_via_firmware()?
> > > >
> > > > The kernel-doc for pm_suspend_via_firmware() states "return true if
> > > > platform firmware is going to be invoked at end of system suspend...in
> > > > order to complete it".
> > > >
> > > > To me this indicates that the purpose of this mechanism is to signal
> > > > Linux that we're running on a target where platform firmware will cut
> > > > power once we reach the bottom of the suspend chain, so the individual
> > > > drivers has to save/restore state in order to prepare for this.
> > > >
> > >
> > > Yes!
> > >
> >
> > But that's not the case here. In fact, iirc even on Makena the PCIe
> > controller is retained, it's just the link that can't be sustained
> > without CX - but I might misremember.
> >
>
> There is no MX retention for PCIe on Makena.
>
Okay, thanks for correcting my memory.
[..]
> > > We tried to fix the issue in the NVMe driver so far, but all efforts ended up in
> > > vain. NVMe maintainers preferred to rely on some PCI/PM APIs for this. Initially
> > > I looked into creating one such API, but then figured out that I can just make
> > > use of this global flag and not touch the NVMe driver at all.
> > >
> >
> > At the same time acpi_storage_d3() tells me that this problem is already
> > accepted by the community, we just don't have a way to signal the same
> > in our system.
> >
>
> There are many solutions initially accepted for x86/ACPI which are not getting
> accepted now.
>
It's inspiring to see the whole x86 ecosystem moving beyond such issues.
> > > This worked well for s2ram, so I resent a patch submitted by Konrad in 2024 [1].
> > > But for s2idle and Makena, I thought I could reuse the same global flag and
> > > achieve the same net result.
> > >
> >
> > My problem remains that we're using the global "power to all IP-blocks
> > will be lost"-flag to say "on Makena, the PCIe controller doesn't retain
> > state in low power mode", to a single driver.
> >
>
> I agree with your concern. If not global, we need to either fix it in PCI core
> or in the NVMe driver. Unfortunately, for an one-off issue like this, making
> change in both proves difficult. As you may know, our initial work was to get it
> fixed in the NVMe driver, but that didn't fly.
>
The fact that the PCIe controller is "broken" does not imply that the
whole SoC is broken, and claiming so will come with side effects.
[..]
> > > > Do we force all "pre-Hamoa" targets into this same behavior? Or do we
> > > > have a different flag for saying "at the end of suspend power will be
> > > > lost" there?
> > > >
> > >
> > > If you don't agree with the above reasoning, I can just limit its usage to
> > > Makena and s2ram.
> > >
> >
> > I don't think I have all the details, but if we're saying that Makena is
> > broken and need special treatment in NVMe, it would be better to say
> > just that, with a patch in check_vendor_combination_bug()
> >
> > /* Qualcomm SC8280XP can not enter low-power with PCIe link active */
> > if (of_machine_is_compatible("qcom,sc8280xp"))
> > return NVME_QUIRK_SIMPLE_SUSPEND;
> >
>
> I tried it in multiple attempts. Recent one is this:
> https://lore.kernel.org/all/20250126050309.7243-1-manivannan.sadhasivam@linaro.org/
>
> And here is the reply:
> https://lore.kernel.org/all/20250210040446.GA2823@lst.de/
I presume "core" doesn't necessarily imply "system-wide", but you know
the PCIe stack better than me, so I'm not sure where in the "core" this
would be better implemented.
>
> >
> > Alternatively, make it possible to set StorageD3Enable in DeviceTree.
> > This flag exists in ACPI for a reason, why wouldn't we see the same
> > problems when describing the system using DeviceTree?
> >
>
> This is not a ACPI spec defined object, but just a MSFT way of telling the OS
> when to enter D3 (D3Hot or D3Cold) for storage devices. If we want to go with
> the DT property, it has to be something like 'qcom,no-retention'. I don't think
> a generic property makes sense here.
>
Unless I'm reading the nvme driver incorrectly, I don't think the need
for such property is vendor-specific. But conceptually it doesn't really
belong in system description, given that the user might swap NVMe and
introduce/remove the need for this quirk.
> Even then, we would need a PCI API to translate that for client drivers. I can
> look into this direction.
>
I hope that we can agree that this is a property of the PCIe controller,
for Makena. If we can't move the problem with NVMe on Makena out of the
way of progress, how about we work our way around it (for now
hopefully...)?
Change qcom_pcie_suspend_noirq() to explicitly carry a vote for CX (e.g.
using the bw-vote as today), describe the problem explictily in a
comment, and then move forward with releasing the CX vote for all other
targets.
Regards,
Bjorn
> - Mani
>
> --
> மணிவண்ணன் சதாசிவம்
On Fri, Jan 09, 2026 at 01:36:32PM -0600, Bjorn Andersson wrote:
> On Thu, Jan 08, 2026 at 10:55:16AM +0530, Manivannan Sadhasivam wrote:
> > On Tue, Jan 06, 2026 at 11:21:59AM -0600, Bjorn Andersson wrote:
> > > On Mon, Jan 05, 2026 at 10:27:41PM +0530, Manivannan Sadhasivam wrote:
> > > > On Mon, Jan 05, 2026 at 08:04:54AM -0600, Bjorn Andersson wrote:
> > > > > On Thu, Jan 01, 2026 at 10:21:57PM +0530, Manivannan Sadhasivam wrote:
> > > > > > Hi,
> > > > > >
> > > > >
> > > > > Many thanks for looking into this problem, Mani.
> > > > >
> > > > > > This is just an attempt to let the device drivers know of the quirky platform
> > > > > > behavior of mimicking s2ram in s2idle for older Qcom SoCs (pre-Hamoa and
> > > > > > non-chromebooks) using RPMh. This information is important for the device
> > > > > > drivers as they need to prepare for the possible power loss during system
> > > > > > suspend by shutting down or resetting the devices.
> > > > > >
> > > > >
> > > > > Is this really what happens? What _exactly_ is the purpose of
> > > > > pm_suspend_via_firmware() and pm_resume_via_firmware()?
> > > > >
> > > > > The kernel-doc for pm_suspend_via_firmware() states "return true if
> > > > > platform firmware is going to be invoked at end of system suspend...in
> > > > > order to complete it".
> > > > >
> > > > > To me this indicates that the purpose of this mechanism is to signal
> > > > > Linux that we're running on a target where platform firmware will cut
> > > > > power once we reach the bottom of the suspend chain, so the individual
> > > > > drivers has to save/restore state in order to prepare for this.
> > > > >
> > > >
> > > > Yes!
> > > >
> > >
> > > But that's not the case here. In fact, iirc even on Makena the PCIe
> > > controller is retained, it's just the link that can't be sustained
> > > without CX - but I might misremember.
> > >
> >
> > There is no MX retention for PCIe on Makena.
> >
>
> Okay, thanks for correcting my memory.
>
> [..]
> > > > We tried to fix the issue in the NVMe driver so far, but all efforts ended up in
> > > > vain. NVMe maintainers preferred to rely on some PCI/PM APIs for this. Initially
> > > > I looked into creating one such API, but then figured out that I can just make
> > > > use of this global flag and not touch the NVMe driver at all.
> > > >
> > >
> > > At the same time acpi_storage_d3() tells me that this problem is already
> > > accepted by the community, we just don't have a way to signal the same
> > > in our system.
> > >
> >
> > There are many solutions initially accepted for x86/ACPI which are not getting
> > accepted now.
> >
>
> It's inspiring to see the whole x86 ecosystem moving beyond such issues.
>
> > > > This worked well for s2ram, so I resent a patch submitted by Konrad in 2024 [1].
> > > > But for s2idle and Makena, I thought I could reuse the same global flag and
> > > > achieve the same net result.
> > > >
> > >
> > > My problem remains that we're using the global "power to all IP-blocks
> > > will be lost"-flag to say "on Makena, the PCIe controller doesn't retain
> > > state in low power mode", to a single driver.
> > >
> >
> > I agree with your concern. If not global, we need to either fix it in PCI core
> > or in the NVMe driver. Unfortunately, for an one-off issue like this, making
> > change in both proves difficult. As you may know, our initial work was to get it
> > fixed in the NVMe driver, but that didn't fly.
> >
>
> The fact that the PCIe controller is "broken" does not imply that the
> whole SoC is broken, and claiming so will come with side effects.
>
I don't "disagree" with you.
> [..]
> > > > > Do we force all "pre-Hamoa" targets into this same behavior? Or do we
> > > > > have a different flag for saying "at the end of suspend power will be
> > > > > lost" there?
> > > > >
> > > >
> > > > If you don't agree with the above reasoning, I can just limit its usage to
> > > > Makena and s2ram.
> > > >
> > >
> > > I don't think I have all the details, but if we're saying that Makena is
> > > broken and need special treatment in NVMe, it would be better to say
> > > just that, with a patch in check_vendor_combination_bug()
> > >
> > > /* Qualcomm SC8280XP can not enter low-power with PCIe link active */
> > > if (of_machine_is_compatible("qcom,sc8280xp"))
> > > return NVME_QUIRK_SIMPLE_SUSPEND;
> > >
> >
> > I tried it in multiple attempts. Recent one is this:
> > https://lore.kernel.org/all/20250126050309.7243-1-manivannan.sadhasivam@linaro.org/
> >
> > And here is the reply:
> > https://lore.kernel.org/all/20250210040446.GA2823@lst.de/
>
> I presume "core" doesn't necessarily imply "system-wide", but you know
> the PCIe stack better than me, so I'm not sure where in the "core" this
> would be better implemented.
>
"Core" here is just the PCI stack. If the stack can detect that this platform
Root Complex is broken, the it may possible indicate that to its client drivers.
> >
> > >
> > > Alternatively, make it possible to set StorageD3Enable in DeviceTree.
> > > This flag exists in ACPI for a reason, why wouldn't we see the same
> > > problems when describing the system using DeviceTree?
> > >
> >
> > This is not a ACPI spec defined object, but just a MSFT way of telling the OS
> > when to enter D3 (D3Hot or D3Cold) for storage devices. If we want to go with
> > the DT property, it has to be something like 'qcom,no-retention'. I don't think
> > a generic property makes sense here.
> >
>
> Unless I'm reading the nvme driver incorrectly, I don't think the need
> for such property is vendor-specific. But conceptually it doesn't really
> belong in system description, given that the user might swap NVMe and
> introduce/remove the need for this quirk.
>
Here, I meant "Qcom" as the vendor for the property, not NVMe vendor. Since this
behavior of loosing context during s2idle is only specific to Qcom, that too
only for a specific SoC, if we were going for a DT property, I will use Qcom
specific property to mark the RC as broken in some way.
> > Even then, we would need a PCI API to translate that for client drivers. I can
> > look into this direction.
> >
>
> I hope that we can agree that this is a property of the PCIe controller,
> for Makena. If we can't move the problem with NVMe on Makena out of the
> way of progress, how about we work our way around it (for now
> hopefully...)?
>
> Change qcom_pcie_suspend_noirq() to explicitly carry a vote for CX (e.g.
> using the bw-vote as today), describe the problem explictily in a
> comment, and then move forward with releasing the CX vote for all other
> targets.
>
Sure. That should work as a last resort.
- Mani
--
மணிவண்ணன் சதாசிவம்
© 2016 - 2026 Red Hat, Inc.