[PATCH 0/4] PCI: Introduce pci_dev_suspend_retention_supported() API

Manivannan Sadhasivam via B4 Relay posted 4 patches 2 months ago
drivers/nvme/host/pci.c                |  3 ++-
drivers/pci/controller/dwc/pcie-qcom.c | 11 +++++++++++
drivers/pci/pci.c                      | 34 ++++++++++++++++++++++++++++++++++
include/linux/pci.h                    |  9 +++++++++
4 files changed, 56 insertions(+), 1 deletion(-)
[PATCH 0/4] PCI: Introduce pci_dev_suspend_retention_supported() API
Posted by Manivannan Sadhasivam via B4 Relay 2 months ago
Hi all,

This series introduces a new PCI API pci_dev_suspend_retention_supported() to
let the client drivers know whether they can expect context retention across
suspend/resume or not and uses it in the NVMe PCI host driver.

This new API is targeted to abstract the PCI power management details away from
the client drivers. This is needed because client drivers like NVMe make use of
APIs such as pm_suspend_via_firmware() and decide to keep the device in low
power mode if this API returns 'false'. But some platforms may have other
limitations like in the case of Qcom, where if the RC driver removes the
resource vote to allow the SoC to enter low power mode, it cannot reliably exit
the L1ss state when the endpoint asserts CLKREQ#. So in this case also, the
client drivers cannot keep the device in low power state during suspend and
expect context retention.

And these limitations may just keep adding in the future. Without a unified
API, the client drivers have to implement their own logic which may cause code
duplication and may also lead to drivers missing some of the platform
limitations.

Once this series gets merged, we can extend this API usage to other client
drivers as well.

Testing
=======

This series is tested on Qualcomm Hamoa based Lenovo Thinkpad T14s latop with
NVMe drive.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
---
Manivannan Sadhasivam (4):
      PCI: Introduce an API to check if RC/platform can retain device context during suspend
      PCI: Indicate context lost if L1ss exit is broken during resume from system suspend
      PCI: qcom: Indicate broken L1ss exit during resume from system suspend
      nvme-pci: Use pci_dev_suspend_retention_supported() API during suspend

 drivers/nvme/host/pci.c                |  3 ++-
 drivers/pci/controller/dwc/pcie-qcom.c | 11 +++++++++++
 drivers/pci/pci.c                      | 34 ++++++++++++++++++++++++++++++++++
 include/linux/pci.h                    |  9 +++++++++
 4 files changed, 56 insertions(+), 1 deletion(-)
---
base-commit: 591cd656a1bf5ea94a222af5ef2ee76df029c1d2
change-id: 20260414-l1ss-fix-6c9cf2451944

Best regards,
--  
Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Re: [PATCH 0/4] PCI: Introduce pci_dev_suspend_retention_supported() API
Posted by Bjorn Helgaas 2 months ago
[+cc Rafael]

On Tue, Apr 14, 2026 at 09:29:38PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> This series introduces a new PCI API
> pci_dev_suspend_retention_supported() to let the client drivers know
> whether they can expect context retention across suspend/resume or
> not and uses it in the NVMe PCI host driver.
> 
> This new API is targeted to abstract the PCI power management
> details away from the client drivers. This is needed because client
> drivers like NVMe make use of APIs such as pm_suspend_via_firmware()
> and decide to keep the device in low power mode if this API returns
> 'false'. But some platforms may have other limitations like in the
> case of Qcom, where if the RC driver removes the resource vote to
> allow the SoC to enter low power mode, it cannot reliably exit the
> L1ss state when the endpoint asserts CLKREQ#. So in this case also,
> the client drivers cannot keep the device in low power state during
> suspend and expect context retention.

I don't know what pm_suspend_via_firmware() means.  The kernel-doc
says "platform firmware is going to be invoked at the end of the
system-wide power management transition," but that doesn't say
anything about what firmware might do or what it means to drivers.

Based on d916b1be94b6 ("nvme-pci: use host managed power state for
suspend"), which used it in nvme_suspend(), I guess the assumption is
that pm_suspend_via_firmware() means the device might be put in D3cold
and lose all its internal state, and conversely,
!pm_suspend_via_firmware() means the device will *never* be put in a
low-power state that loses internal state.

> And these limitations may just keep adding in the future. Without a
> unified API, the client drivers have to implement their own logic
> which may cause code duplication and may also lead to drivers
> missing some of the platform limitations.
> 
> Once this series gets merged, we can extend this API usage to other
> client drivers as well.

> 
> Testing
> =======
> 
> This series is tested on Qualcomm Hamoa based Lenovo Thinkpad T14s latop with
> NVMe drive.
> 
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> ---
> Manivannan Sadhasivam (4):
>       PCI: Introduce an API to check if RC/platform can retain device context during suspend
>       PCI: Indicate context lost if L1ss exit is broken during resume from system suspend
>       PCI: qcom: Indicate broken L1ss exit during resume from system suspend
>       nvme-pci: Use pci_dev_suspend_retention_supported() API during suspend
> 
>  drivers/nvme/host/pci.c                |  3 ++-
>  drivers/pci/controller/dwc/pcie-qcom.c | 11 +++++++++++
>  drivers/pci/pci.c                      | 34 ++++++++++++++++++++++++++++++++++
>  include/linux/pci.h                    |  9 +++++++++
>  4 files changed, 56 insertions(+), 1 deletion(-)
> ---
> base-commit: 591cd656a1bf5ea94a222af5ef2ee76df029c1d2
> change-id: 20260414-l1ss-fix-6c9cf2451944
> 
> Best regards,
> --  
> Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> 
>
Re: [PATCH 0/4] PCI: Introduce pci_dev_suspend_retention_supported() API
Posted by Manivannan Sadhasivam 1 month, 4 weeks ago
On Thu, Apr 16, 2026 at 02:11:11PM -0500, Bjorn Helgaas wrote:
> [+cc Rafael]
> 
> On Tue, Apr 14, 2026 at 09:29:38PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> > This series introduces a new PCI API
> > pci_dev_suspend_retention_supported() to let the client drivers know
> > whether they can expect context retention across suspend/resume or
> > not and uses it in the NVMe PCI host driver.
> > 
> > This new API is targeted to abstract the PCI power management
> > details away from the client drivers. This is needed because client
> > drivers like NVMe make use of APIs such as pm_suspend_via_firmware()
> > and decide to keep the device in low power mode if this API returns
> > 'false'. But some platforms may have other limitations like in the
> > case of Qcom, where if the RC driver removes the resource vote to
> > allow the SoC to enter low power mode, it cannot reliably exit the
> > L1ss state when the endpoint asserts CLKREQ#. So in this case also,
> > the client drivers cannot keep the device in low power state during
> > suspend and expect context retention.
> 
> I don't know what pm_suspend_via_firmware() means.  The kernel-doc
> says "platform firmware is going to be invoked at the end of the
> system-wide power management transition," but that doesn't say
> anything about what firmware might do or what it means to drivers.
> 

It's hard to predict what the firmware might do after it gains control from the
OS. But as far as the API goes, it just expects the drivers to save the context
and reset the device so that the firmware could do anything it want.

> Based on d916b1be94b6 ("nvme-pci: use host managed power state for
> suspend"), which used it in nvme_suspend(), I guess the assumption is
> that pm_suspend_via_firmware() means the device might be put in D3cold
> and lose all its internal state, and conversely,
> !pm_suspend_via_firmware() means the device will *never* be put in a
> low-power state that loses internal state.
> 

Yes, that's the assumption. Though, the firmware might not do D3Cold at all,
but the drivers should be prepared for that to be compatible with all firmware
implementations.

- Mani

-- 
மணிவண்ணன் சதாசிவம்
Re: [PATCH 0/4] PCI: Introduce pci_dev_suspend_retention_supported() API
Posted by Bjorn Helgaas 1 month, 4 weeks ago
On Fri, Apr 17, 2026 at 04:34:53PM +0530, Manivannan Sadhasivam wrote:
> On Thu, Apr 16, 2026 at 02:11:11PM -0500, Bjorn Helgaas wrote:
> > On Tue, Apr 14, 2026 at 09:29:38PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> > > This series introduces a new PCI API
> > > pci_dev_suspend_retention_supported() to let the client drivers
> > > know whether they can expect context retention across
> > > suspend/resume or not and uses it in the NVMe PCI host driver.
> > > 
> > > This new API is targeted to abstract the PCI power management
> > > details away from the client drivers. This is needed because
> > > client drivers like NVMe make use of APIs such as
> > > pm_suspend_via_firmware() and decide to keep the device in low
> > > power mode if this API returns 'false'. But some platforms may
> > > have other limitations like in the case of Qcom, where if the RC
> > > driver removes the resource vote to allow the SoC to enter low
> > > power mode, it cannot reliably exit the L1ss state when the
> > > endpoint asserts CLKREQ#. So in this case also, the client
> > > drivers cannot keep the device in low power state during suspend
> > > and expect context retention.
> > 
> > I don't know what pm_suspend_via_firmware() means.  The kernel-doc
> > says "platform firmware is going to be invoked at the end of the
> > system-wide power management transition," but that doesn't say
> > anything about what firmware might do or what it means to drivers.
> 
> It's hard to predict what the firmware might do after it gains
> control from the OS. But as far as the API goes, it just expects the
> drivers to save the context and reset the device so that the
> firmware could do anything it want.

I don't see anything about the driver needing to reset the device.
(Kernel-doc says "driver *may* need to reset it" but no hint about how
to know.)

Adding something like "device internal state is not preserved" would
go a long ways here.

> > Based on d916b1be94b6 ("nvme-pci: use host managed power state for
> > suspend"), which used it in nvme_suspend(), I guess the assumption
> > is that pm_suspend_via_firmware() means the device might be put in
> > D3cold and lose all its internal state, and conversely,
> > !pm_suspend_via_firmware() means the device will *never* be put in
> > a low-power state that loses internal state.
> 
> Yes, that's the assumption. Though, the firmware might not do D3Cold
> at all, but the drivers should be prepared for that to be compatible
> with all firmware implementations.

I don't think it's useful for a driver to know "firmware might not do
D3cold".  What could a driver do with that?  Unless the driver *knows*
internal state will be preserved, it must act as though the state is
lost.
Re: [PATCH 0/4] PCI: Introduce pci_dev_suspend_retention_supported() API
Posted by Manivannan Sadhasivam 1 month, 4 weeks ago
On Fri, Apr 17, 2026 at 05:29:04PM -0500, Bjorn Helgaas wrote:
> On Fri, Apr 17, 2026 at 04:34:53PM +0530, Manivannan Sadhasivam wrote:
> > On Thu, Apr 16, 2026 at 02:11:11PM -0500, Bjorn Helgaas wrote:
> > > On Tue, Apr 14, 2026 at 09:29:38PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> > > > This series introduces a new PCI API
> > > > pci_dev_suspend_retention_supported() to let the client drivers
> > > > know whether they can expect context retention across
> > > > suspend/resume or not and uses it in the NVMe PCI host driver.
> > > > 
> > > > This new API is targeted to abstract the PCI power management
> > > > details away from the client drivers. This is needed because
> > > > client drivers like NVMe make use of APIs such as
> > > > pm_suspend_via_firmware() and decide to keep the device in low
> > > > power mode if this API returns 'false'. But some platforms may
> > > > have other limitations like in the case of Qcom, where if the RC
> > > > driver removes the resource vote to allow the SoC to enter low
> > > > power mode, it cannot reliably exit the L1ss state when the
> > > > endpoint asserts CLKREQ#. So in this case also, the client
> > > > drivers cannot keep the device in low power state during suspend
> > > > and expect context retention.
> > > 
> > > I don't know what pm_suspend_via_firmware() means.  The kernel-doc
> > > says "platform firmware is going to be invoked at the end of the
> > > system-wide power management transition," but that doesn't say
> > > anything about what firmware might do or what it means to drivers.
> > 
> > It's hard to predict what the firmware might do after it gains
> > control from the OS. But as far as the API goes, it just expects the
> > drivers to save the context and reset the device so that the
> > firmware could do anything it want.
> 
> I don't see anything about the driver needing to reset the device.
> (Kernel-doc says "driver *may* need to reset it" but no hint about how
> to know.)
> 
> Adding something like "device internal state is not preserved" would
> go a long ways here.
> 

IIUC, 'may' is used in the description because not all firmware are going to
turn off or do something with the device. But for a driver that is supposed to
work with all firmware implementations, like a NIC/storage client driver, it
should save the internal state and prepare for a possible power loss. This is
what the NVMe driver does currently.

> > > Based on d916b1be94b6 ("nvme-pci: use host managed power state for
> > > suspend"), which used it in nvme_suspend(), I guess the assumption
> > > is that pm_suspend_via_firmware() means the device might be put in
> > > D3cold and lose all its internal state, and conversely,
> > > !pm_suspend_via_firmware() means the device will *never* be put in
> > > a low-power state that loses internal state.
> > 
> > Yes, that's the assumption. Though, the firmware might not do D3Cold
> > at all, but the drivers should be prepared for that to be compatible
> > with all firmware implementations.
> 
> I don't think it's useful for a driver to know "firmware might not do
> D3cold".  What could a driver do with that?  Unless the driver *knows*
> internal state will be preserved, it must act as though the state is
> lost.

A driver doesn't need to know whether device will be put into D3Cold or not. But
it does need to know whether there is a possibility or not. Because, AFAIK,
there is no way the OS can query what the firmware is going to do at the end of
the suspend. So to be on the conservative side, this API gives an indication to
the client drivers saying 'hey, firmware is going to be invoked at the end of
suspend and it may do something with the device state like invoking D3Cold or
doing something else. So be prepared for that.'

And 'be prepared' means, saving the context and resetting the device.

@Rafael: Please correct me if my above understanding is wrong.

- Mani

-- 
மணிவண்ணன் சதாசிவம்