[PATCH] nvme-pci: Use NVME_QUIRK_SIMPLE_SUSPEND for Qualcomm Snapdragon 8cx Gen 3 platforms

Manivannan Sadhasivam posted 1 patch 1 year ago
drivers/nvme/host/pci.c | 10 ++++++++++
1 file changed, 10 insertions(+)
[PATCH] nvme-pci: Use NVME_QUIRK_SIMPLE_SUSPEND for Qualcomm Snapdragon 8cx Gen 3 platforms
Posted by Manivannan Sadhasivam 1 year ago
On these platforms, power to the PCIe bus is not retained if the SoC enters
its own deep low power state called, CX power collapse state during system
suspend. Once the SoC resumes after going through CX power collapse, all
the PCIe bus state will be lost. So the NVMe devices on these platforms
won't resume properly, rendering the machines useless until forcefully
restarted by the users.

To workaround this issue, the PCIe controller driver is keeping a minimal
vote on the PCIe-MEM interconnect path in its system suspend callback
current currently. While this gives a working system suspend, it also
results in increased power consumption during suspend as the SoC never
enters its low power state. So with this change, the workaround can finally
be removed.

Also it should be noted that the actual fix to this issue lies in the
PCI/PM core. But currently there is no such fix exist as of now and the
consensus to reach it is also taking a lot of time.

So use this quirk to allow users to save battery life on these platforms
until the PCI/PM core comes up with an API that tells the PCI client
drivers when to turn off the devices.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
---

Hi,

I'm sending this patch as there seems to be no consensus so far (as been
discussed in this thread [1]) in coming up with a PCI/PM API that tells the
client drivers when to safely turn off the devices. Since coming up with these
kind of generic APIs are bound to take time (I do intend to pursue), it makes
sense to allow these platforms to make use of the existing
NVME_QUIRK_SIMPLE_SUSPEND quirk in the meantime. Once such API is upstreamed and
used in this driver, this quirk can be reverted.

It should be noted that this CX power collapse issue only exist on the platforms
based on the Qcom Snapdragon 8cx Gen 3 SoC. On other platforms, it is a more of
a policy decision to decide whether to shutdown the devices or not. And these
other platforms can wait for the generic API to be made available by the PCI/PM
core.

[1] https://lore.kernel.org/linux-pci/20241118082344.8146-1-manivannan.sadhasivam@linaro.org

 drivers/nvme/host/pci.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index e2634f437f33..c52dced952a1 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3162,6 +3162,16 @@ static unsigned long check_vendor_combination_bug(struct pci_dev *pdev)
 	if (dmi_match(DMI_BOARD_NAME, "LXKT-ZXEG-N6"))
 		return NVME_QUIRK_NO_APST;
 
+	/*
+	 * Qualcomm Snapdragon 8cx Gen 3 (SC8280XP) platforms doesn't retain
+	 * power to the PCIe bus after entering low power CX power collapse
+	 * state during system suspend. So shutdown the NVMe devices to have a
+	 * working system suspend on these platforms.
+	 */
+	if (dmi_match(DMI_PRODUCT_FAMILY, "SCP_MAKENA") ||
+	    dmi_match(DMI_PRODUCT_FAMILY, "ThinkPad X13s Gen 1"))
+		return NVME_QUIRK_SIMPLE_SUSPEND;
+
 	return 0;
 }
 
-- 
2.25.1
Re: [PATCH] nvme-pci: Use NVME_QUIRK_SIMPLE_SUSPEND for Qualcomm Snapdragon 8cx Gen 3 platforms
Posted by Bjorn Helgaas 1 year ago
On Sun, Jan 26, 2025 at 10:33:09AM +0530, Manivannan Sadhasivam wrote:
> On these platforms, power to the PCIe bus is not retained if the SoC enters
> its own deep low power state called, CX power collapse state during system
> suspend. Once the SoC resumes after going through CX power collapse, all
> the PCIe bus state will be lost. So the NVMe devices on these platforms
> won't resume properly, rendering the machines useless until forcefully
> restarted by the users.

I guess "forcefully restarted" means a power cycle?

> +++ b/drivers/nvme/host/pci.c
> @@ -3162,6 +3162,16 @@ static unsigned long check_vendor_combination_bug(struct pci_dev *pdev)
>  	if (dmi_match(DMI_BOARD_NAME, "LXKT-ZXEG-N6"))
>  		return NVME_QUIRK_NO_APST;
>  
> +	/*
> +	 * Qualcomm Snapdragon 8cx Gen 3 (SC8280XP) platforms doesn't retain
> +	 * power to the PCIe bus after entering low power CX power collapse
> +	 * state during system suspend. So shutdown the NVMe devices to have a
> +	 * working system suspend on these platforms.
> +	 */
> +	if (dmi_match(DMI_PRODUCT_FAMILY, "SCP_MAKENA") ||
> +	    dmi_match(DMI_PRODUCT_FAMILY, "ThinkPad X13s Gen 1"))
> +		return NVME_QUIRK_SIMPLE_SUSPEND;

I certainly acknowledge that this is a big problem for users.  At the
same time, this seems like a maintenance nightmare of
platform-specific hacks scattered through endpoint drivers.
Re: [PATCH] nvme-pci: Use NVME_QUIRK_SIMPLE_SUSPEND for Qualcomm Snapdragon 8cx Gen 3 platforms
Posted by Manivannan Sadhasivam 12 months ago
On Sat, Feb 08, 2025 at 12:51:24PM -0600, Bjorn Helgaas wrote:
> On Sun, Jan 26, 2025 at 10:33:09AM +0530, Manivannan Sadhasivam wrote:
> > On these platforms, power to the PCIe bus is not retained if the SoC enters
> > its own deep low power state called, CX power collapse state during system
> > suspend. Once the SoC resumes after going through CX power collapse, all
> > the PCIe bus state will be lost. So the NVMe devices on these platforms
> > won't resume properly, rendering the machines useless until forcefully
> > restarted by the users.
> 
> I guess "forcefully restarted" means a power cycle?
> 

Yes!

> > +++ b/drivers/nvme/host/pci.c
> > @@ -3162,6 +3162,16 @@ static unsigned long check_vendor_combination_bug(struct pci_dev *pdev)
> >  	if (dmi_match(DMI_BOARD_NAME, "LXKT-ZXEG-N6"))
> >  		return NVME_QUIRK_NO_APST;
> >  
> > +	/*
> > +	 * Qualcomm Snapdragon 8cx Gen 3 (SC8280XP) platforms doesn't retain
> > +	 * power to the PCIe bus after entering low power CX power collapse
> > +	 * state during system suspend. So shutdown the NVMe devices to have a
> > +	 * working system suspend on these platforms.
> > +	 */
> > +	if (dmi_match(DMI_PRODUCT_FAMILY, "SCP_MAKENA") ||
> > +	    dmi_match(DMI_PRODUCT_FAMILY, "ThinkPad X13s Gen 1"))
> > +		return NVME_QUIRK_SIMPLE_SUSPEND;
> 
> I certainly acknowledge that this is a big problem for users.  At the
> same time, this seems like a maintenance nightmare of
> platform-specific hacks scattered through endpoint drivers.

This chipset is only used in Laptop form factors where there are no open PCIe
slots. And we have seen the issue with only the NVMe driver as the rest of the
endpoint drivers (WLAN, Modem) are all behaving fine.

But as I mentioned comments section, this is only an interim solution.
Obviously, we would like to have a generic solution available in PCI/PM core and
we'd certainly work towards that. Once that happens, this quirk can be reverted.

- Mani

-- 
மணிவண்ணன் சதாசிவம்
Re: [PATCH] nvme-pci: Use NVME_QUIRK_SIMPLE_SUSPEND for Qualcomm Snapdragon 8cx Gen 3 platforms
Posted by Christoph Hellwig 12 months ago
On Sat, Feb 08, 2025 at 12:51:24PM -0600, Bjorn Helgaas wrote:
> > +	/*
> > +	 * Qualcomm Snapdragon 8cx Gen 3 (SC8280XP) platforms doesn't retain
> > +	 * power to the PCIe bus after entering low power CX power collapse
> > +	 * state during system suspend. So shutdown the NVMe devices to have a
> > +	 * working system suspend on these platforms.
> > +	 */
> > +	if (dmi_match(DMI_PRODUCT_FAMILY, "SCP_MAKENA") ||
> > +	    dmi_match(DMI_PRODUCT_FAMILY, "ThinkPad X13s Gen 1"))
> > +		return NVME_QUIRK_SIMPLE_SUSPEND;
> 
> I certainly acknowledge that this is a big problem for users.  At the
> same time, this seems like a maintenance nightmare of
> platform-specific hacks scattered through endpoint drivers.

Yes.  And it's what we stated that we won't do multiple times.
This needs to be taken on in the firmware or at least core PCI/PM
code.
Re: [PATCH] nvme-pci: Use NVME_QUIRK_SIMPLE_SUSPEND for Qualcomm Snapdragon 8cx Gen 3 platforms
Posted by Manivannan Sadhasivam 12 months ago
On Mon, Feb 10, 2025 at 05:04:46AM +0100, Christoph Hellwig wrote:
> On Sat, Feb 08, 2025 at 12:51:24PM -0600, Bjorn Helgaas wrote:
> > > +	/*
> > > +	 * Qualcomm Snapdragon 8cx Gen 3 (SC8280XP) platforms doesn't retain
> > > +	 * power to the PCIe bus after entering low power CX power collapse
> > > +	 * state during system suspend. So shutdown the NVMe devices to have a
> > > +	 * working system suspend on these platforms.
> > > +	 */
> > > +	if (dmi_match(DMI_PRODUCT_FAMILY, "SCP_MAKENA") ||
> > > +	    dmi_match(DMI_PRODUCT_FAMILY, "ThinkPad X13s Gen 1"))
> > > +		return NVME_QUIRK_SIMPLE_SUSPEND;
> > 
> > I certainly acknowledge that this is a big problem for users.  At the
> > same time, this seems like a maintenance nightmare of
> > platform-specific hacks scattered through endpoint drivers.
> 
> Yes.  And it's what we stated that we won't do multiple times.

Multiple times? As I stated in the commit message, this quirk is *not* going to
be extended for any Qcom platforms as the rest of them can retain the PCIe power
state during CX Power Collapse.

And we don't need this quirk in any other endpoint drivers as this chipset is
mostly used in Laptop form factors and only the NVMe driver is found to be
causing the issues. Rest of the endpoint drivers (WLAN, Modem) are all coping
with PCIe power going down during system suspend.

> This needs to be taken on in the firmware or at least core PCI/PM
> code.

That would be the end goal. Once that happens, we can revert this patch.
Unfortunately, we need an interim solution to satisfy users.

- Mani

-- 
மணிவண்ணன் சதாசிவம்
Re: [PATCH] nvme-pci: Use NVME_QUIRK_SIMPLE_SUSPEND for Qualcomm Snapdragon 8cx Gen 3 platforms
Posted by Christoph Hellwig 12 months ago
On Mon, Feb 10, 2025 at 12:06:05PM +0530, Manivannan Sadhasivam wrote:
> Multiple times? As I stated in the commit message, this quirk is *not* going to
> be extended for any Qcom platforms as the rest of them can retain the PCIe power
> state during CX Power Collapse.
> 
> And we don't need this quirk in any other endpoint drivers as this chipset is
> mostly used in Laptop form factors and only the NVMe driver is found to be
> causing the issues. Rest of the endpoint drivers (WLAN, Modem) are all coping
> with PCIe power going down during system suspend.

No matter where you need it - the place is the core and not the driver.
Re: [PATCH] nvme-pci: Use NVME_QUIRK_SIMPLE_SUSPEND for Qualcomm Snapdragon 8cx Gen 3 platforms
Posted by Ulf Hansson 1 year ago
On Sun, 26 Jan 2025 at 06:03, Manivannan Sadhasivam
<manivannan.sadhasivam@linaro.org> wrote:
>
> On these platforms, power to the PCIe bus is not retained if the SoC enters
> its own deep low power state called, CX power collapse state during system
> suspend. Once the SoC resumes after going through CX power collapse, all
> the PCIe bus state will be lost. So the NVMe devices on these platforms
> won't resume properly, rendering the machines useless until forcefully
> restarted by the users.
>
> To workaround this issue, the PCIe controller driver is keeping a minimal
> vote on the PCIe-MEM interconnect path in its system suspend callback
> current currently. While this gives a working system suspend, it also
> results in increased power consumption during suspend as the SoC never
> enters its low power state. So with this change, the workaround can finally
> be removed.
>
> Also it should be noted that the actual fix to this issue lies in the
> PCI/PM core. But currently there is no such fix exist as of now and the
> consensus to reach it is also taking a lot of time.
>
> So use this quirk to allow users to save battery life on these platforms
> until the PCI/PM core comes up with an API that tells the PCI client
> drivers when to turn off the devices.
>
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>

FWIW, feel free to add:

Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

> ---
>
> Hi,
>
> I'm sending this patch as there seems to be no consensus so far (as been
> discussed in this thread [1]) in coming up with a PCI/PM API that tells the
> client drivers when to safely turn off the devices. Since coming up with these
> kind of generic APIs are bound to take time (I do intend to pursue), it makes
> sense to allow these platforms to make use of the existing

In whatever form you intend to continue to try to move this forward,
feel free to keep me in the loop. I am interested and will try to
provide as much feedback as I possibly can.

I can also keep an eye for a conference where we can bring this up
again, last time we discussed this was in LPC in Richmond 2023, I
think. It looks like we have some more data for discussion now, but
let's see how far we can get on LKML.

> NVME_QUIRK_SIMPLE_SUSPEND quirk in the meantime. Once such API is upstreamed and
> used in this driver, this quirk can be reverted.
>
> It should be noted that this CX power collapse issue only exist on the platforms
> based on the Qcom Snapdragon 8cx Gen 3 SoC. On other platforms, it is a more of
> a policy decision to decide whether to shutdown the devices or not. And these
> other platforms can wait for the generic API to be made available by the PCI/PM
> core.
>
> [1] https://lore.kernel.org/linux-pci/20241118082344.8146-1-manivannan.sadhasivam@linaro.org
>
>  drivers/nvme/host/pci.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index e2634f437f33..c52dced952a1 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -3162,6 +3162,16 @@ static unsigned long check_vendor_combination_bug(struct pci_dev *pdev)
>         if (dmi_match(DMI_BOARD_NAME, "LXKT-ZXEG-N6"))
>                 return NVME_QUIRK_NO_APST;
>
> +       /*
> +        * Qualcomm Snapdragon 8cx Gen 3 (SC8280XP) platforms doesn't retain
> +        * power to the PCIe bus after entering low power CX power collapse
> +        * state during system suspend. So shutdown the NVMe devices to have a
> +        * working system suspend on these platforms.
> +        */
> +       if (dmi_match(DMI_PRODUCT_FAMILY, "SCP_MAKENA") ||
> +           dmi_match(DMI_PRODUCT_FAMILY, "ThinkPad X13s Gen 1"))
> +               return NVME_QUIRK_SIMPLE_SUSPEND;
> +
>         return 0;
>  }
>
> --
> 2.25.1
>

Kind regards
Uffe
Re: [PATCH] nvme-pci: Use NVME_QUIRK_SIMPLE_SUSPEND for Qualcomm Snapdragon 8cx Gen 3 platforms
Posted by Manivannan Sadhasivam 1 year ago
On Tue, Feb 04, 2025 at 05:03:33PM +0100, Ulf Hansson wrote:
> On Sun, 26 Jan 2025 at 06:03, Manivannan Sadhasivam
> <manivannan.sadhasivam@linaro.org> wrote:
> >
> > On these platforms, power to the PCIe bus is not retained if the SoC enters
> > its own deep low power state called, CX power collapse state during system
> > suspend. Once the SoC resumes after going through CX power collapse, all
> > the PCIe bus state will be lost. So the NVMe devices on these platforms
> > won't resume properly, rendering the machines useless until forcefully
> > restarted by the users.
> >
> > To workaround this issue, the PCIe controller driver is keeping a minimal
> > vote on the PCIe-MEM interconnect path in its system suspend callback
> > current currently. While this gives a working system suspend, it also
> > results in increased power consumption during suspend as the SoC never
> > enters its low power state. So with this change, the workaround can finally
> > be removed.
> >
> > Also it should be noted that the actual fix to this issue lies in the
> > PCI/PM core. But currently there is no such fix exist as of now and the
> > consensus to reach it is also taking a lot of time.
> >
> > So use this quirk to allow users to save battery life on these platforms
> > until the PCI/PM core comes up with an API that tells the PCI client
> > drivers when to turn off the devices.
> >
> > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> 
> FWIW, feel free to add:
> 
> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

Thanks!

> 
> > ---
> >
> > Hi,
> >
> > I'm sending this patch as there seems to be no consensus so far (as been
> > discussed in this thread [1]) in coming up with a PCI/PM API that tells the
> > client drivers when to safely turn off the devices. Since coming up with these
> > kind of generic APIs are bound to take time (I do intend to pursue), it makes
> > sense to allow these platforms to make use of the existing
> 
> In whatever form you intend to continue to try to move this forward,
> feel free to keep me in the loop. I am interested and will try to
> provide as much feedback as I possibly can.
> 

Sure, will keep you in the loop.

> I can also keep an eye for a conference where we can bring this up
> again, last time we discussed this was in LPC in Richmond 2023, I
> think. It looks like we have some more data for discussion now, but
> let's see how far we can get on LKML.
> 

Ack.

- Mani

-- 
மணிவண்ணன் சதாசிவம்