[v2] PCI/ASPM: Enable ASPM and Clock PM by default on devicetree platforms

[PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms

Posted by Manivannan Sadhasivam via B4 Relay 4 months, 2 weeks ago

From: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>

So far, the PCI subsystem has honored the ASPM and Clock PM states set by
the BIOS (through LNKCTL) during device initialization, if it relies on the
default state selected using:

* Kconfig: CONFIG_PCIEASPM_DEFAULT=y, or
* cmdline: "pcie_aspm=off", or
* FADT: ACPI_FADT_NO_ASPM

This was done conservatively to avoid issues with the buggy devices that
advertise ASPM capabilities, but behave erratically if the ASPM states are
enabled. So the PCI subsystem ended up trusting the BIOS to enable only the
ASPM states that were known to work for the devices.

But this turned out to be a problem for devicetree platforms, especially
the ARM based devicetree platforms powering Embedded and *some* Compute
devices as they tend to run without any standard BIOS. So the ASPM states
on these platforms were left disabled during boot and the PCI subsystem
never bothered to enable them, unless the user has forcefully enabled the
ASPM states through Kconfig, cmdline, and sysfs or the device drivers
themselves, enabling the ASPM states through pci_enable_link_state() APIs.

This caused runtime power issues on those platforms. So a couple of
approaches were tried to mitigate this BIOS dependency without user
intervention by enabling the ASPM states in the PCI controller drivers
after device enumeration, and overriding the ASPM/Clock PM states
by the PCI controller drivers through an API before enumeration.

But it has been concluded that none of these mitigations should really be
required and the PCI subsystem should enable the ASPM states advertised by
the devices without relying on BIOS or the PCI controller drivers. If any
device is found to be misbehaving after enabling ASPM states that they
advertised, then those devices should be quirked to disable the problematic
ASPM/Clock PM states.

In an effort to do so, start by overriding the ASPM and Clock PM states set
by the BIOS for devicetree platforms first. Separate helper functions are
introduced to override the BIOS set states by enabling all of them if
of_have_populated_dt() returns true. To aid debugging, print the overridden
ASPM and Clock PM states as well.

In the future, these helpers could be extended to allow other platforms
like VMD, newer ACPI systems with a cutoff year etc... to follow the path.

Link: https://lore.kernel.org/linux-pci/20250828204345.GA958461@bhelgaas
Suggested-by: Bjorn Helgaas <helgaas@kernel.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Link: https://patch.msgid.link/20250916-pci-dt-aspm-v1-1-778fe907c9ad@oss.qualcomm.com
---
 drivers/pci/pcie/aspm.c | 42 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 919a05b9764791c3cc469c9ada62ba5b2c405118..cda31150aec1b67b6a48b60569222ea3d1c3d41f 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -15,6 +15,7 @@
 #include <linux/math.h>
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/of.h>
 #include <linux/pci.h>
 #include <linux/pci_regs.h>
 #include <linux/errno.h>
@@ -235,13 +236,15 @@ struct pcie_link_state {
 	u32 aspm_support:7;		/* Supported ASPM state */
 	u32 aspm_enabled:7;		/* Enabled ASPM state */
 	u32 aspm_capable:7;		/* Capable ASPM state with latency */
-	u32 aspm_default:7;		/* Default ASPM state by BIOS */
+	u32 aspm_default:7;		/* Default ASPM state by BIOS or
+					   override */
 	u32 aspm_disable:7;		/* Disabled ASPM state */
 
 	/* Clock PM state */
 	u32 clkpm_capable:1;		/* Clock PM capable? */
 	u32 clkpm_enabled:1;		/* Current Clock PM state */
-	u32 clkpm_default:1;		/* Default Clock PM state by BIOS */
+	u32 clkpm_default:1;		/* Default Clock PM state by BIOS or
+					   override */
 	u32 clkpm_disable:1;		/* Clock PM disabled */
 };
 
@@ -373,6 +376,18 @@ static void pcie_set_clkpm(struct pcie_link_state *link, int enable)
 	pcie_set_clkpm_nocheck(link, enable);
 }
 
+static void pcie_clkpm_override_default_link_state(struct pcie_link_state *link,
+						   int enabled)
+{
+	struct pci_dev *pdev = link->downstream;
+
+	/* Override the BIOS disabled Clock PM state for devicetree platforms */
+	if (of_have_populated_dt() && !enabled) {
+		link->clkpm_default = 1;
+		pci_info(pdev, "Clock PM state overridden: ClockPM+\n");
+	}
+}
+
 static void pcie_clkpm_cap_init(struct pcie_link_state *link, int blacklist)
 {
 	int capable = 1, enabled = 1;
@@ -395,6 +410,7 @@ static void pcie_clkpm_cap_init(struct pcie_link_state *link, int blacklist)
 	}
 	link->clkpm_enabled = enabled;
 	link->clkpm_default = enabled;
+	pcie_clkpm_override_default_link_state(link, enabled);
 	link->clkpm_capable = capable;
 	link->clkpm_disable = blacklist ? 1 : 0;
 }
@@ -788,6 +804,26 @@ static void aspm_l1ss_init(struct pcie_link_state *link)
 		aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap);
 }
 
+static void pcie_aspm_override_default_link_state(struct pcie_link_state *link)
+{
+	struct pci_dev *pdev = link->downstream;
+	u32 override;
+
+	/* Override the BIOS disabled ASPM states for devicetree platforms */
+	if (of_have_populated_dt()) {
+		link->aspm_default = PCIE_LINK_STATE_ASPM_ALL;
+		override = link->aspm_default & ~link->aspm_enabled;
+		if (override)
+			pci_info(pdev, "ASPM states overridden: %s%s%s%s%s%s\n",
+				 (override & PCIE_LINK_STATE_L0S) ? "L0s+, " : "",
+				 (override & PCIE_LINK_STATE_L1) ? "L1+, " : "",
+				 (override & PCIE_LINK_STATE_L1_1) ? "L1.1+, " : "",
+				 (override & PCIE_LINK_STATE_L1_2) ? "L1.2+, " : "",
+				 (override & PCIE_LINK_STATE_L1_1_PCIPM) ? "L1.1 PCI-PM+, " : "",
+				 (override & PCIE_LINK_STATE_L1_2_PCIPM) ? "L1.2 PCI-PM+" : "");
+	}
+}
+
 static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
 {
 	struct pci_dev *child = link->downstream, *parent = link->pdev;
@@ -868,6 +904,8 @@ static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
 	/* Save default state */
 	link->aspm_default = link->aspm_enabled;
 
+	pcie_aspm_override_default_link_state(link);
+
 	/* Setup initial capable state. Will be updated later */
 	link->aspm_capable = link->aspm_support;
 

-- 
2.48.1

Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms

Posted by Jon Hunter 2 weeks, 4 days ago

Hi Manivannan,

On 22/09/2025 17:16, Manivannan Sadhasivam via B4 Relay wrote:
> From: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> 
> So far, the PCI subsystem has honored the ASPM and Clock PM states set by
> the BIOS (through LNKCTL) during device initialization, if it relies on the
> default state selected using:
> 
> * Kconfig: CONFIG_PCIEASPM_DEFAULT=y, or
> * cmdline: "pcie_aspm=off", or
> * FADT: ACPI_FADT_NO_ASPM
> 
> This was done conservatively to avoid issues with the buggy devices that
> advertise ASPM capabilities, but behave erratically if the ASPM states are
> enabled. So the PCI subsystem ended up trusting the BIOS to enable only the
> ASPM states that were known to work for the devices.
> 
> But this turned out to be a problem for devicetree platforms, especially
> the ARM based devicetree platforms powering Embedded and *some* Compute
> devices as they tend to run without any standard BIOS. So the ASPM states
> on these platforms were left disabled during boot and the PCI subsystem
> never bothered to enable them, unless the user has forcefully enabled the
> ASPM states through Kconfig, cmdline, and sysfs or the device drivers
> themselves, enabling the ASPM states through pci_enable_link_state() APIs.
> 
> This caused runtime power issues on those platforms. So a couple of
> approaches were tried to mitigate this BIOS dependency without user
> intervention by enabling the ASPM states in the PCI controller drivers
> after device enumeration, and overriding the ASPM/Clock PM states
> by the PCI controller drivers through an API before enumeration.
> 
> But it has been concluded that none of these mitigations should really be
> required and the PCI subsystem should enable the ASPM states advertised by
> the devices without relying on BIOS or the PCI controller drivers. If any
> device is found to be misbehaving after enabling ASPM states that they
> advertised, then those devices should be quirked to disable the problematic
> ASPM/Clock PM states.
> 
> In an effort to do so, start by overriding the ASPM and Clock PM states set
> by the BIOS for devicetree platforms first. Separate helper functions are
> introduced to override the BIOS set states by enabling all of them if
> of_have_populated_dt() returns true. To aid debugging, print the overridden
> ASPM and Clock PM states as well.
> 
> In the future, these helpers could be extended to allow other platforms
> like VMD, newer ACPI systems with a cutoff year etc... to follow the path.
> 
> Link: https://lore.kernel.org/linux-pci/20250828204345.GA958461@bhelgaas
> Suggested-by: Bjorn Helgaas <helgaas@kernel.org>
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> Link: https://patch.msgid.link/20250916-pci-dt-aspm-v1-1-778fe907c9ad@oss.qualcomm.com
> ---
>   drivers/pci/pcie/aspm.c | 42 ++++++++++++++++++++++++++++++++++++++++--
>   1 file changed, 40 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index 919a05b9764791c3cc469c9ada62ba5b2c405118..cda31150aec1b67b6a48b60569222ea3d1c3d41f 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -15,6 +15,7 @@
>   #include <linux/math.h>
>   #include <linux/module.h>
>   #include <linux/moduleparam.h>
> +#include <linux/of.h>
>   #include <linux/pci.h>
>   #include <linux/pci_regs.h>
>   #include <linux/errno.h>
> @@ -235,13 +236,15 @@ struct pcie_link_state {
>   	u32 aspm_support:7;		/* Supported ASPM state */
>   	u32 aspm_enabled:7;		/* Enabled ASPM state */
>   	u32 aspm_capable:7;		/* Capable ASPM state with latency */
> -	u32 aspm_default:7;		/* Default ASPM state by BIOS */
> +	u32 aspm_default:7;		/* Default ASPM state by BIOS or
> +					   override */
>   	u32 aspm_disable:7;		/* Disabled ASPM state */
>   
>   	/* Clock PM state */
>   	u32 clkpm_capable:1;		/* Clock PM capable? */
>   	u32 clkpm_enabled:1;		/* Current Clock PM state */
> -	u32 clkpm_default:1;		/* Default Clock PM state by BIOS */
> +	u32 clkpm_default:1;		/* Default Clock PM state by BIOS or
> +					   override */
>   	u32 clkpm_disable:1;		/* Clock PM disabled */
>   };
>   
> @@ -373,6 +376,18 @@ static void pcie_set_clkpm(struct pcie_link_state *link, int enable)
>   	pcie_set_clkpm_nocheck(link, enable);
>   }
>   
> +static void pcie_clkpm_override_default_link_state(struct pcie_link_state *link,
> +						   int enabled)
> +{
> +	struct pci_dev *pdev = link->downstream;
> +
> +	/* Override the BIOS disabled Clock PM state for devicetree platforms */
> +	if (of_have_populated_dt() && !enabled) {
> +		link->clkpm_default = 1;
> +		pci_info(pdev, "Clock PM state overridden: ClockPM+\n");
> +	}
> +}
> +
>   static void pcie_clkpm_cap_init(struct pcie_link_state *link, int blacklist)
>   {
>   	int capable = 1, enabled = 1;
> @@ -395,6 +410,7 @@ static void pcie_clkpm_cap_init(struct pcie_link_state *link, int blacklist)
>   	}
>   	link->clkpm_enabled = enabled;
>   	link->clkpm_default = enabled;
> +	pcie_clkpm_override_default_link_state(link, enabled);
>   	link->clkpm_capable = capable;
>   	link->clkpm_disable = blacklist ? 1 : 0;
>   }
> @@ -788,6 +804,26 @@ static void aspm_l1ss_init(struct pcie_link_state *link)
>   		aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap);
>   }
>   
> +static void pcie_aspm_override_default_link_state(struct pcie_link_state *link)
> +{
> +	struct pci_dev *pdev = link->downstream;
> +	u32 override;
> +
> +	/* Override the BIOS disabled ASPM states for devicetree platforms */
> +	if (of_have_populated_dt()) {
> +		link->aspm_default = PCIE_LINK_STATE_ASPM_ALL;
> +		override = link->aspm_default & ~link->aspm_enabled;
> +		if (override)
> +			pci_info(pdev, "ASPM states overridden: %s%s%s%s%s%s\n",
> +				 (override & PCIE_LINK_STATE_L0S) ? "L0s+, " : "",
> +				 (override & PCIE_LINK_STATE_L1) ? "L1+, " : "",
> +				 (override & PCIE_LINK_STATE_L1_1) ? "L1.1+, " : "",
> +				 (override & PCIE_LINK_STATE_L1_2) ? "L1.2+, " : "",
> +				 (override & PCIE_LINK_STATE_L1_1_PCIPM) ? "L1.1 PCI-PM+, " : "",
> +				 (override & PCIE_LINK_STATE_L1_2_PCIPM) ? "L1.2 PCI-PM+" : "");
> +	}
> +}
> +
>   static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
>   {
>   	struct pci_dev *child = link->downstream, *parent = link->pdev;
> @@ -868,6 +904,8 @@ static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
>   	/* Save default state */
>   	link->aspm_default = link->aspm_enabled;
>   
> +	pcie_aspm_override_default_link_state(link);
> +
>   	/* Setup initial capable state. Will be updated later */
>   	link->aspm_capable = link->aspm_support;


Since this commit was added in Linux v6.18, I have been observing a 
suspend test failures on some of our boards. The suspend test suspends 
the devices for 20 secs and before this change the board would resume in 
about ~27 secs (including the 20 sec sleep). After this change the board 
would take over 80 secs to resume and this triggered a failure.

Looking at the logs, I can see it is the NVMe device on the board that 
is having an issue, and I see the reset failing ...

  [  945.754939] r8169 0007:01:00.0 enP7p1s0: Link is Up - 1Gbps/Full -
   flow control rx/tx
  [ 1002.467432] nvme nvme0: I/O tag 12 (400c) opcode 0x9 (Admin Cmd) QID
   0 timeout, reset controller
  [ 1002.493713] nvme nvme0: 12/0/0 default/read/poll queues
  [ 1003.050448] nvme nvme0: ctrl state 1 is not RESETTING
  [ 1003.050481] OOM killer enabled.
  [ 1003.054035] nvme nvme0: Disabling device after reset failure: -19

 From the above timestamps the delay is coming from the NVMe. I see this 
issue on several boards with different NVMe devices and I can workaround 
this by disabling ASPM L0/L1 for these devices ...

  DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5011, quirk_disable_aspm_l0s_l1);
  DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5036, quirk_disable_aspm_l0s_l1);
  DECLARE_PCI_FIXUP_HEADER(0x1b4b, 0x1322, quirk_disable_aspm_l0s_l1);
  DECLARE_PCI_FIXUP_HEADER(0xc0a9, 0x540a, quirk_disable_aspm_l0s_l1);

I am curious if you have seen any similar issues?

Other PCIe devices seem to be OK (like the realtek r8169) but just the 
NVMe is having issues. So I am trying to figure out the best way to 
resolve this?

Thanks
Jon

-- 
nvpublic

Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms

Posted by Bjorn Helgaas 2 weeks, 4 days ago

[+cc NVMe folks]

On Thu, Jan 22, 2026 at 12:12:42PM +0000, Jon Hunter wrote:
> ...

> Since this commit was added in Linux v6.18, I have been observing a suspend
> test failures on some of our boards. The suspend test suspends the devices
> for 20 secs and before this change the board would resume in about ~27 secs
> (including the 20 sec sleep). After this change the board would take over 80
> secs to resume and this triggered a failure.
> 
> Looking at the logs, I can see it is the NVMe device on the board that is
> having an issue, and I see the reset failing ...
> 
>  [  945.754939] r8169 0007:01:00.0 enP7p1s0: Link is Up - 1Gbps/Full -
>   flow control rx/tx
>  [ 1002.467432] nvme nvme0: I/O tag 12 (400c) opcode 0x9 (Admin Cmd) QID
>   0 timeout, reset controller
>  [ 1002.493713] nvme nvme0: 12/0/0 default/read/poll queues
>  [ 1003.050448] nvme nvme0: ctrl state 1 is not RESETTING
>  [ 1003.050481] OOM killer enabled.
>  [ 1003.054035] nvme nvme0: Disabling device after reset failure: -19
> 
> From the above timestamps the delay is coming from the NVMe. I see this
> issue on several boards with different NVMe devices and I can workaround
> this by disabling ASPM L0/L1 for these devices ...
> 
>  DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5011, quirk_disable_aspm_l0s_l1);
>  DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5036, quirk_disable_aspm_l0s_l1);
>  DECLARE_PCI_FIXUP_HEADER(0x1b4b, 0x1322, quirk_disable_aspm_l0s_l1);
>  DECLARE_PCI_FIXUP_HEADER(0xc0a9, 0x540a, quirk_disable_aspm_l0s_l1);
> 
> I am curious if you have seen any similar issues?
> 
> Other PCIe devices seem to be OK (like the realtek r8169) but just
> the NVMe is having issues. So I am trying to figure out the best way
> to resolve this?

For context, "this commit" refers to f3ac2ff14834, modified by
df5192d9bb0e:

  f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms")
  df5192d9bb0e ("PCI/ASPM: Enable only L0s and L1 for devicetree platforms")

The fact that this suspend issue only affects NVMe reminds me of the
code in dw_pcie_suspend_noirq() [1] that bails out early if L1 is
enabled because of some NVMe expectation:

  dw_pcie_suspend_noirq()
  {
    ...
    /*
     * If L1SS is supported, then do not put the link into L2 as some
     * devices such as NVMe expect low resume latency.
     */
    if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) & PCI_EXP_LNKCTL_ASPM_L1)
      return 0;
    ...

That suggests there's some NVMe/ASPM interaction that the PCI core
doesn't understand yet.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/dwc/pcie-designware-host.c?id=v6.18#n1146

Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms

Posted by Manivannan Sadhasivam 2 weeks, 4 days ago

On Thu, Jan 22, 2026 at 09:29:03AM -0600, Bjorn Helgaas wrote:
> [+cc NVMe folks]
> 
> On Thu, Jan 22, 2026 at 12:12:42PM +0000, Jon Hunter wrote:
> > ...
> 
> > Since this commit was added in Linux v6.18, I have been observing a suspend
> > test failures on some of our boards. The suspend test suspends the devices
> > for 20 secs and before this change the board would resume in about ~27 secs
> > (including the 20 sec sleep). After this change the board would take over 80
> > secs to resume and this triggered a failure.
> > 
> > Looking at the logs, I can see it is the NVMe device on the board that is
> > having an issue, and I see the reset failing ...
> > 
> >  [  945.754939] r8169 0007:01:00.0 enP7p1s0: Link is Up - 1Gbps/Full -
> >   flow control rx/tx
> >  [ 1002.467432] nvme nvme0: I/O tag 12 (400c) opcode 0x9 (Admin Cmd) QID
> >   0 timeout, reset controller
> >  [ 1002.493713] nvme nvme0: 12/0/0 default/read/poll queues
> >  [ 1003.050448] nvme nvme0: ctrl state 1 is not RESETTING
> >  [ 1003.050481] OOM killer enabled.
> >  [ 1003.054035] nvme nvme0: Disabling device after reset failure: -19
> > 
> > From the above timestamps the delay is coming from the NVMe. I see this
> > issue on several boards with different NVMe devices and I can workaround
> > this by disabling ASPM L0/L1 for these devices ...
> > 
> >  DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5011, quirk_disable_aspm_l0s_l1);
> >  DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5036, quirk_disable_aspm_l0s_l1);
> >  DECLARE_PCI_FIXUP_HEADER(0x1b4b, 0x1322, quirk_disable_aspm_l0s_l1);
> >  DECLARE_PCI_FIXUP_HEADER(0xc0a9, 0x540a, quirk_disable_aspm_l0s_l1);
> > 
> > I am curious if you have seen any similar issues?
> > 
> > Other PCIe devices seem to be OK (like the realtek r8169) but just
> > the NVMe is having issues. So I am trying to figure out the best way
> > to resolve this?
> 
> For context, "this commit" refers to f3ac2ff14834, modified by
> df5192d9bb0e:
> 
>   f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms")
>   df5192d9bb0e ("PCI/ASPM: Enable only L0s and L1 for devicetree platforms")
> 
> The fact that this suspend issue only affects NVMe reminds me of the
> code in dw_pcie_suspend_noirq() [1] that bails out early if L1 is
> enabled because of some NVMe expectation:
> 
>   dw_pcie_suspend_noirq()
>   {
>     ...
>     /*
>      * If L1SS is supported, then do not put the link into L2 as some
>      * devices such as NVMe expect low resume latency.
>      */
>     if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) & PCI_EXP_LNKCTL_ASPM_L1)
>       return 0;
>     ...
> 
> That suggests there's some NVMe/ASPM interaction that the PCI core
> doesn't understand yet.
> 

We have this check in place since NVMe driver keeps the device in D0 and expects
the link to be in L1ss on platforms not passing below checks:

        if (pm_suspend_via_firmware() || !ctrl->npss ||
            !pcie_aspm_enabled(pdev) ||
            (ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND))

Since the majority of the DWC platforms do not pass the above checks, we don't
transition the device to D3Cold or link to L2/L3 in dw_pcie_suspend_noirq() if
the link is in L1ss. Though I think we should be checking for D0 state instead
of L1ss here.

I think what is going on here is that since before commits f3ac2ff14834 and
df5192d9bb0e, !pcie_aspm_enabled() check was passing as ASPM was not enabled for
the device (and upstream port) and after those commits, this check is not
passing and the NVMe driver is not shutting down the controller and expects the
link to be in L0/L1ss. But the Tegra controller driver initiates L2/L3
transition, and also turns off the device. So all the NVMe context is lost
during suspend and while resuming, the NVMe driver got confused due to lost
context.

Jon, could you please try the below hack and see if it fixes the issue?

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 0e4caeab739c..4b8d261117f5 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3723,7 +3723,7 @@ static int nvme_suspend(struct device *dev)
         * state (which may not be possible if the link is up).
         */
        if (pm_suspend_via_firmware() || !ctrl->npss ||
-           !pcie_aspm_enabled(pdev) ||
+           pcie_aspm_enabled(pdev) ||
            (ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND))
                return nvme_disable_prepare_reset(ndev, true);
 
This will confirm whether the issue is due to Tegra controller driver breaking
the NVMe driver assumption or not.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms

Posted by Jon Hunter 2 weeks, 4 days ago

On 22/01/2026 17:01, Manivannan Sadhasivam wrote:
> On Thu, Jan 22, 2026 at 09:29:03AM -0600, Bjorn Helgaas wrote:
>> [+cc NVMe folks]
>>
>> On Thu, Jan 22, 2026 at 12:12:42PM +0000, Jon Hunter wrote:
>>> ...
>>
>>> Since this commit was added in Linux v6.18, I have been observing a suspend
>>> test failures on some of our boards. The suspend test suspends the devices
>>> for 20 secs and before this change the board would resume in about ~27 secs
>>> (including the 20 sec sleep). After this change the board would take over 80
>>> secs to resume and this triggered a failure.
>>>
>>> Looking at the logs, I can see it is the NVMe device on the board that is
>>> having an issue, and I see the reset failing ...
>>>
>>>   [  945.754939] r8169 0007:01:00.0 enP7p1s0: Link is Up - 1Gbps/Full -
>>>    flow control rx/tx
>>>   [ 1002.467432] nvme nvme0: I/O tag 12 (400c) opcode 0x9 (Admin Cmd) QID
>>>    0 timeout, reset controller
>>>   [ 1002.493713] nvme nvme0: 12/0/0 default/read/poll queues
>>>   [ 1003.050448] nvme nvme0: ctrl state 1 is not RESETTING
>>>   [ 1003.050481] OOM killer enabled.
>>>   [ 1003.054035] nvme nvme0: Disabling device after reset failure: -19
>>>
>>>  From the above timestamps the delay is coming from the NVMe. I see this
>>> issue on several boards with different NVMe devices and I can workaround
>>> this by disabling ASPM L0/L1 for these devices ...
>>>
>>>   DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5011, quirk_disable_aspm_l0s_l1);
>>>   DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5036, quirk_disable_aspm_l0s_l1);
>>>   DECLARE_PCI_FIXUP_HEADER(0x1b4b, 0x1322, quirk_disable_aspm_l0s_l1);
>>>   DECLARE_PCI_FIXUP_HEADER(0xc0a9, 0x540a, quirk_disable_aspm_l0s_l1);
>>>
>>> I am curious if you have seen any similar issues?
>>>
>>> Other PCIe devices seem to be OK (like the realtek r8169) but just
>>> the NVMe is having issues. So I am trying to figure out the best way
>>> to resolve this?
>>
>> For context, "this commit" refers to f3ac2ff14834, modified by
>> df5192d9bb0e:
>>
>>    f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms")
>>    df5192d9bb0e ("PCI/ASPM: Enable only L0s and L1 for devicetree platforms")
>>
>> The fact that this suspend issue only affects NVMe reminds me of the
>> code in dw_pcie_suspend_noirq() [1] that bails out early if L1 is
>> enabled because of some NVMe expectation:
>>
>>    dw_pcie_suspend_noirq()
>>    {
>>      ...
>>      /*
>>       * If L1SS is supported, then do not put the link into L2 as some
>>       * devices such as NVMe expect low resume latency.
>>       */
>>      if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) & PCI_EXP_LNKCTL_ASPM_L1)
>>        return 0;
>>      ...
>>
>> That suggests there's some NVMe/ASPM interaction that the PCI core
>> doesn't understand yet.
>>
> 
> We have this check in place since NVMe driver keeps the device in D0 and expects
> the link to be in L1ss on platforms not passing below checks:
> 
>          if (pm_suspend_via_firmware() || !ctrl->npss ||
>              !pcie_aspm_enabled(pdev) ||
>              (ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND))
> 
> Since the majority of the DWC platforms do not pass the above checks, we don't
> transition the device to D3Cold or link to L2/L3 in dw_pcie_suspend_noirq() if
> the link is in L1ss. Though I think we should be checking for D0 state instead
> of L1ss here.
> 
> I think what is going on here is that since before commits f3ac2ff14834 and
> df5192d9bb0e, !pcie_aspm_enabled() check was passing as ASPM was not enabled for
> the device (and upstream port) and after those commits, this check is not
> passing and the NVMe driver is not shutting down the controller and expects the
> link to be in L0/L1ss. But the Tegra controller driver initiates L2/L3
> transition, and also turns off the device. So all the NVMe context is lost
> during suspend and while resuming, the NVMe driver got confused due to lost
> context.
> 
> Jon, could you please try the below hack and see if it fixes the issue?
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 0e4caeab739c..4b8d261117f5 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -3723,7 +3723,7 @@ static int nvme_suspend(struct device *dev)
>           * state (which may not be possible if the link is up).
>           */
>          if (pm_suspend_via_firmware() || !ctrl->npss ||
> -           !pcie_aspm_enabled(pdev) ||
> +           pcie_aspm_enabled(pdev) ||
>              (ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND))
>                  return nvme_disable_prepare_reset(ndev, true);
>   
> This will confirm whether the issue is due to Tegra controller driver breaking
> the NVMe driver assumption or not.

Yes that appears to be working! I will test some more boards to confirm.

Cheers
Jon

-- 
nvpublic

Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms

Posted by Jon Hunter 2 weeks, 3 days ago

On 22/01/2026 19:14, Jon Hunter wrote:

...

>> I think what is going on here is that since before commits 
>> f3ac2ff14834 and
>> df5192d9bb0e, !pcie_aspm_enabled() check was passing as ASPM was not 
>> enabled for
>> the device (and upstream port) and after those commits, this check is not
>> passing and the NVMe driver is not shutting down the controller and 
>> expects the
>> link to be in L0/L1ss. But the Tegra controller driver initiates L2/L3
>> transition, and also turns off the device. So all the NVMe context is 
>> lost
>> during suspend and while resuming, the NVMe driver got confused due to 
>> lost
>> context.
>>
>> Jon, could you please try the below hack and see if it fixes the issue?
>>
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index 0e4caeab739c..4b8d261117f5 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -3723,7 +3723,7 @@ static int nvme_suspend(struct device *dev)
>>           * state (which may not be possible if the link is up).
>>           */
>>          if (pm_suspend_via_firmware() || !ctrl->npss ||
>> -           !pcie_aspm_enabled(pdev) ||
>> +           pcie_aspm_enabled(pdev) ||
>>              (ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND))
>>                  return nvme_disable_prepare_reset(ndev, true);
>> This will confirm whether the issue is due to Tegra controller driver 
>> breaking
>> the NVMe driver assumption or not.
> 
> Yes that appears to be working! I will test some more boards to confirm.

So yes with the above all boards appear to be working fine.

How is this usually coordinated between the NVMe driver and Host 
controller driver? It is not clear to me exactly where the problem is 
and if the NVMe is not shutting down, then what should be preventing the 
Host controller from shutting down.

Thanks
Jon

-- 
nvpublic

Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms

Posted by Manivannan Sadhasivam 2 weeks, 3 days ago

+ Krishna

On Fri, Jan 23, 2026 at 10:55:28AM +0000, Jon Hunter wrote:
> 
> On 22/01/2026 19:14, Jon Hunter wrote:
> 
> ...
> 
> > > I think what is going on here is that since before commits
> > > f3ac2ff14834 and
> > > df5192d9bb0e, !pcie_aspm_enabled() check was passing as ASPM was not
> > > enabled for
> > > the device (and upstream port) and after those commits, this check is not
> > > passing and the NVMe driver is not shutting down the controller and
> > > expects the
> > > link to be in L0/L1ss. But the Tegra controller driver initiates L2/L3
> > > transition, and also turns off the device. So all the NVMe context
> > > is lost
> > > during suspend and while resuming, the NVMe driver got confused due
> > > to lost
> > > context.
> > > 
> > > Jon, could you please try the below hack and see if it fixes the issue?
> > > 
> > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > > index 0e4caeab739c..4b8d261117f5 100644
> > > --- a/drivers/nvme/host/pci.c
> > > +++ b/drivers/nvme/host/pci.c
> > > @@ -3723,7 +3723,7 @@ static int nvme_suspend(struct device *dev)
> > >           * state (which may not be possible if the link is up).
> > >           */
> > >          if (pm_suspend_via_firmware() || !ctrl->npss ||
> > > -           !pcie_aspm_enabled(pdev) ||
> > > +           pcie_aspm_enabled(pdev) ||
> > >              (ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND))
> > >                  return nvme_disable_prepare_reset(ndev, true);
> > > This will confirm whether the issue is due to Tegra controller
> > > driver breaking
> > > the NVMe driver assumption or not.
> > 
> > Yes that appears to be working! I will test some more boards to confirm.
> 
> So yes with the above all boards appear to be working fine.
> 
> How is this usually coordinated between the NVMe driver and Host controller
> driver? It is not clear to me exactly where the problem is and if the NVMe
> is not shutting down, then what should be preventing the Host controller
> from shutting down.
> 

Well if the NVMe driver is not shutting down the device, then it expects the
device to be in APST (NVMe low power state if supported) state and retain all
the context across the suspend/resume cycle.

But if the host controller powers down the device, then during resume, the
device will start afresh and would've lost all the context (like queue info
etc..). So when the NVMe driver resumes, it would expect the device to retain
the context and try to use the device as such. But it won't work as the device
will be in an unconfigured state and you'll see failures as you reported.

Apparently, most host controller drivers never cared about it because either
they were not tested with NVMe or they haven't enabled ASPM before. So the NVMe
driver ended up shutting down the controller during suspend. But since we
started enabling ASPM by default since 6.18, this issue is being uncovered.

So to properly fix it, we need the controller drivers to perform below checks
for all devices under the Root bus(ses) before initiating D3Cold:

1. Check if the device state is D3Hot. If it is not D3Hot, then the device is
expected to stay in the current D-state by the client driver, so D3Cold should
not be initiated.

2. Check if the device is wakeup capable. If it is, then check if it can support
wakeup from D3Cold (with WAKE#).

Only if both conditions are satisfied for all the devices under the Root busses,
then the host controller driver should initiate D3Cold sequence.

Krishna is going to post a patch that performs the above checks for
pcie-designware-host driver. But since the above checks are platform agnostic,
we should introduce a helper and resuse it across other controllers as well.

Hope this clarifies.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms

Posted by Jon Hunter 2 weeks, 3 days ago

On 23/01/2026 13:56, Manivannan Sadhasivam wrote:
> + Krishna
> 
> On Fri, Jan 23, 2026 at 10:55:28AM +0000, Jon Hunter wrote:
>>
>> On 22/01/2026 19:14, Jon Hunter wrote:
>>
>> ...
>>
>>>> I think what is going on here is that since before commits
>>>> f3ac2ff14834 and
>>>> df5192d9bb0e, !pcie_aspm_enabled() check was passing as ASPM was not
>>>> enabled for
>>>> the device (and upstream port) and after those commits, this check is not
>>>> passing and the NVMe driver is not shutting down the controller and
>>>> expects the
>>>> link to be in L0/L1ss. But the Tegra controller driver initiates L2/L3
>>>> transition, and also turns off the device. So all the NVMe context
>>>> is lost
>>>> during suspend and while resuming, the NVMe driver got confused due
>>>> to lost
>>>> context.
>>>>
>>>> Jon, could you please try the below hack and see if it fixes the issue?
>>>>
>>>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>>>> index 0e4caeab739c..4b8d261117f5 100644
>>>> --- a/drivers/nvme/host/pci.c
>>>> +++ b/drivers/nvme/host/pci.c
>>>> @@ -3723,7 +3723,7 @@ static int nvme_suspend(struct device *dev)
>>>>            * state (which may not be possible if the link is up).
>>>>            */
>>>>           if (pm_suspend_via_firmware() || !ctrl->npss ||
>>>> -           !pcie_aspm_enabled(pdev) ||
>>>> +           pcie_aspm_enabled(pdev) ||
>>>>               (ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND))
>>>>                   return nvme_disable_prepare_reset(ndev, true);
>>>> This will confirm whether the issue is due to Tegra controller
>>>> driver breaking
>>>> the NVMe driver assumption or not.
>>>
>>> Yes that appears to be working! I will test some more boards to confirm.
>>
>> So yes with the above all boards appear to be working fine.
>>
>> How is this usually coordinated between the NVMe driver and Host controller
>> driver? It is not clear to me exactly where the problem is and if the NVMe
>> is not shutting down, then what should be preventing the Host controller
>> from shutting down.
>>
> 
> Well if the NVMe driver is not shutting down the device, then it expects the
> device to be in APST (NVMe low power state if supported) state and retain all
> the context across the suspend/resume cycle.
> 
> But if the host controller powers down the device, then during resume, the
> device will start afresh and would've lost all the context (like queue info
> etc..). So when the NVMe driver resumes, it would expect the device to retain
> the context and try to use the device as such. But it won't work as the device
> will be in an unconfigured state and you'll see failures as you reported.
> 
> Apparently, most host controller drivers never cared about it because either
> they were not tested with NVMe or they haven't enabled ASPM before. So the NVMe
> driver ended up shutting down the controller during suspend. But since we
> started enabling ASPM by default since 6.18, this issue is being uncovered.
> 
> So to properly fix it, we need the controller drivers to perform below checks
> for all devices under the Root bus(ses) before initiating D3Cold:
> 
> 1. Check if the device state is D3Hot. If it is not D3Hot, then the device is
> expected to stay in the current D-state by the client driver, so D3Cold should
> not be initiated.
> 
> 2. Check if the device is wakeup capable. If it is, then check if it can support
> wakeup from D3Cold (with WAKE#).
> 
> Only if both conditions are satisfied for all the devices under the Root busses,
> then the host controller driver should initiate D3Cold sequence.
> 
> Krishna is going to post a patch that performs the above checks for
> pcie-designware-host driver. But since the above checks are platform agnostic,
> we should introduce a helper and resuse it across other controllers as well.
> 
> Hope this clarifies.

Yes it does. I am happy to test any patches for this.

Jon
-- 
nvpublic

Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms

Posted by Manivannan Sadhasivam 2 weeks, 4 days ago

On Thu, Jan 22, 2026 at 12:12:42PM +0000, Jon Hunter wrote:
> Hi Manivannan,
> 
> On 22/09/2025 17:16, Manivannan Sadhasivam via B4 Relay wrote:
> > From: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> > 
> > So far, the PCI subsystem has honored the ASPM and Clock PM states set by
> > the BIOS (through LNKCTL) during device initialization, if it relies on the
> > default state selected using:
> > 
> > * Kconfig: CONFIG_PCIEASPM_DEFAULT=y, or
> > * cmdline: "pcie_aspm=off", or
> > * FADT: ACPI_FADT_NO_ASPM
> > 
> > This was done conservatively to avoid issues with the buggy devices that
> > advertise ASPM capabilities, but behave erratically if the ASPM states are
> > enabled. So the PCI subsystem ended up trusting the BIOS to enable only the
> > ASPM states that were known to work for the devices.
> > 
> > But this turned out to be a problem for devicetree platforms, especially
> > the ARM based devicetree platforms powering Embedded and *some* Compute
> > devices as they tend to run without any standard BIOS. So the ASPM states
> > on these platforms were left disabled during boot and the PCI subsystem
> > never bothered to enable them, unless the user has forcefully enabled the
> > ASPM states through Kconfig, cmdline, and sysfs or the device drivers
> > themselves, enabling the ASPM states through pci_enable_link_state() APIs.
> > 
> > This caused runtime power issues on those platforms. So a couple of
> > approaches were tried to mitigate this BIOS dependency without user
> > intervention by enabling the ASPM states in the PCI controller drivers
> > after device enumeration, and overriding the ASPM/Clock PM states
> > by the PCI controller drivers through an API before enumeration.
> > 
> > But it has been concluded that none of these mitigations should really be
> > required and the PCI subsystem should enable the ASPM states advertised by
> > the devices without relying on BIOS or the PCI controller drivers. If any
> > device is found to be misbehaving after enabling ASPM states that they
> > advertised, then those devices should be quirked to disable the problematic
> > ASPM/Clock PM states.
> > 
> > In an effort to do so, start by overriding the ASPM and Clock PM states set
> > by the BIOS for devicetree platforms first. Separate helper functions are
> > introduced to override the BIOS set states by enabling all of them if
> > of_have_populated_dt() returns true. To aid debugging, print the overridden
> > ASPM and Clock PM states as well.
> > 
> > In the future, these helpers could be extended to allow other platforms
> > like VMD, newer ACPI systems with a cutoff year etc... to follow the path.
> > 
> > Link: https://lore.kernel.org/linux-pci/20250828204345.GA958461@bhelgaas
> > Suggested-by: Bjorn Helgaas <helgaas@kernel.org>
> > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> > Link: https://patch.msgid.link/20250916-pci-dt-aspm-v1-1-778fe907c9ad@oss.qualcomm.com
> > ---
> >   drivers/pci/pcie/aspm.c | 42 ++++++++++++++++++++++++++++++++++++++++--
> >   1 file changed, 40 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index 919a05b9764791c3cc469c9ada62ba5b2c405118..cda31150aec1b67b6a48b60569222ea3d1c3d41f 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -15,6 +15,7 @@
> >   #include <linux/math.h>
> >   #include <linux/module.h>
> >   #include <linux/moduleparam.h>
> > +#include <linux/of.h>
> >   #include <linux/pci.h>
> >   #include <linux/pci_regs.h>
> >   #include <linux/errno.h>
> > @@ -235,13 +236,15 @@ struct pcie_link_state {
> >   	u32 aspm_support:7;		/* Supported ASPM state */
> >   	u32 aspm_enabled:7;		/* Enabled ASPM state */
> >   	u32 aspm_capable:7;		/* Capable ASPM state with latency */
> > -	u32 aspm_default:7;		/* Default ASPM state by BIOS */
> > +	u32 aspm_default:7;		/* Default ASPM state by BIOS or
> > +					   override */
> >   	u32 aspm_disable:7;		/* Disabled ASPM state */
> >   	/* Clock PM state */
> >   	u32 clkpm_capable:1;		/* Clock PM capable? */
> >   	u32 clkpm_enabled:1;		/* Current Clock PM state */
> > -	u32 clkpm_default:1;		/* Default Clock PM state by BIOS */
> > +	u32 clkpm_default:1;		/* Default Clock PM state by BIOS or
> > +					   override */
> >   	u32 clkpm_disable:1;		/* Clock PM disabled */
> >   };
> > @@ -373,6 +376,18 @@ static void pcie_set_clkpm(struct pcie_link_state *link, int enable)
> >   	pcie_set_clkpm_nocheck(link, enable);
> >   }
> > +static void pcie_clkpm_override_default_link_state(struct pcie_link_state *link,
> > +						   int enabled)
> > +{
> > +	struct pci_dev *pdev = link->downstream;
> > +
> > +	/* Override the BIOS disabled Clock PM state for devicetree platforms */
> > +	if (of_have_populated_dt() && !enabled) {
> > +		link->clkpm_default = 1;
> > +		pci_info(pdev, "Clock PM state overridden: ClockPM+\n");
> > +	}
> > +}
> > +
> >   static void pcie_clkpm_cap_init(struct pcie_link_state *link, int blacklist)
> >   {
> >   	int capable = 1, enabled = 1;
> > @@ -395,6 +410,7 @@ static void pcie_clkpm_cap_init(struct pcie_link_state *link, int blacklist)
> >   	}
> >   	link->clkpm_enabled = enabled;
> >   	link->clkpm_default = enabled;
> > +	pcie_clkpm_override_default_link_state(link, enabled);
> >   	link->clkpm_capable = capable;
> >   	link->clkpm_disable = blacklist ? 1 : 0;
> >   }
> > @@ -788,6 +804,26 @@ static void aspm_l1ss_init(struct pcie_link_state *link)
> >   		aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap);
> >   }
> > +static void pcie_aspm_override_default_link_state(struct pcie_link_state *link)
> > +{
> > +	struct pci_dev *pdev = link->downstream;
> > +	u32 override;
> > +
> > +	/* Override the BIOS disabled ASPM states for devicetree platforms */
> > +	if (of_have_populated_dt()) {
> > +		link->aspm_default = PCIE_LINK_STATE_ASPM_ALL;
> > +		override = link->aspm_default & ~link->aspm_enabled;
> > +		if (override)
> > +			pci_info(pdev, "ASPM states overridden: %s%s%s%s%s%s\n",
> > +				 (override & PCIE_LINK_STATE_L0S) ? "L0s+, " : "",
> > +				 (override & PCIE_LINK_STATE_L1) ? "L1+, " : "",
> > +				 (override & PCIE_LINK_STATE_L1_1) ? "L1.1+, " : "",
> > +				 (override & PCIE_LINK_STATE_L1_2) ? "L1.2+, " : "",
> > +				 (override & PCIE_LINK_STATE_L1_1_PCIPM) ? "L1.1 PCI-PM+, " : "",
> > +				 (override & PCIE_LINK_STATE_L1_2_PCIPM) ? "L1.2 PCI-PM+" : "");
> > +	}
> > +}
> > +
> >   static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
> >   {
> >   	struct pci_dev *child = link->downstream, *parent = link->pdev;
> > @@ -868,6 +904,8 @@ static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
> >   	/* Save default state */
> >   	link->aspm_default = link->aspm_enabled;
> > +	pcie_aspm_override_default_link_state(link);
> > +
> >   	/* Setup initial capable state. Will be updated later */
> >   	link->aspm_capable = link->aspm_support;
> 
> 
> Since this commit was added in Linux v6.18, I have been observing a suspend
> test failures on some of our boards. The suspend test suspends the devices
> for 20 secs and before this change the board would resume in about ~27 secs
> (including the 20 sec sleep). After this change the board would take over 80
> secs to resume and this triggered a failure.
> 
> Looking at the logs, I can see it is the NVMe device on the board that is
> having an issue, and I see the reset failing ...
> 
>  [  945.754939] r8169 0007:01:00.0 enP7p1s0: Link is Up - 1Gbps/Full -
>   flow control rx/tx
>  [ 1002.467432] nvme nvme0: I/O tag 12 (400c) opcode 0x9 (Admin Cmd) QID
>   0 timeout, reset controller
>  [ 1002.493713] nvme nvme0: 12/0/0 default/read/poll queues
>  [ 1003.050448] nvme nvme0: ctrl state 1 is not RESETTING
>  [ 1003.050481] OOM killer enabled.
>  [ 1003.054035] nvme nvme0: Disabling device after reset failure: -19
> 
> From the above timestamps the delay is coming from the NVMe. I see this
> issue on several boards with different NVMe devices and I can workaround
> this by disabling ASPM L0/L1 for these devices ...
> 
>  DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5011, quirk_disable_aspm_l0s_l1);
>  DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5036, quirk_disable_aspm_l0s_l1);
>  DECLARE_PCI_FIXUP_HEADER(0x1b4b, 0x1322, quirk_disable_aspm_l0s_l1);
>  DECLARE_PCI_FIXUP_HEADER(0xc0a9, 0x540a, quirk_disable_aspm_l0s_l1);
> 
> I am curious if you have seen any similar issues?
> 

Marek reported a similar issue on ARM Juno board [1] on which one of the switch
downstream port failed to come up while *entering* system suspend. But I was
clueless as to why the device fails to function only while entering system
suspend and not during runtime. I suspect something is going wrong in the
suspend path.

In your case, looks like the device is failing while resuming from suspend. Did
you see any error log during suspend as well?

> Other PCIe devices seem to be OK (like the realtek r8169) but just the NVMe
> is having issues. So I am trying to figure out the best way to resolve this?
> 

First let's try to isolate the issue to L0s or L1. Can you try disabling L0s
first, then L1?

I will also inspect the suspend/resume path in the meantime.

- Mani

[1] https://lore.kernel.org/linux-pci/cae5cb24-a8b0-4088-bacd-14368f32bdc5@samsung.com/

-- 
மணிவண்ணன் சதாசிவம்

Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms

Posted by Jon Hunter 2 weeks, 4 days ago

On 22/01/2026 13:17, Manivannan Sadhasivam wrote:

...

>> Since this commit was added in Linux v6.18, I have been observing a suspend
>> test failures on some of our boards. The suspend test suspends the devices
>> for 20 secs and before this change the board would resume in about ~27 secs
>> (including the 20 sec sleep). After this change the board would take over 80
>> secs to resume and this triggered a failure.
>>
>> Looking at the logs, I can see it is the NVMe device on the board that is
>> having an issue, and I see the reset failing ...
>>
>>   [  945.754939] r8169 0007:01:00.0 enP7p1s0: Link is Up - 1Gbps/Full -
>>    flow control rx/tx
>>   [ 1002.467432] nvme nvme0: I/O tag 12 (400c) opcode 0x9 (Admin Cmd) QID
>>    0 timeout, reset controller
>>   [ 1002.493713] nvme nvme0: 12/0/0 default/read/poll queues
>>   [ 1003.050448] nvme nvme0: ctrl state 1 is not RESETTING
>>   [ 1003.050481] OOM killer enabled.
>>   [ 1003.054035] nvme nvme0: Disabling device after reset failure: -19
>>
>>  From the above timestamps the delay is coming from the NVMe. I see this
>> issue on several boards with different NVMe devices and I can workaround
>> this by disabling ASPM L0/L1 for these devices ...
>>
>>   DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5011, quirk_disable_aspm_l0s_l1);
>>   DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5036, quirk_disable_aspm_l0s_l1);
>>   DECLARE_PCI_FIXUP_HEADER(0x1b4b, 0x1322, quirk_disable_aspm_l0s_l1);
>>   DECLARE_PCI_FIXUP_HEADER(0xc0a9, 0x540a, quirk_disable_aspm_l0s_l1);
>>
>> I am curious if you have seen any similar issues?
>>
> 
> Marek reported a similar issue on ARM Juno board [1] on which one of the switch
> downstream port failed to come up while *entering* system suspend. But I was
> clueless as to why the device fails to function only while entering system
> suspend and not during runtime. I suspect something is going wrong in the
> suspend path.
> 
> In your case, looks like the device is failing while resuming from suspend. Did
> you see any error log during suspend as well?

I don't see any errors on entering suspend, just resuming from suspend. 
One other thing that I notice, on resuming in a good case I see ...

  tegra194-pcie 141e0000.pcie: Link didn't transition to L2 state

In a bad case I see ...

  tegra194-pcie 141e0000.pcie: Link didn't transition to L2 state
  tegra194-pcie 14160000.pcie: Link didn't transition to L2 state
  tegra194-pcie 14160000.pcie: Link didn't go to detect state

It appears that this is related because ...

  tegra194-pcie 14160000.pcie: PCI host bridge to bus 0004:00
  ...
  nvme nvme0: pci function 0004:01:00.0

>> Other PCIe devices seem to be OK (like the realtek r8169) but just the NVMe
>> is having issues. So I am trying to figure out the best way to resolve this?
>>
> 
> First let's try to isolate the issue to L0s or L1. Can you try disabling L0s
> first, then L1?

Yes I will try this today.

Jon

-- 
nvpublic

Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms

Posted by Manivannan Sadhasivam 2 weeks, 4 days ago

On Thu, Jan 22, 2026 at 01:43:41PM +0000, Jon Hunter wrote:
> 
> On 22/01/2026 13:17, Manivannan Sadhasivam wrote:
> 
> ...
> 
> > > Since this commit was added in Linux v6.18, I have been observing a suspend
> > > test failures on some of our boards. The suspend test suspends the devices
> > > for 20 secs and before this change the board would resume in about ~27 secs
> > > (including the 20 sec sleep). After this change the board would take over 80
> > > secs to resume and this triggered a failure.
> > > 
> > > Looking at the logs, I can see it is the NVMe device on the board that is
> > > having an issue, and I see the reset failing ...
> > > 
> > >   [  945.754939] r8169 0007:01:00.0 enP7p1s0: Link is Up - 1Gbps/Full -
> > >    flow control rx/tx
> > >   [ 1002.467432] nvme nvme0: I/O tag 12 (400c) opcode 0x9 (Admin Cmd) QID
> > >    0 timeout, reset controller
> > >   [ 1002.493713] nvme nvme0: 12/0/0 default/read/poll queues
> > >   [ 1003.050448] nvme nvme0: ctrl state 1 is not RESETTING
> > >   [ 1003.050481] OOM killer enabled.
> > >   [ 1003.054035] nvme nvme0: Disabling device after reset failure: -19
> > > 
> > >  From the above timestamps the delay is coming from the NVMe. I see this
> > > issue on several boards with different NVMe devices and I can workaround
> > > this by disabling ASPM L0/L1 for these devices ...
> > > 
> > >   DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5011, quirk_disable_aspm_l0s_l1);
> > >   DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5036, quirk_disable_aspm_l0s_l1);
> > >   DECLARE_PCI_FIXUP_HEADER(0x1b4b, 0x1322, quirk_disable_aspm_l0s_l1);
> > >   DECLARE_PCI_FIXUP_HEADER(0xc0a9, 0x540a, quirk_disable_aspm_l0s_l1);
> > > 
> > > I am curious if you have seen any similar issues?
> > > 
> > 
> > Marek reported a similar issue on ARM Juno board [1] on which one of the switch
> > downstream port failed to come up while *entering* system suspend. But I was
> > clueless as to why the device fails to function only while entering system
> > suspend and not during runtime. I suspect something is going wrong in the
> > suspend path.
> > 
> > In your case, looks like the device is failing while resuming from suspend. Did
> > you see any error log during suspend as well?
> 
> I don't see any errors on entering suspend, just resuming from suspend. One
> other thing that I notice, on resuming in a good case I see ...
> 
>  tegra194-pcie 141e0000.pcie: Link didn't transition to L2 state
> 
> In a bad case I see ...
> 
>  tegra194-pcie 141e0000.pcie: Link didn't transition to L2 state
>  tegra194-pcie 14160000.pcie: Link didn't transition to L2 state
>  tegra194-pcie 14160000.pcie: Link didn't go to detect state
> 

But this error print is coming from:

	tegra_pcie_dw_suspend_noirq()
		\__ tegra_pcie_dw_pme_turnoff()

This function broadcasts PME_Turn_Off message and expects the PME_TO_Ack
response from the device. But in both working and non-working cases, response is
not being received. Then the driver *deasserts* PERST# without asserting it
before (which is weird) and then restarts LTSSM expecting the LTSSM to be in
Detect state. And this one is not happening in non-working case. I don't know
why, maybe the device got crashed?

Can you also try skipping the suspend/resume handlers to isolate the issue with
PME_Turn_Off?

diff --git drivers/pci/controller/dwc/pcie-tegra194.c drivers/pci/controller/dwc/pcie-tegra194.c
index 0ddeef70726d..a5bb122a9477 100644
--- drivers/pci/controller/dwc/pcie-tegra194.c
+++ drivers/pci/controller/dwc/pcie-tegra194.c
@@ -2490,7 +2490,7 @@ static struct platform_driver tegra_pcie_dw_driver = {
        .shutdown = tegra_pcie_dw_shutdown,
        .driver = {
                .name   = "tegra194-pcie",
-               .pm = &tegra_pcie_dw_pm_ops,
+//             .pm = &tegra_pcie_dw_pm_ops,
                .of_match_table = tegra_pcie_dw_of_match,
        },
 };

- Mani

-- 
மணிவண்ணன் சதாசிவம்

[PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
[PATCH v2 2/2] PCI: qcom: Remove the custom ASPM enablement code