[PATCH v2 0/2] PCI/ASPM: Enable ASPM and Clock PM by default on devicetree platforms

Manivannan Sadhasivam via B4 Relay posted 2 patches 4 months, 2 weeks ago
drivers/pci/controller/dwc/pcie-qcom.c | 32 --------------------------
drivers/pci/pcie/aspm.c                | 42 ++++++++++++++++++++++++++++++++--
2 files changed, 40 insertions(+), 34 deletions(-)
[PATCH v2 0/2] PCI/ASPM: Enable ASPM and Clock PM by default on devicetree platforms
Posted by Manivannan Sadhasivam via B4 Relay 4 months, 2 weeks ago
Hi,

This series is one of the 'let's bite the bullet' kind, where we have decided to
enable all ASPM and Clock PM states by default on devicetree platforms [1]. The
reason why devicetree platforms were chosen because, it will be of minimal
impact compared to the ACPI platforms. So seemed ideal to test the waters.

Problem Statement
=================

Historically, PCI subsystem relied on the BIOS to enable ASPM and Clock PM
states for PCI devices before the kernel boot if the default states are
selected using:

* Kconfig: CONFIG_PCIEASPM_DEFAULT=y, or
* cmdline: "pcie_aspm=off", or
* FADT: ACPI_FADT_NO_ASPM

This was done to avoid enabling ASPM for the buggy devices that are known to
create issues with ASPM (even though they advertise the ASPM capability). But
BIOS is not at all a thing on most of the non-x86 platforms. For instance, the
majority of the Embedded and Compute ARM based platforms using devicetree have
something called bootloader, which is not anyway near the standard BIOS used in
x86 based platforms. And these bootloaders wouldn't touch PCIe at all, unless
they boot using PCIe storage, even then there would be no guarantee that the
ASPM states will get enabled. Another example is the Intel's VMD domain that is
not at all configured by the BIOS. But, this series is not enabling ASPM/Clock
PM for VMD domain. I hope it will be done similarly in the future patches.

Solution
========

So to avoid relying on BIOS, it was agreed [2] that the PCI subsystem has to
enable ASPM and Clock PM states based on the device capability. If any devices
misbehave, then they should be quirked accordingly.

First patch of this series introduces two helper functions to enable all ASPM
and Clock PM states if of_have_populated_dt() is true. Second patch drops the
custom ASPM enablement code from the pcie-qcom driver as it is no longer needed.

Testing
=======

This series is tested on Lenovo Thinkpad T14s based on Snapdragon X1 SoC. All
supported ASPM states are getting enabled for both the NVMe and WLAN devices by
default.

[1] https://lore.kernel.org/linux-pci/a47sg5ahflhvzyzqnfxvpk3dw4clkhqlhznjxzwqpf4nyjx5dk@bcghz5o6zolk
[2] https://lore.kernel.org/linux-pci/20250828204345.GA958461@bhelgaas

Changes in v2:

- Used of_have_populated_dt() instead of CONFIG_OF to identify devicetree
  platforms
- Renamed the override helpers and changed the override print
- Moved setting the default state back to the original place and only kept the
  override in helpers

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
---
Manivannan Sadhasivam (2):
      PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
      PCI: qcom: Remove the custom ASPM enablement code

 drivers/pci/controller/dwc/pcie-qcom.c | 32 --------------------------
 drivers/pci/pcie/aspm.c                | 42 ++++++++++++++++++++++++++++++++--
 2 files changed, 40 insertions(+), 34 deletions(-)
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20250916-pci-dt-aspm-8b3a7e8d2cf1

Best regards,
-- 
Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Re: [PATCH v2 0/2] PCI/ASPM: Enable ASPM and Clock PM by default on devicetree platforms
Posted by Dmitry Baryshkov 3 months ago
On Mon, Sep 22, 2025 at 09:46:43PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> Hi,
> 
> This series is one of the 'let's bite the bullet' kind, where we have decided to
> enable all ASPM and Clock PM states by default on devicetree platforms [1]. The
> reason why devicetree platforms were chosen because, it will be of minimal
> impact compared to the ACPI platforms. So seemed ideal to test the waters.
> 
> This series is tested on Lenovo Thinkpad T14s based on Snapdragon X1 SoC. All
> supported ASPM states are getting enabled for both the NVMe and WLAN devices by
> default.
> 
> [1] https://lore.kernel.org/linux-pci/a47sg5ahflhvzyzqnfxvpk3dw4clkhqlhznjxzwqpf4nyjx5dk@bcghz5o6zolk
> [2] https://lore.kernel.org/linux-pci/20250828204345.GA958461@bhelgaas
> 
> Changes in v2:
> 
> - Used of_have_populated_dt() instead of CONFIG_OF to identify devicetree
>   platforms
> - Renamed the override helpers and changed the override print
> - Moved setting the default state back to the original place and only kept the
>   override in helpers

The series breaks the DRM CI on DB820C board (apq8096, PCIe network
card, NFS root). The board resets randomly after some time ([1]).

Note:

- Reverting just the second patch is not enough ([2])

- Reverting the second patch and picking up df5192d9bb0e ("PCI/ASPM:
  Enable only L0s and L1 for devicetree platforms") is also nout enough
  ([3])

- Only revert of both patches results in a working pipeline ([4])


[1] https://gitlab.freedesktop.org/drm/msm/-/jobs/87321332

[2] https://gitlab.freedesktop.org/drm/msm/-/jobs/87476851

[3] https://gitlab.freedesktop.org/drm/msm/-/jobs/87482677

[4] https://gitlab.freedesktop.org/drm/msm/-/jobs/87481381

> 
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> ---
> Manivannan Sadhasivam (2):
>       PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
>       PCI: qcom: Remove the custom ASPM enablement code
> 
>  drivers/pci/controller/dwc/pcie-qcom.c | 32 --------------------------
>  drivers/pci/pcie/aspm.c                | 42 ++++++++++++++++++++++++++++++++--
>  2 files changed, 40 insertions(+), 34 deletions(-)
> ---
> base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
> change-id: 20250916-pci-dt-aspm-8b3a7e8d2cf1
> 
> Best regards,
> -- 
> Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> 
> 

-- 
With best wishes
Dmitry
Re: [PATCH v2 0/2] PCI/ASPM: Enable ASPM and Clock PM by default on devicetree platforms
Posted by Val Packett 2 months, 3 weeks ago
On 11/8/25 1:18 PM, Dmitry Baryshkov wrote:
> On Mon, Sep 22, 2025 at 09:46:43PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
>> Hi,
>>
>> This series is one of the 'let's bite the bullet' kind, where we have decided to
>> enable all ASPM and Clock PM states by default on devicetree platforms [1]. The
>> reason why devicetree platforms were chosen because, it will be of minimal
>> impact compared to the ACPI platforms. So seemed ideal to test the waters.
>>
>> This series is tested on Lenovo Thinkpad T14s based on Snapdragon X1 SoC. All
>> supported ASPM states are getting enabled for both the NVMe and WLAN devices by
>> default.
>> [..]
> The series breaks the DRM CI on DB820C board (apq8096, PCIe network
> card, NFS root). The board resets randomly after some time ([1]).

Is that reset.. due to the watchdog resetting a hard-frozen system?

Me and a bunch of other people in the #aarch64-laptops irc/matrix room 
have been experiencing these random hard freezes with ASPM enabled for 
the NVMe SSD, on Hamoa (and Purwa too I think) devices.

Totally unpredictable, could be after 4 minutes or 4 days of uptime. 
Panic-indicator LED not blinking, no reaction to magic SysRq, display 
image frozen, just a complete hang until the watchdog does the reset.

I have confirmed with a modified (to accept args) enable-aspm.sh 
script[1] that disabling ASPM *only* for the SSD, while keeping it *on* 
for the WiFi adapter, is enough to keep the system stable (got to about 
a month of uptime in that state).

If you have reproduced the same issue on an entirely different SoC, it's 
probably a general driver issue.

Please, please help us debug this using your internal secret debug 
equipment :)


[1]: https://gist.github.com/valpackett/8a6207b44364de6b32652f4041fe680f

Thanks,
~val
Re: [PATCH v2 0/2] PCI/ASPM: Enable ASPM and Clock PM by default on devicetree platforms
Posted by Bjorn Helgaas 2 months, 3 weeks ago
On Tue, Nov 11, 2025 at 03:51:03AM -0300, Val Packett wrote:
> On 11/8/25 1:18 PM, Dmitry Baryshkov wrote:
> > On Mon, Sep 22, 2025 at 09:46:43PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> > > Hi,
> > > 
> > > This series is one of the 'let's bite the bullet' kind, where we have decided to
> > > enable all ASPM and Clock PM states by default on devicetree platforms [1]. The
> > > reason why devicetree platforms were chosen because, it will be of minimal
> > > impact compared to the ACPI platforms. So seemed ideal to test the waters.
> > > 
> > > This series is tested on Lenovo Thinkpad T14s based on Snapdragon X1 SoC. All
> > > supported ASPM states are getting enabled for both the NVMe and WLAN devices by
> > > default.
> > > [..]
> > The series breaks the DRM CI on DB820C board (apq8096, PCIe network
> > card, NFS root). The board resets randomly after some time ([1]).
>
> Is that reset.. due to the watchdog resetting a hard-frozen system?
> 
> Me and a bunch of other people in the #aarch64-laptops irc/matrix room have
> been experiencing these random hard freezes with ASPM enabled for the NVMe
> SSD, on Hamoa (and Purwa too I think) devices.

I don't know what controllers are in Hamoa and Purwa or what the IDs
of the root ports and endpoints are.  Can you collect the Vendor and
Device IDs (from dmesg log or "lspci -n")?  If we figure out that some
are broken, we might be able to add quirks to avoid any broken ASPM
states.

> I have confirmed with a modified (to accept args) enable-aspm.sh script[1]
> that disabling ASPM *only* for the SSD, while keeping it *on* for the WiFi
> adapter, is enough to keep the system stable (got to about a month of uptime
> in that state).
> 
> If you have reproduced the same issue on an entirely different SoC, it's
> probably a general driver issue.
> 
> Please, please help us debug this using your internal secret debug equipment
> :)
> 
> 
> [1]: https://gist.github.com/valpackett/8a6207b44364de6b32652f4041fe680f

Can you use "echo 1 > /sys/bus/pci/devices/.../link/l0s_aspm" and
similar (see Documentation/ABI/testing/sysfs-bus-pci) to do this
tuning instead of poking with setpci?  If so, it might be easier.
There are ordering requirements that aspm.c tries to observe via the
sysfs interface.

enable-aspm.sh might observe them also (I didn't look that carefully),
but if aspm.c gets them wrong, they're wrong for everybody, so we'd
like to know about that.

Bjorn
Re: [PATCH v2 0/2] PCI/ASPM: Enable ASPM and Clock PM by default on devicetree platforms
Posted by Manivannan Sadhasivam 2 months, 3 weeks ago
On Tue, Nov 11, 2025 at 03:51:03AM -0300, Val Packett wrote:
> 
> On 11/8/25 1:18 PM, Dmitry Baryshkov wrote:
> > On Mon, Sep 22, 2025 at 09:46:43PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> > > Hi,
> > > 
> > > This series is one of the 'let's bite the bullet' kind, where we have decided to
> > > enable all ASPM and Clock PM states by default on devicetree platforms [1]. The
> > > reason why devicetree platforms were chosen because, it will be of minimal
> > > impact compared to the ACPI platforms. So seemed ideal to test the waters.
> > > 
> > > This series is tested on Lenovo Thinkpad T14s based on Snapdragon X1 SoC. All
> > > supported ASPM states are getting enabled for both the NVMe and WLAN devices by
> > > default.
> > > [..]
> > The series breaks the DRM CI on DB820C board (apq8096, PCIe network
> > card, NFS root). The board resets randomly after some time ([1]).
> 
> Is that reset.. due to the watchdog resetting a hard-frozen system?
> 
> Me and a bunch of other people in the #aarch64-laptops irc/matrix room have
> been experiencing these random hard freezes with ASPM enabled for the NVMe
> SSD, on Hamoa (and Purwa too I think) devices.
> 

Interesting! ASPM is tested and found to be working on Hamoa and other Qcom
chipsets also, except Makena based chipsets that doesn't support L0s due to
incorrect PHY settings. APQ8096 might be an exception since it is a really old
target and I'm digging up internally regarding the ASPM support.

> Totally unpredictable, could be after 4 minutes or 4 days of uptime.
> Panic-indicator LED not blinking, no reaction to magic SysRq, display image
> frozen, just a complete hang until the watchdog does the reset.
> 

I have KIOXIA SSD on my T14s. I do see some random hang, but I thought those
predate the ASPM enablement as I saw them earlier as well. But even before this
series, we had ASPM enabled for SSDs on Qcom targets (or devices that gets
enumerated during initial bus scan), so it might be that the SSD doesn't support
ASPM well enough.

But I'm clueless on why it results in a hang. What I know on ARM platforms is
that we get SError aborts and other crazy bus/NOC issues if the device doesn't
respond to the PCIe read request. So the hang could be due to one of those
issues.

> I have confirmed with a modified (to accept args) enable-aspm.sh script[1]
> that disabling ASPM *only* for the SSD, while keeping it *on* for the WiFi
> adapter, is enough to keep the system stable (got to about a month of uptime
> in that state).
> 

So this confirms that the controller supports it, and the device (SSD) might be
of fault here.

> If you have reproduced the same issue on an entirely different SoC, it's
> probably a general driver issue.
> 
> Please, please help us debug this using your internal secret debug equipment
> :)
> 

Starting from v6.18-rc3, we only enable L0s and L1 by default on all devicetree
platforms. Are you seeing the hangs post -rc3 also? If so, could you please
share the SSD model by doing 'lspci -nn'?

Apologies for the inconvenience!

- Mani

-- 
மணிவண்ணன் சதாசிவம்
Re: [PATCH v2 0/2] PCI/ASPM: Enable ASPM and Clock PM by default on devicetree platforms
Posted by Val Packett 2 months, 3 weeks ago
On 11/11/25 4:19 AM, Manivannan Sadhasivam wrote:
> On Tue, Nov 11, 2025 at 03:51:03AM -0300, Val Packett wrote:
>> On 11/8/25 1:18 PM, Dmitry Baryshkov wrote:
>>> On Mon, Sep 22, 2025 at 09:46:43PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
>>>> Hi,
>>>>
>>>> This series is one of the 'let's bite the bullet' kind, where we have decided to
>>>> enable all ASPM and Clock PM states by default on devicetree platforms [1]. The
>>>> reason why devicetree platforms were chosen because, it will be of minimal
>>>> impact compared to the ACPI platforms. So seemed ideal to test the waters.
>>>>
>>>> This series is tested on Lenovo Thinkpad T14s based on Snapdragon X1 SoC. All
>>>> supported ASPM states are getting enabled for both the NVMe and WLAN devices by
>>>> default.
>>>> [..]
>>> The series breaks the DRM CI on DB820C board (apq8096, PCIe network
>>> card, NFS root). The board resets randomly after some time ([1]).
>> Is that reset.. due to the watchdog resetting a hard-frozen system?
>>
>> Me and a bunch of other people in the #aarch64-laptops irc/matrix room have
>> been experiencing these random hard freezes with ASPM enabled for the NVMe
>> SSD, on Hamoa (and Purwa too I think) devices.
>>
> Interesting! ASPM is tested and found to be working on Hamoa and other Qcom
> chipsets also, except Makena based chipsets that doesn't support L0s due to
> incorrect PHY settings. APQ8096 might be an exception since it is a really old
> target and I'm digging up internally regarding the ASPM support.
>
>> Totally unpredictable, could be after 4 minutes or 4 days of uptime.
>> Panic-indicator LED not blinking, no reaction to magic SysRq, display image
>> frozen, just a complete hang until the watchdog does the reset.
>>
> I have KIOXIA SSD on my T14s. I do see some random hang, but I thought those
> predate the ASPM enablement as I saw them earlier as well. But even before this
> series, we had ASPM enabled for SSDs on Qcom targets (or devices that gets
> enumerated during initial bus scan), so it might be that the SSD doesn't support
> ASPM well enough.

I certainly remember that ASPM *was* enabled by default when I first got 
this laptop, via the custom way that predates this series.

Actually that custom enablement code getting removed was how I 
discovered it was ASPM related!

I pulled linux-next once and suddenly the system became stable!.. and 
then I noticed +2W of battery drain..

> But I'm clueless on why it results in a hang. What I know on ARM platforms is
> that we get SError aborts and other crazy bus/NOC issues if the device doesn't
> respond to the PCIe read request. So the hang could be due to one of those
> issues.

Could the kernel be making requests before the device fully resumed from 
a sleep state?

>> I have confirmed with a modified (to accept args) enable-aspm.sh script[1]
>> that disabling ASPM *only* for the SSD, while keeping it *on* for the WiFi
>> adapter, is enough to keep the system stable (got to about a month of uptime
>> in that state).
>>
> So this confirms that the controller supports it, and the device (SSD) might be
> of fault here.
>
>> If you have reproduced the same issue on an entirely different SoC, it's
>> probably a general driver issue.
>>
>> Please, please help us debug this using your internal secret debug equipment
>> :)
>>
> Starting from v6.18-rc3, we only enable L0s and L1 by default on all devicetree
> platforms. Are you seeing the hangs post -rc3 also? If so, could you please
> share the SSD model by doing 'lspci -nn'?

Yes, still seeing them on 6.18.0-rc4-next-20251107. At least with 
pcie_aspm=force (have been using that recently, so likely all my testing 
"post -rc3" was with force on.. but others have been testing without it)

I'm currently using the stock drive: Sandisk Corp PC SN740 NVMe SSD 
(DRAM-less) [15b7:5015] (rev 01)

Though for a couple months I've used a 3rd party one, an SK Hynix BC901 
[1c5c:1d59]

And other users have different other models and still have the same issue.

// Every time something PCIe related is posted to the mailing lists I've 
been wondering if it could solve this :D
"Program correct T_POWER_ON value for L1.2 exit timing" didn't help. 
Testing "Remove DPC Extended Capability" now..


~val

Re: [PATCH v2 0/2] PCI/ASPM: Enable ASPM and Clock PM by default on devicetree platforms
Posted by Manivannan Sadhasivam 2 months, 3 weeks ago
On Tue, Nov 11, 2025 at 04:40:01AM -0300, Val Packett wrote:
> 
> On 11/11/25 4:19 AM, Manivannan Sadhasivam wrote:
> > On Tue, Nov 11, 2025 at 03:51:03AM -0300, Val Packett wrote:
> > > On 11/8/25 1:18 PM, Dmitry Baryshkov wrote:
> > > > On Mon, Sep 22, 2025 at 09:46:43PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> > > > > Hi,
> > > > > 
> > > > > This series is one of the 'let's bite the bullet' kind, where we have decided to
> > > > > enable all ASPM and Clock PM states by default on devicetree platforms [1]. The
> > > > > reason why devicetree platforms were chosen because, it will be of minimal
> > > > > impact compared to the ACPI platforms. So seemed ideal to test the waters.
> > > > > 
> > > > > This series is tested on Lenovo Thinkpad T14s based on Snapdragon X1 SoC. All
> > > > > supported ASPM states are getting enabled for both the NVMe and WLAN devices by
> > > > > default.
> > > > > [..]
> > > > The series breaks the DRM CI on DB820C board (apq8096, PCIe network
> > > > card, NFS root). The board resets randomly after some time ([1]).
> > > Is that reset.. due to the watchdog resetting a hard-frozen system?
> > > 
> > > Me and a bunch of other people in the #aarch64-laptops irc/matrix room have
> > > been experiencing these random hard freezes with ASPM enabled for the NVMe
> > > SSD, on Hamoa (and Purwa too I think) devices.
> > > 
> > Interesting! ASPM is tested and found to be working on Hamoa and other Qcom
> > chipsets also, except Makena based chipsets that doesn't support L0s due to
> > incorrect PHY settings. APQ8096 might be an exception since it is a really old
> > target and I'm digging up internally regarding the ASPM support.
> > 
> > > Totally unpredictable, could be after 4 minutes or 4 days of uptime.
> > > Panic-indicator LED not blinking, no reaction to magic SysRq, display image
> > > frozen, just a complete hang until the watchdog does the reset.
> > > 
> > I have KIOXIA SSD on my T14s. I do see some random hang, but I thought those
> > predate the ASPM enablement as I saw them earlier as well. But even before this
> > series, we had ASPM enabled for SSDs on Qcom targets (or devices that gets
> > enumerated during initial bus scan), so it might be that the SSD doesn't support
> > ASPM well enough.
> 
> I certainly remember that ASPM *was* enabled by default when I first got
> this laptop, via the custom way that predates this series.
> 
> Actually that custom enablement code getting removed was how I discovered it
> was ASPM related!
> 
> I pulled linux-next once and suddenly the system became stable!.. and then I
> noticed +2W of battery drain..
> 

Because, we only enable L0s and L1 by default and not L1ss.

> > But I'm clueless on why it results in a hang. What I know on ARM platforms is
> > that we get SError aborts and other crazy bus/NOC issues if the device doesn't
> > respond to the PCIe read request. So the hang could be due to one of those
> > issues.
> 
> Could the kernel be making requests before the device fully resumed from a
> sleep state?
> 

Kernel has no visibility on the PCIe link ASPM states as it happens autonomously
in hardware once enabled. So once kernel issues a PCIe read TLP, the link is
supposed to transition L0 and the device should respond. But if the link doesn't
come up for any reason, it will result in a completion timeout and weird things
happen on the host.

> > > I have confirmed with a modified (to accept args) enable-aspm.sh script[1]
> > > that disabling ASPM *only* for the SSD, while keeping it *on* for the WiFi
> > > adapter, is enough to keep the system stable (got to about a month of uptime
> > > in that state).
> > > 
> > So this confirms that the controller supports it, and the device (SSD) might be
> > of fault here.
> > 
> > > If you have reproduced the same issue on an entirely different SoC, it's
> > > probably a general driver issue.
> > > 
> > > Please, please help us debug this using your internal secret debug equipment
> > > :)
> > > 
> > Starting from v6.18-rc3, we only enable L0s and L1 by default on all devicetree
> > platforms. Are you seeing the hangs post -rc3 also? If so, could you please
> > share the SSD model by doing 'lspci -nn'?
> 
> Yes, still seeing them on 6.18.0-rc4-next-20251107. At least with
> pcie_aspm=force (have been using that recently, so likely all my testing
> "post -rc3" was with force on.. but others have been testing without it)
> 

pcie_aspm=force will forcefully enable all the ASPM states. So it will result in
the same crash if L1ss is not supported properly by the endpoint.

> I'm currently using the stock drive: Sandisk Corp PC SN740 NVMe SSD
> (DRAM-less) [15b7:5015] (rev 01)
> 

I'm suspecting the L1ss issue with this SSD since you said above that
next/master works fine until you pass 'pcie_aspm=force'. Could you try the below
diff with that cmdline option?

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 44e780718953..ba48f8184b68 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2525,6 +2525,16 @@ static void quirk_disable_aspm_l0s_l1(struct pci_dev *dev)
  */
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASMEDIA, 0x1080, quirk_disable_aspm_l0s_l1);
 
+static void quirk_disable_aspm_l1ss(struct pci_dev *dev)
+{
+       pci_info(dev, "Disabling ASPM L1ss\n");
+       pci_disable_link_state(dev, PCIE_LINK_STATE_L1_1 |
+                               PCIE_LINK_STATE_L1_2 |
+                               PCIE_LINK_STATE_L1_1_PCIPM |
+                               PCIE_LINK_STATE_L1_2_PCIPM);
+}
+DECLARE_PCI_FIXUP_FINAL(0x15b7, 0x5015, quirk_disable_aspm_l1ss);
+
 /*
  * Remove ASPM L0s and L1 support from cached copy of Link Capabilities so
  * aspm.c won't try to enable them.

> Though for a couple months I've used a 3rd party one, an SK Hynix BC901
> [1c5c:1d59]
> 
> And other users have different other models and still have the same issue.
> 
> // Every time something PCIe related is posted to the mailing lists I've
> been wondering if it could solve this :D
> "Program correct T_POWER_ON value for L1.2 exit timing" didn't help. Testing
> "Remove DPC Extended Capability" now..
>

You could've reported this issue to linux-pci list.

- Mani

-- 
மணிவண்ணன் சதாசிவம்
Re: [PATCH v2 0/2] PCI/ASPM: Enable ASPM and Clock PM by default on devicetree platforms
Posted by Val Packett 2 months, 3 weeks ago
On 11/11/25 7:06 AM, Manivannan Sadhasivam wrote:
> On Tue, Nov 11, 2025 at 04:40:01AM -0300, Val Packett wrote:
>> On 11/11/25 4:19 AM, Manivannan Sadhasivam wrote:
>>> [..]
>>>> Totally unpredictable, could be after 4 minutes or 4 days of uptime.
>>>> Panic-indicator LED not blinking, no reaction to magic SysRq, display image
>>>> frozen, just a complete hang until the watchdog does the reset.
>>> I have KIOXIA SSD on my T14s. I do see some random hang, but I thought those
>>> predate the ASPM enablement as I saw them earlier as well. But even before this
>>> series, we had ASPM enabled for SSDs on Qcom targets (or devices that gets
>>> enumerated during initial bus scan), so it might be that the SSD doesn't support
>>> ASPM well enough.
>> I certainly remember that ASPM *was* enabled by default when I first got
>> this laptop, via the custom way that predates this series.
>>
>> Actually that custom enablement code getting removed was how I discovered it
>> was ASPM related!
>>
>> I pulled linux-next once and suddenly the system became stable!.. and then I
>> noticed +2W of battery drain..
> Because, we only enable L0s and L1 by default and not L1ss.

Back in that short time period between the old code getting removed and 
this series landing, the default behavior was no ASPM at all, I'm pretty 
sure.

Again, with the SK hynix SSD I used back then, I *definitely* saw the 
issue with this series in and no args applied.

> [..]
>> I'm currently using the stock drive: Sandisk Corp PC SN740 NVMe SSD
>> (DRAM-less) [15b7:5015] (rev 01)
> I'm suspecting the L1ss issue with this SSD since you said above that
> next/master works fine until you pass 'pcie_aspm=force'. Could you try the below
> diff with that cmdline option?

I did *not* say that it works fine with no arg!

I said that I've only tested this stock WD SSD with 'force' so far, and 
don't have any data on *this* SSD without 'force' yet.

Now testing with this drive and no arg:

                 LnkCtl: ASPM L1 Enabled; RCB 64 bytes, LnkDisable- CommClk+
                         ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt- 
FltModeDis-

                 L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                            T_CommonMode=0us LTR1.2_Threshold=156672ns

Let's see how it goes.

But it sounds very odd that all the SSDs would be to blame and not the 
controller.. Other platforms don't seem to be having this issue. Don't 
Intel and AMD enable L1ss by default?

~val

Re: [PATCH v2 0/2] PCI/ASPM: Enable ASPM and Clock PM by default on devicetree platforms
Posted by Val Packett 2 months, 3 weeks ago
On 11/11/25 2:29 PM, Val Packett wrote:
> [..]
>
> Now testing with this drive and no arg:
>
>                 LnkCtl: ASPM L1 Enabled; RCB 64 bytes, LnkDisable- 
> CommClk+
>                         ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt- 
> FltModeDis-
>
>                 L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- 
> ASPM_L1.1-
>                            T_CommonMode=0us LTR1.2_Threshold=156672ns
>
> Let's see how it goes.

Update: close to 2 days in, went AFK to eat and came back to a gdm login 
prompt once again. (This is the stock WD drive and no force.)

This does not seem to be related to L1ss.


~val

Re: [PATCH v2 0/2] PCI/ASPM: Enable ASPM and Clock PM by default on devicetree platforms
Posted by Bjorn Helgaas 4 months, 2 weeks ago
On Mon, Sep 22, 2025 at 09:46:43PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> Hi,
> 
> This series is one of the 'let's bite the bullet' kind, where we have decided to
> enable all ASPM and Clock PM states by default on devicetree platforms [1]. The
> reason why devicetree platforms were chosen because, it will be of minimal
> impact compared to the ACPI platforms. So seemed ideal to test the waters.
> 
> Problem Statement
> =================
> 
> Historically, PCI subsystem relied on the BIOS to enable ASPM and Clock PM
> states for PCI devices before the kernel boot if the default states are
> selected using:
> 
> * Kconfig: CONFIG_PCIEASPM_DEFAULT=y, or
> * cmdline: "pcie_aspm=off", or
> * FADT: ACPI_FADT_NO_ASPM
> 
> This was done to avoid enabling ASPM for the buggy devices that are known to
> create issues with ASPM (even though they advertise the ASPM capability). But
> BIOS is not at all a thing on most of the non-x86 platforms. For instance, the
> majority of the Embedded and Compute ARM based platforms using devicetree have
> something called bootloader, which is not anyway near the standard BIOS used in
> x86 based platforms. And these bootloaders wouldn't touch PCIe at all, unless
> they boot using PCIe storage, even then there would be no guarantee that the
> ASPM states will get enabled. Another example is the Intel's VMD domain that is
> not at all configured by the BIOS. But, this series is not enabling ASPM/Clock
> PM for VMD domain. I hope it will be done similarly in the future patches.
> 
> Solution
> ========
> 
> So to avoid relying on BIOS, it was agreed [2] that the PCI subsystem has to
> enable ASPM and Clock PM states based on the device capability. If any devices
> misbehave, then they should be quirked accordingly.
> 
> First patch of this series introduces two helper functions to enable all ASPM
> and Clock PM states if of_have_populated_dt() is true. Second patch drops the
> custom ASPM enablement code from the pcie-qcom driver as it is no longer needed.
> 
> Testing
> =======
> 
> This series is tested on Lenovo Thinkpad T14s based on Snapdragon X1 SoC. All
> supported ASPM states are getting enabled for both the NVMe and WLAN devices by
> default.
> 
> [1] https://lore.kernel.org/linux-pci/a47sg5ahflhvzyzqnfxvpk3dw4clkhqlhznjxzwqpf4nyjx5dk@bcghz5o6zolk
> [2] https://lore.kernel.org/linux-pci/20250828204345.GA958461@bhelgaas
> 
> Changes in v2:
> 
> - Used of_have_populated_dt() instead of CONFIG_OF to identify devicetree
>   platforms
> - Renamed the override helpers and changed the override print
> - Moved setting the default state back to the original place and only kept the
>   override in helpers
> 
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> ---
> Manivannan Sadhasivam (2):
>       PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
>       PCI: qcom: Remove the custom ASPM enablement code
> 
>  drivers/pci/controller/dwc/pcie-qcom.c | 32 --------------------------
>  drivers/pci/pcie/aspm.c                | 42 ++++++++++++++++++++++++++++++++--
>  2 files changed, 40 insertions(+), 34 deletions(-)

I tentatively put this on pci/aspm and included it in pci/next.

I think it's too late in the cycle to include this for v6.18, so I'll
probably defer it until v6.19, but maybe we can start getting a little
more testing.