drivers/pci/controller/dwc/pcie-qcom.c | 32 -------------------------- drivers/pci/pcie/aspm.c | 42 ++++++++++++++++++++++++++++++++-- 2 files changed, 40 insertions(+), 34 deletions(-)
Hi,
This series is one of the 'let's bite the bullet' kind, where we have decided to
enable all ASPM and Clock PM states by default on devicetree platforms [1]. The
reason why devicetree platforms were chosen because, it will be of minimal
impact compared to the ACPI platforms. So seemed ideal to test the waters.
Problem Statement
=================
Historically, PCI subsystem relied on the BIOS to enable ASPM and Clock PM
states for PCI devices before the kernel boot if the default states are
selected using:
* Kconfig: CONFIG_PCIEASPM_DEFAULT=y, or
* cmdline: "pcie_aspm=off", or
* FADT: ACPI_FADT_NO_ASPM
This was done to avoid enabling ASPM for the buggy devices that are known to
create issues with ASPM (even though they advertise the ASPM capability). But
BIOS is not at all a thing on most of the non-x86 platforms. For instance, the
majority of the Embedded and Compute ARM based platforms using devicetree have
something called bootloader, which is not anyway near the standard BIOS used in
x86 based platforms. And these bootloaders wouldn't touch PCIe at all, unless
they boot using PCIe storage, even then there would be no guarantee that the
ASPM states will get enabled. Another example is the Intel's VMD domain that is
not at all configured by the BIOS. But, this series is not enabling ASPM/Clock
PM for VMD domain. I hope it will be done similarly in the future patches.
Solution
========
So to avoid relying on BIOS, it was agreed [2] that the PCI subsystem has to
enable ASPM and Clock PM states based on the device capability. If any devices
misbehave, then they should be quirked accordingly.
First patch of this series introduces two helper functions to enable all ASPM
and Clock PM states if of_have_populated_dt() is true. Second patch drops the
custom ASPM enablement code from the pcie-qcom driver as it is no longer needed.
Testing
=======
This series is tested on Lenovo Thinkpad T14s based on Snapdragon X1 SoC. All
supported ASPM states are getting enabled for both the NVMe and WLAN devices by
default.
[1] https://lore.kernel.org/linux-pci/a47sg5ahflhvzyzqnfxvpk3dw4clkhqlhznjxzwqpf4nyjx5dk@bcghz5o6zolk
[2] https://lore.kernel.org/linux-pci/20250828204345.GA958461@bhelgaas
Changes in v2:
- Used of_have_populated_dt() instead of CONFIG_OF to identify devicetree
platforms
- Renamed the override helpers and changed the override print
- Moved setting the default state back to the original place and only kept the
override in helpers
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
---
Manivannan Sadhasivam (2):
PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
PCI: qcom: Remove the custom ASPM enablement code
drivers/pci/controller/dwc/pcie-qcom.c | 32 --------------------------
drivers/pci/pcie/aspm.c | 42 ++++++++++++++++++++++++++++++++--
2 files changed, 40 insertions(+), 34 deletions(-)
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20250916-pci-dt-aspm-8b3a7e8d2cf1
Best regards,
--
Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
On Mon, Sep 22, 2025 at 09:46:43PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> Hi,
>
> This series is one of the 'let's bite the bullet' kind, where we have decided to
> enable all ASPM and Clock PM states by default on devicetree platforms [1]. The
> reason why devicetree platforms were chosen because, it will be of minimal
> impact compared to the ACPI platforms. So seemed ideal to test the waters.
>
> This series is tested on Lenovo Thinkpad T14s based on Snapdragon X1 SoC. All
> supported ASPM states are getting enabled for both the NVMe and WLAN devices by
> default.
>
> [1] https://lore.kernel.org/linux-pci/a47sg5ahflhvzyzqnfxvpk3dw4clkhqlhznjxzwqpf4nyjx5dk@bcghz5o6zolk
> [2] https://lore.kernel.org/linux-pci/20250828204345.GA958461@bhelgaas
>
> Changes in v2:
>
> - Used of_have_populated_dt() instead of CONFIG_OF to identify devicetree
> platforms
> - Renamed the override helpers and changed the override print
> - Moved setting the default state back to the original place and only kept the
> override in helpers
The series breaks the DRM CI on DB820C board (apq8096, PCIe network
card, NFS root). The board resets randomly after some time ([1]).
Note:
- Reverting just the second patch is not enough ([2])
- Reverting the second patch and picking up df5192d9bb0e ("PCI/ASPM:
Enable only L0s and L1 for devicetree platforms") is also nout enough
([3])
- Only revert of both patches results in a working pipeline ([4])
[1] https://gitlab.freedesktop.org/drm/msm/-/jobs/87321332
[2] https://gitlab.freedesktop.org/drm/msm/-/jobs/87476851
[3] https://gitlab.freedesktop.org/drm/msm/-/jobs/87482677
[4] https://gitlab.freedesktop.org/drm/msm/-/jobs/87481381
>
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> ---
> Manivannan Sadhasivam (2):
> PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
> PCI: qcom: Remove the custom ASPM enablement code
>
> drivers/pci/controller/dwc/pcie-qcom.c | 32 --------------------------
> drivers/pci/pcie/aspm.c | 42 ++++++++++++++++++++++++++++++++--
> 2 files changed, 40 insertions(+), 34 deletions(-)
> ---
> base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
> change-id: 20250916-pci-dt-aspm-8b3a7e8d2cf1
>
> Best regards,
> --
> Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
>
>
--
With best wishes
Dmitry
On 11/8/25 1:18 PM, Dmitry Baryshkov wrote: > On Mon, Sep 22, 2025 at 09:46:43PM +0530, Manivannan Sadhasivam via B4 Relay wrote: >> Hi, >> >> This series is one of the 'let's bite the bullet' kind, where we have decided to >> enable all ASPM and Clock PM states by default on devicetree platforms [1]. The >> reason why devicetree platforms were chosen because, it will be of minimal >> impact compared to the ACPI platforms. So seemed ideal to test the waters. >> >> This series is tested on Lenovo Thinkpad T14s based on Snapdragon X1 SoC. All >> supported ASPM states are getting enabled for both the NVMe and WLAN devices by >> default. >> [..] > The series breaks the DRM CI on DB820C board (apq8096, PCIe network > card, NFS root). The board resets randomly after some time ([1]). Is that reset.. due to the watchdog resetting a hard-frozen system? Me and a bunch of other people in the #aarch64-laptops irc/matrix room have been experiencing these random hard freezes with ASPM enabled for the NVMe SSD, on Hamoa (and Purwa too I think) devices. Totally unpredictable, could be after 4 minutes or 4 days of uptime. Panic-indicator LED not blinking, no reaction to magic SysRq, display image frozen, just a complete hang until the watchdog does the reset. I have confirmed with a modified (to accept args) enable-aspm.sh script[1] that disabling ASPM *only* for the SSD, while keeping it *on* for the WiFi adapter, is enough to keep the system stable (got to about a month of uptime in that state). If you have reproduced the same issue on an entirely different SoC, it's probably a general driver issue. Please, please help us debug this using your internal secret debug equipment :) [1]: https://gist.github.com/valpackett/8a6207b44364de6b32652f4041fe680f Thanks, ~val
On Tue, Nov 11, 2025 at 03:51:03AM -0300, Val Packett wrote: > On 11/8/25 1:18 PM, Dmitry Baryshkov wrote: > > On Mon, Sep 22, 2025 at 09:46:43PM +0530, Manivannan Sadhasivam via B4 Relay wrote: > > > Hi, > > > > > > This series is one of the 'let's bite the bullet' kind, where we have decided to > > > enable all ASPM and Clock PM states by default on devicetree platforms [1]. The > > > reason why devicetree platforms were chosen because, it will be of minimal > > > impact compared to the ACPI platforms. So seemed ideal to test the waters. > > > > > > This series is tested on Lenovo Thinkpad T14s based on Snapdragon X1 SoC. All > > > supported ASPM states are getting enabled for both the NVMe and WLAN devices by > > > default. > > > [..] > > The series breaks the DRM CI on DB820C board (apq8096, PCIe network > > card, NFS root). The board resets randomly after some time ([1]). > > Is that reset.. due to the watchdog resetting a hard-frozen system? > > Me and a bunch of other people in the #aarch64-laptops irc/matrix room have > been experiencing these random hard freezes with ASPM enabled for the NVMe > SSD, on Hamoa (and Purwa too I think) devices. I don't know what controllers are in Hamoa and Purwa or what the IDs of the root ports and endpoints are. Can you collect the Vendor and Device IDs (from dmesg log or "lspci -n")? If we figure out that some are broken, we might be able to add quirks to avoid any broken ASPM states. > I have confirmed with a modified (to accept args) enable-aspm.sh script[1] > that disabling ASPM *only* for the SSD, while keeping it *on* for the WiFi > adapter, is enough to keep the system stable (got to about a month of uptime > in that state). > > If you have reproduced the same issue on an entirely different SoC, it's > probably a general driver issue. > > Please, please help us debug this using your internal secret debug equipment > :) > > > [1]: https://gist.github.com/valpackett/8a6207b44364de6b32652f4041fe680f Can you use "echo 1 > /sys/bus/pci/devices/.../link/l0s_aspm" and similar (see Documentation/ABI/testing/sysfs-bus-pci) to do this tuning instead of poking with setpci? If so, it might be easier. There are ordering requirements that aspm.c tries to observe via the sysfs interface. enable-aspm.sh might observe them also (I didn't look that carefully), but if aspm.c gets them wrong, they're wrong for everybody, so we'd like to know about that. Bjorn
On Tue, Nov 11, 2025 at 03:51:03AM -0300, Val Packett wrote: > > On 11/8/25 1:18 PM, Dmitry Baryshkov wrote: > > On Mon, Sep 22, 2025 at 09:46:43PM +0530, Manivannan Sadhasivam via B4 Relay wrote: > > > Hi, > > > > > > This series is one of the 'let's bite the bullet' kind, where we have decided to > > > enable all ASPM and Clock PM states by default on devicetree platforms [1]. The > > > reason why devicetree platforms were chosen because, it will be of minimal > > > impact compared to the ACPI platforms. So seemed ideal to test the waters. > > > > > > This series is tested on Lenovo Thinkpad T14s based on Snapdragon X1 SoC. All > > > supported ASPM states are getting enabled for both the NVMe and WLAN devices by > > > default. > > > [..] > > The series breaks the DRM CI on DB820C board (apq8096, PCIe network > > card, NFS root). The board resets randomly after some time ([1]). > > Is that reset.. due to the watchdog resetting a hard-frozen system? > > Me and a bunch of other people in the #aarch64-laptops irc/matrix room have > been experiencing these random hard freezes with ASPM enabled for the NVMe > SSD, on Hamoa (and Purwa too I think) devices. > Interesting! ASPM is tested and found to be working on Hamoa and other Qcom chipsets also, except Makena based chipsets that doesn't support L0s due to incorrect PHY settings. APQ8096 might be an exception since it is a really old target and I'm digging up internally regarding the ASPM support. > Totally unpredictable, could be after 4 minutes or 4 days of uptime. > Panic-indicator LED not blinking, no reaction to magic SysRq, display image > frozen, just a complete hang until the watchdog does the reset. > I have KIOXIA SSD on my T14s. I do see some random hang, but I thought those predate the ASPM enablement as I saw them earlier as well. But even before this series, we had ASPM enabled for SSDs on Qcom targets (or devices that gets enumerated during initial bus scan), so it might be that the SSD doesn't support ASPM well enough. But I'm clueless on why it results in a hang. What I know on ARM platforms is that we get SError aborts and other crazy bus/NOC issues if the device doesn't respond to the PCIe read request. So the hang could be due to one of those issues. > I have confirmed with a modified (to accept args) enable-aspm.sh script[1] > that disabling ASPM *only* for the SSD, while keeping it *on* for the WiFi > adapter, is enough to keep the system stable (got to about a month of uptime > in that state). > So this confirms that the controller supports it, and the device (SSD) might be of fault here. > If you have reproduced the same issue on an entirely different SoC, it's > probably a general driver issue. > > Please, please help us debug this using your internal secret debug equipment > :) > Starting from v6.18-rc3, we only enable L0s and L1 by default on all devicetree platforms. Are you seeing the hangs post -rc3 also? If so, could you please share the SSD model by doing 'lspci -nn'? Apologies for the inconvenience! - Mani -- மணிவண்ணன் சதாசிவம்
On 11/11/25 4:19 AM, Manivannan Sadhasivam wrote: > On Tue, Nov 11, 2025 at 03:51:03AM -0300, Val Packett wrote: >> On 11/8/25 1:18 PM, Dmitry Baryshkov wrote: >>> On Mon, Sep 22, 2025 at 09:46:43PM +0530, Manivannan Sadhasivam via B4 Relay wrote: >>>> Hi, >>>> >>>> This series is one of the 'let's bite the bullet' kind, where we have decided to >>>> enable all ASPM and Clock PM states by default on devicetree platforms [1]. The >>>> reason why devicetree platforms were chosen because, it will be of minimal >>>> impact compared to the ACPI platforms. So seemed ideal to test the waters. >>>> >>>> This series is tested on Lenovo Thinkpad T14s based on Snapdragon X1 SoC. All >>>> supported ASPM states are getting enabled for both the NVMe and WLAN devices by >>>> default. >>>> [..] >>> The series breaks the DRM CI on DB820C board (apq8096, PCIe network >>> card, NFS root). The board resets randomly after some time ([1]). >> Is that reset.. due to the watchdog resetting a hard-frozen system? >> >> Me and a bunch of other people in the #aarch64-laptops irc/matrix room have >> been experiencing these random hard freezes with ASPM enabled for the NVMe >> SSD, on Hamoa (and Purwa too I think) devices. >> > Interesting! ASPM is tested and found to be working on Hamoa and other Qcom > chipsets also, except Makena based chipsets that doesn't support L0s due to > incorrect PHY settings. APQ8096 might be an exception since it is a really old > target and I'm digging up internally regarding the ASPM support. > >> Totally unpredictable, could be after 4 minutes or 4 days of uptime. >> Panic-indicator LED not blinking, no reaction to magic SysRq, display image >> frozen, just a complete hang until the watchdog does the reset. >> > I have KIOXIA SSD on my T14s. I do see some random hang, but I thought those > predate the ASPM enablement as I saw them earlier as well. But even before this > series, we had ASPM enabled for SSDs on Qcom targets (or devices that gets > enumerated during initial bus scan), so it might be that the SSD doesn't support > ASPM well enough. I certainly remember that ASPM *was* enabled by default when I first got this laptop, via the custom way that predates this series. Actually that custom enablement code getting removed was how I discovered it was ASPM related! I pulled linux-next once and suddenly the system became stable!.. and then I noticed +2W of battery drain.. > But I'm clueless on why it results in a hang. What I know on ARM platforms is > that we get SError aborts and other crazy bus/NOC issues if the device doesn't > respond to the PCIe read request. So the hang could be due to one of those > issues. Could the kernel be making requests before the device fully resumed from a sleep state? >> I have confirmed with a modified (to accept args) enable-aspm.sh script[1] >> that disabling ASPM *only* for the SSD, while keeping it *on* for the WiFi >> adapter, is enough to keep the system stable (got to about a month of uptime >> in that state). >> > So this confirms that the controller supports it, and the device (SSD) might be > of fault here. > >> If you have reproduced the same issue on an entirely different SoC, it's >> probably a general driver issue. >> >> Please, please help us debug this using your internal secret debug equipment >> :) >> > Starting from v6.18-rc3, we only enable L0s and L1 by default on all devicetree > platforms. Are you seeing the hangs post -rc3 also? If so, could you please > share the SSD model by doing 'lspci -nn'? Yes, still seeing them on 6.18.0-rc4-next-20251107. At least with pcie_aspm=force (have been using that recently, so likely all my testing "post -rc3" was with force on.. but others have been testing without it) I'm currently using the stock drive: Sandisk Corp PC SN740 NVMe SSD (DRAM-less) [15b7:5015] (rev 01) Though for a couple months I've used a 3rd party one, an SK Hynix BC901 [1c5c:1d59] And other users have different other models and still have the same issue. // Every time something PCIe related is posted to the mailing lists I've been wondering if it could solve this :D "Program correct T_POWER_ON value for L1.2 exit timing" didn't help. Testing "Remove DPC Extended Capability" now.. ~val
On Tue, Nov 11, 2025 at 04:40:01AM -0300, Val Packett wrote:
>
> On 11/11/25 4:19 AM, Manivannan Sadhasivam wrote:
> > On Tue, Nov 11, 2025 at 03:51:03AM -0300, Val Packett wrote:
> > > On 11/8/25 1:18 PM, Dmitry Baryshkov wrote:
> > > > On Mon, Sep 22, 2025 at 09:46:43PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> > > > > Hi,
> > > > >
> > > > > This series is one of the 'let's bite the bullet' kind, where we have decided to
> > > > > enable all ASPM and Clock PM states by default on devicetree platforms [1]. The
> > > > > reason why devicetree platforms were chosen because, it will be of minimal
> > > > > impact compared to the ACPI platforms. So seemed ideal to test the waters.
> > > > >
> > > > > This series is tested on Lenovo Thinkpad T14s based on Snapdragon X1 SoC. All
> > > > > supported ASPM states are getting enabled for both the NVMe and WLAN devices by
> > > > > default.
> > > > > [..]
> > > > The series breaks the DRM CI on DB820C board (apq8096, PCIe network
> > > > card, NFS root). The board resets randomly after some time ([1]).
> > > Is that reset.. due to the watchdog resetting a hard-frozen system?
> > >
> > > Me and a bunch of other people in the #aarch64-laptops irc/matrix room have
> > > been experiencing these random hard freezes with ASPM enabled for the NVMe
> > > SSD, on Hamoa (and Purwa too I think) devices.
> > >
> > Interesting! ASPM is tested and found to be working on Hamoa and other Qcom
> > chipsets also, except Makena based chipsets that doesn't support L0s due to
> > incorrect PHY settings. APQ8096 might be an exception since it is a really old
> > target and I'm digging up internally regarding the ASPM support.
> >
> > > Totally unpredictable, could be after 4 minutes or 4 days of uptime.
> > > Panic-indicator LED not blinking, no reaction to magic SysRq, display image
> > > frozen, just a complete hang until the watchdog does the reset.
> > >
> > I have KIOXIA SSD on my T14s. I do see some random hang, but I thought those
> > predate the ASPM enablement as I saw them earlier as well. But even before this
> > series, we had ASPM enabled for SSDs on Qcom targets (or devices that gets
> > enumerated during initial bus scan), so it might be that the SSD doesn't support
> > ASPM well enough.
>
> I certainly remember that ASPM *was* enabled by default when I first got
> this laptop, via the custom way that predates this series.
>
> Actually that custom enablement code getting removed was how I discovered it
> was ASPM related!
>
> I pulled linux-next once and suddenly the system became stable!.. and then I
> noticed +2W of battery drain..
>
Because, we only enable L0s and L1 by default and not L1ss.
> > But I'm clueless on why it results in a hang. What I know on ARM platforms is
> > that we get SError aborts and other crazy bus/NOC issues if the device doesn't
> > respond to the PCIe read request. So the hang could be due to one of those
> > issues.
>
> Could the kernel be making requests before the device fully resumed from a
> sleep state?
>
Kernel has no visibility on the PCIe link ASPM states as it happens autonomously
in hardware once enabled. So once kernel issues a PCIe read TLP, the link is
supposed to transition L0 and the device should respond. But if the link doesn't
come up for any reason, it will result in a completion timeout and weird things
happen on the host.
> > > I have confirmed with a modified (to accept args) enable-aspm.sh script[1]
> > > that disabling ASPM *only* for the SSD, while keeping it *on* for the WiFi
> > > adapter, is enough to keep the system stable (got to about a month of uptime
> > > in that state).
> > >
> > So this confirms that the controller supports it, and the device (SSD) might be
> > of fault here.
> >
> > > If you have reproduced the same issue on an entirely different SoC, it's
> > > probably a general driver issue.
> > >
> > > Please, please help us debug this using your internal secret debug equipment
> > > :)
> > >
> > Starting from v6.18-rc3, we only enable L0s and L1 by default on all devicetree
> > platforms. Are you seeing the hangs post -rc3 also? If so, could you please
> > share the SSD model by doing 'lspci -nn'?
>
> Yes, still seeing them on 6.18.0-rc4-next-20251107. At least with
> pcie_aspm=force (have been using that recently, so likely all my testing
> "post -rc3" was with force on.. but others have been testing without it)
>
pcie_aspm=force will forcefully enable all the ASPM states. So it will result in
the same crash if L1ss is not supported properly by the endpoint.
> I'm currently using the stock drive: Sandisk Corp PC SN740 NVMe SSD
> (DRAM-less) [15b7:5015] (rev 01)
>
I'm suspecting the L1ss issue with this SSD since you said above that
next/master works fine until you pass 'pcie_aspm=force'. Could you try the below
diff with that cmdline option?
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 44e780718953..ba48f8184b68 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2525,6 +2525,16 @@ static void quirk_disable_aspm_l0s_l1(struct pci_dev *dev)
*/
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASMEDIA, 0x1080, quirk_disable_aspm_l0s_l1);
+static void quirk_disable_aspm_l1ss(struct pci_dev *dev)
+{
+ pci_info(dev, "Disabling ASPM L1ss\n");
+ pci_disable_link_state(dev, PCIE_LINK_STATE_L1_1 |
+ PCIE_LINK_STATE_L1_2 |
+ PCIE_LINK_STATE_L1_1_PCIPM |
+ PCIE_LINK_STATE_L1_2_PCIPM);
+}
+DECLARE_PCI_FIXUP_FINAL(0x15b7, 0x5015, quirk_disable_aspm_l1ss);
+
/*
* Remove ASPM L0s and L1 support from cached copy of Link Capabilities so
* aspm.c won't try to enable them.
> Though for a couple months I've used a 3rd party one, an SK Hynix BC901
> [1c5c:1d59]
>
> And other users have different other models and still have the same issue.
>
> // Every time something PCIe related is posted to the mailing lists I've
> been wondering if it could solve this :D
> "Program correct T_POWER_ON value for L1.2 exit timing" didn't help. Testing
> "Remove DPC Extended Capability" now..
>
You could've reported this issue to linux-pci list.
- Mani
--
மணிவண்ணன் சதாசிவம்
On 11/11/25 7:06 AM, Manivannan Sadhasivam wrote: > On Tue, Nov 11, 2025 at 04:40:01AM -0300, Val Packett wrote: >> On 11/11/25 4:19 AM, Manivannan Sadhasivam wrote: >>> [..] >>>> Totally unpredictable, could be after 4 minutes or 4 days of uptime. >>>> Panic-indicator LED not blinking, no reaction to magic SysRq, display image >>>> frozen, just a complete hang until the watchdog does the reset. >>> I have KIOXIA SSD on my T14s. I do see some random hang, but I thought those >>> predate the ASPM enablement as I saw them earlier as well. But even before this >>> series, we had ASPM enabled for SSDs on Qcom targets (or devices that gets >>> enumerated during initial bus scan), so it might be that the SSD doesn't support >>> ASPM well enough. >> I certainly remember that ASPM *was* enabled by default when I first got >> this laptop, via the custom way that predates this series. >> >> Actually that custom enablement code getting removed was how I discovered it >> was ASPM related! >> >> I pulled linux-next once and suddenly the system became stable!.. and then I >> noticed +2W of battery drain.. > Because, we only enable L0s and L1 by default and not L1ss. Back in that short time period between the old code getting removed and this series landing, the default behavior was no ASPM at all, I'm pretty sure. Again, with the SK hynix SSD I used back then, I *definitely* saw the issue with this series in and no args applied. > [..] >> I'm currently using the stock drive: Sandisk Corp PC SN740 NVMe SSD >> (DRAM-less) [15b7:5015] (rev 01) > I'm suspecting the L1ss issue with this SSD since you said above that > next/master works fine until you pass 'pcie_aspm=force'. Could you try the below > diff with that cmdline option? I did *not* say that it works fine with no arg! I said that I've only tested this stock WD SSD with 'force' so far, and don't have any data on *this* SSD without 'force' yet. Now testing with this drive and no arg: LnkCtl: ASPM L1 Enabled; RCB 64 bytes, LnkDisable- CommClk+ ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt- FltModeDis- L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=156672ns Let's see how it goes. But it sounds very odd that all the SSDs would be to blame and not the controller.. Other platforms don't seem to be having this issue. Don't Intel and AMD enable L1ss by default? ~val
On 11/11/25 2:29 PM, Val Packett wrote: > [..] > > Now testing with this drive and no arg: > > LnkCtl: ASPM L1 Enabled; RCB 64 bytes, LnkDisable- > CommClk+ > ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt- > FltModeDis- > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- > ASPM_L1.1- > T_CommonMode=0us LTR1.2_Threshold=156672ns > > Let's see how it goes. Update: close to 2 days in, went AFK to eat and came back to a gdm login prompt once again. (This is the stock WD drive and no force.) This does not seem to be related to L1ss. ~val
On Mon, Sep 22, 2025 at 09:46:43PM +0530, Manivannan Sadhasivam via B4 Relay wrote: > Hi, > > This series is one of the 'let's bite the bullet' kind, where we have decided to > enable all ASPM and Clock PM states by default on devicetree platforms [1]. The > reason why devicetree platforms were chosen because, it will be of minimal > impact compared to the ACPI platforms. So seemed ideal to test the waters. > > Problem Statement > ================= > > Historically, PCI subsystem relied on the BIOS to enable ASPM and Clock PM > states for PCI devices before the kernel boot if the default states are > selected using: > > * Kconfig: CONFIG_PCIEASPM_DEFAULT=y, or > * cmdline: "pcie_aspm=off", or > * FADT: ACPI_FADT_NO_ASPM > > This was done to avoid enabling ASPM for the buggy devices that are known to > create issues with ASPM (even though they advertise the ASPM capability). But > BIOS is not at all a thing on most of the non-x86 platforms. For instance, the > majority of the Embedded and Compute ARM based platforms using devicetree have > something called bootloader, which is not anyway near the standard BIOS used in > x86 based platforms. And these bootloaders wouldn't touch PCIe at all, unless > they boot using PCIe storage, even then there would be no guarantee that the > ASPM states will get enabled. Another example is the Intel's VMD domain that is > not at all configured by the BIOS. But, this series is not enabling ASPM/Clock > PM for VMD domain. I hope it will be done similarly in the future patches. > > Solution > ======== > > So to avoid relying on BIOS, it was agreed [2] that the PCI subsystem has to > enable ASPM and Clock PM states based on the device capability. If any devices > misbehave, then they should be quirked accordingly. > > First patch of this series introduces two helper functions to enable all ASPM > and Clock PM states if of_have_populated_dt() is true. Second patch drops the > custom ASPM enablement code from the pcie-qcom driver as it is no longer needed. > > Testing > ======= > > This series is tested on Lenovo Thinkpad T14s based on Snapdragon X1 SoC. All > supported ASPM states are getting enabled for both the NVMe and WLAN devices by > default. > > [1] https://lore.kernel.org/linux-pci/a47sg5ahflhvzyzqnfxvpk3dw4clkhqlhznjxzwqpf4nyjx5dk@bcghz5o6zolk > [2] https://lore.kernel.org/linux-pci/20250828204345.GA958461@bhelgaas > > Changes in v2: > > - Used of_have_populated_dt() instead of CONFIG_OF to identify devicetree > platforms > - Renamed the override helpers and changed the override print > - Moved setting the default state back to the original place and only kept the > override in helpers > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com> > --- > Manivannan Sadhasivam (2): > PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms > PCI: qcom: Remove the custom ASPM enablement code > > drivers/pci/controller/dwc/pcie-qcom.c | 32 -------------------------- > drivers/pci/pcie/aspm.c | 42 ++++++++++++++++++++++++++++++++-- > 2 files changed, 40 insertions(+), 34 deletions(-) I tentatively put this on pci/aspm and included it in pci/next. I think it's too late in the cycle to include this for v6.18, so I'll probably defer it until v6.19, but maybe we can start getting a little more testing.
© 2016 - 2026 Red Hat, Inc.