[PATCH] PCI/ASPM: Don't reconfigure ASPM entering low-power state

Carlos Bilbao (Lambda) posted 1 patch 1 month, 2 weeks ago
drivers/pci/pci.c | 3 ---
1 file changed, 3 deletions(-)
[PATCH] PCI/ASPM: Don't reconfigure ASPM entering low-power state
Posted by Carlos Bilbao (Lambda) 1 month, 2 weeks ago
From: Carlos Bilbao <carlos.bilbao@kernel.org>

Reconfiguring ASPM when a device transitions to low-power state can enable
L1.1/L1.2 substates on the PCIe link at a time when the device is sleeping
and may be unable to exit them. ASPM should be reconfigured on D0 entry
(resume), not on the way down.

pci_set_low_power_state() calls pcie_aspm_pm_state_change() after writing
D3hot to PCI_PM_CTRL. pcie_aspm_pm_state_change() resets link->aspm_capable
to link->aspm_support and then calls pcie_config_aspm_path(), which can
enable ASPM L1.1/L1.2 substates on the PCIe link. If the device cannot
recover the link from L1.2 while in D3hot, subsequent config space reads
return 0xFFFF ("device inaccessible") and pci_power_up() fails with message
"Unable to change power state from D3hot to D0, device inaccessible".

This was observed on NVIDIA H100 SXM5 GPUs bound to vfio-pci when Linux
runtime PM suspends them to D3hot: the GPU becomes permanently inaccessible
and disappears from the PCIe bus.

The call to pcie_aspm_pm_state_change() in pci_set_low_power_state() was
restored by commit f93e71aea6c6 ("Revert "PCI/ASPM: Remove
pcie_aspm_pm_state_change()""), which reverted
commit 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()").
The revert was necessary because the
removal broke suspend/resume on certain platforms that required ASPM to be
reconfigured on D0 entry. However, the revert restored the call in both
pci_set_full_power_state() (D0 entry) and pci_set_low_power_state()
(low-power entry).

Only the D0-entry call is needed to fix the suspend/resume regression. The
low-power-entry call is harmful: reconfiguring ASPM immediately after
putting a device into D3hot can enable link substates that the device or
platform cannot exit while the device is sleeping.

Remove the pcie_aspm_pm_state_change() call from pci_set_low_power_state().
ASPM will still be reconfigured correctly when the device returns to D0 via
pci_set_full_power_state().

Fixes: f93e71aea6c6 ("Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"")
Link: https://lore.kernel.org/r/20240102232550.1751655-1-helgaas@kernel.org
Signed-off-by: Carlos Bilbao (Lambda) <carlos.bilbao@kernel.org>
---
 drivers/pci/pci.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b2ccb8e122f2..8b47887019f9 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1542,9 +1542,6 @@ static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state, bool
 				     pci_power_name(dev->current_state),
 				     pci_power_name(state));
 
-	if (dev->bus->self)
-		pcie_aspm_pm_state_change(dev->bus->self, locked);
-
 	return 0;
 }
 
-- 
2.50.1 (Apple Git-155)
Re: [PATCH] PCI/ASPM: Don't reconfigure ASPM entering low-power state
Posted by Bjorn Helgaas 1 month, 1 week ago
On Mon, Apr 27, 2026 at 09:01:04PM -0700, Carlos Bilbao (Lambda) wrote:
> From: Carlos Bilbao <carlos.bilbao@kernel.org>
> 
> Reconfiguring ASPM when a device transitions to low-power state can enable
> L1.1/L1.2 substates on the PCIe link at a time when the device is sleeping
> and may be unable to exit them. ASPM should be reconfigured on D0 entry
> (resume), not on the way down.
> 
> pci_set_low_power_state() calls pcie_aspm_pm_state_change() after writing
> D3hot to PCI_PM_CTRL. pcie_aspm_pm_state_change() resets link->aspm_capable
> to link->aspm_support and then calls pcie_config_aspm_path(), which can
> enable ASPM L1.1/L1.2 substates on the PCIe link. If the device cannot
> recover the link from L1.2 while in D3hot, subsequent config space reads
> return 0xFFFF ("device inaccessible") and pci_power_up() fails with message
> "Unable to change power state from D3hot to D0, device inaccessible".
> 
> This was observed on NVIDIA H100 SXM5 GPUs bound to vfio-pci when Linux
> runtime PM suspends them to D3hot: the GPU becomes permanently inaccessible
> and disappears from the PCIe bus.
> 
> The call to pcie_aspm_pm_state_change() in pci_set_low_power_state() was
> restored by commit f93e71aea6c6 ("Revert "PCI/ASPM: Remove
> pcie_aspm_pm_state_change()""), which reverted
> commit 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()").
> The revert was necessary because the
> removal broke suspend/resume on certain platforms that required ASPM to be
> reconfigured on D0 entry. However, the revert restored the call in both
> pci_set_full_power_state() (D0 entry) and pci_set_low_power_state()
> (low-power entry).
> 
> Only the D0-entry call is needed to fix the suspend/resume regression. The
> low-power-entry call is harmful: reconfiguring ASPM immediately after
> putting a device into D3hot can enable link substates that the device or
> platform cannot exit while the device is sleeping.
> 
> Remove the pcie_aspm_pm_state_change() call from pci_set_low_power_state().
> ASPM will still be reconfigured correctly when the device returns to D0 via
> pci_set_full_power_state().
> 
> Fixes: f93e71aea6c6 ("Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"")
> Link: https://lore.kernel.org/r/20240102232550.1751655-1-helgaas@kernel.org
> Signed-off-by: Carlos Bilbao (Lambda) <carlos.bilbao@kernel.org>

Applied to pci/aspm for v7.2, thanks!

> ---
>  drivers/pci/pci.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index b2ccb8e122f2..8b47887019f9 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1542,9 +1542,6 @@ static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state, bool
>  				     pci_power_name(dev->current_state),
>  				     pci_power_name(state));
>  
> -	if (dev->bus->self)
> -		pcie_aspm_pm_state_change(dev->bus->self, locked);
> -
>  	return 0;
>  }
>  
> -- 
> 2.50.1 (Apple Git-155)
>
Re: [PATCH] PCI/ASPM: Don't reconfigure ASPM entering low-power state
Posted by Bjorn Helgaas 1 month, 1 week ago
On Mon, Apr 27, 2026 at 09:01:04PM -0700, Carlos Bilbao (Lambda) wrote:
> From: Carlos Bilbao <carlos.bilbao@kernel.org>
> 
> Reconfiguring ASPM when a device transitions to low-power state can enable
> L1.1/L1.2 substates on the PCIe link at a time when the device is sleeping
> and may be unable to exit them. ASPM should be reconfigured on D0 entry
> (resume), not on the way down.
> 
> pci_set_low_power_state() calls pcie_aspm_pm_state_change() after writing
> D3hot to PCI_PM_CTRL. pcie_aspm_pm_state_change() resets link->aspm_capable
> to link->aspm_support and then calls pcie_config_aspm_path(), which can
> enable ASPM L1.1/L1.2 substates on the PCIe link. If the device cannot
> recover the link from L1.2 while in D3hot, subsequent config space reads
> return 0xFFFF ("device inaccessible") and pci_power_up() fails with message
> "Unable to change power state from D3hot to D0, device inaccessible".

Carlos, do you have a few lines of dmesg showing this issue that we
could quote to help people match the issue with this fix?
Re: [PATCH] PCI/ASPM: Don't reconfigure ASPM entering low-power state
Posted by Carlos Bilbao 1 month, 1 week ago
Hey Bjorn.

On 5/6/26 11:10, Bjorn Helgaas wrote:
> On Mon, Apr 27, 2026 at 09:01:04PM -0700, Carlos Bilbao (Lambda) wrote:
>> From: Carlos Bilbao <carlos.bilbao@kernel.org>
>>
>> Reconfiguring ASPM when a device transitions to low-power state can enable
>> L1.1/L1.2 substates on the PCIe link at a time when the device is sleeping
>> and may be unable to exit them. ASPM should be reconfigured on D0 entry
>> (resume), not on the way down.
>>
>> pci_set_low_power_state() calls pcie_aspm_pm_state_change() after writing
>> D3hot to PCI_PM_CTRL. pcie_aspm_pm_state_change() resets link->aspm_capable
>> to link->aspm_support and then calls pcie_config_aspm_path(), which can
>> enable ASPM L1.1/L1.2 substates on the PCIe link. If the device cannot
>> recover the link from L1.2 while in D3hot, subsequent config space reads
>> return 0xFFFF ("device inaccessible") and pci_power_up() fails with message
>> "Unable to change power state from D3hot to D0, device inaccessible".
> Carlos, do you have a few lines of dmesg showing this issue that we
> could quote to help people match the issue with this fix?


Thank you for reviewing this. Only the error message:

[160459.607156] vfio-pci 0000:5d:00.0: Unable to change power state from 
D3cold to D0, device inaccessible


Thanks,

Carlos
Re: [PATCH] PCI/ASPM: Don't reconfigure ASPM entering low-power state
Posted by Bjorn Helgaas 1 month, 1 week ago
[+cc Kai-Heng, Michael, David, Mani]

On Mon, Apr 27, 2026 at 09:01:04PM -0700, Carlos Bilbao (Lambda) wrote:
> From: Carlos Bilbao <carlos.bilbao@kernel.org>
> 
> Reconfiguring ASPM when a device transitions to low-power state can enable
> L1.1/L1.2 substates on the PCIe link at a time when the device is sleeping
> and may be unable to exit them. ASPM should be reconfigured on D0 entry
> (resume), not on the way down.
> 
> pci_set_low_power_state() calls pcie_aspm_pm_state_change() after writing
> D3hot to PCI_PM_CTRL. pcie_aspm_pm_state_change() resets link->aspm_capable
> to link->aspm_support and then calls pcie_config_aspm_path(), which can
> enable ASPM L1.1/L1.2 substates on the PCIe link. If the device cannot
> recover the link from L1.2 while in D3hot, subsequent config space reads
> return 0xFFFF ("device inaccessible") and pci_power_up() fails with message
> "Unable to change power state from D3hot to D0, device inaccessible".
> 
> This was observed on NVIDIA H100 SXM5 GPUs bound to vfio-pci when Linux
> runtime PM suspends them to D3hot: the GPU becomes permanently inaccessible
> and disappears from the PCIe bus.
> 
> The call to pcie_aspm_pm_state_change() in pci_set_low_power_state() was
> restored by commit f93e71aea6c6 ("Revert "PCI/ASPM: Remove
> pcie_aspm_pm_state_change()""), which reverted
> commit 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()").
> The revert was necessary because the
> removal broke suspend/resume on certain platforms that required ASPM to be
> reconfigured on D0 entry. However, the revert restored the call in both
> pci_set_full_power_state() (D0 entry) and pci_set_low_power_state()
> (low-power entry).
> 
> Only the D0-entry call is needed to fix the suspend/resume regression. The
> low-power-entry call is harmful: reconfiguring ASPM immediately after
> putting a device into D3hot can enable link substates that the device or
> platform cannot exit while the device is sleeping.
> 
> Remove the pcie_aspm_pm_state_change() call from pci_set_low_power_state().
> ASPM will still be reconfigured correctly when the device returns to D0 via
> pci_set_full_power_state().

Sounds right to me.  I don't know why we would want to touch ASPM
during suspend to D3hot.  I could imagine disabling ASPM states
*before* that transition, but enabling new states at that point sounds
wrong.

Any comments, Kai-Heng and Michael?

I know your regression report was long ago, Michael,
(https://lore.kernel.org/all/76c61361-b8b4-435f-a9f1-32b716763d62@5challer.de/),
so likely not practical for you to test this change, but I would hate
to have your system break again.

> Fixes: f93e71aea6c6 ("Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"")
> Link: https://lore.kernel.org/r/20240102232550.1751655-1-helgaas@kernel.org
> Signed-off-by: Carlos Bilbao (Lambda) <carlos.bilbao@kernel.org>
> ---
>  drivers/pci/pci.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index b2ccb8e122f2..8b47887019f9 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1542,9 +1542,6 @@ static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state, bool
>  				     pci_power_name(dev->current_state),
>  				     pci_power_name(state));
>  
> -	if (dev->bus->self)
> -		pcie_aspm_pm_state_change(dev->bus->self, locked);
> -
>  	return 0;
>  }
>  
> -- 
> 2.50.1 (Apple Git-155)
>