drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 --- 1 file changed, 3 deletions(-)
This reverts commit 7294863a6f01248d72b61d38478978d638641bee.
This commit was erroneously applied again after commit 0ab5d711ec74
("drm/amd: Refactor `amdgpu_aspm` to be evaluated per device")
removed it, leading to very hard to debug crashes, when used with a system with two
AMD GPUs of which only one supports ASPM.
Link: https://lore.kernel.org/linux-acpi/20251006120944.7880-1-spasswolf@web.de/
Link: https://github.com/acpica/acpica/issues/1060
Fixes: 0ab5d711ec74 ("drm/amd: Refactor `amdgpu_aspm` to be evaluated per device")
Signed-off-by: Bert Karwatzki <spasswolf@web.de>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index d6d0a6e34c6b..95d26f086d54 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2405,9 +2405,6 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
return -ENODEV;
}
- if (amdgpu_aspm == -1 && !pcie_aspm_enabled(pdev))
- amdgpu_aspm = 0;
-
if (amdgpu_virtual_display ||
amdgpu_device_asic_has_dc_support(pdev, flags & AMD_ASIC_MASK))
supports_atomic = true;
--
2.47.3
On 1/31/26 6:24 PM, Bert Karwatzki wrote:
> This reverts commit 7294863a6f01248d72b61d38478978d638641bee.
>
> This commit was erroneously applied again after commit 0ab5d711ec74
> ("drm/amd: Refactor `amdgpu_aspm` to be evaluated per device")
> removed it, leading to very hard to debug crashes, when used with a system with two
> AMD GPUs of which only one supports ASPM.
>
> Link: https://lore.kernel.org/linux-acpi/20251006120944.7880-1-spasswolf@web.de/
> Link: https://github.com/acpica/acpica/issues/1060
> Fixes: 0ab5d711ec74 ("drm/amd: Refactor `amdgpu_aspm` to be evaluated per device")
>
> Signed-off-by: Bert Karwatzki <spasswolf@web.de>
> ---
Amazing detective work, thanks so much.
This added the code initially:
cba07cce39ace drm/amd: Check if ASPM is enabled from PCIe subsystem
This effectively removed it:
0ab5d711ec74d drm/amd: Refactor `amdgpu_aspm` to be evaluated per device
This was the accidental re-apply:
7294863a6f012 drm/amd: Check if ASPM is enabled from PCIe subsystem
It looks like this as right on the edge of the 5.17-rc6 and 5.18-rc1.
I think drm-fixes-2022-02-25 and amd-drm-next-5.18-2022-02-25 ended up
with different content.
Nonethless this is the correct change and I've applied it to
amd-staging-drm-next.
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index d6d0a6e34c6b..95d26f086d54 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2405,9 +2405,6 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
> return -ENODEV;
> }
>
> - if (amdgpu_aspm == -1 && !pcie_aspm_enabled(pdev))
> - amdgpu_aspm = 0;
> -
> if (amdgpu_virtual_display ||
> amdgpu_device_asic_has_dc_support(pdev, flags & AMD_ASIC_MASK))
> supports_atomic = true;
On 2/2/26 15:25, Mario Limonciello wrote:
> On 1/31/26 6:24 PM, Bert Karwatzki wrote:
>> This reverts commit 7294863a6f01248d72b61d38478978d638641bee.
>>
>> This commit was erroneously applied again after commit 0ab5d711ec74
>> ("drm/amd: Refactor `amdgpu_aspm` to be evaluated per device")
>> removed it, leading to very hard to debug crashes, when used with a system with two
>> AMD GPUs of which only one supports ASPM.
>>
>> Link: https://lore.kernel.org/linux-acpi/20251006120944.7880-1-spasswolf@web.de/
>> Link: https://github.com/acpica/acpica/issues/1060
>> Fixes: 0ab5d711ec74 ("drm/amd: Refactor `amdgpu_aspm` to be evaluated per device")
>>
>> Signed-off-by: Bert Karwatzki <spasswolf@web.de>
>> ---
>
> Amazing detective work, thanks so much.
>
> This added the code initially:
> cba07cce39ace drm/amd: Check if ASPM is enabled from PCIe subsystem
>
> This effectively removed it:
> 0ab5d711ec74d drm/amd: Refactor `amdgpu_aspm` to be evaluated per device
>
> This was the accidental re-apply:
> 7294863a6f012 drm/amd: Check if ASPM is enabled from PCIe subsystem
>
> It looks like this as right on the edge of the 5.17-rc6 and 5.18-rc1.
> I think drm-fixes-2022-02-25 and amd-drm-next-5.18-2022-02-25 ended up with different content.
>
> Nonethless this is the correct change and I've applied it to amd-staging-drm-next.
>
> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
There is just one major question left: Why is disabling ASPM causing problems?
I mean we had tons of problems with ASPM before, but only by accidentally enabling it and never accidentally disabling it.
IIRC we even suggested to disable ASPM as possible workaround.
Thanks,
Christian.
>
>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ---
>> 1 file changed, 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index d6d0a6e34c6b..95d26f086d54 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -2405,9 +2405,6 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
>> return -ENODEV;
>> }
>> - if (amdgpu_aspm == -1 && !pcie_aspm_enabled(pdev))
>> - amdgpu_aspm = 0;
>> -
>> if (amdgpu_virtual_display ||
>> amdgpu_device_asic_has_dc_support(pdev, flags & AMD_ASIC_MASK))
>> supports_atomic = true;
>
On 2/2/26 8:35 AM, Christian König wrote:
> On 2/2/26 15:25, Mario Limonciello wrote:
>> On 1/31/26 6:24 PM, Bert Karwatzki wrote:
>>> This reverts commit 7294863a6f01248d72b61d38478978d638641bee.
>>>
>>> This commit was erroneously applied again after commit 0ab5d711ec74
>>> ("drm/amd: Refactor `amdgpu_aspm` to be evaluated per device")
>>> removed it, leading to very hard to debug crashes, when used with a system with two
>>> AMD GPUs of which only one supports ASPM.
>>>
>>> Link: https://lore.kernel.org/linux-acpi/20251006120944.7880-1-spasswolf@web.de/
>>> Link: https://github.com/acpica/acpica/issues/1060
>>> Fixes: 0ab5d711ec74 ("drm/amd: Refactor `amdgpu_aspm` to be evaluated per device")
>>>
>>> Signed-off-by: Bert Karwatzki <spasswolf@web.de>
>>> ---
>>
>> Amazing detective work, thanks so much.
>>
>> This added the code initially:
>> cba07cce39ace drm/amd: Check if ASPM is enabled from PCIe subsystem
>>
>> This effectively removed it:
>> 0ab5d711ec74d drm/amd: Refactor `amdgpu_aspm` to be evaluated per device
>>
>> This was the accidental re-apply:
>> 7294863a6f012 drm/amd: Check if ASPM is enabled from PCIe subsystem
>>
>> It looks like this as right on the edge of the 5.17-rc6 and 5.18-rc1.
>> I think drm-fixes-2022-02-25 and amd-drm-next-5.18-2022-02-25 ended up with different content.
>>
>> Nonethless this is the correct change and I've applied it to amd-staging-drm-next.
>>
>> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
>
> Reviewed-by: Christian König <christian.koenig@amd.com>
>
> There is just one major question left: Why is disabling ASPM causing problems?
>
My theory is that it's a mismatch of PCIe core and AMDGPU. IE if the
PCIe core thinks it's enabled but amdgpu thinks it is disabled can hit
some corner scenarios.
> I mean we had tons of problems with ASPM before, but only by accidentally enabling it and never accidentally disabling it.
>
> IIRC we even suggested to disable ASPM as possible workaround.
>
> Thanks,
> Christian.
>
>>
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ---
>>> 1 file changed, 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> index d6d0a6e34c6b..95d26f086d54 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> @@ -2405,9 +2405,6 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
>>> return -ENODEV;
>>> }
>>> - if (amdgpu_aspm == -1 && !pcie_aspm_enabled(pdev))
>>> - amdgpu_aspm = 0;
>>> -
>>> if (amdgpu_virtual_display ||
>>> amdgpu_device_asic_has_dc_support(pdev, flags & AMD_ASIC_MASK))
>>> supports_atomic = true;
>>
>
Am Montag, dem 02.02.2026 um 10:11 -0600 schrieb Mario Limonciello:
> On 2/2/26 8:35 AM, Christian König wrote:
> > On 2/2/26 15:25, Mario Limonciello wrote:
> > > On 1/31/26 6:24 PM, Bert Karwatzki wrote:
> > > > This reverts commit 7294863a6f01248d72b61d38478978d638641bee.
> > > >
> > > > This commit was erroneously applied again after commit 0ab5d711ec74
> > > > ("drm/amd: Refactor `amdgpu_aspm` to be evaluated per device")
> > > > removed it, leading to very hard to debug crashes, when used with a system with two
> > > > AMD GPUs of which only one supports ASPM.
> > > >
> > > > Link: https://lore.kernel.org/linux-acpi/20251006120944.7880-1-spasswolf@web.de/
> > > > Link: https://github.com/acpica/acpica/issues/1060
> > > > Fixes: 0ab5d711ec74 ("drm/amd: Refactor `amdgpu_aspm` to be evaluated per device")
> > > >
> > > > Signed-off-by: Bert Karwatzki <spasswolf@web.de>
> > > > ---
> > >
> > > Amazing detective work, thanks so much.
> > >
> > > This added the code initially:
> > > cba07cce39ace drm/amd: Check if ASPM is enabled from PCIe subsystem
> > >
> > > This effectively removed it:
> > > 0ab5d711ec74d drm/amd: Refactor `amdgpu_aspm` to be evaluated per device
> > >
> > > This was the accidental re-apply:
> > > 7294863a6f012 drm/amd: Check if ASPM is enabled from PCIe subsystem
> > >
> > > It looks like this as right on the edge of the 5.17-rc6 and 5.18-rc1.
> > > I think drm-fixes-2022-02-25 and amd-drm-next-5.18-2022-02-25 ended up with different content.
> > >
> > > Nonethless this is the correct change and I've applied it to amd-staging-drm-next.
> > >
> > > Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
> >
> > Reviewed-by: Christian König <christian.koenig@amd.com>
> >
> > There is just one major question left: Why is disabling ASPM causing problems?
> >
>
> My theory is that it's a mismatch of PCIe core and AMDGPU. IE if the
> PCIe core thinks it's enabled but amdgpu thinks it is disabled can hit
> some corner scenarios.
That's also my theory. In my case the discrete GPU is probed first
[ 1.652505] [ T194] amdgpu 0000:03:00.0: enabling device (0000 -> 0002)
[ 1.658662] [ T194] amdgpu 0000:03:00.0: amdgpu: initializing kernel modesetting (DIMGREY_CAVEFISH 0x1002:0x73FF 0x1462:0x1313 0xC3).
[ 1.665045] [ T194] amdgpu 0000:03:00.0: amdgpu: register mmio base: 0xFCA00000
[ 1.671399] [ T194] amdgpu 0000:03:00.0: amdgpu: register mmio size: 1048576
[ 1.681596] [ T194] amdgpu 0000:03:00.0: amdgpu: detected ip block number 0 <common_v1_0_0> (nv_common)
then the built-in GPU is probed and set amdgpu_aspm = 0.
[ 4.883191] [ T194] amdgpu 0000:08:00.0: enabling device (0006 -> 0007)
[ 4.890078] [ T194] amdgpu 0000:08:00.0: amdgpu: initializing kernel modesetting (RENOIR 0x1002:0x1638 0x1462:0x1313 0xC5).
[ 4.895907] [ T194] amdgpu 0000:08:00.0: amdgpu: register mmio base: 0xFC900000
[ 4.901640] [ T194] amdgpu 0000:08:00.0: amdgpu: register mmio size: 524288
[ 4.909833] [ T194] amdgpu 0000:08:00.0: amdgpu: detected ip block number 0 <common_v2_0_0> (soc15_common)
I'm going to monitor calls to amdgpu_device_should_use_aspm() to check if it's called during
the suspend/resumes cycle giving the wrong answer (i.e. false when ASPM is actually enabled)
Bert Karwatzki
© 2016 - 2026 Red Hat, Inc.