[REGRESSION] drm/amd/display: Radeon 840M/860M: bisected suspend crash

ggo@tuxedocomputers.com posted 1 patch 6 months ago
[REGRESSION] drm/amd/display: Radeon 840M/860M: bisected suspend crash
Posted by ggo@tuxedocomputers.com 6 months ago
Hi,

I have discovered that two small form factor desktops with Ryzen AI 7
350 and Ryzen AI 5 340 crash when woken up from suspend. I can see how
the LED on the USB mouse is switched on when I trigger a resume via
keyboard button, but the display remains black. The kernel also no
longer responds to Magic SysRq keys in this state.

The problem affects all kernels after merge b50753547453 (v6.11.0). But
this merge only adds PCI_DEVICE_ID_AMD_1AH_M60H_ROOT with commit
59c34008d (necessary to trigger this bug with Ryzen AI CPU).
I cherry-picked this commit and continued searching. Which finally led
me to commit f6098641d3e - drm/amd/display: fix s2idle entry for DCN3.5+

If I remove the code, which has changed somewhat in the meantime, then
the suspend works without any problems. See the following patch.

Regards,
Georg


diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index d3100f641ac6..76204ae70acc 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -3121,9 +3121,6 @@ static int dm_suspend(struct amdgpu_ip_block
*ip_block)

 	dc_set_power_state(dm->dc, DC_ACPI_CM_POWER_STATE_D3);

-	if (dm->dc->caps.ips_support && adev->in_s0ix)
-		dc_allow_idle_optimizations(dm->dc, true);
-
 	dc_dmub_srv_set_power_state(dm->dc->ctx->dmub_srv,
DC_ACPI_CM_POWER_STATE_D3);

 	return 0;
Re: [REGRESSION] drm/amd/display: Radeon 840M/860M: bisected suspend crash
Posted by Alex Hung 5 months, 4 weeks ago
Hi,

Thanks for reporting. Can you please create a bug at 
https://gitlab.freedesktop.org/drm/amd/-/issues/ for issue tracking and 
log collection.

On 6/12/25 08:08, ggo@tuxedocomputers.com wrote:
> Hi,
> 
> I have discovered that two small form factor desktops with Ryzen AI 7
> 350 and Ryzen AI 5 340 crash when woken up from suspend. I can see how
> the LED on the USB mouse is switched on when I trigger a resume via
> keyboard button, but the display remains black. The kernel also no
> longer responds to Magic SysRq keys in this state.
> 
> The problem affects all kernels after merge b50753547453 (v6.11.0). But
> this merge only adds PCI_DEVICE_ID_AMD_1AH_M60H_ROOT with commit
> 59c34008d (necessary to trigger this bug with Ryzen AI CPU).
> I cherry-picked this commit and continued searching. Which finally led
> me to commit f6098641d3e - drm/amd/display: fix s2idle entry for DCN3.5+
> 
> If I remove the code, which has changed somewhat in the meantime, then
> the suspend works without any problems. See the following patch.
> 
> Regards,
> Georg
> 
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index d3100f641ac6..76204ae70acc 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -3121,9 +3121,6 @@ static int dm_suspend(struct amdgpu_ip_block
> *ip_block)
> 
>   	dc_set_power_state(dm->dc, DC_ACPI_CM_POWER_STATE_D3);
> 
> -	if (dm->dc->caps.ips_support && adev->in_s0ix)
> -		dc_allow_idle_optimizations(dm->dc, true);
> -
>   	dc_dmub_srv_set_power_state(dm->dc->ctx->dmub_srv,
> DC_ACPI_CM_POWER_STATE_D3);
> 
>   	return 0;
>
Re: [REGRESSION] drm/amd/display: Radeon 840M/860M: bisected suspend crash
Posted by Mario Limonciello 5 months, 4 weeks ago
On 6/17/2025 6:42 PM, Alex Hung wrote:
> Hi,
> 
> Thanks for reporting. Can you please create a bug at https:// 
> gitlab.freedesktop.org/drm/amd/-/issues/ for issue tracking and log 
> collection.
> 
> On 6/12/25 08:08, ggo@tuxedocomputers.com wrote:
>> Hi,
>>
>> I have discovered that two small form factor desktops with Ryzen AI 7
>> 350 and Ryzen AI 5 340 crash when woken up from suspend. I can see how
>> the LED on the USB mouse is switched on when I trigger a resume via
>> keyboard button, but the display remains black. The kernel also no
>> longer responds to Magic SysRq keys in this state.
>>
>> The problem affects all kernels after merge b50753547453 (v6.11.0). But
>> this merge only adds PCI_DEVICE_ID_AMD_1AH_M60H_ROOT with commit
>> 59c34008d (necessary to trigger this bug with Ryzen AI CPU).
>> I cherry-picked this commit and continued searching. Which finally led
>> me to commit f6098641d3e - drm/amd/display: fix s2idle entry for DCN3.5+
>>
>> If I remove the code, which has changed somewhat in the meantime, then
>> the suspend works without any problems. See the following patch.
>>
>> Regards,
>> Georg
>>
>>
>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> index d3100f641ac6..76204ae70acc 100644
>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> @@ -3121,9 +3121,6 @@ static int dm_suspend(struct amdgpu_ip_block
>> *ip_block)
>>
>>       dc_set_power_state(dm->dc, DC_ACPI_CM_POWER_STATE_D3);
>>
>> -    if (dm->dc->caps.ips_support && adev->in_s0ix)
>> -        dc_allow_idle_optimizations(dm->dc, true);
>> -
>>       dc_dmub_srv_set_power_state(dm->dc->ctx->dmub_srv,
>> DC_ACPI_CM_POWER_STATE_D3);
>>
>>       return 0;
>>
> 
> 

That patch you did is basically blocking hardware sleep.  I wouldn't 
call it a solution.

If you haven't already; please use 
https://git.kernel.org/pub/scm/linux/kernel/git/superm1/amd-debug-tools.git/about/ 
to triage this issue.  It will flag the most common things that are hard 
to diagnose without knowledge.

If that doesn't flag anything, please reproduce on a mainline kernel 
(6.15.y or 6.16-rcX) and then file a bug as Alex suggested.  Attach the 
report you generated from the tool there.
Re: [REGRESSION] drm/amd/display: Radeon 840M/860M: bisected suspend crash
Posted by Georg Gottleuber 5 months, 3 weeks ago

Am 18.06.25 um 04:16 schrieb Mario Limonciello:
> On 6/17/2025 6:42 PM, Alex Hung wrote:
>> Hi,
>>
>> Thanks for reporting. Can you please create a bug at https:// 
>> gitlab.freedesktop.org/drm/amd/-/issues/ for issue tracking and log 
>> collection.
>>
>> On 6/12/25 08:08, ggo@tuxedocomputers.com wrote:
>>> Hi,
>>>
>>> I have discovered that two small form factor desktops with Ryzen AI 7
>>> 350 and Ryzen AI 5 340 crash when woken up from suspend. I can see how
>>> the LED on the USB mouse is switched on when I trigger a resume via
>>> keyboard button, but the display remains black. The kernel also no
>>> longer responds to Magic SysRq keys in this state.
>>>
>>> The problem affects all kernels after merge b50753547453 (v6.11.0). But
>>> this merge only adds PCI_DEVICE_ID_AMD_1AH_M60H_ROOT with commit
>>> 59c34008d (necessary to trigger this bug with Ryzen AI CPU).
>>> I cherry-picked this commit and continued searching. Which finally led
>>> me to commit f6098641d3e - drm/amd/display: fix s2idle entry for DCN3.5+
>>>
>>> If I remove the code, which has changed somewhat in the meantime, then
>>> the suspend works without any problems. See the following patch.
>>>
>>> Regards,
>>> Georg
>>>
>>>
>>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> index d3100f641ac6..76204ae70acc 100644
>>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> @@ -3121,9 +3121,6 @@ static int dm_suspend(struct amdgpu_ip_block
>>> *ip_block)
>>>
>>>       dc_set_power_state(dm->dc, DC_ACPI_CM_POWER_STATE_D3);
>>>
>>> -    if (dm->dc->caps.ips_support && adev->in_s0ix)
>>> -        dc_allow_idle_optimizations(dm->dc, true);
>>> -
>>>       dc_dmub_srv_set_power_state(dm->dc->ctx->dmub_srv,
>>> DC_ACPI_CM_POWER_STATE_D3);
>>>
>>>       return 0;
>>>
>>
>>
> 
> That patch you did is basically blocking hardware sleep.  I wouldn't 
> call it a solution.
> 
> If you haven't already; please use 
> https://git.kernel.org/pub/scm/linux/kernel/git/superm1/amd-debug-tools.git/about/ 
> to triage this issue.  It will flag the most common things that are hard 
> to diagnose without knowledge.
> 
> If that doesn't flag anything, please reproduce on a mainline kernel 
> (6.15.y or 6.16-rcX) and then file a bug as Alex suggested.  Attach the 
> report you generated from the tool there.
> 

Tested with newest mainline kernel 6.16.0-rc2 and the bug somehow
changed. Now the system resumes always (with GUI), but NVMe is always
disconnected. If I apply the patch resume works (including NVMe).

Created an issue:
https://gitlab.freedesktop.org/drm/amd/-/issues/4344 (with report)

Regards,
Georg