drm/amdgpu: Avoid unnecessary Call Traces in amdgpu_irq_put()

[RFC PATCH] drm/amdgpu: Avoid unnecessary Call Traces in amdgpu_irq_put()

Posted by Tiezhu Yang 3 weeks, 3 days ago

Currently, there are many Call Traces when booting kernel on LoongArch,
here are the trimmed kernel log messages:

  amdgpu 0000:07:00.0: amdgpu: hw_init of IP block <gfx_v6_0> failed -110
  amdgpu 0000:07:00.0: amdgpu: amdgpu_device_ip_init failed
  amdgpu 0000:07:00.0: amdgpu: Fatal error during GPU init
  amdgpu 0000:07:00.0: amdgpu: amdgpu: finishing device.
  ------------[ cut here ]------------
  WARNING: drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:639 at amdgpu_irq_put+0xb0/0x140 [amdgpu], CPU#0: kworker/0:0/9
  ...
  Call Trace:
  [<90000000047a8524>] show_stack+0x64/0x190
  [<90000000047a0614>] dump_stack_lvl+0x6c/0x9c
  [<90000000047cef34>] __warn+0xa4/0x1b0
  [<90000000060a4884>] __report_bug+0xa4/0x1d0
  [<90000000060a4a88>] report_bug+0x38/0xd0
  [<90000000060df330>] do_bp+0x260/0x410
  [<90000000047a6bc0>] handle_bp+0x120/0x1c0
  [<ffff8000028bff40>] amdgpu_irq_put+0xb0/0x140 [amdgpu]
  [<ffff8000027b1a8c>] amdgpu_fence_driver_hw_fini+0x12c/0x180 [amdgpu]
  [<ffff800002f2c04c>] amdgpu_device_fini_hw+0xf0/0x3fc [amdgpu]
  [<ffff80000279e2ac>] amdgpu_driver_load_kms+0x7c/0xa0 [amdgpu]
  [<ffff800002791128>] amdgpu_pci_probe+0x298/0x810 [amdgpu]
  [<90000000054d04a4>] local_pci_probe+0x44/0xc0
  [<90000000047f4ab0>] work_for_cpu_fn+0x20/0x40
  [<90000000047f93e0>] process_one_work+0x170/0x4e0
  [<90000000047fa14c>] worker_thread+0x3ac/0x4e0
  [<9000000004806824>] kthread+0x154/0x170
  [<90000000060df5b4>] ret_from_kernel_thread+0x24/0xd0
  [<90000000047a62a4>] ret_from_kernel_thread_asm+0xc/0x88

  ---[ end trace 0000000000000000 ]---
  amdgpu 0000:07:00.0: probe with driver amdgpu failed with error -110
  amdgpu 0000:07:00.0: amdgpu: amdgpu: ttm finalized

This is because amdgpu_irq_enabled() is false in amdgpu_irq_put(), then
the condition of WARN_ON() is true.

In order to avoid the unnecessary Call Traces, it can remove the check of
amdgpu_irq_enabled() and only check atomic_read(&src->enabled_types[type]
for three reasons:

(1) The aim is to prevent refcount from being less than 0, it was added in
commit 1fa8d710573f ("drm/amdgpu: Fix desktop freezed after gpu-reset").
(2) There are already many useful failed log before the Call Trace, there
is no need to WARN_ON().
(3) The following checks in amdgpu_irq_put() are same with the checks in
amdgpu_irq_enabled(), there is no need to do the redundant operations.

	if (!adev->irq.installed)
		return -ENOENT;

	if (type >= src->num_types)
		return -EINVAL;

	if (!src->enabled_types || !src->funcs->set)
		return -EINVAL;

Cc: stable@vger.kernel.org
Fixes: 1fa8d710573f ("drm/amdgpu: Fix desktop freezed after gpu-reset")
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index 8112ffc85995..0d00a8d841b5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -636,7 +636,7 @@ int amdgpu_irq_put(struct amdgpu_device *adev, struct amdgpu_irq_src *src,
 	if (!src->enabled_types || !src->funcs->set)
 		return -EINVAL;
 
-	if (WARN_ON(!amdgpu_irq_enabled(adev, src, type)))
+	if (!atomic_read(&src->enabled_types[type]))
 		return -EINVAL;
 
 	if (atomic_dec_and_test(&src->enabled_types[type]))
-- 
2.42.0

Re: [RFC PATCH] drm/amdgpu: Avoid unnecessary Call Traces in amdgpu_irq_put()

Posted by Christian König 3 weeks, 3 days ago

On 1/15/26 02:28, Tiezhu Yang wrote:
> Currently, there are many Call Traces when booting kernel on LoongArch,
> here are the trimmed kernel log messages:
> 
>   amdgpu 0000:07:00.0: amdgpu: hw_init of IP block <gfx_v6_0> failed -110
>   amdgpu 0000:07:00.0: amdgpu: amdgpu_device_ip_init failed
>   amdgpu 0000:07:00.0: amdgpu: Fatal error during GPU init
>   amdgpu 0000:07:00.0: amdgpu: amdgpu: finishing device.
>   ------------[ cut here ]------------
>   WARNING: drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:639 at amdgpu_irq_put+0xb0/0x140 [amdgpu], CPU#0: kworker/0:0/9
>   ...
>   Call Trace:
>   [<90000000047a8524>] show_stack+0x64/0x190
>   [<90000000047a0614>] dump_stack_lvl+0x6c/0x9c
>   [<90000000047cef34>] __warn+0xa4/0x1b0
>   [<90000000060a4884>] __report_bug+0xa4/0x1d0
>   [<90000000060a4a88>] report_bug+0x38/0xd0
>   [<90000000060df330>] do_bp+0x260/0x410
>   [<90000000047a6bc0>] handle_bp+0x120/0x1c0
>   [<ffff8000028bff40>] amdgpu_irq_put+0xb0/0x140 [amdgpu]
>   [<ffff8000027b1a8c>] amdgpu_fence_driver_hw_fini+0x12c/0x180 [amdgpu]
>   [<ffff800002f2c04c>] amdgpu_device_fini_hw+0xf0/0x3fc [amdgpu]
>   [<ffff80000279e2ac>] amdgpu_driver_load_kms+0x7c/0xa0 [amdgpu]
>   [<ffff800002791128>] amdgpu_pci_probe+0x298/0x810 [amdgpu]
>   [<90000000054d04a4>] local_pci_probe+0x44/0xc0
>   [<90000000047f4ab0>] work_for_cpu_fn+0x20/0x40
>   [<90000000047f93e0>] process_one_work+0x170/0x4e0
>   [<90000000047fa14c>] worker_thread+0x3ac/0x4e0
>   [<9000000004806824>] kthread+0x154/0x170
>   [<90000000060df5b4>] ret_from_kernel_thread+0x24/0xd0
>   [<90000000047a62a4>] ret_from_kernel_thread_asm+0xc/0x88
> 
>   ---[ end trace 0000000000000000 ]---
>   amdgpu 0000:07:00.0: probe with driver amdgpu failed with error -110
>   amdgpu 0000:07:00.0: amdgpu: amdgpu: ttm finalized
> 
> This is because amdgpu_irq_enabled() is false in amdgpu_irq_put(), then
> the condition of WARN_ON() is true.
> 
> In order to avoid the unnecessary Call Traces, it can remove the check of
> amdgpu_irq_enabled() and only check atomic_read(&src->enabled_types[type]
> for three reasons:
> 
> (1) The aim is to prevent refcount from being less than 0, it was added in
> commit 1fa8d710573f ("drm/amdgpu: Fix desktop freezed after gpu-reset").
> (2) There are already many useful failed log before the Call Trace, there
> is no need to WARN_ON().

Well completely disagree. The call trace here is absolutely intentional.

That you get a lot of other backtraces because the driver doesn't initialize at all isn't a good rational to remove this one here.

Regards,
Christian.

> (3) The following checks in amdgpu_irq_put() are same with the checks in
> amdgpu_irq_enabled(), there is no need to do the redundant operations.
> 
> 	if (!adev->irq.installed)
> 		return -ENOENT;
> 
> 	if (type >= src->num_types)
> 		return -EINVAL;
> 
> 	if (!src->enabled_types || !src->funcs->set)
> 		return -EINVAL;
> 
> Cc: stable@vger.kernel.org
> Fixes: 1fa8d710573f ("drm/amdgpu: Fix desktop freezed after gpu-reset")
> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> index 8112ffc85995..0d00a8d841b5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> @@ -636,7 +636,7 @@ int amdgpu_irq_put(struct amdgpu_device *adev, struct amdgpu_irq_src *src,
>  	if (!src->enabled_types || !src->funcs->set)
>  		return -EINVAL;
>  
> -	if (WARN_ON(!amdgpu_irq_enabled(adev, src, type)))
> +	if (!atomic_read(&src->enabled_types[type]))
>  		return -EINVAL;
>  
>  	if (atomic_dec_and_test(&src->enabled_types[type]))

Re: [RFC PATCH] drm/amdgpu: Avoid unnecessary Call Traces in amdgpu_irq_put()

Posted by Tiezhu Yang 3 weeks, 2 days ago

On 2026/1/15 下午9:47, Christian König wrote:
> On 1/15/26 02:28, Tiezhu Yang wrote:
>> Currently, there are many Call Traces when booting kernel on LoongArch,
>> here are the trimmed kernel log messages:
>>
>>    amdgpu 0000:07:00.0: amdgpu: hw_init of IP block <gfx_v6_0> failed -110
>>    amdgpu 0000:07:00.0: amdgpu: amdgpu_device_ip_init failed
>>    amdgpu 0000:07:00.0: amdgpu: Fatal error during GPU init
>>    amdgpu 0000:07:00.0: amdgpu: amdgpu: finishing device.
>>    ------------[ cut here ]------------
>>    WARNING: drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:639 at amdgpu_irq_put+0xb0/0x140 [amdgpu], CPU#0: kworker/0:0/9
>>    ...
>>    Call Trace:
>>    [<90000000047a8524>] show_stack+0x64/0x190
>>    [<90000000047a0614>] dump_stack_lvl+0x6c/0x9c
>>    [<90000000047cef34>] __warn+0xa4/0x1b0
>>    [<90000000060a4884>] __report_bug+0xa4/0x1d0
>>    [<90000000060a4a88>] report_bug+0x38/0xd0
>>    [<90000000060df330>] do_bp+0x260/0x410
>>    [<90000000047a6bc0>] handle_bp+0x120/0x1c0
>>    [<ffff8000028bff40>] amdgpu_irq_put+0xb0/0x140 [amdgpu]
>>    [<ffff8000027b1a8c>] amdgpu_fence_driver_hw_fini+0x12c/0x180 [amdgpu]
>>    [<ffff800002f2c04c>] amdgpu_device_fini_hw+0xf0/0x3fc [amdgpu]
>>    [<ffff80000279e2ac>] amdgpu_driver_load_kms+0x7c/0xa0 [amdgpu]
>>    [<ffff800002791128>] amdgpu_pci_probe+0x298/0x810 [amdgpu]
>>    [<90000000054d04a4>] local_pci_probe+0x44/0xc0
>>    [<90000000047f4ab0>] work_for_cpu_fn+0x20/0x40
>>    [<90000000047f93e0>] process_one_work+0x170/0x4e0
>>    [<90000000047fa14c>] worker_thread+0x3ac/0x4e0
>>    [<9000000004806824>] kthread+0x154/0x170
>>    [<90000000060df5b4>] ret_from_kernel_thread+0x24/0xd0
>>    [<90000000047a62a4>] ret_from_kernel_thread_asm+0xc/0x88
>>
>>    ---[ end trace 0000000000000000 ]---
>>    amdgpu 0000:07:00.0: probe with driver amdgpu failed with error -110
>>    amdgpu 0000:07:00.0: amdgpu: amdgpu: ttm finalized
>>
>> This is because amdgpu_irq_enabled() is false in amdgpu_irq_put(), then
>> the condition of WARN_ON() is true.
>>
>> In order to avoid the unnecessary Call Traces, it can remove the check of
>> amdgpu_irq_enabled() and only check atomic_read(&src->enabled_types[type]
>> for three reasons:
>>
>> (1) The aim is to prevent refcount from being less than 0, it was added in
>> commit 1fa8d710573f ("drm/amdgpu: Fix desktop freezed after gpu-reset").
>> (2) There are already many useful failed log before the Call Trace, there
>> is no need to WARN_ON().
> 
> Well completely disagree. The call trace here is absolutely intentional.

If so, since the call trace is same, is it enough to use WARN_ON_ONCE()
here?

> That you get a lot of other backtraces because the driver doesn't initialize at all isn't a good rational to remove this one here.
> 
> Regards,
> Christian.
> 
>> (3) The following checks in amdgpu_irq_put() are same with the checks in
>> amdgpu_irq_enabled(), there is no need to do the redundant operations.
>>
>> 	if (!adev->irq.installed)
>> 		return -ENOENT;
>>
>> 	if (type >= src->num_types)
>> 		return -EINVAL;
>>
>> 	if (!src->enabled_types || !src->funcs->set)
>> 		return -EINVAL;

Is this reasonable? Only check atomic_read(&src->enabled_types[type]?

That is to say, does it make sense to do the following change?

----->8-----
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index 8112ffc85995..d10d6fcc525e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -636,7 +636,7 @@ int amdgpu_irq_put(struct amdgpu_device *adev, 
struct amdgpu_irq_src *src,
         if (!src->enabled_types || !src->funcs->set)
                 return -EINVAL;

-       if (WARN_ON(!amdgpu_irq_enabled(adev, src, type)))
+       if (WARN_ON_ONCE(!atomic_read(&src->enabled_types[type])))
                 return -EINVAL;

         if (atomic_dec_and_test(&src->enabled_types[type]))

Thanks,
Tiezhu

Re: [RFC PATCH] drm/amdgpu: Avoid unnecessary Call Traces in amdgpu_irq_put()

Posted by Christian König 3 weeks, 2 days ago

On 1/16/26 02:20, Tiezhu Yang wrote:
> On 2026/1/15 下午9:47, Christian König wrote:
>> On 1/15/26 02:28, Tiezhu Yang wrote:
>>> Currently, there are many Call Traces when booting kernel on LoongArch,
>>> here are the trimmed kernel log messages:
>>>
>>>    amdgpu 0000:07:00.0: amdgpu: hw_init of IP block <gfx_v6_0> failed -110
>>>    amdgpu 0000:07:00.0: amdgpu: amdgpu_device_ip_init failed
>>>    amdgpu 0000:07:00.0: amdgpu: Fatal error during GPU init
>>>    amdgpu 0000:07:00.0: amdgpu: amdgpu: finishing device.
>>>    ------------[ cut here ]------------
>>>    WARNING: drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:639 at amdgpu_irq_put+0xb0/0x140 [amdgpu], CPU#0: kworker/0:0/9
>>>    ...
>>>    Call Trace:
>>>    [<90000000047a8524>] show_stack+0x64/0x190
>>>    [<90000000047a0614>] dump_stack_lvl+0x6c/0x9c
>>>    [<90000000047cef34>] __warn+0xa4/0x1b0
>>>    [<90000000060a4884>] __report_bug+0xa4/0x1d0
>>>    [<90000000060a4a88>] report_bug+0x38/0xd0
>>>    [<90000000060df330>] do_bp+0x260/0x410
>>>    [<90000000047a6bc0>] handle_bp+0x120/0x1c0
>>>    [<ffff8000028bff40>] amdgpu_irq_put+0xb0/0x140 [amdgpu]
>>>    [<ffff8000027b1a8c>] amdgpu_fence_driver_hw_fini+0x12c/0x180 [amdgpu]
>>>    [<ffff800002f2c04c>] amdgpu_device_fini_hw+0xf0/0x3fc [amdgpu]
>>>    [<ffff80000279e2ac>] amdgpu_driver_load_kms+0x7c/0xa0 [amdgpu]
>>>    [<ffff800002791128>] amdgpu_pci_probe+0x298/0x810 [amdgpu]
>>>    [<90000000054d04a4>] local_pci_probe+0x44/0xc0
>>>    [<90000000047f4ab0>] work_for_cpu_fn+0x20/0x40
>>>    [<90000000047f93e0>] process_one_work+0x170/0x4e0
>>>    [<90000000047fa14c>] worker_thread+0x3ac/0x4e0
>>>    [<9000000004806824>] kthread+0x154/0x170
>>>    [<90000000060df5b4>] ret_from_kernel_thread+0x24/0xd0
>>>    [<90000000047a62a4>] ret_from_kernel_thread_asm+0xc/0x88
>>>
>>>    ---[ end trace 0000000000000000 ]---
>>>    amdgpu 0000:07:00.0: probe with driver amdgpu failed with error -110
>>>    amdgpu 0000:07:00.0: amdgpu: amdgpu: ttm finalized
>>>
>>> This is because amdgpu_irq_enabled() is false in amdgpu_irq_put(), then
>>> the condition of WARN_ON() is true.
>>>
>>> In order to avoid the unnecessary Call Traces, it can remove the check of
>>> amdgpu_irq_enabled() and only check atomic_read(&src->enabled_types[type]
>>> for three reasons:
>>>
>>> (1) The aim is to prevent refcount from being less than 0, it was added in
>>> commit 1fa8d710573f ("drm/amdgpu: Fix desktop freezed after gpu-reset").
>>> (2) There are already many useful failed log before the Call Trace, there
>>> is no need to WARN_ON().
>>
>> Well completely disagree. The call trace here is absolutely intentional.
> 
> If so, since the call trace is same, is it enough to use WARN_ON_ONCE()
> here?

I also don't see a justification for that.

>> That you get a lot of other backtraces because the driver doesn't initialize at all isn't a good rational to remove this one here.
>>
>> Regards,
>> Christian.
>>
>>> (3) The following checks in amdgpu_irq_put() are same with the checks in
>>> amdgpu_irq_enabled(), there is no need to do the redundant operations.
>>>
>>>     if (!adev->irq.installed)
>>>         return -ENOENT;
>>>
>>>     if (type >= src->num_types)
>>>         return -EINVAL;
>>>
>>>     if (!src->enabled_types || !src->funcs->set)
>>>         return -EINVAL;
> 
> Is this reasonable? Only check atomic_read(&src->enabled_types[type]?

No, absolutely not. That are two completely different things.

> That is to say, does it make sense to do the following change?

The warning can basically only be triggered by two conditions:
1. A fatal problem while loading the driver and the error handling is not 100% clean.
2. A driver coding error.

And we really need to catch all of those, so there is no real rational to limit the warning.

I mean when you run into any of those they should potentially be fixed at some point.

Regards,
Christian.

> 
> ----->8-----
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> index 8112ffc85995..d10d6fcc525e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> @@ -636,7 +636,7 @@ int amdgpu_irq_put(struct amdgpu_device *adev, struct amdgpu_irq_src *src,
>         if (!src->enabled_types || !src->funcs->set)
>                 return -EINVAL;
> 
> -       if (WARN_ON(!amdgpu_irq_enabled(adev, src, type)))
> +       if (WARN_ON_ONCE(!atomic_read(&src->enabled_types[type])))
>                 return -EINVAL;
> 
>         if (atomic_dec_and_test(&src->enabled_types[type]))
> 
> Thanks,
> Tiezhu
>

Re: [RFC PATCH] drm/amdgpu: Avoid unnecessary Call Traces in amdgpu_irq_put()

Posted by Tiezhu Yang 2 weeks, 6 days ago

On 2026/1/16 下午6:03, Christian König wrote:
> On 1/16/26 02:20, Tiezhu Yang wrote:
>> On 2026/1/15 下午9:47, Christian König wrote:
>>> On 1/15/26 02:28, Tiezhu Yang wrote:
>>>> Currently, there are many Call Traces when booting kernel on LoongArch,
>>>> here are the trimmed kernel log messages:
>>>>
>>>>     amdgpu 0000:07:00.0: amdgpu: hw_init of IP block <gfx_v6_0> failed -110
>>>>     amdgpu 0000:07:00.0: amdgpu: amdgpu_device_ip_init failed
>>>>     amdgpu 0000:07:00.0: amdgpu: Fatal error during GPU init
>>>>     amdgpu 0000:07:00.0: amdgpu: amdgpu: finishing device.
>>>>     ------------[ cut here ]------------
>>>>     WARNING: drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:639 at amdgpu_irq_put+0xb0/0x140 [amdgpu], CPU#0: kworker/0:0/9
>>>>     ...
>>>>     Call Trace:

...

> The warning can basically only be triggered by two conditions:
> 1. A fatal problem while loading the driver and the error handling is not 100% clean.
> 2. A driver coding error.
> 
> And we really need to catch all of those, so there is no real rational to limit the warning.
> 
> I mean when you run into any of those they should potentially be fixed at some point.

I did the following change and it can fix the problem, given that I am
not familiar with amdgpu driver, could you please check it? If it is OK,
I will send a formal patch later.

----->8-----
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index 8112ffc85995..ac19565e7c45 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -138,6 +138,9 @@ void amdgpu_irq_disable_all(struct amdgpu_device *adev)
                         if (!src || !src->funcs->set || !src->num_types)
                                 continue;

+                       kfree(src->enabled_types);
+                       src->enabled_types = NULL;
+
                         for (k = 0; k < src->num_types; ++k) {
                                 r = src->funcs->set(adev, src, k,
 
AMDGPU_IRQ_STATE_DISABLE);

Thanks,
Tiezhu

Re: [RFC PATCH] drm/amdgpu: Avoid unnecessary Call Traces in amdgpu_irq_put()

Posted by Christian König 2 weeks, 6 days ago

On 1/19/26 09:53, Tiezhu Yang wrote:
> On 2026/1/16 下午6:03, Christian König wrote:
>> On 1/16/26 02:20, Tiezhu Yang wrote:
>>> On 2026/1/15 下午9:47, Christian König wrote:
>>>> On 1/15/26 02:28, Tiezhu Yang wrote:
>>>>> Currently, there are many Call Traces when booting kernel on LoongArch,
>>>>> here are the trimmed kernel log messages:
>>>>>
>>>>>     amdgpu 0000:07:00.0: amdgpu: hw_init of IP block <gfx_v6_0> failed -110
>>>>>     amdgpu 0000:07:00.0: amdgpu: amdgpu_device_ip_init failed
>>>>>     amdgpu 0000:07:00.0: amdgpu: Fatal error during GPU init
>>>>>     amdgpu 0000:07:00.0: amdgpu: amdgpu: finishing device.
>>>>>     ------------[ cut here ]------------
>>>>>     WARNING: drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:639 at amdgpu_irq_put+0xb0/0x140 [amdgpu], CPU#0: kworker/0:0/9
>>>>>     ...
>>>>>     Call Trace:
> 
> ...
> 
>> The warning can basically only be triggered by two conditions:
>> 1. A fatal problem while loading the driver and the error handling is not 100% clean.
>> 2. A driver coding error.
>>
>> And we really need to catch all of those, so there is no real rational to limit the warning.
>>
>> I mean when you run into any of those they should potentially be fixed at some point.
> 
> I did the following change and it can fix the problem, given that I am
> not familiar with amdgpu driver, could you please check it? If it is OK,
> I will send a formal patch later.
> 
> ----->8-----
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> index 8112ffc85995..ac19565e7c45 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> @@ -138,6 +138,9 @@ void amdgpu_irq_disable_all(struct amdgpu_device *adev)
>                         if (!src || !src->funcs->set || !src->num_types)
>                                 continue;
> 
> +                       kfree(src->enabled_types);
> +                       src->enabled_types = NULL;
> +

Mhm, I need to double check that but that looks like not such a bad idea of hand.

Christian.

>                         for (k = 0; k < src->num_types; ++k) {
>                                 r = src->funcs->set(adev, src, k,
> 
> AMDGPU_IRQ_STATE_DISABLE);
> 
> Thanks,
> Tiezhu
>

Re: [RFC PATCH] drm/amdgpu: Avoid unnecessary Call Traces in amdgpu_irq_put()

Posted by Tiezhu Yang 2 weeks, 3 days ago

On 2026/1/19 下午6:45, Christian König wrote:
> On 1/19/26 09:53, Tiezhu Yang wrote:
>> On 2026/1/16 下午6:03, Christian König wrote:
>>> On 1/16/26 02:20, Tiezhu Yang wrote:
>>>> On 2026/1/15 下午9:47, Christian König wrote:
>>>>> On 1/15/26 02:28, Tiezhu Yang wrote:
>>>>>> Currently, there are many Call Traces when booting kernel on LoongArch,
>>>>>> here are the trimmed kernel log messages:
>>>>>>
>>>>>>      amdgpu 0000:07:00.0: amdgpu: hw_init of IP block <gfx_v6_0> failed -110
>>>>>>      amdgpu 0000:07:00.0: amdgpu: amdgpu_device_ip_init failed
>>>>>>      amdgpu 0000:07:00.0: amdgpu: Fatal error during GPU init
>>>>>>      amdgpu 0000:07:00.0: amdgpu: amdgpu: finishing device.
>>>>>>      ------------[ cut here ]------------
>>>>>>      WARNING: drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:639 at amdgpu_irq_put+0xb0/0x140 [amdgpu], CPU#0: kworker/0:0/9
>>>>>>      ...
>>>>>>      Call Trace:
>>
>> ...
>>
>>> The warning can basically only be triggered by two conditions:
>>> 1. A fatal problem while loading the driver and the error handling is not 100% clean.
>>> 2. A driver coding error.
>>>
>>> And we really need to catch all of those, so there is no real rational to limit the warning.
>>>
>>> I mean when you run into any of those they should potentially be fixed at some point.
>>
>> I did the following change and it can fix the problem, given that I am
>> not familiar with amdgpu driver, could you please check it? If it is OK,
>> I will send a formal patch later.
>>
>> ----->8-----
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> index 8112ffc85995..ac19565e7c45 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> @@ -138,6 +138,9 @@ void amdgpu_irq_disable_all(struct amdgpu_device *adev)
>>                          if (!src || !src->funcs->set || !src->num_types)
>>                                  continue;
>>
>> +                       kfree(src->enabled_types);
>> +                       src->enabled_types = NULL;
>> +
> 
> Mhm, I need to double check that but that looks like not such a bad idea of hand.

Please let me know whether the above change is OK, if yes I will send a
patch ASAP before the coming merge window.

Thanks,
Tiezhu

Re: [RFC PATCH] drm/amdgpu: Avoid unnecessary Call Traces in amdgpu_irq_put()

Posted by Christian König 2 weeks, 3 days ago

On 1/22/26 10:37, Tiezhu Yang wrote:
> On 2026/1/19 下午6:45, Christian König wrote:
>> On 1/19/26 09:53, Tiezhu Yang wrote:
>>> On 2026/1/16 下午6:03, Christian König wrote:
>>>> On 1/16/26 02:20, Tiezhu Yang wrote:
>>>>> On 2026/1/15 下午9:47, Christian König wrote:
>>>>>> On 1/15/26 02:28, Tiezhu Yang wrote:
>>>>>>> Currently, there are many Call Traces when booting kernel on LoongArch,
>>>>>>> here are the trimmed kernel log messages:
>>>>>>>
>>>>>>>      amdgpu 0000:07:00.0: amdgpu: hw_init of IP block <gfx_v6_0> failed -110
>>>>>>>      amdgpu 0000:07:00.0: amdgpu: amdgpu_device_ip_init failed
>>>>>>>      amdgpu 0000:07:00.0: amdgpu: Fatal error during GPU init
>>>>>>>      amdgpu 0000:07:00.0: amdgpu: amdgpu: finishing device.
>>>>>>>      ------------[ cut here ]------------
>>>>>>>      WARNING: drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:639 at amdgpu_irq_put+0xb0/0x140 [amdgpu], CPU#0: kworker/0:0/9
>>>>>>>      ...
>>>>>>>      Call Trace:
>>>
>>> ...
>>>
>>>> The warning can basically only be triggered by two conditions:
>>>> 1. A fatal problem while loading the driver and the error handling is not 100% clean.
>>>> 2. A driver coding error.
>>>>
>>>> And we really need to catch all of those, so there is no real rational to limit the warning.
>>>>
>>>> I mean when you run into any of those they should potentially be fixed at some point.
>>>
>>> I did the following change and it can fix the problem, given that I am
>>> not familiar with amdgpu driver, could you please check it? If it is OK,
>>> I will send a formal patch later.
>>>
>>> ----->8-----
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>> index 8112ffc85995..ac19565e7c45 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>> @@ -138,6 +138,9 @@ void amdgpu_irq_disable_all(struct amdgpu_device *adev)
>>>                          if (!src || !src->funcs->set || !src->num_types)
>>>                                  continue;
>>>
>>> +                       kfree(src->enabled_types);
>>> +                       src->enabled_types = NULL;
>>> +
>>
>> Mhm, I need to double check that but that looks like not such a bad idea of hand.
> 
> Please let me know whether the above change is OK, if yes I will send a
> patch ASAP before the coming merge window.

Well, this has absolutely lowest priority. So don't expect anything in the near term.

Christian.

> 
> Thanks,
> Tiezhu
>

Re: [RFC PATCH] drm/amdgpu: Avoid unnecessary Call Traces in amdgpu_irq_put()

Posted by Alex Deucher 3 weeks, 2 days ago

On Fri, Jan 16, 2026 at 5:42 AM Christian König
<christian.koenig@amd.com> wrote:
>
> On 1/16/26 02:20, Tiezhu Yang wrote:
> > On 2026/1/15 下午9:47, Christian König wrote:
> >> On 1/15/26 02:28, Tiezhu Yang wrote:
> >>> Currently, there are many Call Traces when booting kernel on LoongArch,
> >>> here are the trimmed kernel log messages:
> >>>
> >>>    amdgpu 0000:07:00.0: amdgpu: hw_init of IP block <gfx_v6_0> failed -110
> >>>    amdgpu 0000:07:00.0: amdgpu: amdgpu_device_ip_init failed
> >>>    amdgpu 0000:07:00.0: amdgpu: Fatal error during GPU init
> >>>    amdgpu 0000:07:00.0: amdgpu: amdgpu: finishing device.
> >>>    ------------[ cut here ]------------
> >>>    WARNING: drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:639 at amdgpu_irq_put+0xb0/0x140 [amdgpu], CPU#0: kworker/0:0/9
> >>>    ...
> >>>    Call Trace:
> >>>    [<90000000047a8524>] show_stack+0x64/0x190
> >>>    [<90000000047a0614>] dump_stack_lvl+0x6c/0x9c
> >>>    [<90000000047cef34>] __warn+0xa4/0x1b0
> >>>    [<90000000060a4884>] __report_bug+0xa4/0x1d0
> >>>    [<90000000060a4a88>] report_bug+0x38/0xd0
> >>>    [<90000000060df330>] do_bp+0x260/0x410
> >>>    [<90000000047a6bc0>] handle_bp+0x120/0x1c0
> >>>    [<ffff8000028bff40>] amdgpu_irq_put+0xb0/0x140 [amdgpu]
> >>>    [<ffff8000027b1a8c>] amdgpu_fence_driver_hw_fini+0x12c/0x180 [amdgpu]
> >>>    [<ffff800002f2c04c>] amdgpu_device_fini_hw+0xf0/0x3fc [amdgpu]
> >>>    [<ffff80000279e2ac>] amdgpu_driver_load_kms+0x7c/0xa0 [amdgpu]
> >>>    [<ffff800002791128>] amdgpu_pci_probe+0x298/0x810 [amdgpu]
> >>>    [<90000000054d04a4>] local_pci_probe+0x44/0xc0
> >>>    [<90000000047f4ab0>] work_for_cpu_fn+0x20/0x40
> >>>    [<90000000047f93e0>] process_one_work+0x170/0x4e0
> >>>    [<90000000047fa14c>] worker_thread+0x3ac/0x4e0
> >>>    [<9000000004806824>] kthread+0x154/0x170
> >>>    [<90000000060df5b4>] ret_from_kernel_thread+0x24/0xd0
> >>>    [<90000000047a62a4>] ret_from_kernel_thread_asm+0xc/0x88
> >>>
> >>>    ---[ end trace 0000000000000000 ]---
> >>>    amdgpu 0000:07:00.0: probe with driver amdgpu failed with error -110
> >>>    amdgpu 0000:07:00.0: amdgpu: amdgpu: ttm finalized
> >>>
> >>> This is because amdgpu_irq_enabled() is false in amdgpu_irq_put(), then
> >>> the condition of WARN_ON() is true.
> >>>
> >>> In order to avoid the unnecessary Call Traces, it can remove the check of
> >>> amdgpu_irq_enabled() and only check atomic_read(&src->enabled_types[type]
> >>> for three reasons:
> >>>
> >>> (1) The aim is to prevent refcount from being less than 0, it was added in
> >>> commit 1fa8d710573f ("drm/amdgpu: Fix desktop freezed after gpu-reset").
> >>> (2) There are already many useful failed log before the Call Trace, there
> >>> is no need to WARN_ON().
> >>
> >> Well completely disagree. The call trace here is absolutely intentional.
> >
> > If so, since the call trace is same, is it enough to use WARN_ON_ONCE()
> > here?
>
> I also don't see a justification for that.
>
> >> That you get a lot of other backtraces because the driver doesn't initialize at all isn't a good rational to remove this one here.
> >>
> >> Regards,
> >> Christian.
> >>
> >>> (3) The following checks in amdgpu_irq_put() are same with the checks in
> >>> amdgpu_irq_enabled(), there is no need to do the redundant operations.
> >>>
> >>>     if (!adev->irq.installed)
> >>>         return -ENOENT;
> >>>
> >>>     if (type >= src->num_types)
> >>>         return -EINVAL;
> >>>
> >>>     if (!src->enabled_types || !src->funcs->set)
> >>>         return -EINVAL;
> >
> > Is this reasonable? Only check atomic_read(&src->enabled_types[type]?
>
> No, absolutely not. That are two completely different things.
>
> > That is to say, does it make sense to do the following change?
>
> The warning can basically only be triggered by two conditions:
> 1. A fatal problem while loading the driver and the error handling is not 100% clean.
> 2. A driver coding error.
>
> And we really need to catch all of those, so there is no real rational to limit the warning.
>
> I mean when you run into any of those they should potentially be fixed at some point.

The problem is there are a lot of error paths which don't get this
right that no one has fixed for years.  I'm not sure if there is much
value anymore; most people just ignore them.

Alex

>
> Regards,
> Christian.
>
> >
> > ----->8-----
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> > index 8112ffc85995..d10d6fcc525e 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> > @@ -636,7 +636,7 @@ int amdgpu_irq_put(struct amdgpu_device *adev, struct amdgpu_irq_src *src,
> >         if (!src->enabled_types || !src->funcs->set)
> >                 return -EINVAL;
> >
> > -       if (WARN_ON(!amdgpu_irq_enabled(adev, src, type)))
> > +       if (WARN_ON_ONCE(!atomic_read(&src->enabled_types[type])))
> >                 return -EINVAL;
> >
> >         if (atomic_dec_and_test(&src->enabled_types[type]))
> >
> > Thanks,
> > Tiezhu
> >
>