[PATCH 2/6] amd/amdkfd: Ignore return code of dma_fence_signal()

Philipp Stanner posted 6 patches 5 days, 9 hours ago
There is a newer version of this series
[PATCH 2/6] amd/amdkfd: Ignore return code of dma_fence_signal()
Posted by Philipp Stanner 5 days, 9 hours ago
The return code of dma_fence_signal() is not really useful as there is
nothing reasonable to do if a fence was already signaled. That return
code shall be removed from the kernel.

Ignore dma_fence_signal()'s return code.

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index ddfe30c13e9d..950fafa4b3c3 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1986,7 +1986,6 @@ kfd_process_gpuid_from_node(struct kfd_process *p, struct kfd_node *node,
 static int signal_eviction_fence(struct kfd_process *p)
 {
 	struct dma_fence *ef;
-	int ret;
 
 	rcu_read_lock();
 	ef = dma_fence_get_rcu_safe(&p->ef);
@@ -1994,10 +1993,10 @@ static int signal_eviction_fence(struct kfd_process *p)
 	if (!ef)
 		return -EINVAL;
 
-	ret = dma_fence_signal(ef);
+	dma_fence_signal(ef);
 	dma_fence_put(ef);
 
-	return ret;
+	return 0;
 }
 
 static void evict_process_worker(struct work_struct *work)
-- 
2.49.0

Re: [PATCH 2/6] amd/amdkfd: Ignore return code of dma_fence_signal()
Posted by Kuehling, Felix 5 days, 1 hour ago
On 2025-11-26 08:19, Philipp Stanner wrote:
> The return code of dma_fence_signal() is not really useful as there is
> nothing reasonable to do if a fence was already signaled. That return
> code shall be removed from the kernel.
>
> Ignore dma_fence_signal()'s return code.

I think this is not correct. Looking at the comment in 
evict_process_worker, we use the return value to decide a race 
conditions where multiple threads are trying to signal the eviction 
fence. Only one of them should schedule the restore work. And the other 
ones need to increment the reference count to keep evictions balanced.

Regards,
   Felix


>
> Suggested-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Philipp Stanner <phasta@kernel.org>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 5 ++---
>   1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index ddfe30c13e9d..950fafa4b3c3 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -1986,7 +1986,6 @@ kfd_process_gpuid_from_node(struct kfd_process *p, struct kfd_node *node,
>   static int signal_eviction_fence(struct kfd_process *p)
>   {
>   	struct dma_fence *ef;
> -	int ret;
>   
>   	rcu_read_lock();
>   	ef = dma_fence_get_rcu_safe(&p->ef);
> @@ -1994,10 +1993,10 @@ static int signal_eviction_fence(struct kfd_process *p)
>   	if (!ef)
>   		return -EINVAL;
>   
> -	ret = dma_fence_signal(ef);
> +	dma_fence_signal(ef);
>   	dma_fence_put(ef);
>   
> -	return ret;
> +	return 0;
>   }
>   
>   static void evict_process_worker(struct work_struct *work)
Re: [PATCH 2/6] amd/amdkfd: Ignore return code of dma_fence_signal()
Posted by Philipp Stanner 4 days, 12 hours ago
On Wed, 2025-11-26 at 16:24 -0500, Kuehling, Felix wrote:
> 
> On 2025-11-26 08:19, Philipp Stanner wrote:
> > The return code of dma_fence_signal() is not really useful as there is
> > nothing reasonable to do if a fence was already signaled. That return
> > code shall be removed from the kernel.
> > 
> > Ignore dma_fence_signal()'s return code.
> 
> I think this is not correct. Looking at the comment in 
> evict_process_worker, we use the return value to decide a race 
> conditions where multiple threads are trying to signal the eviction 
> fence. Only one of them should schedule the restore work. And the other 
> ones need to increment the reference count to keep evictions balanced.

Thank you for pointing that out. Seems then amdkfd is the only user who
actually relies on the feature. See below

> 
> Regards,
>    Felix
> 
> 
> > 
> > Suggested-by: Christian König <christian.koenig@amd.com>
> > Signed-off-by: Philipp Stanner <phasta@kernel.org>
> > ---
> >   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 5 ++---
> >   1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> > index ddfe30c13e9d..950fafa4b3c3 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> > @@ -1986,7 +1986,6 @@ kfd_process_gpuid_from_node(struct kfd_process *p, struct kfd_node *node,
> >   static int signal_eviction_fence(struct kfd_process *p)
> >   {
> >   	struct dma_fence *ef;
> > -	int ret;
> >   
> >   	rcu_read_lock();
> >   	ef = dma_fence_get_rcu_safe(&p->ef);
> > @@ -1994,10 +1993,10 @@ static int signal_eviction_fence(struct kfd_process *p)
> >   	if (!ef)
> >   		return -EINVAL;
> >   
> > -	ret = dma_fence_signal(ef);
> > +	dma_fence_signal(ef);

The issue now is that dma_fence_signal()'s return code is actually non-
racy, because check + bit-set are protected by lock.

Christian's new spinlock series would add a lock function for fences:
https://lore.kernel.org/dri-devel/20251113145332.16805-5-christian.koenig@amd.com/


So I suppose this should work:

dma_fence_lock_irqsave(ef, flags);
if (dma_fence_test_signaled_flag(ef)) {
	dma_fence_unlock_irqrestore(ef, flags);
	return true;
}
dma_fence_signal_locked(ef);
dma_fence_unlock_irqrestore(ef, flags);

return false;


+ some cosmetic adjustments for the boolean of course.


Would that fly and be reasonable? @Felix, Christian.


P.
Re: [PATCH 2/6] amd/amdkfd: Ignore return code of dma_fence_signal()
Posted by Christian König 4 days, 12 hours ago
On 11/27/25 10:48, Philipp Stanner wrote:
> On Wed, 2025-11-26 at 16:24 -0500, Kuehling, Felix wrote:
>>
>> On 2025-11-26 08:19, Philipp Stanner wrote:
>>> The return code of dma_fence_signal() is not really useful as there is
>>> nothing reasonable to do if a fence was already signaled. That return
>>> code shall be removed from the kernel.
>>>
>>> Ignore dma_fence_signal()'s return code.
>>
>> I think this is not correct. Looking at the comment in 
>> evict_process_worker, we use the return value to decide a race 
>> conditions where multiple threads are trying to signal the eviction 
>> fence. Only one of them should schedule the restore work. And the other 
>> ones need to increment the reference count to keep evictions balanced.
> 
> Thank you for pointing that out. Seems then amdkfd is the only user who
> actually relies on the feature. See below
> 
>>
>> Regards,
>>    Felix
>>
>>
>>>
>>> Suggested-by: Christian König <christian.koenig@amd.com>
>>> Signed-off-by: Philipp Stanner <phasta@kernel.org>
>>> ---
>>>   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 5 ++---
>>>   1 file changed, 2 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> index ddfe30c13e9d..950fafa4b3c3 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> @@ -1986,7 +1986,6 @@ kfd_process_gpuid_from_node(struct kfd_process *p, struct kfd_node *node,
>>>   static int signal_eviction_fence(struct kfd_process *p)
>>>   {
>>>   	struct dma_fence *ef;
>>> -	int ret;
>>>   
>>>   	rcu_read_lock();
>>>   	ef = dma_fence_get_rcu_safe(&p->ef);
>>> @@ -1994,10 +1993,10 @@ static int signal_eviction_fence(struct kfd_process *p)
>>>   	if (!ef)
>>>   		return -EINVAL;
>>>   
>>> -	ret = dma_fence_signal(ef);
>>> +	dma_fence_signal(ef);
> 
> The issue now is that dma_fence_signal()'s return code is actually non-
> racy, because check + bit-set are protected by lock.
> 
> Christian's new spinlock series would add a lock function for fences:
> https://lore.kernel.org/dri-devel/20251113145332.16805-5-christian.koenig@amd.com/
> 
> 
> So I suppose this should work:
> 
> dma_fence_lock_irqsave(ef, flags);
> if (dma_fence_test_signaled_flag(ef)) {
> 	dma_fence_unlock_irqrestore(ef, flags);
> 	return true;
> }
> dma_fence_signal_locked(ef);
> dma_fence_unlock_irqrestore(ef, flags);
> 
> return false;
> 
> 
> + some cosmetic adjustments for the boolean of course.
> 
> 
> Would that fly and be reasonable? @Felix, Christian.

I was just about to reply with the same idea when your mail arrived.

So yes looks totally reasonable to me.

Regards,
Christian.

> 
> 
> P.

Re: [PATCH 2/6] amd/amdkfd: Ignore return code of dma_fence_signal()
Posted by Kuehling, Felix 4 days, 7 hours ago
On 2025-11-27 04:55, Christian König wrote:
> On 11/27/25 10:48, Philipp Stanner wrote:
>> On Wed, 2025-11-26 at 16:24 -0500, Kuehling, Felix wrote:
>>> On 2025-11-26 08:19, Philipp Stanner wrote:
>>>> The return code of dma_fence_signal() is not really useful as there is
>>>> nothing reasonable to do if a fence was already signaled. That return
>>>> code shall be removed from the kernel.
>>>>
>>>> Ignore dma_fence_signal()'s return code.
>>> I think this is not correct. Looking at the comment in
>>> evict_process_worker, we use the return value to decide a race
>>> conditions where multiple threads are trying to signal the eviction
>>> fence. Only one of them should schedule the restore work. And the other
>>> ones need to increment the reference count to keep evictions balanced.
>> Thank you for pointing that out. Seems then amdkfd is the only user who
>> actually relies on the feature. See below
>>
>>> Regards,
>>>     Felix
>>>
>>>
>>>> Suggested-by: Christian König <christian.koenig@amd.com>
>>>> Signed-off-by: Philipp Stanner <phasta@kernel.org>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdkfd/kfd_process.c | 5 ++---
>>>>    1 file changed, 2 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>> index ddfe30c13e9d..950fafa4b3c3 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>>> @@ -1986,7 +1986,6 @@ kfd_process_gpuid_from_node(struct kfd_process *p, struct kfd_node *node,
>>>>    static int signal_eviction_fence(struct kfd_process *p)
>>>>    {
>>>>    	struct dma_fence *ef;
>>>> -	int ret;
>>>>    
>>>>    	rcu_read_lock();
>>>>    	ef = dma_fence_get_rcu_safe(&p->ef);
>>>> @@ -1994,10 +1993,10 @@ static int signal_eviction_fence(struct kfd_process *p)
>>>>    	if (!ef)
>>>>    		return -EINVAL;
>>>>    
>>>> -	ret = dma_fence_signal(ef);
>>>> +	dma_fence_signal(ef);
>> The issue now is that dma_fence_signal()'s return code is actually non-
>> racy, because check + bit-set are protected by lock.
>>
>> Christian's new spinlock series would add a lock function for fences:
>> https://lore.kernel.org/dri-devel/20251113145332.16805-5-christian.koenig@amd.com/
>>
>>
>> So I suppose this should work:
>>
>> dma_fence_lock_irqsave(ef, flags);
>> if (dma_fence_test_signaled_flag(ef)) {
>> 	dma_fence_unlock_irqrestore(ef, flags);
>> 	return true;
>> }
>> dma_fence_signal_locked(ef);
>> dma_fence_unlock_irqrestore(ef, flags);
>>
>> return false;
>>
>>
>> + some cosmetic adjustments for the boolean of course.
>>
>>
>> Would that fly and be reasonable? @Felix, Christian.
> I was just about to reply with the same idea when your mail arrived.

I agree as well. The important feature is that we need to test and 
signal the fence atomically. It may even make sense to add a function 
for that "dma_fence_test_and_signal" that preserves the original 
behaviour of dma_fence_signal. ;)

Regards,
   Felix


>
> So yes looks totally reasonable to me.
>
> Regards,
> Christian.
>
>>
>> P.
Re: [PATCH 2/6] amd/amdkfd: Ignore return code of dma_fence_signal()
Posted by Philipp Stanner 4 days, 6 hours ago
On Thu, 2025-11-27 at 10:08 -0500, Kuehling, Felix wrote:
> On 2025-11-27 04:55, Christian König wrote:
> > On 11/27/25 10:48, Philipp Stanner wrote:
> > > 

[…]

> > > The issue now is that dma_fence_signal()'s return code is actually non-
> > > racy, because check + bit-set are protected by lock.
> > > 
> > > Christian's new spinlock series would add a lock function for fences:
> > > https://lore.kernel.org/dri-devel/20251113145332.16805-5-christian.koenig@amd.com/
> > > 
> > > 
> > > So I suppose this should work:
> > > 
> > > dma_fence_lock_irqsave(ef, flags);
> > > if (dma_fence_test_signaled_flag(ef)) {
> > > 	dma_fence_unlock_irqrestore(ef, flags);
> > > 	return true;
> > > }
> > > dma_fence_signal_locked(ef);
> > > dma_fence_unlock_irqrestore(ef, flags);
> > > 
> > > return false;
> > > 
> > > 
> > > + some cosmetic adjustments for the boolean of course.
> > > 
> > > 
> > > Would that fly and be reasonable? @Felix, Christian.
> > I was just about to reply with the same idea when your mail arrived.
> 
> I agree as well. The important feature is that we need to test and 
> signal the fence atomically. It may even make sense to add a function
> for that "dma_fence_test_and_signal" that preserves the original 
> behaviour of dma_fence_signal. ;)

Fine by me if the maintainer agrees

P.