[PATCH] dma-fence: Correct return of dma_fence_driver_name()

Philipp Stanner posted 1 patch 3 months, 2 weeks ago
drivers/dma-buf/dma-fence.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] dma-fence: Correct return of dma_fence_driver_name()
Posted by Philipp Stanner 3 months, 2 weeks ago
To decouple the dma_fence_ops lifetime from dma_fences lifetime RCU
support was added to said function, coupled with using the signaled bit
to detect whether the fence_ops might be gone already.

When implementing that a wrong string was set as a default return
parameter, indicating that every driver whose fence is already signalled
must be detached, which is frankly wrong.

Reported-by: Danilo Krummrich <dakr@kernel.org>
Fixes: 506aa8b02a8d ("dma-fence: Add safe access helpers and document the rules")
Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
When this was merged, it sadly slipped by me. I think this entire RCU
mechanism was / is an overengineered idea.

If we look at who actually uses dma_fence_driver_name() and
dma_fence_timeline_name() – functions from which the largest share of
the fence_ops vs. fence lifetime issue stems from – we discover that
there is a single user:

i915.

Isn't that driver even deprecated?

I think the better thing to do is: remove these functions alltogether,
or at least deprecate them. Then the only lifetime issue left so solve
is the callback functions.

P.
---
 drivers/dma-buf/dma-fence.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 3f78c56b58dc..1875a0abebd3 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -1111,7 +1111,7 @@ const char __rcu *dma_fence_driver_name(struct dma_fence *fence)
 	if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
 		return fence->ops->get_driver_name(fence);
 	else
-		return "detached-driver";
+		return "driver-whose-fence-is-already-signalled";
 }
 EXPORT_SYMBOL(dma_fence_driver_name);
 
-- 
2.49.0

Re: [PATCH] dma-fence: Correct return of dma_fence_driver_name()
Posted by Tvrtko Ursulin 3 months, 2 weeks ago
On 24/10/2025 08:50, Philipp Stanner wrote:
> To decouple the dma_fence_ops lifetime from dma_fences lifetime RCU
> support was added to said function, coupled with using the signaled bit
> to detect whether the fence_ops might be gone already.
> 
> When implementing that a wrong string was set as a default return
> parameter, indicating that every driver whose fence is already signalled
> must be detached, which is frankly wrong.

Depends on how you look at it. After being signaled fence has to be 
detached from the driver. Ie. nothing belonging to this driver must be 
accessed via the fence.

I started with names and Christian has recently continued with ops.

> Reported-by: Danilo Krummrich <dakr@kernel.org>
> Fixes: 506aa8b02a8d ("dma-fence: Add safe access helpers and document the rules")
> Signed-off-by: Philipp Stanner <phasta@kernel.org>
> ---
> When this was merged, it sadly slipped by me. I think this entire RCU
> mechanism was / is an overengineered idea.
> 
> If we look at who actually uses dma_fence_driver_name() and
> dma_fence_timeline_name() – functions from which the largest share of
> the fence_ops vs. fence lifetime issue stems from – we discover that
> there is a single user:
> 
> i915.

Not quite. The trigger event for fixing this was actually xe where use 
after free was achievable with a trivial set of userspace steps. See 
reproducer at:

https://lore.kernel.org/igt-dev/20250312131835.83983-1-tvrtko.ursulin@igalia.com/

Essentially any fence exporter whose fence can be exported either 
directly via sync file, or via syncobj to sync file export, and has 
state accessible via the fence ops which may be freed after the fence is 
signalled, or if the driver can be unbound from the device and unloaded, 
is vulnerable.

> Isn't that driver even deprecated?
Not exactly. It is not getting support for new hardware generations, 
while the new driver is not supporting old. There is a cut off point and 
an overlap of around one generation. Although I am not even sure this 
overlap is officially supported by Intel.

> I think the better thing to do is: remove these functions alltogether,
> or at least deprecate them. Then the only lifetime issue left so solve
> is the callback functions.

That would be nice, I also do not see much value in exporting names to 
userspace. But first more conversation around breaking the sync file ABI 
needs to happen. I think we had a little bit of it when changing the 
names of signalled fences and thinking was existing tools which look at 
the names will mostly survive it. Not sure if they would if unsignalled 
names would change.

> 
> P.
> ---
>   drivers/dma-buf/dma-fence.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 3f78c56b58dc..1875a0abebd3 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -1111,7 +1111,7 @@ const char __rcu *dma_fence_driver_name(struct dma_fence *fence)
>   	if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>   		return fence->ops->get_driver_name(fence);
>   	else
> -		return "detached-driver";
> +		return "driver-whose-fence-is-already-signalled";

IMHO unnecessarily verbose and whether or not changing it to anything 
different warrants a Fixes: tag is debatable.

Regards,

Tvrtko

>   }
>   EXPORT_SYMBOL(dma_fence_driver_name);
>   

Re: [PATCH] dma-fence: Correct return of dma_fence_driver_name()
Posted by Philipp Stanner 3 months, 2 weeks ago
On Fri, 2025-10-24 at 09:31 +0100, Tvrtko Ursulin wrote:
> 
> On 24/10/2025 08:50, Philipp Stanner wrote:
> > To decouple the dma_fence_ops lifetime from dma_fences lifetime RCU
> > support was added to said function, coupled with using the signaled bit
> > to detect whether the fence_ops might be gone already.
> > 
> > When implementing that a wrong string was set as a default return
> > parameter, indicating that every driver whose fence is already signalled
> > must be detached, which is frankly wrong.
> 
> Depends on how you look at it. After being signaled fence has to be 
> detached from the driver. Ie. nothing belonging to this driver must be 
> accessed via the fence.

Is that even documented btw? Many of the mysterious "dma fence rules"
are often only obtainable by asking Christian & Co

> 
> I started with names and Christian has recently continued with ops.
> 
> > Reported-by: Danilo Krummrich <dakr@kernel.org>
> > Fixes: 506aa8b02a8d ("dma-fence: Add safe access helpers and document the rules")
> > Signed-off-by: Philipp Stanner <phasta@kernel.org>
> > ---
> > When this was merged, it sadly slipped by me. I think this entire RCU
> > mechanism was / is an overengineered idea.
> > 
> > If we look at who actually uses dma_fence_driver_name() and
> > dma_fence_timeline_name() – functions from which the largest share of
> > the fence_ops vs. fence lifetime issue stems from – we discover that
> > there is a single user:
> > 
> > i915.
> 

[…]

> 
> 
> That would be nice, I also do not see much value in exporting names to 
> userspace. But first more conversation around breaking the sync file ABI 
> needs to happen. I think we had a little bit of it when changing the 
> names of signalled fences and thinking was existing tools which look at 
> the names will mostly survive it. Not sure if they would if unsignalled 
> names would change.

I mean, what you and Christian are addressing in recent weeks are real
problems, and I was / am about to write similar solutions for our Rust
dma_fence.

In the case of those names, however, I'll likely just not support that
in Rust, saving me from adding those RCU guards and delivering output
of questionable use to users.
(could ofc be added later by someone who really needs it…)

> 
> > 
> > P.
> > ---
> >   drivers/dma-buf/dma-fence.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > index 3f78c56b58dc..1875a0abebd3 100644
> > --- a/drivers/dma-buf/dma-fence.c
> > +++ b/drivers/dma-buf/dma-fence.c
> > @@ -1111,7 +1111,7 @@ const char __rcu *dma_fence_driver_name(struct dma_fence *fence)
> >   	if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> >   		return fence->ops->get_driver_name(fence);
> >   	else
> > -		return "detached-driver";
> > +		return "driver-whose-fence-is-already-signalled";
> 
> IMHO unnecessarily verbose and whether or not changing it to anything
> different warrants a Fixes: tag is debatable.

IMO the output is just wrong and confusing. It's easy to imagine that
some user starts wondering and searching why his driver has been
unloaded, opening support tickets and so on.

Could be less verbose, though. Dunno. I let the maintainer decide.

P.

> 
> Regards,
> 
> Tvrtko
> 
> >   }
> >   EXPORT_SYMBOL(dma_fence_driver_name);
> >   
> 
Re: [PATCH] dma-fence: Correct return of dma_fence_driver_name()
Posted by Tvrtko Ursulin 3 months, 2 weeks ago
On 24/10/2025 11:59, Philipp Stanner wrote:
> On Fri, 2025-10-24 at 09:31 +0100, Tvrtko Ursulin wrote:
>>
>> On 24/10/2025 08:50, Philipp Stanner wrote:
>>> To decouple the dma_fence_ops lifetime from dma_fences lifetime RCU
>>> support was added to said function, coupled with using the signaled bit
>>> to detect whether the fence_ops might be gone already.
>>>
>>> When implementing that a wrong string was set as a default return
>>> parameter, indicating that every driver whose fence is already signalled
>>> must be detached, which is frankly wrong.
>>
>> Depends on how you look at it. After being signaled fence has to be
>> detached from the driver. Ie. nothing belonging to this driver must be
>> accessed via the fence.
> 
> Is that even documented btw? Many of the mysterious "dma fence rules"
> are often only obtainable by asking Christian & Co

I tried to document it in the very patch which added it. Both in the 
commit message and in the large sticky-outy comments added to these helpers:

"""
  * dma_fence_driver_name - Access the driver name
  * @fence: the fence to query
  *
  * Returns a driver name backing the dma-fence implementation.
  *
  * IMPORTANT CONSIDERATION:
  * Dma-fence contract stipulates that access to driver provided data 
(data not
  * directly embedded into the object itself), such as the 
&dma_fence.lock and
  * memory potentially accessed by the &dma_fence.ops functions, is 
forbidden
  * after the fence has been signalled. Drivers are allowed to free that 
data,
  * and some do.
  *
  * To allow safe access drivers are mandated to guarantee a RCU grace 
period
  * between signalling the fence and freeing said data.
  *
  * As such access to the driver name is only valid inside a RCU locked 
section.
  * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block 
guarded
  * by the &rcu_read_lock and &rcu_read_unlock pair.
"""

> 
>>
>> I started with names and Christian has recently continued with ops.
>>
>>> Reported-by: Danilo Krummrich <dakr@kernel.org>
>>> Fixes: 506aa8b02a8d ("dma-fence: Add safe access helpers and document the rules")
>>> Signed-off-by: Philipp Stanner <phasta@kernel.org>
>>> ---
>>> When this was merged, it sadly slipped by me. I think this entire RCU
>>> mechanism was / is an overengineered idea.
>>>
>>> If we look at who actually uses dma_fence_driver_name() and
>>> dma_fence_timeline_name() – functions from which the largest share of
>>> the fence_ops vs. fence lifetime issue stems from – we discover that
>>> there is a single user:
>>>
>>> i915.
>>
> 
> […]
> 
>>
>>
>> That would be nice, I also do not see much value in exporting names to
>> userspace. But first more conversation around breaking the sync file ABI
>> needs to happen. I think we had a little bit of it when changing the
>> names of signalled fences and thinking was existing tools which look at
>> the names will mostly survive it. Not sure if they would if unsignalled
>> names would change.
> 
> I mean, what you and Christian are addressing in recent weeks are real
> problems, and I was / am about to write similar solutions for our Rust
> dma_fence.
> 
> In the case of those names, however, I'll likely just not support that
> in Rust, saving me from adding those RCU guards and delivering output
> of questionable use to users.
> (could ofc be added later by someone who really needs it…)

Sounds like a good plan to start without the problematic parts whenever 
possible. More than that I cannot comment since I have no idea how rust 
stuff will work and interact with the existing uapi entry points such as 
the sync file.

>>> P.
>>> ---
>>>    drivers/dma-buf/dma-fence.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>>> index 3f78c56b58dc..1875a0abebd3 100644
>>> --- a/drivers/dma-buf/dma-fence.c
>>> +++ b/drivers/dma-buf/dma-fence.c
>>> @@ -1111,7 +1111,7 @@ const char __rcu *dma_fence_driver_name(struct dma_fence *fence)
>>>    	if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>>    		return fence->ops->get_driver_name(fence);
>>>    	else
>>> -		return "detached-driver";
>>> +		return "driver-whose-fence-is-already-signalled";
>>
>> IMHO unnecessarily verbose and whether or not changing it to anything
>> different warrants a Fixes: tag is debatable.
> 
> IMO the output is just wrong and confusing. It's easy to imagine that
> some user starts wondering and searching why his driver has been
> unloaded, opening support tickets and so on.
> 
> Could be less verbose, though. Dunno. I let the maintainer decide.

Driver and timeline usually come together so the signalled info is 
already there ie. "detached-driver signaled-timeline". For example in 
debugfs via dma_fence_describe().

So changing that to "driver-whose-fence-is-already-signalled 
signaled-timeline" still looks too much.

Also, the short name can be reduced from a verbose starting point 
similar to yours:

  "unknown-driver-is-detached-from-the-signaled-fence"
  "driver-detached-from-the-fence"
  "driver-detached"

Or keep "detached-driver" as good enough. Mea culpa for typing it up 
transposed. :)

Regards,

Tvrtko

> 
> P.
> 
>>
>> Regards,
>>
>> Tvrtko
>>
>>>    }
>>>    EXPORT_SYMBOL(dma_fence_driver_name);
>>>    
>>
> 

Re: [PATCH] dma-fence: Correct return of dma_fence_driver_name()
Posted by Danilo Krummrich 3 months, 2 weeks ago
On 10/24/25 1:31 PM, Tvrtko Ursulin wrote:
> Also, the short name can be reduced from a verbose starting point similar to yours:
> 
>  "unknown-driver-is-detached-from-the-signaled-fence"
>  "driver-detached-from-the-fence"
>  "driver-detached"
> 
> Or keep "detached-driver" as good enough. Mea culpa for typing it up transposed. :)

"detached-driver" is misleading, just because the fence is signaled it doesn't
mean that the driver is detached.
Re: [PATCH] dma-fence: Correct return of dma_fence_driver_name()
Posted by Tvrtko Ursulin 3 months, 2 weeks ago
On 24/10/2025 13:12, Danilo Krummrich wrote:
> On 10/24/25 1:31 PM, Tvrtko Ursulin wrote:
>> Also, the short name can be reduced from a verbose starting point similar to yours:
>>
>>   "unknown-driver-is-detached-from-the-signaled-fence"
>>   "driver-detached-from-the-fence"
>>   "driver-detached"
>>
>> Or keep "detached-driver" as good enough. Mea culpa for typing it up transposed. :)
> 
> "detached-driver" is misleading, just because the fence is signaled it doesn't
> mean that the driver is detached.

You trim too much of the quote making it unclear if you read the whole 
story.

If the driver isn't detached from the signalled fence then it is 
vulnerable to use after free.

Current state is that name and timeline helpers _are_ detached from the 
driver once the fence is signalled.

Fence ops and lock are still not and that is work in progress.

The names under discussion are printed together in debugfs (intended 
audience kernel developers). Currently, per fence, it should be along 
the lines of:

"kernel fence: detached-driver signaled-timeline seq 1234 signalled"

You can propose a better name but make it not too verbose and/or 
redundant considering the above.

Regards,

Tvrtko

Re: [PATCH] dma-fence: Correct return of dma_fence_driver_name()
Posted by Danilo Krummrich 3 months, 2 weeks ago
On 10/24/25 2:40 PM, Tvrtko Ursulin wrote:
> You trim too much of the quote making it unclear if you read the whole story.

I'm well aware of the context.
> If the driver isn't detached from the signalled fence then it is vulnerable to
> use after free.
When someone just reads "detached-driver" is creates the impression that the
driver is unbound from its device, since this is what this term is usually used for.

(And this is even the case you want to protect against, i.e. the string behind
the pointer returned by get_driver_name() has been freed.)

However, the condition that has changed when you print "driver-detached" is that
the fence has been signaled, independent of whether the driver has been detached
from the device.

Now, you can argue that you mean "driver has been detached from the fence",
which means something along the lines of "the driver has no business with the
fence anymore", but this is not what people think of when they read
"detached-driver".
Re: [PATCH] dma-fence: Correct return of dma_fence_driver_name()
Posted by Tvrtko Ursulin 3 months, 2 weeks ago
On 24/10/2025 14:13, Danilo Krummrich wrote:
> On 10/24/25 2:40 PM, Tvrtko Ursulin wrote:
>> You trim too much of the quote making it unclear if you read the whole story.
> 
> I'm well aware of the context.

Good to know. I am coming from the angle that netiquette, at least in 
the olden days, used to be that when you join an established thread you 
don't trim too much of the context. For the benefit of people joining 
the thread at that very point, especially when re-raising an argument 
which has already been discussed.

>> If the driver isn't detached from the signalled fence then it is vulnerable to
>> use after free.
> When someone just reads "detached-driver" is creates the impression that the
> driver is unbound from its device, since this is what this term is usually used for.
> 
> (And this is even the case you want to protect against, i.e. the string behind
> the pointer returned by get_driver_name() has been freed.)

One of the cases just to be clear. The driver getting unbound from the 
device is not *the* case.

In fact with xe the bug was exploitable by just closing the render node 
fd and then querying the fence. Hence detached in this context is more 
than unbound or unloaded.
> However, the condition that has changed when you print "driver-detached" is that
> the fence has been signaled, independent of whether the driver has been detached
> from the device.
> 
> Now, you can argue that you mean "driver has been detached from the fence",
> which means something along the lines of "the driver has no business with the
> fence anymore", but this is not what people think of when they read
> "detached-driver".Okay people. :)

How about "unknown-driver", would that satisfy you?

Regards,

Tvrtko
Re: [PATCH] dma-fence: Correct return of dma_fence_driver_name()
Posted by Danilo Krummrich 3 months, 2 weeks ago
On 10/24/25 3:37 PM, Tvrtko Ursulin wrote:
> Good to know. I am coming from the angle that netiquette, at least in the olden
> days, used to be that when you join an established thread you don't trim too
> much of the context. For the benefit of people joining the thread at that very
> point, especially when re-raising an argument which has already been discussed.

I see it the opposite way, leaving too much context wastes people's time
searching for the actual reply, see also [1].

If someone wants the full context, previously sent mails are always available.

[1] https://subspace.kernel.org/etiquette.html#trim-your-quotes-when-replying

>> Now, you can argue that you mean "driver has been detached from the fence",
>> which means something along the lines of "the driver has no business with the
>> fence anymore", but this is not what people think of when they read
>> "detached-driver".Okay people. :)

Not a big deal, but for you to note: Quite some of your replies I've received
recently add text to the quoted parts, in this case the "Okay people. :)".

> How about "unknown-driver", would that satisfy you?

Honestly, the most accurate thing to say would be "fence-signaled", because
that's the actual condition which causes the change.
Re: [PATCH] dma-fence: Correct return of dma_fence_driver_name()
Posted by Tvrtko Ursulin 3 months, 2 weeks ago
On 24/10/2025 15:17, Danilo Krummrich wrote:
> On 10/24/25 3:37 PM, Tvrtko Ursulin wrote:
>> Good to know. I am coming from the angle that netiquette, at least in the olden
>> days, used to be that when you join an established thread you don't trim too
>> much of the context. For the benefit of people joining the thread at that very
>> point, especially when re-raising an argument which has already been discussed.
> 
> I see it the opposite way, leaving too much context wastes people's time
> searching for the actual reply, see also [1].
> 
> If someone wants the full context, previously sent mails are always available.
> 
> [1] https://subspace.kernel.org/etiquette.html#trim-your-quotes-when-replying
> 
>>> Now, you can argue that you mean "driver has been detached from the fence",
>>> which means something along the lines of "the driver has no business with the
>>> fence anymore", but this is not what people think of when they read
>>> "detached-driver".Okay people. :)
> 
> Not a big deal, but for you to note: Quite some of your replies I've received
> recently add text to the quoted parts, in this case the "Okay people. :)".

Oh dear.. I can see it in the sent folder but it definitely wasn't 
composed like that. Thank you for pointing it out.

There has been something seriously wrong with Thunderbird's compose 
window for a few weeks now. First it would be inserting random newlines 
while editing, and then now after an update it jumps to the end of the 
whole message and add newlines there when pressing backspace on an empty 
line. Its infuriating. I see there is another update so I will try that.

>> How about "unknown-driver", would that satisfy you?
> 
> Honestly, the most accurate thing to say would be "fence-signaled", because
> that's the actual condition which causes the change.
Hm, ->get_driver_name() returning "fence-signaled" is not great, and 
debugfs output in the form of "kernel fence: fence-signaled 
timeline-signaled seq 1234 signaled" feels a bit redundant. :shrug:

Regards,

Tvrtko
Re: [PATCH] dma-fence: Correct return of dma_fence_driver_name()
Posted by Danilo Krummrich 3 months, 2 weeks ago
On 10/24/25 4:28 PM, Tvrtko Ursulin wrote:
> On 24/10/2025 15:17, Danilo Krummrich wrote:
>> On 10/24/25 3:37 PM, Tvrtko Ursulin wrote:
>>> How about "unknown-driver", would that satisfy you?
>>
>> Honestly, the most accurate thing to say would be "fence-signaled", because
>> that's the actual condition which causes the change.
> Hm, ->get_driver_name() returning "fence-signaled" is not great, and debugfs
> output in the form of "kernel fence: fence-signaled timeline-signaled seq 1234
> signaled" feels a bit redundant. :shrug:

Indeed, what about "retired-driver"? Still implying that it's not just unknown,
but that some process has finished. But without the ambiguity of "detached-driver".
Re: [PATCH] dma-fence: Correct return of dma_fence_driver_name()
Posted by Tvrtko Ursulin 3 months, 2 weeks ago
On 24/10/2025 15:36, Danilo Krummrich wrote:
> On 10/24/25 4:28 PM, Tvrtko Ursulin wrote:
>> On 24/10/2025 15:17, Danilo Krummrich wrote:
>>> On 10/24/25 3:37 PM, Tvrtko Ursulin wrote:
>>>> How about "unknown-driver", would that satisfy you?
>>>
>>> Honestly, the most accurate thing to say would be "fence-signaled", because
>>> that's the actual condition which causes the change.
>> Hm, ->get_driver_name() returning "fence-signaled" is not great, and debugfs
>> output in the form of "kernel fence: fence-signaled timeline-signaled seq 1234
>> signaled" feels a bit redundant. :shrug:
> 
> Indeed, what about "retired-driver"? Still implying that it's not just unknown,
> but that some process has finished. But without the ambiguity of "detached-driver".

Maybe "decoupled-driver"?

Regards,

Tvrtko
Re: [PATCH] dma-fence: Correct return of dma_fence_driver_name()
Posted by Danilo Krummrich 3 months, 2 weeks ago
On 10/24/25 6:06 PM, Tvrtko Ursulin wrote:
> Maybe "decoupled-driver"?
SGTM.