drivers/dma-buf/dma-fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
To decouple the dma_fence_ops lifetime from dma_fences lifetime RCU
support was added to said function, coupled with using the signaled bit
to detect whether the fence_ops might be gone already.
When implementing that a wrong string was set as a default return
parameter, indicating that every driver whose fence is already signalled
must be detached, which is frankly wrong.
Reported-by: Danilo Krummrich <dakr@kernel.org>
Fixes: 506aa8b02a8d ("dma-fence: Add safe access helpers and document the rules")
Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
When this was merged, it sadly slipped by me. I think this entire RCU
mechanism was / is an overengineered idea.
If we look at who actually uses dma_fence_driver_name() and
dma_fence_timeline_name() – functions from which the largest share of
the fence_ops vs. fence lifetime issue stems from – we discover that
there is a single user:
i915.
Isn't that driver even deprecated?
I think the better thing to do is: remove these functions alltogether,
or at least deprecate them. Then the only lifetime issue left so solve
is the callback functions.
P.
---
drivers/dma-buf/dma-fence.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 3f78c56b58dc..1875a0abebd3 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -1111,7 +1111,7 @@ const char __rcu *dma_fence_driver_name(struct dma_fence *fence)
if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
return fence->ops->get_driver_name(fence);
else
- return "detached-driver";
+ return "driver-whose-fence-is-already-signalled";
}
EXPORT_SYMBOL(dma_fence_driver_name);
--
2.49.0
On 24/10/2025 08:50, Philipp Stanner wrote:
> To decouple the dma_fence_ops lifetime from dma_fences lifetime RCU
> support was added to said function, coupled with using the signaled bit
> to detect whether the fence_ops might be gone already.
>
> When implementing that a wrong string was set as a default return
> parameter, indicating that every driver whose fence is already signalled
> must be detached, which is frankly wrong.
Depends on how you look at it. After being signaled fence has to be
detached from the driver. Ie. nothing belonging to this driver must be
accessed via the fence.
I started with names and Christian has recently continued with ops.
> Reported-by: Danilo Krummrich <dakr@kernel.org>
> Fixes: 506aa8b02a8d ("dma-fence: Add safe access helpers and document the rules")
> Signed-off-by: Philipp Stanner <phasta@kernel.org>
> ---
> When this was merged, it sadly slipped by me. I think this entire RCU
> mechanism was / is an overengineered idea.
>
> If we look at who actually uses dma_fence_driver_name() and
> dma_fence_timeline_name() – functions from which the largest share of
> the fence_ops vs. fence lifetime issue stems from – we discover that
> there is a single user:
>
> i915.
Not quite. The trigger event for fixing this was actually xe where use
after free was achievable with a trivial set of userspace steps. See
reproducer at:
https://lore.kernel.org/igt-dev/20250312131835.83983-1-tvrtko.ursulin@igalia.com/
Essentially any fence exporter whose fence can be exported either
directly via sync file, or via syncobj to sync file export, and has
state accessible via the fence ops which may be freed after the fence is
signalled, or if the driver can be unbound from the device and unloaded,
is vulnerable.
> Isn't that driver even deprecated?
Not exactly. It is not getting support for new hardware generations,
while the new driver is not supporting old. There is a cut off point and
an overlap of around one generation. Although I am not even sure this
overlap is officially supported by Intel.
> I think the better thing to do is: remove these functions alltogether,
> or at least deprecate them. Then the only lifetime issue left so solve
> is the callback functions.
That would be nice, I also do not see much value in exporting names to
userspace. But first more conversation around breaking the sync file ABI
needs to happen. I think we had a little bit of it when changing the
names of signalled fences and thinking was existing tools which look at
the names will mostly survive it. Not sure if they would if unsignalled
names would change.
>
> P.
> ---
> drivers/dma-buf/dma-fence.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 3f78c56b58dc..1875a0abebd3 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -1111,7 +1111,7 @@ const char __rcu *dma_fence_driver_name(struct dma_fence *fence)
> if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> return fence->ops->get_driver_name(fence);
> else
> - return "detached-driver";
> + return "driver-whose-fence-is-already-signalled";
IMHO unnecessarily verbose and whether or not changing it to anything
different warrants a Fixes: tag is debatable.
Regards,
Tvrtko
> }
> EXPORT_SYMBOL(dma_fence_driver_name);
>
On Fri, 2025-10-24 at 09:31 +0100, Tvrtko Ursulin wrote:
>
> On 24/10/2025 08:50, Philipp Stanner wrote:
> > To decouple the dma_fence_ops lifetime from dma_fences lifetime RCU
> > support was added to said function, coupled with using the signaled bit
> > to detect whether the fence_ops might be gone already.
> >
> > When implementing that a wrong string was set as a default return
> > parameter, indicating that every driver whose fence is already signalled
> > must be detached, which is frankly wrong.
>
> Depends on how you look at it. After being signaled fence has to be
> detached from the driver. Ie. nothing belonging to this driver must be
> accessed via the fence.
Is that even documented btw? Many of the mysterious "dma fence rules"
are often only obtainable by asking Christian & Co
>
> I started with names and Christian has recently continued with ops.
>
> > Reported-by: Danilo Krummrich <dakr@kernel.org>
> > Fixes: 506aa8b02a8d ("dma-fence: Add safe access helpers and document the rules")
> > Signed-off-by: Philipp Stanner <phasta@kernel.org>
> > ---
> > When this was merged, it sadly slipped by me. I think this entire RCU
> > mechanism was / is an overengineered idea.
> >
> > If we look at who actually uses dma_fence_driver_name() and
> > dma_fence_timeline_name() – functions from which the largest share of
> > the fence_ops vs. fence lifetime issue stems from – we discover that
> > there is a single user:
> >
> > i915.
>
[…]
>
>
> That would be nice, I also do not see much value in exporting names to
> userspace. But first more conversation around breaking the sync file ABI
> needs to happen. I think we had a little bit of it when changing the
> names of signalled fences and thinking was existing tools which look at
> the names will mostly survive it. Not sure if they would if unsignalled
> names would change.
I mean, what you and Christian are addressing in recent weeks are real
problems, and I was / am about to write similar solutions for our Rust
dma_fence.
In the case of those names, however, I'll likely just not support that
in Rust, saving me from adding those RCU guards and delivering output
of questionable use to users.
(could ofc be added later by someone who really needs it…)
>
> >
> > P.
> > ---
> > drivers/dma-buf/dma-fence.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > index 3f78c56b58dc..1875a0abebd3 100644
> > --- a/drivers/dma-buf/dma-fence.c
> > +++ b/drivers/dma-buf/dma-fence.c
> > @@ -1111,7 +1111,7 @@ const char __rcu *dma_fence_driver_name(struct dma_fence *fence)
> > if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> > return fence->ops->get_driver_name(fence);
> > else
> > - return "detached-driver";
> > + return "driver-whose-fence-is-already-signalled";
>
> IMHO unnecessarily verbose and whether or not changing it to anything
> different warrants a Fixes: tag is debatable.
IMO the output is just wrong and confusing. It's easy to imagine that
some user starts wondering and searching why his driver has been
unloaded, opening support tickets and so on.
Could be less verbose, though. Dunno. I let the maintainer decide.
P.
>
> Regards,
>
> Tvrtko
>
> > }
> > EXPORT_SYMBOL(dma_fence_driver_name);
> >
>
On 24/10/2025 11:59, Philipp Stanner wrote:
> On Fri, 2025-10-24 at 09:31 +0100, Tvrtko Ursulin wrote:
>>
>> On 24/10/2025 08:50, Philipp Stanner wrote:
>>> To decouple the dma_fence_ops lifetime from dma_fences lifetime RCU
>>> support was added to said function, coupled with using the signaled bit
>>> to detect whether the fence_ops might be gone already.
>>>
>>> When implementing that a wrong string was set as a default return
>>> parameter, indicating that every driver whose fence is already signalled
>>> must be detached, which is frankly wrong.
>>
>> Depends on how you look at it. After being signaled fence has to be
>> detached from the driver. Ie. nothing belonging to this driver must be
>> accessed via the fence.
>
> Is that even documented btw? Many of the mysterious "dma fence rules"
> are often only obtainable by asking Christian & Co
I tried to document it in the very patch which added it. Both in the
commit message and in the large sticky-outy comments added to these helpers:
"""
* dma_fence_driver_name - Access the driver name
* @fence: the fence to query
*
* Returns a driver name backing the dma-fence implementation.
*
* IMPORTANT CONSIDERATION:
* Dma-fence contract stipulates that access to driver provided data
(data not
* directly embedded into the object itself), such as the
&dma_fence.lock and
* memory potentially accessed by the &dma_fence.ops functions, is
forbidden
* after the fence has been signalled. Drivers are allowed to free that
data,
* and some do.
*
* To allow safe access drivers are mandated to guarantee a RCU grace
period
* between signalling the fence and freeing said data.
*
* As such access to the driver name is only valid inside a RCU locked
section.
* The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block
guarded
* by the &rcu_read_lock and &rcu_read_unlock pair.
"""
>
>>
>> I started with names and Christian has recently continued with ops.
>>
>>> Reported-by: Danilo Krummrich <dakr@kernel.org>
>>> Fixes: 506aa8b02a8d ("dma-fence: Add safe access helpers and document the rules")
>>> Signed-off-by: Philipp Stanner <phasta@kernel.org>
>>> ---
>>> When this was merged, it sadly slipped by me. I think this entire RCU
>>> mechanism was / is an overengineered idea.
>>>
>>> If we look at who actually uses dma_fence_driver_name() and
>>> dma_fence_timeline_name() – functions from which the largest share of
>>> the fence_ops vs. fence lifetime issue stems from – we discover that
>>> there is a single user:
>>>
>>> i915.
>>
>
> […]
>
>>
>>
>> That would be nice, I also do not see much value in exporting names to
>> userspace. But first more conversation around breaking the sync file ABI
>> needs to happen. I think we had a little bit of it when changing the
>> names of signalled fences and thinking was existing tools which look at
>> the names will mostly survive it. Not sure if they would if unsignalled
>> names would change.
>
> I mean, what you and Christian are addressing in recent weeks are real
> problems, and I was / am about to write similar solutions for our Rust
> dma_fence.
>
> In the case of those names, however, I'll likely just not support that
> in Rust, saving me from adding those RCU guards and delivering output
> of questionable use to users.
> (could ofc be added later by someone who really needs it…)
Sounds like a good plan to start without the problematic parts whenever
possible. More than that I cannot comment since I have no idea how rust
stuff will work and interact with the existing uapi entry points such as
the sync file.
>>> P.
>>> ---
>>> drivers/dma-buf/dma-fence.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>>> index 3f78c56b58dc..1875a0abebd3 100644
>>> --- a/drivers/dma-buf/dma-fence.c
>>> +++ b/drivers/dma-buf/dma-fence.c
>>> @@ -1111,7 +1111,7 @@ const char __rcu *dma_fence_driver_name(struct dma_fence *fence)
>>> if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>> return fence->ops->get_driver_name(fence);
>>> else
>>> - return "detached-driver";
>>> + return "driver-whose-fence-is-already-signalled";
>>
>> IMHO unnecessarily verbose and whether or not changing it to anything
>> different warrants a Fixes: tag is debatable.
>
> IMO the output is just wrong and confusing. It's easy to imagine that
> some user starts wondering and searching why his driver has been
> unloaded, opening support tickets and so on.
>
> Could be less verbose, though. Dunno. I let the maintainer decide.
Driver and timeline usually come together so the signalled info is
already there ie. "detached-driver signaled-timeline". For example in
debugfs via dma_fence_describe().
So changing that to "driver-whose-fence-is-already-signalled
signaled-timeline" still looks too much.
Also, the short name can be reduced from a verbose starting point
similar to yours:
"unknown-driver-is-detached-from-the-signaled-fence"
"driver-detached-from-the-fence"
"driver-detached"
Or keep "detached-driver" as good enough. Mea culpa for typing it up
transposed. :)
Regards,
Tvrtko
>
> P.
>
>>
>> Regards,
>>
>> Tvrtko
>>
>>> }
>>> EXPORT_SYMBOL(dma_fence_driver_name);
>>>
>>
>
On 10/24/25 1:31 PM, Tvrtko Ursulin wrote: > Also, the short name can be reduced from a verbose starting point similar to yours: > > "unknown-driver-is-detached-from-the-signaled-fence" > "driver-detached-from-the-fence" > "driver-detached" > > Or keep "detached-driver" as good enough. Mea culpa for typing it up transposed. :) "detached-driver" is misleading, just because the fence is signaled it doesn't mean that the driver is detached.
On 24/10/2025 13:12, Danilo Krummrich wrote: > On 10/24/25 1:31 PM, Tvrtko Ursulin wrote: >> Also, the short name can be reduced from a verbose starting point similar to yours: >> >> "unknown-driver-is-detached-from-the-signaled-fence" >> "driver-detached-from-the-fence" >> "driver-detached" >> >> Or keep "detached-driver" as good enough. Mea culpa for typing it up transposed. :) > > "detached-driver" is misleading, just because the fence is signaled it doesn't > mean that the driver is detached. You trim too much of the quote making it unclear if you read the whole story. If the driver isn't detached from the signalled fence then it is vulnerable to use after free. Current state is that name and timeline helpers _are_ detached from the driver once the fence is signalled. Fence ops and lock are still not and that is work in progress. The names under discussion are printed together in debugfs (intended audience kernel developers). Currently, per fence, it should be along the lines of: "kernel fence: detached-driver signaled-timeline seq 1234 signalled" You can propose a better name but make it not too verbose and/or redundant considering the above. Regards, Tvrtko
On 10/24/25 2:40 PM, Tvrtko Ursulin wrote: > You trim too much of the quote making it unclear if you read the whole story. I'm well aware of the context. > If the driver isn't detached from the signalled fence then it is vulnerable to > use after free. When someone just reads "detached-driver" is creates the impression that the driver is unbound from its device, since this is what this term is usually used for. (And this is even the case you want to protect against, i.e. the string behind the pointer returned by get_driver_name() has been freed.) However, the condition that has changed when you print "driver-detached" is that the fence has been signaled, independent of whether the driver has been detached from the device. Now, you can argue that you mean "driver has been detached from the fence", which means something along the lines of "the driver has no business with the fence anymore", but this is not what people think of when they read "detached-driver".
On 24/10/2025 14:13, Danilo Krummrich wrote: > On 10/24/25 2:40 PM, Tvrtko Ursulin wrote: >> You trim too much of the quote making it unclear if you read the whole story. > > I'm well aware of the context. Good to know. I am coming from the angle that netiquette, at least in the olden days, used to be that when you join an established thread you don't trim too much of the context. For the benefit of people joining the thread at that very point, especially when re-raising an argument which has already been discussed. >> If the driver isn't detached from the signalled fence then it is vulnerable to >> use after free. > When someone just reads "detached-driver" is creates the impression that the > driver is unbound from its device, since this is what this term is usually used for. > > (And this is even the case you want to protect against, i.e. the string behind > the pointer returned by get_driver_name() has been freed.) One of the cases just to be clear. The driver getting unbound from the device is not *the* case. In fact with xe the bug was exploitable by just closing the render node fd and then querying the fence. Hence detached in this context is more than unbound or unloaded. > However, the condition that has changed when you print "driver-detached" is that > the fence has been signaled, independent of whether the driver has been detached > from the device. > > Now, you can argue that you mean "driver has been detached from the fence", > which means something along the lines of "the driver has no business with the > fence anymore", but this is not what people think of when they read > "detached-driver".Okay people. :) How about "unknown-driver", would that satisfy you? Regards, Tvrtko
On 10/24/25 3:37 PM, Tvrtko Ursulin wrote: > Good to know. I am coming from the angle that netiquette, at least in the olden > days, used to be that when you join an established thread you don't trim too > much of the context. For the benefit of people joining the thread at that very > point, especially when re-raising an argument which has already been discussed. I see it the opposite way, leaving too much context wastes people's time searching for the actual reply, see also [1]. If someone wants the full context, previously sent mails are always available. [1] https://subspace.kernel.org/etiquette.html#trim-your-quotes-when-replying >> Now, you can argue that you mean "driver has been detached from the fence", >> which means something along the lines of "the driver has no business with the >> fence anymore", but this is not what people think of when they read >> "detached-driver".Okay people. :) Not a big deal, but for you to note: Quite some of your replies I've received recently add text to the quoted parts, in this case the "Okay people. :)". > How about "unknown-driver", would that satisfy you? Honestly, the most accurate thing to say would be "fence-signaled", because that's the actual condition which causes the change.
On 24/10/2025 15:17, Danilo Krummrich wrote: > On 10/24/25 3:37 PM, Tvrtko Ursulin wrote: >> Good to know. I am coming from the angle that netiquette, at least in the olden >> days, used to be that when you join an established thread you don't trim too >> much of the context. For the benefit of people joining the thread at that very >> point, especially when re-raising an argument which has already been discussed. > > I see it the opposite way, leaving too much context wastes people's time > searching for the actual reply, see also [1]. > > If someone wants the full context, previously sent mails are always available. > > [1] https://subspace.kernel.org/etiquette.html#trim-your-quotes-when-replying > >>> Now, you can argue that you mean "driver has been detached from the fence", >>> which means something along the lines of "the driver has no business with the >>> fence anymore", but this is not what people think of when they read >>> "detached-driver".Okay people. :) > > Not a big deal, but for you to note: Quite some of your replies I've received > recently add text to the quoted parts, in this case the "Okay people. :)". Oh dear.. I can see it in the sent folder but it definitely wasn't composed like that. Thank you for pointing it out. There has been something seriously wrong with Thunderbird's compose window for a few weeks now. First it would be inserting random newlines while editing, and then now after an update it jumps to the end of the whole message and add newlines there when pressing backspace on an empty line. Its infuriating. I see there is another update so I will try that. >> How about "unknown-driver", would that satisfy you? > > Honestly, the most accurate thing to say would be "fence-signaled", because > that's the actual condition which causes the change. Hm, ->get_driver_name() returning "fence-signaled" is not great, and debugfs output in the form of "kernel fence: fence-signaled timeline-signaled seq 1234 signaled" feels a bit redundant. :shrug: Regards, Tvrtko
On 10/24/25 4:28 PM, Tvrtko Ursulin wrote: > On 24/10/2025 15:17, Danilo Krummrich wrote: >> On 10/24/25 3:37 PM, Tvrtko Ursulin wrote: >>> How about "unknown-driver", would that satisfy you? >> >> Honestly, the most accurate thing to say would be "fence-signaled", because >> that's the actual condition which causes the change. > Hm, ->get_driver_name() returning "fence-signaled" is not great, and debugfs > output in the form of "kernel fence: fence-signaled timeline-signaled seq 1234 > signaled" feels a bit redundant. :shrug: Indeed, what about "retired-driver"? Still implying that it's not just unknown, but that some process has finished. But without the ambiguity of "detached-driver".
On 24/10/2025 15:36, Danilo Krummrich wrote: > On 10/24/25 4:28 PM, Tvrtko Ursulin wrote: >> On 24/10/2025 15:17, Danilo Krummrich wrote: >>> On 10/24/25 3:37 PM, Tvrtko Ursulin wrote: >>>> How about "unknown-driver", would that satisfy you? >>> >>> Honestly, the most accurate thing to say would be "fence-signaled", because >>> that's the actual condition which causes the change. >> Hm, ->get_driver_name() returning "fence-signaled" is not great, and debugfs >> output in the form of "kernel fence: fence-signaled timeline-signaled seq 1234 >> signaled" feels a bit redundant. :shrug: > > Indeed, what about "retired-driver"? Still implying that it's not just unknown, > but that some process has finished. But without the ambiguity of "detached-driver". Maybe "decoupled-driver"? Regards, Tvrtko
© 2016 - 2026 Red Hat, Inc.