There are two main ways that GPUVM might be used:
* staged mode, where VM_BIND ioctls update the GPUVM immediately so that
the GPUVM reflects the state of the VM *including* staged changes that
are not yet applied to the GPU's virtual address space.
* immediate mode, where the GPUVM state is updated during run_job(),
i.e., in the DMA fence signalling critical path, to ensure that the
GPUVM and the GPU's virtual address space has the same state at all
times.
Currently, only Panthor uses GPUVM in immediate mode, but the Rust
drivers Tyr and Nova will also use GPUVM in immediate mode, so it is
worth to support both staged and immediate mode well in GPUVM. To use
immediate mode, the GEMs gpuva list must be modified during the fence
signalling path, which means that it must be protected by a lock that is
fence signalling safe.
For this reason, a mutex is added to struct drm_gem_object that is
intended to achieve this purpose. Adding it directly in the GEM object
both makes it easier to use GPUVM in immediate mode, but also makes it
possible to take the gpuva lock from core drm code.
As a follow-up, another change that should probably be made to support
immediate mode is a mechanism to postpone cleanup of vm_bo objects, as
dropping a vm_bo object in the fence signalling path is problematic for
two reasons:
* When using DRM_GPUVM_RESV_PROTECTED, you cannot remove the vm_bo from
the extobj/evicted lists during the fence signalling path.
* Dropping a vm_bo could lead to the GEM object getting destroyed.
The requirement that GEM object cleanup is fence signalling safe is
dubious and likely to be violated in practice.
Panthor already has its own custom implementation of postponing vm_bo
cleanup.
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
drivers/gpu/drm/drm_gem.c | 2 ++
include/drm/drm_gem.h | 4 +++-
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 4a89b6acb6af39720451ac24033b89e144d282dc..8d25cc65707d5b44d931beb0207c9d08a3e2de5a 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -187,6 +187,7 @@ void drm_gem_private_object_init(struct drm_device *dev,
kref_init(&obj->refcount);
obj->handle_count = 0;
obj->size = size;
+ mutex_init(&obj->gpuva.lock);
dma_resv_init(&obj->_resv);
if (!obj->resv)
obj->resv = &obj->_resv;
@@ -210,6 +211,7 @@ void drm_gem_private_object_fini(struct drm_gem_object *obj)
WARN_ON(obj->dma_buf);
dma_resv_fini(&obj->_resv);
+ mutex_destroy(&obj->gpuva.lock);
}
EXPORT_SYMBOL(drm_gem_private_object_fini);
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index d3a7b43e2c637b164eba5af7cc2fc8ef09d4f0a4..5934d8dc267a65aaf62d2d025869221cd110b325 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -403,11 +403,13 @@ struct drm_gem_object {
* Provides the list of GPU VAs attached to this GEM object.
*
* Drivers should lock list accesses with the GEMs &dma_resv lock
- * (&drm_gem_object.resv) or a custom lock if one is provided.
+ * (&drm_gem_object.resv) or a custom lock if one is provided. The
+ * mutex inside this struct may be used as the custom lock.
*/
struct {
struct list_head list;
+ struct mutex lock;
#ifdef CONFIG_LOCKDEP
struct lockdep_map *lock_dep_map;
#endif
--
2.51.0.rc2.233.g662b1ed5c5-goog
On Fri, 22 Aug 2025 09:28:24 +0000 Alice Ryhl <aliceryhl@google.com> wrote: > There are two main ways that GPUVM might be used: > > * staged mode, where VM_BIND ioctls update the GPUVM immediately so that > the GPUVM reflects the state of the VM *including* staged changes that > are not yet applied to the GPU's virtual address space. > * immediate mode, where the GPUVM state is updated during run_job(), > i.e., in the DMA fence signalling critical path, to ensure that the > GPUVM and the GPU's virtual address space has the same state at all > times. > > Currently, only Panthor uses GPUVM in immediate mode, but the Rust > drivers Tyr and Nova will also use GPUVM in immediate mode, so it is > worth to support both staged and immediate mode well in GPUVM. To use > immediate mode, the GEMs gpuva list must be modified during the fence > signalling path, which means that it must be protected by a lock that is > fence signalling safe. > > For this reason, a mutex is added to struct drm_gem_object that is > intended to achieve this purpose. Adding it directly in the GEM object > both makes it easier to use GPUVM in immediate mode, but also makes it > possible to take the gpuva lock from core drm code. > > As a follow-up, another change that should probably be made to support > immediate mode is a mechanism to postpone cleanup of vm_bo objects, as > dropping a vm_bo object in the fence signalling path is problematic for > two reasons: > > * When using DRM_GPUVM_RESV_PROTECTED, you cannot remove the vm_bo from > the extobj/evicted lists during the fence signalling path. > * Dropping a vm_bo could lead to the GEM object getting destroyed. > The requirement that GEM object cleanup is fence signalling safe is > dubious and likely to be violated in practice. > > Panthor already has its own custom implementation of postponing vm_bo > cleanup. > > Signed-off-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> One minor thing below. > --- > drivers/gpu/drm/drm_gem.c | 2 ++ > include/drm/drm_gem.h | 4 +++- > 2 files changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c > index 4a89b6acb6af39720451ac24033b89e144d282dc..8d25cc65707d5b44d931beb0207c9d08a3e2de5a 100644 > --- a/drivers/gpu/drm/drm_gem.c > +++ b/drivers/gpu/drm/drm_gem.c > @@ -187,6 +187,7 @@ void drm_gem_private_object_init(struct drm_device *dev, > kref_init(&obj->refcount); > obj->handle_count = 0; > obj->size = size; > + mutex_init(&obj->gpuva.lock); > dma_resv_init(&obj->_resv); > if (!obj->resv) > obj->resv = &obj->_resv; > @@ -210,6 +211,7 @@ void drm_gem_private_object_fini(struct drm_gem_object *obj) > WARN_ON(obj->dma_buf); > > dma_resv_fini(&obj->_resv); > + mutex_destroy(&obj->gpuva.lock); > } > EXPORT_SYMBOL(drm_gem_private_object_fini); > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h > index d3a7b43e2c637b164eba5af7cc2fc8ef09d4f0a4..5934d8dc267a65aaf62d2d025869221cd110b325 100644 > --- a/include/drm/drm_gem.h > +++ b/include/drm/drm_gem.h > @@ -403,11 +403,13 @@ struct drm_gem_object { > * Provides the list of GPU VAs attached to this GEM object. > * > * Drivers should lock list accesses with the GEMs &dma_resv lock > - * (&drm_gem_object.resv) or a custom lock if one is provided. > + * (&drm_gem_object.resv) or a custom lock if one is provided. The > + * mutex inside this struct may be used as the custom lock. > */ > struct { > struct list_head list; > > + struct mutex lock; Maybe it's time we start moving some bits of the gpuva field docs next to the fields they describe: /** * @gpuva: Fields used by GPUVM to manage mappings pointing to this GEM object. */ struct { /** * @gpuva.list: list of GPU VAs attached to this GEM object. * * Drivers should lock list accesses with the GEMs &dma_resv lock * (&drm_gem_object.resv) or &drm_gem_object.gpuva.lock if the * list is being updated in places where the resv lock can't be * acquired (fence signalling path). */ struct list_head list; /** * @gpuva.lock: lock protecting access to &drm_gem_object.gpuva.list * when the resv lock can't be used. * * Should only be used when the VM is being modified in a fence * signalling path, otherwise you should use &drm_gem_object.resv to * protect accesses to &drm_gem_object.gpuva.list. */ struct mutex lock; ... }; > #ifdef CONFIG_LOCKDEP > struct lockdep_map *lock_dep_map; > #endif >
On Fri, Aug 22, 2025 at 11:52:21AM +0200, Boris Brezillon wrote: > On Fri, 22 Aug 2025 09:28:24 +0000 > > Maybe it's time we start moving some bits of the gpuva field docs next > to the fields they describe: > > /** > * @gpuva: Fields used by GPUVM to manage mappings pointing to this GEM object. > */ > struct { > /** > * @gpuva.list: list of GPU VAs attached to this GEM object. > * > * Drivers should lock list accesses with the GEMs &dma_resv lock > * (&drm_gem_object.resv) or &drm_gem_object.gpuva.lock if the > * list is being updated in places where the resv lock can't be > * acquired (fence signalling path). > */ > struct list_head list; This isn't a new issue, but it's somewhat confusing to call it a list of VAs when it's a list of vm_bos. > /** > * @gpuva.lock: lock protecting access to &drm_gem_object.gpuva.list > * when the resv lock can't be used. > * > * Should only be used when the VM is being modified in a fence > * signalling path, otherwise you should use &drm_gem_object.resv to > * protect accesses to &drm_gem_object.gpuva.list. > */ > struct mutex lock; > > ... > }; > Alice
On Fri, 22 Aug 2025 10:57:26 +0000 Alice Ryhl <aliceryhl@google.com> wrote: > On Fri, Aug 22, 2025 at 11:52:21AM +0200, Boris Brezillon wrote: > > On Fri, 22 Aug 2025 09:28:24 +0000 > > > > Maybe it's time we start moving some bits of the gpuva field docs next > > to the fields they describe: > > > > /** > > * @gpuva: Fields used by GPUVM to manage mappings pointing to this GEM object. > > */ > > struct { > > /** > > * @gpuva.list: list of GPU VAs attached to this GEM object. > > * > > * Drivers should lock list accesses with the GEMs &dma_resv lock > > * (&drm_gem_object.resv) or &drm_gem_object.gpuva.lock if the > > * list is being updated in places where the resv lock can't be > > * acquired (fence signalling path). > > */ > > struct list_head list; > > This isn't a new issue, but it's somewhat confusing to call it a list of > VAs when it's a list of vm_bos. Yep, that's true. > > > /** > > * @gpuva.lock: lock protecting access to &drm_gem_object.gpuva.list > > * when the resv lock can't be used. > > * > > * Should only be used when the VM is being modified in a fence > > * signalling path, otherwise you should use &drm_gem_object.resv to > > * protect accesses to &drm_gem_object.gpuva.list. > > */ > > struct mutex lock; > > > > ... > > }; > > > > Alice
On 8/22/25 12:57 PM, Alice Ryhl wrote: > On Fri, Aug 22, 2025 at 11:52:21AM +0200, Boris Brezillon wrote: >> On Fri, 22 Aug 2025 09:28:24 +0000 >> >> Maybe it's time we start moving some bits of the gpuva field docs next >> to the fields they describe: >> >> /** >> * @gpuva: Fields used by GPUVM to manage mappings pointing to this GEM object. >> */ >> struct { >> /** >> * @gpuva.list: list of GPU VAs attached to this GEM object. >> * >> * Drivers should lock list accesses with the GEMs &dma_resv lock >> * (&drm_gem_object.resv) or &drm_gem_object.gpuva.lock if the >> * list is being updated in places where the resv lock can't be >> * acquired (fence signalling path). >> */ >> struct list_head list; > > This isn't a new issue, but it's somewhat confusing to call it a list of > VAs when it's a list of vm_bos. Yes, I already suggested (don't remember where though) to change the name of the anonymous accordingly. I think I forgot to rename it back when I introduced struct drm_gpuvm_bo. If you want, please add a patch for this in the next version. But it's also fine to leave as is for your series of course. I can also fix it up. :)
© 2016 - 2025 Red Hat, Inc.