Hi Thomas, Srini,
Srini is already working quite a while on something similar for amdgpu. We use eventfd and a device specific IOCTL instead of the watch_queue proposed here.
Srini please take a look at this, it is basically the equivalent of our event notification approach for KFD/KGD unification. Maybe we could learn from that and/or have something common for both drivers.
Thanks,
Christian.
On 6/12/26 15:53, Thomas Hellström wrote:
> There is a need to inform user-space clients when a rebind worker
> has ran out of memory so that it can react, adjust its working-set
> and restart the job. This patch series aims to start a discussion
> about the best way to accomplish this.
>
> The series builds on the core "general notification mechanism" or
> "watch_queue", and attaches a watch queue to each xe drm file.
>
> The watch_queue is extremely flexible and allows filtering out
> events of interest at the kernel level. There can be multiple
> listeners.
>
> Patch 1 Implements a restart IOCTL for rebind-workers
> paused on OOM.
> Patch 2 Adds fault-injection into the rebind worker for
> testing.
> Patch 3 Adds a DRM_XE_NOTIFY watch_type.
> Patch 4 Implements watch_queue event sending from within
> xe.
>
> igt series:
> Test-with: https://patchwork.freedesktop.org/series/168429/
>
> Compute UMD side is not available yet. Will be available before
> final review.
>
> Thomas Hellström (4):
> drm/xe: Add DRM_IOCTL_XE_VM_RESTART IOCTL
> drm/xe: Add fault injection for rebind worker -ENOSPC
> watch_queue: Add a DRM_XE_NOTIFY watch type and export init_watch()
> drm/xe: Add watch_queue-based device event notification
>
> MAINTAINERS | 1 +
> drivers/gpu/drm/xe/Kconfig | 1 +
> drivers/gpu/drm/xe/Makefile | 1 +
> drivers/gpu/drm/xe/xe_debugfs.c | 4 +-
> drivers/gpu/drm/xe/xe_device.c | 8 ++
> drivers/gpu/drm/xe/xe_device_types.h | 6 ++
> drivers/gpu/drm/xe/xe_vm.c | 135 ++++++++++++++++++++++++++-
> drivers/gpu/drm/xe/xe_vm.h | 13 ++-
> drivers/gpu/drm/xe/xe_vm_types.h | 3 +
> drivers/gpu/drm/xe/xe_watch_queue.c | 111 ++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_watch_queue.h | 20 ++++
> include/uapi/drm/xe_drm.h | 91 +++++++++++++++++-
> include/uapi/drm/xe_drm_events.h | 62 ++++++++++++
> include/uapi/linux/watch_queue.h | 3 +-
> kernel/watch_queue.c | 13 ++-
> 15 files changed, 462 insertions(+), 10 deletions(-)
> create mode 100644 drivers/gpu/drm/xe/xe_watch_queue.c
> create mode 100644 drivers/gpu/drm/xe/xe_watch_queue.h
> create mode 100644 include/uapi/drm/xe_drm_events.h
>