[RFC PATCH 0/4] Xe driver asynchronous notification mechanism

Thomas Hellström posted 4 patches 1 week, 5 days ago
MAINTAINERS                          |   1 +
drivers/gpu/drm/xe/Kconfig           |   1 +
drivers/gpu/drm/xe/Makefile          |   1 +
drivers/gpu/drm/xe/xe_debugfs.c      |   4 +-
drivers/gpu/drm/xe/xe_device.c       |   8 ++
drivers/gpu/drm/xe/xe_device_types.h |   6 ++
drivers/gpu/drm/xe/xe_vm.c           | 135 ++++++++++++++++++++++++++-
drivers/gpu/drm/xe/xe_vm.h           |  13 ++-
drivers/gpu/drm/xe/xe_vm_types.h     |   3 +
drivers/gpu/drm/xe/xe_watch_queue.c  | 111 ++++++++++++++++++++++
drivers/gpu/drm/xe/xe_watch_queue.h  |  20 ++++
include/uapi/drm/xe_drm.h            |  91 +++++++++++++++++-
include/uapi/drm/xe_drm_events.h     |  62 ++++++++++++
include/uapi/linux/watch_queue.h     |   3 +-
kernel/watch_queue.c                 |  13 ++-
15 files changed, 462 insertions(+), 10 deletions(-)
create mode 100644 drivers/gpu/drm/xe/xe_watch_queue.c
create mode 100644 drivers/gpu/drm/xe/xe_watch_queue.h
create mode 100644 include/uapi/drm/xe_drm_events.h
[RFC PATCH 0/4] Xe driver asynchronous notification mechanism
Posted by Thomas Hellström 1 week, 5 days ago
There is a need to inform user-space clients when a rebind worker
has ran out of memory so that it can react, adjust its working-set
and restart the job. This patch series aims to start a discussion
about the best way to accomplish this.

The series builds on the core "general notification mechanism" or
"watch_queue", and attaches a watch queue to each xe drm file.

The watch_queue is extremely flexible and allows filtering out
events of interest at the kernel level. There can be multiple
listeners.

Patch 1 Implements a restart IOCTL for rebind-workers
      paused on OOM.
Patch 2 Adds fault-injection into the rebind worker for
      testing.
Patch 3 Adds a DRM_XE_NOTIFY watch_type.
Patch 4 Implements watch_queue event sending from within
      xe.

igt series:
Test-with: https://patchwork.freedesktop.org/series/168429/

Compute UMD side is not available yet. Will be available before
final review.

Thomas Hellström (4):
  drm/xe: Add DRM_IOCTL_XE_VM_RESTART IOCTL
  drm/xe: Add fault injection for rebind worker -ENOSPC
  watch_queue: Add a DRM_XE_NOTIFY watch type and export init_watch()
  drm/xe: Add watch_queue-based device event notification

 MAINTAINERS                          |   1 +
 drivers/gpu/drm/xe/Kconfig           |   1 +
 drivers/gpu/drm/xe/Makefile          |   1 +
 drivers/gpu/drm/xe/xe_debugfs.c      |   4 +-
 drivers/gpu/drm/xe/xe_device.c       |   8 ++
 drivers/gpu/drm/xe/xe_device_types.h |   6 ++
 drivers/gpu/drm/xe/xe_vm.c           | 135 ++++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_vm.h           |  13 ++-
 drivers/gpu/drm/xe/xe_vm_types.h     |   3 +
 drivers/gpu/drm/xe/xe_watch_queue.c  | 111 ++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_watch_queue.h  |  20 ++++
 include/uapi/drm/xe_drm.h            |  91 +++++++++++++++++-
 include/uapi/drm/xe_drm_events.h     |  62 ++++++++++++
 include/uapi/linux/watch_queue.h     |   3 +-
 kernel/watch_queue.c                 |  13 ++-
 15 files changed, 462 insertions(+), 10 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/xe_watch_queue.c
 create mode 100644 drivers/gpu/drm/xe/xe_watch_queue.h
 create mode 100644 include/uapi/drm/xe_drm_events.h

-- 
2.54.0

Re: [RFC PATCH 0/4] Xe driver asynchronous notification mechanism
Posted by Christian König 1 week, 2 days ago
Hi Thomas, Srini,

Srini is already working quite a while on something similar for amdgpu. We use eventfd and a device specific IOCTL instead of the watch_queue proposed here.

Srini please take a look at this, it is basically the equivalent of our event notification approach for KFD/KGD unification. Maybe we could learn from that and/or have something common for both drivers.

Thanks,
Christian.

On 6/12/26 15:53, Thomas Hellström wrote:
> There is a need to inform user-space clients when a rebind worker
> has ran out of memory so that it can react, adjust its working-set
> and restart the job. This patch series aims to start a discussion
> about the best way to accomplish this.
> 
> The series builds on the core "general notification mechanism" or
> "watch_queue", and attaches a watch queue to each xe drm file.
> 
> The watch_queue is extremely flexible and allows filtering out
> events of interest at the kernel level. There can be multiple
> listeners.
> 
> Patch 1 Implements a restart IOCTL for rebind-workers
>       paused on OOM.
> Patch 2 Adds fault-injection into the rebind worker for
>       testing.
> Patch 3 Adds a DRM_XE_NOTIFY watch_type.
> Patch 4 Implements watch_queue event sending from within
>       xe.
> 
> igt series:
> Test-with: https://patchwork.freedesktop.org/series/168429/
> 
> Compute UMD side is not available yet. Will be available before
> final review.
> 
> Thomas Hellström (4):
>   drm/xe: Add DRM_IOCTL_XE_VM_RESTART IOCTL
>   drm/xe: Add fault injection for rebind worker -ENOSPC
>   watch_queue: Add a DRM_XE_NOTIFY watch type and export init_watch()
>   drm/xe: Add watch_queue-based device event notification
> 
>  MAINTAINERS                          |   1 +
>  drivers/gpu/drm/xe/Kconfig           |   1 +
>  drivers/gpu/drm/xe/Makefile          |   1 +
>  drivers/gpu/drm/xe/xe_debugfs.c      |   4 +-
>  drivers/gpu/drm/xe/xe_device.c       |   8 ++
>  drivers/gpu/drm/xe/xe_device_types.h |   6 ++
>  drivers/gpu/drm/xe/xe_vm.c           | 135 ++++++++++++++++++++++++++-
>  drivers/gpu/drm/xe/xe_vm.h           |  13 ++-
>  drivers/gpu/drm/xe/xe_vm_types.h     |   3 +
>  drivers/gpu/drm/xe/xe_watch_queue.c  | 111 ++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_watch_queue.h  |  20 ++++
>  include/uapi/drm/xe_drm.h            |  91 +++++++++++++++++-
>  include/uapi/drm/xe_drm_events.h     |  62 ++++++++++++
>  include/uapi/linux/watch_queue.h     |   3 +-
>  kernel/watch_queue.c                 |  13 ++-
>  15 files changed, 462 insertions(+), 10 deletions(-)
>  create mode 100644 drivers/gpu/drm/xe/xe_watch_queue.c
>  create mode 100644 drivers/gpu/drm/xe/xe_watch_queue.h
>  create mode 100644 include/uapi/drm/xe_drm_events.h
>