[PATCH 00/15] accel/qda: Qualcomm DSP Accelerator driver

Ekansh Gupta via B4 Relay posted 15 patches 5 days, 20 hours ago
Documentation/accel/index.rst          |    1 +
Documentation/accel/qda/index.rst      |   13 +
Documentation/accel/qda/qda.rst        |  146 +++++
MAINTAINERS                            |    9 +
drivers/accel/Kconfig                  |    1 +
drivers/accel/Makefile                 |    2 +
drivers/accel/qda/Kconfig              |   34 +
drivers/accel/qda/Makefile             |   19 +
drivers/accel/qda/qda_cb.c             |  146 +++++
drivers/accel/qda/qda_cb.h             |   32 +
drivers/accel/qda/qda_compute_bus.c    |   68 ++
drivers/accel/qda/qda_drv.c            |  192 ++++++
drivers/accel/qda/qda_drv.h            |   91 +++
drivers/accel/qda/qda_fastrpc.c        | 1058 ++++++++++++++++++++++++++++++++
drivers/accel/qda/qda_fastrpc.h        |  390 ++++++++++++
drivers/accel/qda/qda_gem.c            |  177 ++++++
drivers/accel/qda/qda_gem.h            |   62 ++
drivers/accel/qda/qda_ioctl.c          |  296 +++++++++
drivers/accel/qda/qda_ioctl.h          |   19 +
drivers/accel/qda/qda_memory_dma.c     |  110 ++++
drivers/accel/qda/qda_memory_dma.h     |   17 +
drivers/accel/qda/qda_memory_manager.c |  380 ++++++++++++
drivers/accel/qda/qda_memory_manager.h |   75 +++
drivers/accel/qda/qda_prime.c          |  184 ++++++
drivers/accel/qda/qda_prime.h          |   18 +
drivers/accel/qda/qda_rpmsg.c          |  248 ++++++++
drivers/accel/qda/qda_rpmsg.h          |   30 +
drivers/iommu/iommu.c                  |    4 +
include/linux/qda_compute_bus.h        |   32 +
include/uapi/drm/qda_accel.h           |  229 +++++++
30 files changed, 4083 insertions(+)
[PATCH 00/15] accel/qda: Qualcomm DSP Accelerator driver
Posted by Ekansh Gupta via B4 Relay 5 days, 20 hours ago
This patch series introduces the Qualcomm DSP Accelerator (QDA) driver,
a DRM-based accelerator driver for Qualcomm DSPs. The driver provides a
standardized interface for offloading computational tasks to DSPs found
on Qualcomm SoCs, supporting all DSP domains.
 
The QDA driver implements the FastRPC protocol over the DRM accel
subsystem. It uses the same device-tree node structure as the existing
fastrpc driver in drivers/misc/. The approach for binding the QDA driver
to device-tree nodes while coexisting with the fastrpc driver is an open
item described below.

RFC thread: https://lore.kernel.org/dri-devel/20260224-qda-firstpost-v1-0-fe46a9c1a046@oss.qualcomm.com/T/
 
User-space staging branch
=========================
https://github.com/qualcomm/fastrpc/tree/accel/staging
 
Key Features
============
 
* Standard DRM accelerator interface via /dev/accel/accelN
* GEM-based buffer management with DMA-BUF import/export (PRIME)
* IOMMU-based memory isolation using per-process context banks
* FastRPC protocol implementation for DSP communication
* RPMsg transport layer for reliable message passing
* Support for all DSP domains (ADSP, CDSP, SDSP, GDSP)
* DRM IOCTL interface for DSP session management, buffer allocation,
  and remote procedure invocation
 
Architecture
============
 
1. DRM Accelerator Framework Integration
   The driver registers as a DRM accel device, exposing a standard
   /dev/accel/accelN character device node. This provides established
   DRM infrastructure for device management, file operations, and
   IOCTL dispatch.
 
2. Memory Management
   Buffers are managed as GEM objects with full PRIME support for
   DMA-BUF import/export. This enables seamless buffer sharing with
   other DRM drivers (GPU, camera, video) using standard kernel
   mechanisms.
 
3. IOMMU Context Bank Management
   IOMMU context banks (CBs) are represented as proper struct device
   instances on a custom virtual bus (qda-compute-cb). Each CB device
   is registered with the IOMMU subsystem and receives its own IOMMU
   domain, enabling per-session address space isolation. The custom
   bus was introduced because IOMMU context banks are synthetic
   constructs — not real platform devices — and to ensure CB device
   lifetime is strictly subordinate to the parent QDA device.
   See also: https://lore.kernel.org/all/245d602f-3037-4ae3-9af9-d98f37258aae@oss.qualcomm.com/
 
4. Memory Manager Architecture
   A pluggable memory manager coordinates IOMMU device assignment and
   buffer allocation. The current implementation uses a DMA-coherent
   backend with SID-prefixed DMA addresses for DSP firmware
   compatibility.
 
5. Transport Layer
   RPMsg communication is handled in a dedicated transport layer
   (qda_rpmsg.c), separate from the core DRM driver logic.
 
6. Code Organization
   The driver is organized across multiple files (~4600 lines total):
   * qda_drv.c:            Core driver and DRM integration
   * qda_rpmsg.c:          RPMsg transport layer
   * qda_cb.c:             Context bank device management
   * qda_compute_bus.c:    Custom virtual bus for CB devices
   * qda_gem.c:            GEM object management
   * qda_prime.c:          DMA-BUF import (PRIME)
   * qda_memory_manager.c: IOMMU device registry and allocation
   * qda_memory_dma.c:     DMA-coherent allocation backend
   * qda_fastrpc.c:        FastRPC protocol implementation
   * qda_ioctl.c:          IOCTL dispatch
 
7. UAPI Design
   The driver exposes DRM-style IOCTLs defined in
   include/uapi/drm/qda_accel.h, following DRM UAPI conventions
   (__u32/__u64 types, C++ guard, GPL-2.0-only WITH Linux-syscall-note).
 
Patch Series Organization
==========================
 
Patch 01:      MAINTAINERS entry
Patch 02:      Driver documentation (Documentation/accel/qda/)
Patches 03-04: Core driver skeleton and compute bus
Patch 05:      iommu: Register qda-compute-cb bus with IOMMU subsystem
Patches 06-07: CB device enumeration and memory manager
Patch 08:      QUERY IOCTL and UAPI header
Patches 09-11: GEM buffer management and PRIME import
Patches 12-15: FastRPC protocol (invoke, session create/release,
               map/unmap)
 
Open Items
===========
 
1. Device-Tree Compatible String
   The QDA driver uses the same device-tree node structure and
   properties as the existing fastrpc driver in drivers/misc/. A
   mechanism is needed to allow the QDA driver to bind to its device
   node independently of the fastrpc driver.
 
   The intended coexistence model is: platforms that require the
   complete fastrpc feature set continue to use "qcom,fastrpc"; new
   platforms where a feature available only in QDA takes priority, or
   where QDA's current feature set is sufficient, use a QDA-specific
   compatible string. New feature development is directed toward QDA
   rather than the existing fastrpc driver. As QDA matures toward
   feature parity with fastrpc, platforms can adopt the QDA-specific
   compatible string exclusively.
 
   The options under consideration are:
 
   a) Add a new "qcom,qda" compatible string to the existing
      qcom,fastrpc.yaml binding, since the DT node structure and
      properties are identical. This avoids a separate binding file
      but adds a QDA-specific string to a fastrpc binding.
 
   b) Introduce a separate qcom,qda.yaml binding that references or
      inherits the fastrpc binding properties.
 
   Seeking guidance from DT binding maintainers on the preferred
   approach.
 
2. Privilege Level Management
   Currently, daemon processes and user processes have the same access
   level as both use the same accel device node. This needs to be
   addressed as daemons attach to privileged DSP protection domains
   and require higher privilege levels for system-level operations.
   Seeking guidance on the best approach: separate device nodes,
   capability-based checks, or DRM master/authentication mechanisms.
 
3. UAPI Compatibility Layer
   A compatibility layer is needed to facilitate migration of client
   applications from the existing FastRPC UAPI to the new QDA UAPI,
   ensuring a smooth transition for existing userspace code. Seeking
   guidance on the preferred implementation approach: in-kernel
   translation layer, userspace wrapper library, or hybrid solution.
 
   An initial evaluation of an in-kernel translation shim was
   performed, where legacy FastRPC device nodes (/dev/fastrpc-*) are
   exposed and requests are internally routed to the QDA accel driver.
   The goal was to keep the compatibility layer minimal, reuse existing
   QDA helper paths (attach, buffer allocation, mapping, etc.), and
   avoid duplication of GEM and buffer management logic.
 
   However, the following challenges were identified:
 
   a) Dependency on drm_file for QDA helpers
      QDA relies on GEM-backed allocations and per-client handle
      namespaces, which require a valid struct drm_file. Since GEM
      handles are scoped per drm_file, the compatibility layer cannot
      directly reuse QDA helper paths without establishing a proper
      drm_file context for each client.
 
   b) Lack of public API for drm_file creation
      Creating a drm_file directly (similar to mock_drm_getfile()-style
      approaches) is not feasible, as the required helpers
      (drm_file_alloc(), drm_file_free(), etc.) are internal to the DRM
      core and not exported. This prevents external drivers from safely
      constructing and managing drm_file instances.
 
   c) VFS-based open is not a viable solution
      Opening the underlying accel device (/dev/accel/accelN) from the
      compatibility driver via filp_open() does provide a valid
      drm_file, but introduces reliance on userspace-visible device
      paths, lack of stability in containerized or chroot environments,
      and no clean mapping between legacy device nodes and accel
      devices.
 
   d) Userspace proxy limitations (CUSE)
      A CUSE-based userspace proxy was evaluated. However, DMA-buf file
      descriptors passed by legacy applications cannot be directly
      reused in the CUSE daemon (file descriptors are process-specific),
      which breaks buffer sharing semantics.
 
   e) drm_client-based approaches do not match requirements
      drm_client APIs (used for fbdev emulation) rely on a shared
      drm_file and do not provide the per-client isolation required by
      FastRPC semantics.
 
   Due to the above constraints, it is currently unclear how to
   implement an in-kernel compatibility layer that correctly handles
   per-client drm_file contexts without relying on VFS paths or
   non-exported DRM internals.
 
4. Documentation Improvements
   Add detailed IOCTL usage examples, document DSP firmware interface
   requirements, and create a migration guide from the existing FastRPC
   driver.
 
5. Per-Session Memory Allocation
   Develop a userspace API to support memory allocation on a per-session
   basis, enabling session-specific memory management.
 
6. Audio and Sensors PD Support
   The current series does not handle Audio PD and Sensors PD
   functionalities. These specialized protection domains require
   additional support for real-time constraints and power management.
 
Interface Compatibility
========================
 
The QDA driver uses the same device-tree node structure and child node
layout (including "qcom,fastrpc-compute-cb" child nodes) as the
existing fastrpc driver. The underlying FastRPC protocol and DSP
firmware interface are compatible with the existing fastrpc driver,
ensuring that DSP firmware and libraries continue to work without
modification.
 
References
==========
 
Previous discussions on this migration:
- https://lkml.org/lkml/2024/6/24/479
- https://lkml.org/lkml/2024/6/21/1252
 
Testing
=======
 
The driver has been tested on Qualcomm platforms with:
- Basic FastRPC attach/release operations
- DSP process creation and initialization
- Memory mapping/unmapping operations
- Dynamic invocation with various buffer types
- GEM buffer allocation and mmap
- PRIME buffer import from other subsystems

Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
---
Ekansh Gupta (15):
      MAINTAINERS: Add entry for Qualcomm DSP Accelerator (QDA) driver
      accel/qda: Add QDA driver documentation
      accel/qda: Add initial QDA DRM accelerator driver
      accel/qda: Add compute bus for QDA context banks
      iommu: Add QDA compute context bank bus to iommu_buses
      accel/qda: Create compute context bank devices on QDA compute bus
      accel/qda: Add memory manager for CB devices
      accel/qda: Add QUERY IOCTL and QDA UAPI header
      accel/qda: Add DMA-backed GEM objects and memory manager integration
      accel/qda: Add GEM_CREATE and GEM_MMAP_OFFSET IOCTLs
      accel/qda: Add PRIME DMA-BUF import support
      accel/qda: Add FastRPC invocation support
      accel/qda: Add DSP process creation and release
      accel/qda: Add remote memory mapping to DSP address space
      accel/qda: Add remote memory unmap from DSP address space

 Documentation/accel/index.rst          |    1 +
 Documentation/accel/qda/index.rst      |   13 +
 Documentation/accel/qda/qda.rst        |  146 +++++
 MAINTAINERS                            |    9 +
 drivers/accel/Kconfig                  |    1 +
 drivers/accel/Makefile                 |    2 +
 drivers/accel/qda/Kconfig              |   34 +
 drivers/accel/qda/Makefile             |   19 +
 drivers/accel/qda/qda_cb.c             |  146 +++++
 drivers/accel/qda/qda_cb.h             |   32 +
 drivers/accel/qda/qda_compute_bus.c    |   68 ++
 drivers/accel/qda/qda_drv.c            |  192 ++++++
 drivers/accel/qda/qda_drv.h            |   91 +++
 drivers/accel/qda/qda_fastrpc.c        | 1058 ++++++++++++++++++++++++++++++++
 drivers/accel/qda/qda_fastrpc.h        |  390 ++++++++++++
 drivers/accel/qda/qda_gem.c            |  177 ++++++
 drivers/accel/qda/qda_gem.h            |   62 ++
 drivers/accel/qda/qda_ioctl.c          |  296 +++++++++
 drivers/accel/qda/qda_ioctl.h          |   19 +
 drivers/accel/qda/qda_memory_dma.c     |  110 ++++
 drivers/accel/qda/qda_memory_dma.h     |   17 +
 drivers/accel/qda/qda_memory_manager.c |  380 ++++++++++++
 drivers/accel/qda/qda_memory_manager.h |   75 +++
 drivers/accel/qda/qda_prime.c          |  184 ++++++
 drivers/accel/qda/qda_prime.h          |   18 +
 drivers/accel/qda/qda_rpmsg.c          |  248 ++++++++
 drivers/accel/qda/qda_rpmsg.h          |   30 +
 drivers/iommu/iommu.c                  |    4 +
 include/linux/qda_compute_bus.h        |   32 +
 include/uapi/drm/qda_accel.h           |  229 +++++++
 30 files changed, 4083 insertions(+)
---
base-commit: 80dd246accce631c328ea43294e53b2b2dd2aa32
change-id: 20260519-qda-series-78c2bf0ed78b

Best regards,
-- 
Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>