Add multi-core support to the RGA (Raster Graphic Accelerator) driver
for Rockchip SoCs. This works by scheduling the given work to multiple
identical RGA cores. Previously other identical cores were discarded
while probing with -ENODEV to avoid exposing multiple video devices for
identical cores and breaking the ABI when adding an in-kernel scheduling.
This series targets the RK3588 SoC, which has one RGA2-Enhance core
and two RGA3 cores (see [1] for an overview of the different RGA cores).
The slimmed down RK3576 SoC also features two RGA2-Pro
(also described as RGA2.5) cores, but is currently not supported by
the driver. Tests are done on a Radxa Rock 5T SBC.
The scheduling is done only on a context level, which causes no
increased performance for a single stream (which uses only one mem2mem
context). Therefore at least N parallel stream are necessary to utilize
N cores. This avoids the more complex buffer handling required to avoid
mixing the frame ordering when one core is slightly faster than the
other (e.g. due to memory transfer timings or different clocks).
While the work is based on Detlev Casanova's multi-core series for the
rkvdec driver [2], it differs in two major aspects:
(1) It doesn't directly call v4l2_m2m_job_finish to mark the current job
as finished in the device_run callback. Detlev used this to trick the
m2m framework to directly schedule the next job. This looked like a
dirty hack and had me running into some of it's pitfalls (e.g. the
difference between the v4l2_m2m_buf_done and the newly introduced
v4l2_m2m_buf_done_manual function).
Instead I've dropped the current curr_ctx member of the v4l2_m2m_dev
struct and added a max_parallel_jobs member to specify the maximum
number of parallel jobs. This allows the driver to set it's maximum
number of parallel jobs with the newly introduced
v4l2_m2m_set_max_parallel_jobs function. The RGA driver uses it to set
it's number of parallel jobs to it's number of available cores. The m2m
framework then schedules the first N jobs on it's job queue to the
device_run callback instead of only one.
(2) Instead of attaching an identical RGA core on probe to the first
probed RGA core instance, use component helpers to add all cores as
components to a virtual platform device. This has the advantage of only
creating the video device after all cores have been probed successfully
and tearing it down if one core is being removed (e.g. by the sysfs),
which otherwise could lead to nasty memory bugs. The implementation is
based on the driver of the etnaviv gpu. As the virtual platform device
doesn't has an iommu, we still allocate all relevant drives on the first
core, which shares it's iommu domain with all other cores.
v4l2-compliance results:
v4l2-compliance 1.32.0, 64 bits, 64-bit time_t
...
Card type : rga2
...
Total for rockchip-rga device /dev/video0: 48, Succeeded: 48, Failed: 0, Warnings: 0
v4l2-compliance 1.32.0, 64 bits, 64-bit time_t
...
Card type : rga3
...
Total for rockchip-rga device /dev/video1: 48, Succeeded: 48, Failed: 0, Warnings: 0
The DTS and iommu changes at the end are picked out of other next trees
to provide an easy way to actually test the changes with an RGA3 on a
rk3588 SoC. They'll be dropped when they get into media/next.
Patch 1-3 address review comments from my last RGA3 patch series
Patch 4 additional driver cleanup
Patch 5 implements support for parallel jobs in the m2m framework
Patch 6-8 add multi core preparations to the driver
Patch 9-13 rework the driver to use component helpers
Patch 14 puts all cores into the same iommu domain
Patch 15 enables the multi-core support
patch 16-17 just pick patches required for testing
[1] https://codeberg.org/airockchip/librga/src/branch/main/docs/Rockchip_Developer_Guide_RGA_EN.md#design-index
[2] https://lore.kernel.org/linux-media/20260409-rkvdec-multicore-v1-0-62b316abf0f7@collabora.com/
Signed-off-by: Sven Püschel <s.pueschel@pengutronix.de>
---
Simon Xue (1):
iommu/rockchip: disable fetch dte time limit
Sven Püschel (16):
media: rockchip: rga: zero cmdbuf in shared code
media: rockchip: rga: add comment about pixel alignment for YUV formats
media: rockchip: rga: move early return into if condition in vidioc_enum_fmt
media: rockchip: rga: removed unused regmap member
media: v4l2-mem2mem: support running multiple jobs in parallel
media: rockchip: rga: move power handling to device_run
media: rockchip: rga: adjust get_version to return the version
media: rockchip: rga: add rga_core structure
media: rockchip: rga: use components to manage multiple cores
media: rockchip: rga: move rockchip_rga allocation to master probe
media: rockchip: rga: move video device to the master
media: rockchip: rga: move core initialization from bind to probe
media: rockchip: rga: bind all cores to the master
media: rockchip: rga: put all cores into first core iommu domain
media: rockchip: rga: schedule jobs to multiple cores
arm64: dts: rockchip: add rga3 dt nodes to rk3588
arch/arm64/boot/dts/rockchip/rk3588-base.dtsi | 44 +++
drivers/iommu/rockchip-iommu.c | 8 +
drivers/media/platform/rockchip/rga/rga-buf.c | 16 +-
drivers/media/platform/rockchip/rga/rga-hw.c | 40 +-
drivers/media/platform/rockchip/rga/rga.c | 501 +++++++++++++++++++-------
drivers/media/platform/rockchip/rga/rga.h | 45 ++-
drivers/media/platform/rockchip/rga/rga3-hw.c | 32 +-
drivers/media/v4l2-core/v4l2-mem2mem.c | 89 +++--
include/media/v4l2-mem2mem.h | 3 +
9 files changed, 541 insertions(+), 237 deletions(-)
---
base-commit: 6a75e3d4f6428b90f398354212e3a2e0172851d6
change-id: 20260602-spu-rga3multicore-ae8c8caf01e9
Best regards,
--
Sven Püschel <s.pueschel@pengutronix.de>