[PATCH 4/5] accel/rocket: Skip CNA/Core S_POINTER initialization for standalone tasks

Ross Cawston posted 5 patches 1 month, 2 weeks ago
[PATCH 4/5] accel/rocket: Skip CNA/Core S_POINTER initialization for standalone tasks
Posted by Ross Cawston 1 month, 2 weeks ago
Standalone DPU (element-wise) and PPU (pooling, etc.) tasks do not use
the CNA or Core blocks. Writing S_POINTER to those blocks re-arms them
with stale/uninitialized state, leading to corruption.

Introduce ROCKET_TASK_SKIP_CNA_CORE flag (added in previous patch) so
userspace can indicate such tasks. When set, skip the CNA and Core
S_POINTER MMIO writes.

Also move the per-core extra bit (bit 28 × core index) inside the same
conditional - it is only needed when CNA/Core are actually used.

Signed-off-by: Ross Cawston <ross@r-sc.ca>
---
 drivers/accel/rocket/rocket_job.c | 41 +++++++++++++++++++++++++++------------
 1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
index 34898084cc56..1dcc0c945f7f 100644
--- a/drivers/accel/rocket/rocket_job.c
+++ b/drivers/accel/rocket/rocket_job.c
@@ -116,7 +116,6 @@ rocket_copy_tasks(struct drm_device *dev,
 static void rocket_job_hw_submit(struct rocket_core *core, struct rocket_job *job)
 {
 	struct rocket_task *task;
-	unsigned int extra_bit;
 
 	/* Don't queue the job if a reset is in progress */
 	if (atomic_read(&core->reset.pending))
@@ -129,17 +128,35 @@ static void rocket_job_hw_submit(struct rocket_core *core, struct rocket_job *jo
 
 	rocket_pc_writel(core, BASE_ADDRESS, 0x1);
 
-	 /* From rknpu, in the TRM this bit is marked as reserved */
-	extra_bit = 0x10000000 * core->index;
-	rocket_cna_writel(core, S_POINTER, CNA_S_POINTER_POINTER_PP_EN(1) |
-					   CNA_S_POINTER_EXECUTER_PP_EN(1) |
-					   CNA_S_POINTER_POINTER_PP_MODE(1) |
-					   extra_bit);
-
-	rocket_core_writel(core, S_POINTER, CORE_S_POINTER_POINTER_PP_EN(1) |
-					    CORE_S_POINTER_EXECUTER_PP_EN(1) |
-					    CORE_S_POINTER_POINTER_PP_MODE(1) |
-					    extra_bit);
+	/*
+	 * Initialize CNA and Core S_POINTER for ping-pong mode via MMIO.
+	 *
+	 * Each core needs a per-core extra_bit (bit 28 * core_index) which
+	 * the TRM marks as reserved but the BSP rknpu driver sets. Without
+	 * it, non-zero cores hang. This MUST be done via MMIO (not regcmd)
+	 * because userspace doesn't know which core the scheduler picks.
+	 *
+	 * For standalone DPU/PPU tasks (element-wise ops, pooling), CNA
+	 * and Core have no work. Writing their S_POINTERs would re-arm
+	 * them with stale state from the previous conv task, corrupting
+	 * the DPU/PPU output. Userspace signals this via the
+	 * ROCKET_TASK_SKIP_CNA_CORE flag.
+	 */
+	if (!(task->flags & ROCKET_TASK_SKIP_CNA_CORE)) {
+		unsigned int extra_bit = 0x10000000 * core->index;
+
+		rocket_cna_writel(core, S_POINTER,
+				  CNA_S_POINTER_POINTER_PP_EN(1) |
+				  CNA_S_POINTER_EXECUTER_PP_EN(1) |
+				  CNA_S_POINTER_POINTER_PP_MODE(1) |
+				  extra_bit);
+
+		rocket_core_writel(core, S_POINTER,
+				   CORE_S_POINTER_POINTER_PP_EN(1) |
+				   CORE_S_POINTER_EXECUTER_PP_EN(1) |
+				   CORE_S_POINTER_POINTER_PP_MODE(1) |
+				   extra_bit);
+	}
 
 	rocket_pc_writel(core, BASE_ADDRESS, task->regcmd);
 	rocket_pc_writel(core, REGISTER_AMOUNTS,

-- 
2.52.0