drivers/gpu/drm/panthor/panthor_sched.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
It is unclear why fence errors were set only for CS_INHERIT_FAULT.
Downstream driver also does not treat CS_INHERIT_FAULT specially.
Remove the check.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
---
drivers/gpu/drm/panthor/panthor_sched.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index a2248f692a030..1a3b1c49f7d7b 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -1399,7 +1399,7 @@ cs_slot_process_fault_event_locked(struct panthor_device *ptdev,
fault = cs_iface->output->fault;
info = cs_iface->output->fault_info;
- if (queue && CS_EXCEPTION_TYPE(fault) == DRM_PANTHOR_EXCEPTION_CS_INHERIT_FAULT) {
+ if (queue) {
u64 cs_extract = queue->iface.output->extract;
struct panthor_job *job;
--
2.50.0.rc2.696.g1fc2a0284f-goog
On Wed, 18 Jun 2025 07:55:49 -0700 Chia-I Wu <olvaffe@gmail.com> wrote: > It is unclear why fence errors were set only for CS_INHERIT_FAULT. > Downstream driver also does not treat CS_INHERIT_FAULT specially. > Remove the check. > > Signed-off-by: Chia-I Wu <olvaffe@gmail.com> > --- > drivers/gpu/drm/panthor/panthor_sched.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c > index a2248f692a030..1a3b1c49f7d7b 100644 > --- a/drivers/gpu/drm/panthor/panthor_sched.c > +++ b/drivers/gpu/drm/panthor/panthor_sched.c > @@ -1399,7 +1399,7 @@ cs_slot_process_fault_event_locked(struct panthor_device *ptdev, > fault = cs_iface->output->fault; > info = cs_iface->output->fault_info; > > - if (queue && CS_EXCEPTION_TYPE(fault) == DRM_PANTHOR_EXCEPTION_CS_INHERIT_FAULT) { > + if (queue) { > u64 cs_extract = queue->iface.output->extract; > struct panthor_job *job; > Now that I look at the code, I think we should record the error when the ERROR_BARRIER is executed instead of flagging all in-flight jobs as faulty. One option would be to re-use the profiling buffer by adding an error field to panthor_job_profiling_data, but we're going to lose 4 bytes per slot because of the 64-bit alignment we want for timestamps, so maybe just create a separate buffers with N entries of: struct panthor_job_status { u32 error; };
On Sun, Jun 22, 2025 at 11:32 PM Boris Brezillon <boris.brezillon@collabora.com> wrote: > > On Wed, 18 Jun 2025 07:55:49 -0700 > Chia-I Wu <olvaffe@gmail.com> wrote: > > > It is unclear why fence errors were set only for CS_INHERIT_FAULT. > > Downstream driver also does not treat CS_INHERIT_FAULT specially. > > Remove the check. > > > > Signed-off-by: Chia-I Wu <olvaffe@gmail.com> > > --- > > drivers/gpu/drm/panthor/panthor_sched.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c > > index a2248f692a030..1a3b1c49f7d7b 100644 > > --- a/drivers/gpu/drm/panthor/panthor_sched.c > > +++ b/drivers/gpu/drm/panthor/panthor_sched.c > > @@ -1399,7 +1399,7 @@ cs_slot_process_fault_event_locked(struct panthor_device *ptdev, > > fault = cs_iface->output->fault; > > info = cs_iface->output->fault_info; > > > > - if (queue && CS_EXCEPTION_TYPE(fault) == DRM_PANTHOR_EXCEPTION_CS_INHERIT_FAULT) { > > + if (queue) { > > u64 cs_extract = queue->iface.output->extract; > > struct panthor_job *job; > > > > Now that I look at the code, I think we should record the error when > the ERROR_BARRIER is executed instead of flagging all in-flight jobs as > faulty. One option would be to re-use the profiling buffer by adding an > error field to panthor_job_profiling_data, but we're going to lose 4 > bytes per slot because of the 64-bit alignment we want for timestamps, > so maybe just create a separate buffers with N entries of: > > struct panthor_job_status { > u32 error; > }; The current error path uses cs_extract to mark exactly the offending job faulty. Innocent in-flight jobs do not seem to be affected. I looked into emitting LOAD/STORE after SYNC_ADD64 to copy the error to panthor_job_status. Other than the extra instrs and storage, because group_sync_upd_work can be called before LOAD/STORE, it will need to check both panthor_job_status and panthor_syncobj_64b. That will be a bit ugly as well.
On Tue, 8 Jul 2025 14:40:06 -0700 Chia-I Wu <olvaffe@gmail.com> wrote: > On Sun, Jun 22, 2025 at 11:32 PM Boris Brezillon > <boris.brezillon@collabora.com> wrote: > > > > On Wed, 18 Jun 2025 07:55:49 -0700 > > Chia-I Wu <olvaffe@gmail.com> wrote: > > > > > It is unclear why fence errors were set only for CS_INHERIT_FAULT. > > > Downstream driver also does not treat CS_INHERIT_FAULT specially. > > > Remove the check. > > > > > > Signed-off-by: Chia-I Wu <olvaffe@gmail.com> > > > --- > > > drivers/gpu/drm/panthor/panthor_sched.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c > > > index a2248f692a030..1a3b1c49f7d7b 100644 > > > --- a/drivers/gpu/drm/panthor/panthor_sched.c > > > +++ b/drivers/gpu/drm/panthor/panthor_sched.c > > > @@ -1399,7 +1399,7 @@ cs_slot_process_fault_event_locked(struct panthor_device *ptdev, > > > fault = cs_iface->output->fault; > > > info = cs_iface->output->fault_info; > > > > > > - if (queue && CS_EXCEPTION_TYPE(fault) == DRM_PANTHOR_EXCEPTION_CS_INHERIT_FAULT) { > > > + if (queue) { > > > u64 cs_extract = queue->iface.output->extract; > > > struct panthor_job *job; > > > > > > > Now that I look at the code, I think we should record the error when > > the ERROR_BARRIER is executed instead of flagging all in-flight jobs as > > faulty. One option would be to re-use the profiling buffer by adding an > > error field to panthor_job_profiling_data, but we're going to lose 4 > > bytes per slot because of the 64-bit alignment we want for timestamps, > > so maybe just create a separate buffers with N entries of: > > > > struct panthor_job_status { > > u32 error; > > }; > The current error path uses cs_extract to mark exactly the offending > job faulty. Innocent in-flight jobs do not seem to be affected. My bad, I thought the faulty CS was automatically entering the recovery substate (fetching all instructions and ignoring RUN_xxx ones), but it turns out CS instruction fetching is stalled until the fault is acknowledged, so we're good. > > I looked into emitting LOAD/STORE after SYNC_ADD64 to copy the error > to panthor_job_status. Other than the extra instrs and storage, > because group_sync_upd_work can be called before LOAD/STORE, it will > need to check both panthor_job_status and panthor_syncobj_64b. That > will be a bit ugly as well. Nah, I think you're right, I just had a wrong recollection of how recovery mode works. The patch is Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
© 2016 - 2025 Red Hat, Inc.