[PATCH v1 RESEND 0/4] drm/tyr: implement GPU reset API

Onur Özkan posted 4 patches 3 weeks, 4 days ago
drivers/gpu/drm/tyr/driver.rs |  38 +++----
drivers/gpu/drm/tyr/reset.rs  | 180 ++++++++++++++++++++++++++++++++++
drivers/gpu/drm/tyr/tyr.rs    |   1 +
rust/helpers/workqueue.c      |   6 ++
rust/kernel/workqueue.rs      |  62 ++++++++++++
5 files changed, 260 insertions(+), 27 deletions(-)
create mode 100644 drivers/gpu/drm/tyr/reset.rs
[PATCH v1 RESEND 0/4] drm/tyr: implement GPU reset API
Posted by Onur Özkan 3 weeks, 4 days ago
This series adds GPU reset handling support for Tyr in a new module
drivers/gpu/drm/tyr/driver.rs which encapsulates the low-level reset
controller internals and exposes a ResetHandle API to the driver.

The reset module owns reset state, queueing and execution ordering
through OrderedQueue and handles duplicate/concurrent reset requests
with a pending flag.

Apart from the reset module, the first 3 patches:

- Fixes a potential reset-complete stale state bug by clearing completed
  state before doing soft reset.
- Adds Work::disable_sync() (wrapper of bindings::disable_work_sync).
- Adds OrderedQueue support.

Runtime tested on hardware by Deborah Brouwer (see [1]) and myself.

[1]: https://gitlab.freedesktop.org/panfrost/linux/-/merge_requests/63#note_3364131

Link: https://gitlab.freedesktop.org/panfrost/linux/-/issues/28
---

Onur Özkan (4):
  drm/tyr: clear reset IRQ before soft reset
  rust: add Work::disable_sync
  rust: add ordered workqueue wrapper
  drm/tyr: add GPU reset handling

 drivers/gpu/drm/tyr/driver.rs |  38 +++----
 drivers/gpu/drm/tyr/reset.rs  | 180 ++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/tyr/tyr.rs    |   1 +
 rust/helpers/workqueue.c      |   6 ++
 rust/kernel/workqueue.rs      |  62 ++++++++++++
 5 files changed, 260 insertions(+), 27 deletions(-)
 create mode 100644 drivers/gpu/drm/tyr/reset.rs


base-commit: 0ccc0dac94bf2f5c6eb3e9e7f1014cd9dddf009f
-- 
2.51.2

Re: [PATCH v1 RESEND 0/4] drm/tyr: implement GPU reset API
Posted by Onur Özkan 3 days, 22 hours ago
> This series adds GPU reset handling support for Tyr in a new module
> drivers/gpu/drm/tyr/driver.rs which encapsulates the low-level reset
> controller internals and exposes a ResetHandle API to the driver.
> 
> The reset module owns reset state, queueing and execution ordering
> through OrderedQueue and handles duplicate/concurrent reset requests
> with a pending flag.
> 
> Apart from the reset module, the first 3 patches:
> 
> - Fixes a potential reset-complete stale state bug by clearing completed
>   state before doing soft reset.
> - Adds Work::disable_sync() (wrapper of bindings::disable_work_sync).
> - Adds OrderedQueue support.
> 
> Runtime tested on hardware by Deborah Brouwer (see [1]) and myself.
> 
> [1]: https://gitlab.freedesktop.org/panfrost/linux/-/merge_requests/63#note_3364131
> 
> Link: https://gitlab.freedesktop.org/panfrost/linux/-/issues/28
> ---
> 
> Onur Özkan (4):
>   drm/tyr: clear reset IRQ before soft reset
>   rust: add Work::disable_sync
>   rust: add ordered workqueue wrapper
>   drm/tyr: add GPU reset handling
> 
>  drivers/gpu/drm/tyr/driver.rs |  38 +++----
>  drivers/gpu/drm/tyr/reset.rs  | 180 ++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/tyr/tyr.rs    |   1 +
>  rust/helpers/workqueue.c      |   6 ++
>  rust/kernel/workqueue.rs      |  62 ++++++++++++
>  5 files changed, 260 insertions(+), 27 deletions(-)
>  create mode 100644 drivers/gpu/drm/tyr/reset.rs
> 
> 
> base-commit: 0ccc0dac94bf2f5c6eb3e9e7f1014cd9dddf009f
> -- 
> 2.51.2
> 

Hi all,

Writing the current status of this work, I have 2 blockers to move forward.

1- GPU unplug API

On the existing C side, reset failure handling eventually needs to unplug the
device, and that path is part of the broader reset flow in:

	- srctree/drivers/gpu/drm/panthor/panthor_device.c

This is part of [1] and as far as I understand, it is still work in progress. For Tyr,
I currently keep this as a placeholder (todo!("unplug the GPU")) in the reset path,
because I do not want to introduce temporary or partial unplug handling in this series
before the unplug design is settled.

[1]: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/29

2- Design decisions for reset handling

The second blocker is the design around how Resettable (a generic pre_reset post_reset hook trait)
implemeter should stop admitting new work, drain in-flight operations and recover after reset.

My current understanding is that the cleanest approach is to keep reset.rs responsible only for
reset orchestration:

	- schedule reset work
	- call pre_reset() hooks
	- perform the hardware reset
	- call post_reset() hooks
	- propagate failure.

Then, each Resettable implementer should own its local recovery logic.

This is also how the existing C implementation is structured. The reset worker is centralized, but
recovery is implemented by the participating subsystems:

	- srctree/drivers/gpu/drm/panthor/panthor_sched.c
	- srctree/drivers/gpu/drm/panthor/panthor_fw.c
	- srctree/drivers/gpu/drm/panthor/panthor_mmu.c

More specifically, the existing C side has hooks such as:

	- panthor_sched_pre_reset() / panthor_sched_post_reset()
	- panthor_fw_pre_reset() / panthor_fw_post_reset()
	- panthor_mmu_pre_reset() / panthor_mmu_post_reset()

The reason I am leaning in the same direction for Tyr is that "stop new work", "drain" and "resume"
are not generic operations. They depend on the implementer.

Because of that, I think reset.rs should not have a global guard/checking API for all of this.

Comments and suggestions are very welcome.

Regards,
Onur
Re: [PATCH v1 RESEND 0/4] drm/tyr: implement GPU reset API
Posted by Alice Ryhl 3 weeks, 4 days ago
On Fri, Mar 13, 2026 at 12:16:40PM +0300, Onur Özkan wrote:
> This series adds GPU reset handling support for Tyr in a new module
> drivers/gpu/drm/tyr/driver.rs which encapsulates the low-level reset
> controller internals and exposes a ResetHandle API to the driver.
> 
> The reset module owns reset state, queueing and execution ordering
> through OrderedQueue and handles duplicate/concurrent reset requests
> with a pending flag.
> 
> Apart from the reset module, the first 3 patches:
> 
> - Fixes a potential reset-complete stale state bug by clearing completed
>   state before doing soft reset.
> - Adds Work::disable_sync() (wrapper of bindings::disable_work_sync).
> - Adds OrderedQueue support.
> 
> Runtime tested on hardware by Deborah Brouwer (see [1]) and myself.
> 
> [1]: https://gitlab.freedesktop.org/panfrost/linux/-/merge_requests/63#note_3364131
> 
> Link: https://gitlab.freedesktop.org/panfrost/linux/-/issues/28
> ---
> 
> Onur Özkan (4):
>   drm/tyr: clear reset IRQ before soft reset
>   rust: add Work::disable_sync
>   rust: add ordered workqueue wrapper

I actually added ordered workqueue support here:
https://lore.kernel.org/all/20260312-create-workqueue-v4-0-ea39c351c38f@google.com/

Alice
Re: [PATCH v1 RESEND 0/4] drm/tyr: implement GPU reset API
Posted by Onur Özkan 3 weeks, 4 days ago
On Fri, 13 Mar 2026 09:52:16 +0000
Alice Ryhl <aliceryhl@google.com> wrote:

> On Fri, Mar 13, 2026 at 12:16:40PM +0300, Onur Özkan wrote:
> > This series adds GPU reset handling support for Tyr in a new module
> > drivers/gpu/drm/tyr/driver.rs which encapsulates the low-level reset
> > controller internals and exposes a ResetHandle API to the driver.
> > 
> > The reset module owns reset state, queueing and execution ordering
> > through OrderedQueue and handles duplicate/concurrent reset requests
> > with a pending flag.
> > 
> > Apart from the reset module, the first 3 patches:
> > 
> > - Fixes a potential reset-complete stale state bug by clearing
> > completed state before doing soft reset.
> > - Adds Work::disable_sync() (wrapper of
> > bindings::disable_work_sync).
> > - Adds OrderedQueue support.
> > 
> > Runtime tested on hardware by Deborah Brouwer (see [1]) and myself.
> > 
> > [1]:
> > https://gitlab.freedesktop.org/panfrost/linux/-/merge_requests/63#note_3364131
> > 
> > Link: https://gitlab.freedesktop.org/panfrost/linux/-/issues/28
> > ---
> > 
> > Onur Özkan (4):
> >   drm/tyr: clear reset IRQ before soft reset
> >   rust: add Work::disable_sync
> >   rust: add ordered workqueue wrapper
> 
> I actually added ordered workqueue support here:
> https://lore.kernel.org/all/20260312-create-workqueue-v4-0-ea39c351c38f@google.com/
> 
> Alice

That's cool. I guess this will wait until your patch lands unless we
want to combine them into a single series.

- Onur
Re: [PATCH v1 RESEND 0/4] drm/tyr: implement GPU reset API
Posted by Alice Ryhl 3 weeks, 4 days ago
On Fri, Mar 13, 2026 at 12:12 PM Onur Özkan <work@onurozkan.dev> wrote:
>
> On Fri, 13 Mar 2026 09:52:16 +0000
> Alice Ryhl <aliceryhl@google.com> wrote:
>
> > On Fri, Mar 13, 2026 at 12:16:40PM +0300, Onur Özkan wrote:
> > > This series adds GPU reset handling support for Tyr in a new module
> > > drivers/gpu/drm/tyr/driver.rs which encapsulates the low-level reset
> > > controller internals and exposes a ResetHandle API to the driver.
> > >
> > > The reset module owns reset state, queueing and execution ordering
> > > through OrderedQueue and handles duplicate/concurrent reset requests
> > > with a pending flag.
> > >
> > > Apart from the reset module, the first 3 patches:
> > >
> > > - Fixes a potential reset-complete stale state bug by clearing
> > > completed state before doing soft reset.
> > > - Adds Work::disable_sync() (wrapper of
> > > bindings::disable_work_sync).
> > > - Adds OrderedQueue support.
> > >
> > > Runtime tested on hardware by Deborah Brouwer (see [1]) and myself.
> > >
> > > [1]:
> > > https://gitlab.freedesktop.org/panfrost/linux/-/merge_requests/63#note_3364131
> > >
> > > Link: https://gitlab.freedesktop.org/panfrost/linux/-/issues/28
> > > ---
> > >
> > > Onur Özkan (4):
> > >   drm/tyr: clear reset IRQ before soft reset
> > >   rust: add Work::disable_sync
> > >   rust: add ordered workqueue wrapper
> >
> > I actually added ordered workqueue support here:
> > https://lore.kernel.org/all/20260312-create-workqueue-v4-0-ea39c351c38f@google.com/
> >
> > Alice
>
> That's cool. I guess this will wait until your patch lands unless we
> want to combine them into a single series.

You can just say in your cover letter that your series depends on mine.

Alice