From nobody Sat Feb  7 21:20:43 2026
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E423394494;
	Tue,  3 Feb 2026 08:14:50 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1770106490; cv=none;
 b=SLen0/gkdvP6Wr6KGJgOOLVnxljttnTPhzCbE45hsHQhASwg/s6bhYfdAirWJ0jewkkHCHWXJ5K+AxwjP1jGVLQv3bV5X9W/jTgBlOQXtivXMqjz0VdnpD2MeJ5rKI2m8PhxvyELM+D3mNWTzE8SoL1G1B6xpSSKN1hKssdOXJQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1770106490; c=relaxed/simple;
	bh=1wKE9m82xnOwxwC/YUHuIsHd7GkBMwmR+CKg/tGKICE=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=Sun5Z3OroxPSzEaV8XKtz6E70IZVQaiMLMhJEiuiEorQ8KsMt9KoarpSSv/TZtPuZ1PsGWmzC/TgGzS2zLuHwchXUbWe7ljRCjkMMs0096Fwem3Mbgq2HNBuabKrwUsljN3ji4F4jAMzD6PDPFxYAHCWlBqC9/HG5kUca2cxgxo=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=ukc1Ah68; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="ukc1Ah68"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3F9E6C19422;
	Tue,  3 Feb 2026 08:14:45 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1770106489;
	bh=1wKE9m82xnOwxwC/YUHuIsHd7GkBMwmR+CKg/tGKICE=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=ukc1Ah68LvvWYrK18LPqCbm5mrFThplooIC/SgqzMKUbC+/HnXoC2jyhg/pw2jrb4
	 c5q/8Ymw9MyFoPUcQR6z98qOKYv1o0Gv0iNo71yfmUXGXcZujbYffKC59kFCaYIaPb
	 wJ0Z4lxVn2pxhfTS4VVWQY0GJfQGuNvYWpN9m83LnhrntpzY0SEQclUI/53P6KAAqA
	 sUVpJvXvTAl3e/FTjQf/O3wJlPqeSzT2zPmC4LQ4pITd44zUZNPbNAyF4tihY80c7C
	 4w0qw9PmovZaPi4h83LJjIU5mzj0d2SsBsE1gpr2qqi3EBzlOGnWosxbHvp3ItWSlv
	 gGI8c9aoeoLmA==
From: Philipp Stanner <phasta@kernel.org>
To: David Airlie <airlied@gmail.com>,
	Simona Vetter <simona@ffwll.ch>,
	Danilo Krummrich <dakr@kernel.org>,
	Alice Ryhl <aliceryhl@google.com>,
	Gary Guo <gary@garyguo.net>,
	Benno Lossin <lossin@kernel.org>,
	=?UTF-8?q?Christian=20K=C3=B6nig?= <christian.koenig@amd.com>,
	Boris Brezillon <boris.brezillon@collabora.com>,
	Daniel Almeida <daniel.almeida@collabora.com>,
	Joel Fernandes <joelagnelf@nvidia.com>
Cc: linux-kernel@vger.kernel.org,
	dri-devel@lists.freedesktop.org,
	rust-for-linux@vger.kernel.org,
	Philipp Stanner <phasta@kernel.org>
Subject: [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
Date: Tue,  3 Feb 2026 09:14:02 +0100
Message-ID: <20260203081403.68733-5-phasta@kernel.org>
X-Mailer: git-send-email 2.49.0
In-Reply-To: <20260203081403.68733-2-phasta@kernel.org>
References: <20260203081403.68733-2-phasta@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

DRM jobqueue is a load balancer, dependency manager and timeout handler
for GPU drivers with firmware scheduling, i.e. drivers which spawn one
firmware ring for each userspace instance for running jobs on the hardware.

This patch provides:
  - Jobs which the user can create and load with custom data.
  - Functionality to register dependencies (DmaFence's) on jobs.
  - The actual Jobqueue, into which you can push jobs.

Jobqueue submits jobs to your driver through a provided driver callback.
It always submits jobs in order. It only submits jobs whose dependencies
have all been signalled.

Additionally, Jobqueue implements a credit count system so it can take
your hardware's queue depth into account. When creating a Jobqueue, you
provide the number of credits that are available for that queue. Each
job you submit has a specified credit cost which will be subtracted from
the Jobqueue's capacity.

If the Jobqueue runs out of capacity, it will still accept more jobs and
run those once more capacity becomes available through finishing jobs.

This code compiles and was tested and is judget to be ready for beta
testers. However, the code is still plastered with TODOs.

Still missing features are:
  - Timeout handling
  - Complete decoupling from DmaFences. Jobqueue shall in the future
    completely detach itself from all related DmaFence's. This is
    currently incomplete. While data-UAF should be impossible, code-UAF
    through DmaFence's could occur if the Jobqueue code were unloaded
    while unsignaled fences are still alive.

Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
 rust/kernel/drm/jq.rs  | 680 +++++++++++++++++++++++++++++++++++++++++
 rust/kernel/drm/mod.rs |   2 +
 2 files changed, 682 insertions(+)
 create mode 100644 rust/kernel/drm/jq.rs

diff --git a/rust/kernel/drm/jq.rs b/rust/kernel/drm/jq.rs
new file mode 100644
index 000000000000..fd5641f40a61
--- /dev/null
+++ b/rust/kernel/drm/jq.rs
@@ -0,0 +1,680 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// Copyright (C) 2025, 2026 Red Hat Inc.:
+//   - Philipp Stanner <pstanner@redhat.com>
+
+//! DrmJobqueue. A load balancer, dependency manager and timeout handler f=
or
+//! GPU job submissions.
+
+use crate::{prelude::*, types::ARef};
+use core::sync::atomic::{AtomicU32, Ordering};
+use kernel::list::*;
+use kernel::revocable::Revocable;
+use kernel::sync::{
+    new_spinlock, Arc, DmaFence, DmaFenceCb, DmaFenceCbFunc, DmaFenceCtx, =
SpinLock,
+};
+use kernel::workqueue::{self, impl_has_work, new_work, Work, WorkItem};
+
+#[pin_data]
+struct Dependency {
+    #[pin]
+    links: ListLinks,
+    fence: ARef<DmaFence<i32>>,
+}
+
+impl Dependency {
+    fn new(fence: ARef<DmaFence<i32>>) -> Result<ListArc<Self>> {
+        ListArc::pin_init(
+            try_pin_init!(Self {
+                links <- ListLinks::new(),
+                fence,
+            }),
+            GFP_KERNEL,
+        )
+    }
+}
+
+impl_list_arc_safe! {
+    impl ListArcSafe<0> for Dependency { untracked; }
+}
+impl_list_item! {
+    impl ListItem<0> for Dependency { using ListLinks { self.links }; }
+}
+// Callback item for the dependency fences to wake / progress the jobqueue.
+struct DependencyWaker<T: 'static + Send> {
+    jobq: Arc<Revocable<SpinLock<InnerJobqueue<T>>>>,
+    // Scary raw pointer! See justification at the unsafe block below.
+    //
+    // What would be the alternatives to the rawpointer? I can see two:
+    //   1. Refcount the jobs and have the dependency callbacks take a ref=
erence.
+    //      That would require then, however, to guard the jobs with a Spi=
nLock.
+    //      That SpinLock would just exist, however, to satisfy the Rust c=
ompiler.
+    //      From a kernel-engineering perspective, that would be undesirab=
le,
+    //      because the  only thing within a job that might be accessed by=
 multiple
+    //      CPUs in parallel is `Job::nr_of_deps`. It's certainly conceiva=
ble
+    //      that some userspace applications with a great many dependencie=
s would
+    //      then suffer from lock contention, just to modify an integer.
+    //  2.  Clever Hackyness just to avoid an unsafe that's provably corre=
ct:
+    //      We could replace this rawpointer with a Arc<AtomicU32>, the Jo=
b<T>
+    //      holding another reference. Would work. But is that worth it?
+    //      Share your opinion on-list :)
+    job: *const Job<T>,
+}
+
+impl<T: 'static + Send> DependencyWaker<T> {
+    fn new(jobq: Arc<Revocable<SpinLock<InnerJobqueue<T>>>>, job: *const J=
ob<T>) -> Self {
+        Self { jobq, job }
+    }
+}
+
+impl<T: 'static + Send> DmaFenceCbFunc for DependencyWaker<T> {
+    fn callback(cb: Pin<KBox<DmaFenceCb<Self>>>)
+    where
+        Self: Sized,
+    {
+        let jq_guard =3D cb.data.jobq.try_access();
+        if jq_guard.is_none() {
+            return;
+        }
+        let outer_jq =3D jq_guard.unwrap();
+
+        // SAFETY:
+        // `job` is only needed to modify the dependency counter within th=
e job.
+        // That counter is atomic, so concurrent modifications are safe.
+        //
+        // As for the life time: Jobs that have pending dependencies are h=
eld by
+        // `InnerJobqueue::waiting_jobs`. As long as any of these dependen=
cy
+        // callbacks here are active, a job can by definition not move to =
the
+        // `InnerJobqueue::running_jobs` list and can, thus, not be freed.
+        //
+        // In case `Jobqueue` drops, the revocable-check above will guard =
against
+        // UAF. Moreover, jobqueue will deregister all of those dma_fence
+        // callbacks and thereby cleanly decouple itself. The dma_fences t=
hat
+        // these callbacks are registered on can, after all, outlive the j=
obqueue.
+        let job: &Job<T> =3D unsafe { &*cb.data.job };
+
+        let old_nr_of_deps =3D job.nr_of_deps.fetch_sub(1, Ordering::Relax=
ed);
+        // If counter =3D=3D 0, a new job somewhere in the queue just got =
ready.
+        // Run all ready jobs.
+        if old_nr_of_deps =3D=3D 1 {
+            let mut jq =3D outer_jq.lock();
+            jq.check_start_submit_worker(cb.data.jobq.clone());
+        }
+
+        // TODO remove the Dependency from the job's dep list, so that when
+        // `Jobqueue` gets dropped it won't try to deregister callbacks for
+        // already-signalled fences.
+    }
+}
+
+/// A jobqueue Job.
+///
+/// You can stuff your data in it. The job will be borrowed back to your d=
river
+/// once the time has come to run it.
+///
+/// Jobs are consumed by [`Jobqueue::submit_job`] by value (ownership tran=
sfer).
+/// You can set multiple [`DmaFence`] as dependencies for a job. It will o=
nly
+/// get run once all dependency fences have been signaled.
+///
+/// Jobs cost credits. Jobs will only be run if there are is enough capaci=
ty in
+/// the jobqueue for the job's credits. It is legal to specify jobs costin=
g 0
+/// credits, effectively disabling that mechanism.
+#[pin_data]
+pub struct Job<T: 'static + Send> {
+    cost: u32,
+    #[pin]
+    pub data: T,
+    done_fence: Option<ARef<DmaFence<i32>>>,
+    hardware_fence: Option<ARef<DmaFence<i32>>>,
+    nr_of_deps: AtomicU32,
+    dependencies: List<Dependency>,
+}
+
+impl<T: 'static + Send> Job<T> {
+    /// Create a new job that can be submitted to [`Jobqueue`].
+    ///
+    /// Jobs contain driver data that will later be made available to the =
driver's
+    /// run_job() callback in which the job gets pushed to the GPU.
+    pub fn new(cost: u32, data: impl PinInit<T>) -> Result<Pin<KBox<Self>>=
> {
+        let job =3D pin_init!(Self {
+            cost,
+            data <- data,
+            done_fence: None,
+            hardware_fence: None,
+            nr_of_deps: AtomicU32::new(0),
+            dependencies <- List::<Dependency>::new(),
+        });
+
+        KBox::pin_init(job, GFP_KERNEL)
+    }
+
+    /// Add a callback to the job. When the job gets submitted, all added =
callbacks will be
+    /// registered on the [`DmaFence`] the jobqueue returns for that job.
+    // TODO is callback a good name? We could call it "consequences" for e=
xample.
+    pub fn add_callback() -> Result {
+        Ok(())
+    }
+
+    /// Add a [`DmaFence`] or a [`DoneFence`] as this job's dependency. Th=
e job
+    /// will only be executed after that dependency has been finished.
+    pub fn add_dependency(&mut self, fence: ARef<DmaFence<i32>>) -> Result=
 {
+        let dependency =3D Dependency::new(fence)?;
+
+        self.dependencies.push_back(dependency);
+        self.nr_of_deps.fetch_add(1, Ordering::Relaxed);
+
+        Ok(())
+    }
+
+    /// Check if there are dependencies for this job. Register the jobqueue
+    /// waker if yes.
+    fn arm_deps(&mut self, jobq: Arc<Revocable<SpinLock<InnerJobqueue<T>>>=
>) {
+        let job_ptr =3D &raw const *self;
+        let mut cursor =3D self.dependencies.cursor_front();
+
+        while let Some(dep) =3D cursor.peek_next() {
+            let waker =3D DependencyWaker::new(jobq.clone(), job_ptr);
+            if dep.fence.register_callback(waker).is_err() {
+                // TODO precise error check
+                // The fence raced or was already signaled. But the hardwa=
re_fence
+                // waker is not yet registered. Thus, it's OK to just decr=
ement
+                // the dependency count.
+                self.nr_of_deps.fetch_sub(1, Ordering::Relaxed);
+                // TODO this dependency must be removed from the list so t=
hat
+                // `Jobqueue::drop()` doesn't try to deregister the callba=
ck.
+            }
+
+            cursor.move_next();
+        }
+    }
+}
+
+#[pin_data]
+struct JobWrap<T: 'static + Send> {
+    #[pin]
+    links: ListLinks,
+    inner: Pin<KBox<Job<T>>>,
+}
+
+impl<T: 'static + Send> JobWrap<T> {
+    fn new(job: Pin<KBox<Job<T>>>) -> Result<ListArc<Self>> {
+        ListArc::pin_init(
+            try_pin_init!(Self {
+                links <- ListLinks::new(),
+                inner: job,
+            }),
+            GFP_KERNEL,
+        )
+    }
+}
+
+impl_list_arc_safe! {
+    impl{T: Send} ListArcSafe<0> for JobWrap<T> { untracked; }
+}
+impl_list_item! {
+    impl{T: Send} ListItem<0> for JobWrap<T> { using ListLinks { self.link=
s }; }
+}
+
+struct InnerJobqueue<T: 'static + Send> {
+    capacity: u32,
+    waiting_jobs: List<JobWrap<T>>,
+    running_jobs: List<JobWrap<T>>,
+    submit_worker_active: bool,
+    run_job: fn(&Pin<&mut Job<T>>) -> ARef<DmaFence<i32>>,
+}
+
+// SAFETY: We use `List` with effectively a `UniqueArc`, so it can be `Sen=
d` when elements are `Send`.
+unsafe impl<T: 'static + Send> Send for InnerJobqueue<T> {}
+
+impl<T: 'static + Send> InnerJobqueue<T> {
+    fn new(capacity: u32, run_job: fn(&Pin<&mut Job<T>>) -> ARef<DmaFence<=
i32>>) -> Self {
+        let waiting_jobs =3D List::<JobWrap<T>>::new();
+        let running_jobs =3D List::<JobWrap<T>>::new();
+
+        Self {
+            capacity,
+            waiting_jobs,
+            running_jobs,
+            submit_worker_active: false,
+            run_job,
+        }
+    }
+
+    fn has_waiting_jobs(&self) -> bool {
+        !self.waiting_jobs.is_empty()
+    }
+
+    fn has_capacity_left(&self, cost: u32) -> bool {
+        let cost =3D cost as i64;
+        let capacity =3D self.capacity as i64;
+
+        if capacity - cost >=3D 0 {
+            return true;
+        }
+
+        false
+    }
+
+    fn check_start_submit_worker(&mut self, outer: Arc<Revocable<SpinLock<=
Self>>>) {
+        if self.submit_worker_active {
+            return;
+        }
+        self.submit_worker_active =3D true;
+
+        // TODO the work item should likely be moved into the JQ struct, s=
ince
+        // only ever 1 worker needs to run at a time. But if we do it that=
 way,
+        // how can we store a reference to the JQ? We obviously can't stor=
e it
+        // in the JQ itself because circular dependency -> memory leak.
+        let submit_work =3D SubmitWorker::new(outer).unwrap(); // TODO err=
or
+        let _ =3D workqueue::system().enqueue(submit_work); // TODO error
+    }
+}
+
+// Callback item for the hardware fences to wake / progress the jobqueue.
+struct HwFenceWaker<T: 'static + Send> {
+    jobq: Arc<Revocable<SpinLock<InnerJobqueue<T>>>>,
+    // Another scary raw pointer!
+    // This one is necessary so that a) a job can be removed from `InnerJo=
bqueue::running_jobs`,
+    // and b) its done_fence be accessed and signaled.
+    //
+    // What would be the alternatives to this rawpointer? Two come to mind:
+    //   1. Refcount the job. Then the job would have to be locked to sati=
sfy Rust.
+    //      Locking it is not necessary, however. See the below safety com=
ment
+    //      for details.
+    //   2. Clever hacky tricks: We could assign a unique ID per job and s=
tore it
+    //      in this callback. Then, we could find the associated job via i=
terating
+    //      over `jobq.running_jobs`. So to access a job and signal its do=
ne_fence,
+    //      we'd have to do a list iteration, which is undesirable perform=
ance-wise.
+    //      Moreover, the unique ID parent would have to be stored in `Job=
queue`,
+    //      requiring us to generate jobs on the jobqueue object.
+    job: *const JobWrap<T>,
+}
+
+impl<T: 'static + Send> HwFenceWaker<T> {
+    fn new(jobq: Arc<Revocable<SpinLock<InnerJobqueue<T>>>>, job: *const J=
obWrap<T>) -> Self {
+        Self { jobq, job }
+    }
+}
+
+impl<T: 'static + Send> DmaFenceCbFunc for HwFenceWaker<T> {
+    fn callback(cb: Pin<KBox<DmaFenceCb<Self>>>)
+    where
+        Self: Sized,
+    {
+        // This prevents against deadlock. See Jobqueue's drop() for detai=
ls.
+        let jq_guard =3D cb.data.jobq.try_access();
+        if jq_guard.is_none() {
+            // The JQ itself will signal all done_fences with an error whe=
n it drops.
+            return;
+        }
+        let jq_guard =3D jq_guard.unwrap();
+
+        let mut jobq =3D jq_guard.lock();
+
+        // SAFETY:
+        // We need the job to remove it from `InnerJobqueue::running_jobs`=
 and to
+        // access its done_fence. There is always only one hardware_fence =
waker
+        // callback per job. It's the only party which will remove the job=
 from
+        // the running_jobs list. This callback only exists once all Depen=
dency
+        // callbacks have already ran. As for the done_fence, the DmaFence
+        // implementation guarantees synchronization and correctness. Thus,
+        // unlocked access is safe.
+        //
+        // As for the life time: Only when this callback here has ran will=
 a job
+        // be removed from the running_jobs list and, thus, be dropped.
+        // `InnerJobqueue`, which owns running_jobs, can only drop once
+        // `Jobqueue` got dropped. The latter will deregister all hardware=
 fence
+        // callbacks while dropping, thereby preventing UAF through dma_fe=
nce
+        // callbacks on jobs.
+        let job: &JobWrap<T> =3D unsafe { &*cb.data.job };
+
+        jobq.capacity +=3D job.inner.cost;
+        let _ =3D job.inner.done_fence.as_ref().expect("done_fence not pre=
sent").signal(); // TODO err
+
+        // SAFETY: This callback function gets registered only once per jo=
b,
+        // and the registering party (`run_all_ready_jobs()`) adds the job=
 to
+        // the list.
+        //
+        // This is the only reference (incl. refcount) to this job. Thus, =
it
+        // may be removed only after all accesses above have been performe=
d.
+        unsafe { jobq.running_jobs.remove(job) };
+
+        // Run more ready jobs if there's capacity.
+        jobq.check_start_submit_worker(cb.data.jobq.clone());
+    }
+}
+
+/// Push a job immediately.
+///
+/// Returns true if the hardware_fence raced, false otherwise.
+fn run_job<T: 'static + Send>(
+    driver_cb: fn(&Pin<&mut Job<T>>) -> ARef<DmaFence<i32>>,
+    waker: HwFenceWaker<T>,
+    job: Pin<&mut Job<T>>,
+) -> bool {
+    let hardware_fence =3D driver_cb(&job);
+
+    // If a GPU is very fast (or is processing jobs synchronously or sth.)=
 it
+    // could be that the hw_fence is already signaled. In case that happen=
s, we
+    // signal the done_fence for userspace & Co. immediately.
+
+    // TODO catch for exact error (currently backend only ever errors if i=
t raced.
+    // But still, robustness, you know.
+    if hardware_fence.register_callback(waker).is_err() {
+        // TODO: Print into log in case of error.
+        let _ =3D job.done_fence.as_ref().expect("done_fence not present")=
.signal();
+        return true;
+    }
+
+    *job.project().hardware_fence =3D Some(hardware_fence);
+
+    false
+}
+
+// Submits all ready jobs as long as there's capacity.
+fn run_all_ready_jobs<T: 'static + Send>(
+    jobq: &mut InnerJobqueue<T>,
+    outer_jobq: Arc<Revocable<SpinLock<InnerJobqueue<T>>>>,
+    driver_cb: fn(&Pin<&mut Job<T>>) -> ARef<DmaFence<i32>>,
+) {
+    let mut cursor =3D jobq.waiting_jobs.cursor_front();
+
+    while let Some(job) =3D cursor.peek_next() {
+        if job.inner.nr_of_deps.load(Ordering::Relaxed) > 0 {
+            return;
+        }
+
+        let cost =3D job.inner.cost as i64;
+        if jobq.capacity as i64 - cost < 0 {
+            return;
+        }
+
+        let runnable_job =3D job.remove();
+        // To obtain a mutable reference to the list element, we need to c=
ast
+        // into a UniqueArc. unwrap() cannot fire because by the jobqueue =
design
+        // a job is only ever in the waiting_jobs OR running_jobs list.
+        let mut unique_job =3D Arc::<JobWrap<T>>::into_unique_or_drop(runn=
able_job.into_arc()).unwrap();
+        let job_ptr: *const JobWrap<T> =3D &raw const *unique_job;
+
+        let runnable_inner_job /* &mut Pin<KBox<Job<T>>> */ =3D unique_job=
.as_mut().project().inner;
+
+        let hw_fence_waker =3D HwFenceWaker::new(outer_jobq.clone(), job_p=
tr);
+        if !run_job(driver_cb, hw_fence_waker, runnable_inner_job.as_mut()=
) {
+            // run_job() didn't run the job immediately (because the
+            // hw_fence did not race). Subtract the credits.
+            jobq.capacity -=3D cost as u32;
+        }
+
+        // We gave up our ownership above. And we couldn't clone the Arc, =
because
+        // we needed a UniqueArc for the mutable access. So turn it back n=
ow.
+        let running_job =3D ListArc::from(unique_job);
+        jobq.running_jobs.push_back(running_job);
+    }
+}
+
+#[pin_data]
+struct SubmitWorker<T: 'static + Send> {
+    jobq: Arc<Revocable<SpinLock<InnerJobqueue<T>>>>,
+    #[pin]
+    work: Work<SubmitWorker<T>>,
+}
+
+impl<T: Send> SubmitWorker<T> {
+    fn new(
+        jobq: Arc<Revocable<SpinLock<InnerJobqueue<T>>>>,
+    ) -> Result<Arc<Self>> {
+        Arc::pin_init(
+            pin_init!(Self {
+            jobq,
+            work <- new_work!("Jobqueue::SubmitWorker")}),
+            GFP_KERNEL,
+        )
+    }
+}
+
+impl_has_work! {
+    impl{T: Send} HasWork<Self> for SubmitWorker<T> { self.work }
+}
+
+impl<T: Send> WorkItem for SubmitWorker<T> {
+    type Pointer =3D Arc<SubmitWorker<T>>;
+
+    fn run(this: Arc<SubmitWorker<T>>) {
+        let outer_jobq_copy =3D this.jobq.clone();
+
+        let guard =3D this.jobq.try_access();
+        if guard.is_none() {
+            // Can never happen. JQ gets only revoked when it drops, and w=
e hold
+            // a reference.
+            return;
+        }
+        let jobq =3D guard.unwrap();
+
+        let mut jobq =3D jobq.lock();
+        let run_job =3D jobq.run_job;
+
+        run_all_ready_jobs(&mut jobq, outer_jobq_copy, run_job);
+        jobq.submit_worker_active =3D false;
+    }
+}
+
+/// A job load balancer, dependency manager and timeout handler for GPUs.
+///
+/// The JQ allows you to submit [`Job`]s. It will run all jobs whose depen=
decny
+/// fences have been signalled, as long as there's capacity. Running jobs =
happens
+/// by borrowing them back to your driver's run_job callback.
+///
+/// # Examples
+///
+/// ```
+/// use kernel::sync::{DmaFenceCtx, DmaFence, Arc};
+/// use kernel::drm::jq::{Job, Jobqueue};
+/// use kernel::types::{ARef};
+/// use kernel::time::{delay::fsleep, Delta};
+///
+/// let fctx =3D DmaFenceCtx::new()?;
+///
+/// fn run_job(job: &Pin<&mut Job<Arc<DmaFenceCtx>>>) -> ARef<DmaFence<i32=
>> {
+///     let fence =3D job.data.as_arc_borrow().new_fence(42 as i32).unwrap=
();
+///
+///     // Our GPU is so damn fast that it executes each job immediately!
+///     fence.signal();
+///     fence
+/// }
+///
+/// let jq1 =3D Jobqueue::new(1_000_000, run_job)?;
+/// let jq2 =3D Jobqueue::new(1_000_000, run_job)?;
+///
+/// let job1 =3D Job::new(1, fctx.clone())?;
+/// let job2 =3D Job::new(1, fctx.clone())?;
+///
+///
+/// // Test normal submission of jobs without dependencies.
+/// let fence1 =3D jq1.submit_job(job1)?;
+/// let fence2 =3D jq1.submit_job(job2)?;
+///
+/// fsleep(Delta::from_secs(1));
+/// assert_eq!(fence1.is_signaled(), true);
+/// assert_eq!(fence2.is_signaled(), true);
+///
+///
+/// // Test whether a job with a fullfilled dependency gets executed.
+/// let mut job3 =3D Job::new(1, fctx.clone())?;
+/// job3.add_dependency(fence1)?;
+///
+/// let fence3 =3D jq2.submit_job(job3)?;
+/// fsleep(Delta::from_secs(1));
+/// assert_eq!(fence3.is_signaled(), true);
+///
+///
+/// // Test whether a job with an unfullfilled dependency does not get exe=
cuted.
+/// let unsignaled_fence =3D fctx.as_arc_borrow().new_fence(9001 as i32)?;
+///
+/// let mut job4 =3D Job::new(1, fctx.clone())?;
+/// job4.add_dependency(unsignaled_fence.clone())?;
+///
+/// let blocked_job_fence =3D jq2.submit_job(job4)?;
+/// fsleep(Delta::from_secs(1));
+/// assert_eq!(blocked_job_fence.is_signaled(), false);
+///
+///
+/// // Test whether job4 from above actually gets executed once its dep is=
 met.
+/// unsignaled_fence.signal()?;
+/// fsleep(Delta::from_secs(1));
+/// assert_eq!(blocked_job_fence.is_signaled(), true);
+///
+/// Ok::<(), Error>(())
+/// ```
+pub struct Jobqueue<T: 'static + Send> {
+    inner: Arc<Revocable<SpinLock<InnerJobqueue<T>>>>,
+    fctx: Arc<DmaFenceCtx>, // TODO currently has a separate lock shared w=
ith fences
+    run_job: fn(&Pin<&mut Job<T>>) -> ARef<DmaFence<i32>>,
+}
+
+impl<T: 'static + Send> Jobqueue<T> {
+    /// Create a new [`Jobqueue`] with `capacity` space for jobs. `run_job=
` is
+    /// your driver's callback which the jobqueue will call to push a subm=
itted
+    /// job to the hardware.
+    ///
+    /// If you don't want to use the capacity mechanism, set it to a value
+    /// unequal 0 and instead set [`Job`]'s cost to 0.
+    pub fn new(
+        capacity: u32,
+        run_job: fn(&Pin<&mut Job<T>>) -> ARef<DmaFence<i32>>,
+    ) -> Result<Self> {
+        if capacity =3D=3D 0 {
+            return Err(EINVAL);
+        }
+
+        let inner =3D Arc::pin_init(
+            Revocable::new(new_spinlock!(InnerJobqueue::<T>::new(capacity,=
 run_job))),
+            GFP_KERNEL,
+        )?;
+        let fctx =3D DmaFenceCtx::new()?;
+
+        Ok(Self {
+            inner,
+            fctx,
+            run_job,
+        })
+    }
+
+    /// Submit a job to the jobqueue.
+    ///
+    /// The jobqueue takes ownership over the job and later passes it back=
 to the
+    /// driver by reference through the driver's run_job callback. Jobs are
+    /// passed back by reference instead of by value partially to allow fo=
r later
+    /// adding a job resubmission mechanism to be added to [`Jobqueue`].
+    ///
+    /// Jobs get run and their done_fences get signalled in submission ord=
er.
+    ///
+    /// Returns the "done_fence" on success, which gets signalled once the
+    /// hardware has completed the job and once the jobqueue is done with =
a job.
+    // TODO: Return a DmaFence-wrapper that users cannot signal.
+    pub fn submit_job(&self, mut job: Pin<KBox<Job<T>>>) -> Result<ARef<Dm=
aFence<i32>>> {
+        let job_cost =3D job.cost;
+        // TODO: It would be nice if the done_fence's seqno actually match=
es the
+        // submission order. To do that, however, we'd need to protect job
+        // creation with InnerJobqueue's spinlock. Is that worth it?
+        let done_fence =3D self.fctx.as_arc_borrow().new_fence(42 as i32)?;
+        *job.as_mut().project().done_fence =3D Some(done_fence.clone());
+
+        // TODO register job's callbacks on done_fence.
+
+        let guard =3D self.inner.try_access();
+        if guard.is_none() {
+            // Can never happen. JQ gets only revoked when it drops.
+            return Err(ENODEV);
+        }
+        let jobq =3D guard.unwrap();
+
+        let mut jobq =3D jobq.lock();
+
+        let had_waiting_jobs_already =3D jobq.has_waiting_jobs();
+
+        // Check if there are dependencies and, if yes, register rewake
+        // callbacks on their fences. Must be done under the JQ lock's pro=
tection
+        // since the callbacks will access JQ data.
+        // SAFETY: `job` was submitted perfectly validly above. We don't m=
ove
+        // the contents; arm_deps() merely iterates over the dependency-li=
st.
+        // TODO: Supposedely this unsafe is unnecessary if you do some mag=
ic.
+        let pure_job =3D unsafe { Pin::into_inner_unchecked(job.as_mut()) =
};
+        pure_job.arm_deps(self.inner.clone());
+
+        let wrapped_job =3D JobWrap::new(job)?;
+        jobq.waiting_jobs.push_back(wrapped_job);
+
+        if had_waiting_jobs_already {
+            // Jobs waiting means that there is either currently no capaci=
ty
+            // for more jobs, or the jobqueue is blocked by a job with
+            // unfullfilled dependencies. Either the hardware fences' call=
backs
+            // or those of the dependency fences will pull in more jobs on=
ce
+            // the conditions are met.
+            return Ok(done_fence);
+        } else if jobq.has_capacity_left(job_cost) {
+            // This is the first waiting job. Wake the submit_worker if ne=
cessary.
+            jobq.check_start_submit_worker(self.inner.clone());
+        }
+
+        // If the conditions for running now were not met, the callbacks r=
egistered
+        // on the already running jobs' hardware fences will check if ther=
e's space
+        // for the next job, guaranteeing progress.
+        //
+        // If no jobs were running, there was by definition still space an=
d the
+        // job will get pushed by the worker.
+        //
+        // If a job couldn't be pushed because there were unfinished depen=
dencies,
+        // then the hardware fences' callbacks mentioned above will detect=
 that
+        // and not yet push the job.
+        //
+        // Each dependency's fence has its own callback which checks:
+        //   a) whether all other callbacks are fullfilled and if yes:
+        //   b) whether there are now enough credits available.
+        //
+        // If a) and b) are fullfilled, the job gets pushed.
+        //
+        // If there are no jobs currently running, credits must be availab=
le by
+        // definition.
+
+        Ok(done_fence)
+    }
+}
+
+impl<T: 'static + Send> Drop for Jobqueue<T> {
+    fn drop(&mut self) {
+        // The hardware and dependency fences might outlive the jobqueue.
+        // So fence callbacks could very well still call into job queue co=
de,
+        // resulting in data UAF or, should the jobqueue code be unloaded,
+        // even code UAF.
+        //
+        // Thus, the jobqueue needs to be cleanly decoupled from those fen=
ces
+        // when it drops; in other words, it needs to deregister all its
+        // fence callbacks.
+        //
+        // This, however, could easily deadlock when a hw_fence signals:
+        //
+        // Step     |   Jobqueue step               |   hw_fence step
+        // ---------------------------------------------------------------=
---
+        // 1        |   JQ starts drop              |   fence signals
+        // 2        |   JQ lock taken               |   fence lock taken
+        // 3        |   Tries to take fence lock    |   Tries to take JQ l=
ock
+        // 4        |   ***DEADLOCK***              |   ***DEADLOCK***
+        //
+        // In order to prevent deadlock, we first have to revoke access to=
 the
+        // JQ so that all fence callbacks can't try to take the lock anymo=
re,
+        // and then deregister all JQ callbacks on the fences.
+        self.inner.revoke();
+
+        /*
+        let guard =3D self.inner.lock();
+        for job in self.inner.waiting_jobs {
+            job.deregister_dep_fences();
+        }
+        for job in self.inner.running_jobs {
+            job.deregister_hw_fence();
+        }
+
+        TODO: signall all remaining done_fences with an error.
+        */
+    }
+}
diff --git a/rust/kernel/drm/mod.rs b/rust/kernel/drm/mod.rs
index 1b82b6945edf..803bed36231b 100644
--- a/rust/kernel/drm/mod.rs
+++ b/rust/kernel/drm/mod.rs
@@ -7,12 +7,14 @@
 pub mod file;
 pub mod gem;
 pub mod ioctl;
+pub mod jq;
=20
 pub use self::device::Device;
 pub use self::driver::Driver;
 pub use self::driver::DriverInfo;
 pub use self::driver::Registration;
 pub use self::file::File;
+pub use self::jq::Jobqueue;
=20
 pub(crate) mod private {
     pub trait Sealed {}
--=20
2.49.0