[v3] Creation of workqueues in Rust

[PATCH v3 1/2] rust: workqueue: restrict delayed work to global wqs

Posted by Alice Ryhl 1 month, 1 week ago

When a workqueue is shut down, delayed work that is pending but not
scheduled does not get properly cleaned up, so it's not safe to use
`enqueue_delayed` on a workqueue that might be destroyed. To fix this,
restricted `enqueue_delayed` to static queues.

Cc: stable@vger.kernel.org
Fixes: 7c098cd5eaae ("workqueue: rust: add delayed work items")
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
 rust/kernel/workqueue.rs | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/rust/kernel/workqueue.rs b/rust/kernel/workqueue.rs
index 706e833e9702..1acd113c04ee 100644
--- a/rust/kernel/workqueue.rs
+++ b/rust/kernel/workqueue.rs
@@ -296,8 +296,15 @@ pub fn enqueue<W, const ID: u64>(&self, w: W) -> W::EnqueueOutput
     ///
     /// This may fail if the work item is already enqueued in a workqueue.
     ///
+    /// This is only valid for global workqueues (with static lifetimes) because those are the only
+    /// ones that outlive all possible delayed work items.
+    ///
     /// The work item will be submitted using `WORK_CPU_UNBOUND`.
-    pub fn enqueue_delayed<W, const ID: u64>(&self, w: W, delay: Jiffies) -> W::EnqueueOutput
+    pub fn enqueue_delayed<W, const ID: u64>(
+        &'static self,
+        w: W,
+        delay: Jiffies,
+    ) -> W::EnqueueOutput
     where
         W: RawDelayedWorkItem<ID> + Send + 'static,
     {

-- 
2.53.0.473.g4a7958ca14-goog

Re: [PATCH v3 1/2] rust: workqueue: restrict delayed work to global wqs

Posted by Tejun Heo 1 month, 1 week ago

On Fri, Feb 27, 2026 at 02:53:20PM +0000, Alice Ryhl wrote:
> When a workqueue is shut down, delayed work that is pending but not
> scheduled does not get properly cleaned up, so it's not safe to use
> `enqueue_delayed` on a workqueue that might be destroyed. To fix this,
> restricted `enqueue_delayed` to static queues.

C being C, we've been just chalking this up as "user error", but please feel
free to add per-workqueue percpu ref for pending delayed work items if
that'd help. That shouldn't be noticeably expensive and should help
straighten this out for rust hopefully.

Thanks.

-- 
tejun

Re: [PATCH v3 1/2] rust: workqueue: restrict delayed work to global wqs

Posted by Alice Ryhl 1 month, 1 week ago

On Fri, Feb 27, 2026 at 07:09:07AM -1000, Tejun Heo wrote:
> On Fri, Feb 27, 2026 at 02:53:20PM +0000, Alice Ryhl wrote:
> > When a workqueue is shut down, delayed work that is pending but not
> > scheduled does not get properly cleaned up, so it's not safe to use
> > `enqueue_delayed` on a workqueue that might be destroyed. To fix this,
> > restricted `enqueue_delayed` to static queues.
> 
> C being C, we've been just chalking this up as "user error", but please feel
> free to add per-workqueue percpu ref for pending delayed work items if
> that'd help. That shouldn't be noticeably expensive and should help
> straighten this out for rust hopefully.

I had been thinking I would pick up this patch again:
https://lore.kernel.org/all/20250423-destroy-workqueue-flush-v1-1-3d74820780a5@google.com/

but it sounds like you're suggesting a different solution?

Alice

Re: [PATCH v3 1/2] rust: workqueue: restrict delayed work to global wqs

Posted by Tejun Heo 1 month, 1 week ago

On Fri, Feb 27, 2026 at 07:01:18PM +0000, Alice Ryhl wrote:
> On Fri, Feb 27, 2026 at 07:09:07AM -1000, Tejun Heo wrote:
> > On Fri, Feb 27, 2026 at 02:53:20PM +0000, Alice Ryhl wrote:
> > > When a workqueue is shut down, delayed work that is pending but not
> > > scheduled does not get properly cleaned up, so it's not safe to use
> > > `enqueue_delayed` on a workqueue that might be destroyed. To fix this,
> > > restricted `enqueue_delayed` to static queues.
> > 
> > C being C, we've been just chalking this up as "user error", but please feel
> > free to add per-workqueue percpu ref for pending delayed work items if
> > that'd help. That shouldn't be noticeably expensive and should help
> > straighten this out for rust hopefully.
> 
> I had been thinking I would pick up this patch again:
> https://lore.kernel.org/all/20250423-destroy-workqueue-flush-v1-1-3d74820780a5@google.com/
> 
> but it sounds like you're suggesting a different solution?

I'm not remembering much context at this point, but if it *could* work,
percpu refcnt counting the number of delayed work items would be cheaper.
Again, I could easily be forgetting why we didn't do that in the first
place.

Thanks.

-- 
tejun

Re: [PATCH v3 1/2] rust: workqueue: restrict delayed work to global wqs

Posted by Alice Ryhl 1 month, 1 week ago

On Fri, Feb 27, 2026 at 09:08:55AM -1000, Tejun Heo wrote:
> On Fri, Feb 27, 2026 at 07:01:18PM +0000, Alice Ryhl wrote:
> > On Fri, Feb 27, 2026 at 07:09:07AM -1000, Tejun Heo wrote:
> > > On Fri, Feb 27, 2026 at 02:53:20PM +0000, Alice Ryhl wrote:
> > > > When a workqueue is shut down, delayed work that is pending but not
> > > > scheduled does not get properly cleaned up, so it's not safe to use
> > > > `enqueue_delayed` on a workqueue that might be destroyed. To fix this,
> > > > restricted `enqueue_delayed` to static queues.
> > > 
> > > C being C, we've been just chalking this up as "user error", but please feel
> > > free to add per-workqueue percpu ref for pending delayed work items if
> > > that'd help. That shouldn't be noticeably expensive and should help
> > > straighten this out for rust hopefully.
> > 
> > I had been thinking I would pick up this patch again:
> > https://lore.kernel.org/all/20250423-destroy-workqueue-flush-v1-1-3d74820780a5@google.com/
> > 
> > but it sounds like you're suggesting a different solution?
> 
> I'm not remembering much context at this point, but if it *could* work,
> percpu refcnt counting the number of delayed work items would be cheaper.
> Again, I could easily be forgetting why we didn't do that in the first
> place.

I guess the question is, what does destroy_workqueue() do?

- Does it wait for the timers to finish?
- Does it immediately run the delayed works?
- Does it exit without waiting for timers?

It sounds like the refcount approach is the last solution, where
destroy_workqueue() just exits without waiting for timers, but then
keeping the workqueue alive until the timers elapse.

The main concern I can see is that this means that delayed work can run
after destroy_workqueue() is called. That may be a problem if
destroy_workqueue() is used to guard module unload (or device unbind).

Alice

Re: [PATCH v3 1/2] rust: workqueue: restrict delayed work to global wqs

Posted by Tejun Heo 1 month, 1 week ago

Hello,

On Fri, Feb 27, 2026 at 07:19:56PM +0000, Alice Ryhl wrote:
> I guess the question is, what does destroy_workqueue() do?
> 
> - Does it wait for the timers to finish?
> - Does it immediately run the delayed works?
> - Does it exit without waiting for timers?
> 
> It sounds like the refcount approach is the last solution, where
> destroy_workqueue() just exits without waiting for timers, but then
> keeping the workqueue alive until the timers elapse.
> 
> The main concern I can see is that this means that delayed work can run
> after destroy_workqueue() is called. That may be a problem if
> destroy_workqueue() is used to guard module unload (or device unbind).

delayed_work is just pointing to the wq pointer. On destroy_workqueue(), we
can shut it down and free all the supporting stuff while leaving zombie wq
struct which noops execution and let the whole thing go away when refs reach
zero?

Thanks.

-- 
tejun

Re: [PATCH v3 1/2] rust: workqueue: restrict delayed work to global wqs

Posted by Alice Ryhl 1 month, 1 week ago

On Fri, Feb 27, 2026 at 09:24:34AM -1000, Tejun Heo wrote:
> Hello,
> 
> On Fri, Feb 27, 2026 at 07:19:56PM +0000, Alice Ryhl wrote:
> > I guess the question is, what does destroy_workqueue() do?
> > 
> > - Does it wait for the timers to finish?
> > - Does it immediately run the delayed works?
> > - Does it exit without waiting for timers?
> > 
> > It sounds like the refcount approach is the last solution, where
> > destroy_workqueue() just exits without waiting for timers, but then
> > keeping the workqueue alive until the timers elapse.
> > 
> > The main concern I can see is that this means that delayed work can run
> > after destroy_workqueue() is called. That may be a problem if
> > destroy_workqueue() is used to guard module unload (or device unbind).
> 
> delayed_work is just pointing to the wq pointer. On destroy_workqueue(), we
> can shut it down and free all the supporting stuff while leaving zombie wq
> struct which noops execution and let the whole thing go away when refs reach
> zero?

But isn't that a problem for e.g. self-freeing work? If we don't run the
work, then its memory is just leaked.

Alice

Re: [PATCH v3 1/2] rust: workqueue: restrict delayed work to global wqs

Posted by Tejun Heo 1 month, 1 week ago

On Fri, Feb 27, 2026 at 07:28:11PM +0000, Alice Ryhl wrote:
> > delayed_work is just pointing to the wq pointer. On destroy_workqueue(), we
> > can shut it down and free all the supporting stuff while leaving zombie wq
> > struct which noops execution and let the whole thing go away when refs reach
> > zero?
> 
> But isn't that a problem for e.g. self-freeing work? If we don't run the
> work, then its memory is just leaked.

Yeah, good point. Maybe we should just keep the whole thing up while
removing it from sysfs. Would that work?

Thanks.

-- 
tejun

Re: [PATCH v3 1/2] rust: workqueue: restrict delayed work to global wqs

Posted by Alice Ryhl 1 month, 1 week ago

On Fri, Feb 27, 2026 at 8:46 PM Tejun Heo <tj@kernel.org> wrote:
>
> On Fri, Feb 27, 2026 at 07:28:11PM +0000, Alice Ryhl wrote:
> > > delayed_work is just pointing to the wq pointer. On destroy_workqueue(), we
> > > can shut it down and free all the supporting stuff while leaving zombie wq
> > > struct which noops execution and let the whole thing go away when refs reach
> > > zero?
> >
> > But isn't that a problem for e.g. self-freeing work? If we don't run the
> > work, then its memory is just leaked.
>
> Yeah, good point. Maybe we should just keep the whole thing up while
> removing it from sysfs. Would that work?

We can but there are two variants of that:

If destroy_workqueue() waits for delayed work, then it may take a long time.

If destroy_workqueue() does not wait for delayed work, then I'm
worried about bugs resulting from module unload and similar.

Alice

Re: [PATCH v3 1/2] rust: workqueue: restrict delayed work to global wqs

Posted by Tejun Heo 1 month, 1 week ago

On Fri, Feb 27, 2026 at 09:36:22PM +0100, Alice Ryhl wrote:
> On Fri, Feb 27, 2026 at 8:46 PM Tejun Heo <tj@kernel.org> wrote:
> >
> > On Fri, Feb 27, 2026 at 07:28:11PM +0000, Alice Ryhl wrote:
> > > > delayed_work is just pointing to the wq pointer. On destroy_workqueue(), we
> > > > can shut it down and free all the supporting stuff while leaving zombie wq
> > > > struct which noops execution and let the whole thing go away when refs reach
> > > > zero?
> > >
> > > But isn't that a problem for e.g. self-freeing work? If we don't run the
> > > work, then its memory is just leaked.
> >
> > Yeah, good point. Maybe we should just keep the whole thing up while
> > removing it from sysfs. Would that work?
> 
> We can but there are two variants of that:
> 
> If destroy_workqueue() waits for delayed work, then it may take a long time.
> 
> If destroy_workqueue() does not wait for delayed work, then I'm
> worried about bugs resulting from module unload and similar.

I see. Yeah, neither seems workable. We should be able to flush the delayed
work items. Maybe we can make that an optional feature so that rust wrappers
can turn it on for safety.

Thanks.

-- 
tejun

Re: [PATCH v3 1/2] rust: workqueue: restrict delayed work to global wqs

Posted by Gary Guo 1 month, 1 week ago

On Fri Feb 27, 2026 at 2:53 PM GMT, Alice Ryhl wrote:
> When a workqueue is shut down, delayed work that is pending but not
> scheduled does not get properly cleaned up, so it's not safe to use
> `enqueue_delayed` on a workqueue that might be destroyed. To fix this,
> restricted `enqueue_delayed` to static queues.
> 
> Cc: stable@vger.kernel.org
> Fixes: 7c098cd5eaae ("workqueue: rust: add delayed work items")
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>

Reviewed-by: Gary Guo <gary@garyguo.net>

> ---
>  rust/kernel/workqueue.rs | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)

Re: [PATCH v3 1/2] rust: workqueue: restrict delayed work to global wqs

Posted by Danilo Krummrich 1 month, 1 week ago

On Fri Feb 27, 2026 at 3:53 PM CET, Alice Ryhl wrote:
> When a workqueue is shut down, delayed work that is pending but not
> scheduled does not get properly cleaned up, so it's not safe to use
> `enqueue_delayed` on a workqueue that might be destroyed. To fix this,
> restricted `enqueue_delayed` to static queues.

:(

Reviewed-by: Danilo Krummrich <dakr@kernel.org>

> Cc: stable@vger.kernel.org
> Fixes: 7c098cd5eaae ("workqueue: rust: add delayed work items")
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
>  rust/kernel/workqueue.rs | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/rust/kernel/workqueue.rs b/rust/kernel/workqueue.rs
> index 706e833e9702..1acd113c04ee 100644
> --- a/rust/kernel/workqueue.rs
> +++ b/rust/kernel/workqueue.rs
> @@ -296,8 +296,15 @@ pub fn enqueue<W, const ID: u64>(&self, w: W) -> W::EnqueueOutput
>      ///
>      /// This may fail if the work item is already enqueued in a workqueue.
>      ///
> +    /// This is only valid for global workqueues (with static lifetimes) because those are the only
> +    /// ones that outlive all possible delayed work items.

We should probably add a FIXME comment pointing out that this should be fixed in
the C code.

Maybe also link your approach?

> +    ///
>      /// The work item will be submitted using `WORK_CPU_UNBOUND`.
> -    pub fn enqueue_delayed<W, const ID: u64>(&self, w: W, delay: Jiffies) -> W::EnqueueOutput
> +    pub fn enqueue_delayed<W, const ID: u64>(
> +        &'static self,
> +        w: W,
> +        delay: Jiffies,
> +    ) -> W::EnqueueOutput
>      where
>          W: RawDelayedWorkItem<ID> + Send + 'static,
>      {
>
> -- 
> 2.53.0.473.g4a7958ca14-goog