When a workqueue is shut down, delayed work that is pending but not
scheduled does not get properly cleaned up, so it's not safe to use
`enqueue_delayed` on a workqueue that might be destroyed. To fix this,
restricted `enqueue_delayed` to static queues.
Cc: stable@vger.kernel.org
Fixes: 7c098cd5eaae ("workqueue: rust: add delayed work items")
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
rust/kernel/workqueue.rs | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/rust/kernel/workqueue.rs b/rust/kernel/workqueue.rs
index 706e833e9702..1acd113c04ee 100644
--- a/rust/kernel/workqueue.rs
+++ b/rust/kernel/workqueue.rs
@@ -296,8 +296,15 @@ pub fn enqueue<W, const ID: u64>(&self, w: W) -> W::EnqueueOutput
///
/// This may fail if the work item is already enqueued in a workqueue.
///
+ /// This is only valid for global workqueues (with static lifetimes) because those are the only
+ /// ones that outlive all possible delayed work items.
+ ///
/// The work item will be submitted using `WORK_CPU_UNBOUND`.
- pub fn enqueue_delayed<W, const ID: u64>(&self, w: W, delay: Jiffies) -> W::EnqueueOutput
+ pub fn enqueue_delayed<W, const ID: u64>(
+ &'static self,
+ w: W,
+ delay: Jiffies,
+ ) -> W::EnqueueOutput
where
W: RawDelayedWorkItem<ID> + Send + 'static,
{
--
2.53.0.473.g4a7958ca14-goog
On Fri, Feb 27, 2026 at 02:53:20PM +0000, Alice Ryhl wrote: > When a workqueue is shut down, delayed work that is pending but not > scheduled does not get properly cleaned up, so it's not safe to use > `enqueue_delayed` on a workqueue that might be destroyed. To fix this, > restricted `enqueue_delayed` to static queues. C being C, we've been just chalking this up as "user error", but please feel free to add per-workqueue percpu ref for pending delayed work items if that'd help. That shouldn't be noticeably expensive and should help straighten this out for rust hopefully. Thanks. -- tejun
On Fri, Feb 27, 2026 at 07:09:07AM -1000, Tejun Heo wrote: > On Fri, Feb 27, 2026 at 02:53:20PM +0000, Alice Ryhl wrote: > > When a workqueue is shut down, delayed work that is pending but not > > scheduled does not get properly cleaned up, so it's not safe to use > > `enqueue_delayed` on a workqueue that might be destroyed. To fix this, > > restricted `enqueue_delayed` to static queues. > > C being C, we've been just chalking this up as "user error", but please feel > free to add per-workqueue percpu ref for pending delayed work items if > that'd help. That shouldn't be noticeably expensive and should help > straighten this out for rust hopefully. I had been thinking I would pick up this patch again: https://lore.kernel.org/all/20250423-destroy-workqueue-flush-v1-1-3d74820780a5@google.com/ but it sounds like you're suggesting a different solution? Alice
On Fri, Feb 27, 2026 at 07:01:18PM +0000, Alice Ryhl wrote: > On Fri, Feb 27, 2026 at 07:09:07AM -1000, Tejun Heo wrote: > > On Fri, Feb 27, 2026 at 02:53:20PM +0000, Alice Ryhl wrote: > > > When a workqueue is shut down, delayed work that is pending but not > > > scheduled does not get properly cleaned up, so it's not safe to use > > > `enqueue_delayed` on a workqueue that might be destroyed. To fix this, > > > restricted `enqueue_delayed` to static queues. > > > > C being C, we've been just chalking this up as "user error", but please feel > > free to add per-workqueue percpu ref for pending delayed work items if > > that'd help. That shouldn't be noticeably expensive and should help > > straighten this out for rust hopefully. > > I had been thinking I would pick up this patch again: > https://lore.kernel.org/all/20250423-destroy-workqueue-flush-v1-1-3d74820780a5@google.com/ > > but it sounds like you're suggesting a different solution? I'm not remembering much context at this point, but if it *could* work, percpu refcnt counting the number of delayed work items would be cheaper. Again, I could easily be forgetting why we didn't do that in the first place. Thanks. -- tejun
On Fri, Feb 27, 2026 at 09:08:55AM -1000, Tejun Heo wrote: > On Fri, Feb 27, 2026 at 07:01:18PM +0000, Alice Ryhl wrote: > > On Fri, Feb 27, 2026 at 07:09:07AM -1000, Tejun Heo wrote: > > > On Fri, Feb 27, 2026 at 02:53:20PM +0000, Alice Ryhl wrote: > > > > When a workqueue is shut down, delayed work that is pending but not > > > > scheduled does not get properly cleaned up, so it's not safe to use > > > > `enqueue_delayed` on a workqueue that might be destroyed. To fix this, > > > > restricted `enqueue_delayed` to static queues. > > > > > > C being C, we've been just chalking this up as "user error", but please feel > > > free to add per-workqueue percpu ref for pending delayed work items if > > > that'd help. That shouldn't be noticeably expensive and should help > > > straighten this out for rust hopefully. > > > > I had been thinking I would pick up this patch again: > > https://lore.kernel.org/all/20250423-destroy-workqueue-flush-v1-1-3d74820780a5@google.com/ > > > > but it sounds like you're suggesting a different solution? > > I'm not remembering much context at this point, but if it *could* work, > percpu refcnt counting the number of delayed work items would be cheaper. > Again, I could easily be forgetting why we didn't do that in the first > place. I guess the question is, what does destroy_workqueue() do? - Does it wait for the timers to finish? - Does it immediately run the delayed works? - Does it exit without waiting for timers? It sounds like the refcount approach is the last solution, where destroy_workqueue() just exits without waiting for timers, but then keeping the workqueue alive until the timers elapse. The main concern I can see is that this means that delayed work can run after destroy_workqueue() is called. That may be a problem if destroy_workqueue() is used to guard module unload (or device unbind). Alice
Hello, On Fri, Feb 27, 2026 at 07:19:56PM +0000, Alice Ryhl wrote: > I guess the question is, what does destroy_workqueue() do? > > - Does it wait for the timers to finish? > - Does it immediately run the delayed works? > - Does it exit without waiting for timers? > > It sounds like the refcount approach is the last solution, where > destroy_workqueue() just exits without waiting for timers, but then > keeping the workqueue alive until the timers elapse. > > The main concern I can see is that this means that delayed work can run > after destroy_workqueue() is called. That may be a problem if > destroy_workqueue() is used to guard module unload (or device unbind). delayed_work is just pointing to the wq pointer. On destroy_workqueue(), we can shut it down and free all the supporting stuff while leaving zombie wq struct which noops execution and let the whole thing go away when refs reach zero? Thanks. -- tejun
On Fri, Feb 27, 2026 at 09:24:34AM -1000, Tejun Heo wrote: > Hello, > > On Fri, Feb 27, 2026 at 07:19:56PM +0000, Alice Ryhl wrote: > > I guess the question is, what does destroy_workqueue() do? > > > > - Does it wait for the timers to finish? > > - Does it immediately run the delayed works? > > - Does it exit without waiting for timers? > > > > It sounds like the refcount approach is the last solution, where > > destroy_workqueue() just exits without waiting for timers, but then > > keeping the workqueue alive until the timers elapse. > > > > The main concern I can see is that this means that delayed work can run > > after destroy_workqueue() is called. That may be a problem if > > destroy_workqueue() is used to guard module unload (or device unbind). > > delayed_work is just pointing to the wq pointer. On destroy_workqueue(), we > can shut it down and free all the supporting stuff while leaving zombie wq > struct which noops execution and let the whole thing go away when refs reach > zero? But isn't that a problem for e.g. self-freeing work? If we don't run the work, then its memory is just leaked. Alice
On Fri, Feb 27, 2026 at 07:28:11PM +0000, Alice Ryhl wrote: > > delayed_work is just pointing to the wq pointer. On destroy_workqueue(), we > > can shut it down and free all the supporting stuff while leaving zombie wq > > struct which noops execution and let the whole thing go away when refs reach > > zero? > > But isn't that a problem for e.g. self-freeing work? If we don't run the > work, then its memory is just leaked. Yeah, good point. Maybe we should just keep the whole thing up while removing it from sysfs. Would that work? Thanks. -- tejun
On Fri, Feb 27, 2026 at 8:46 PM Tejun Heo <tj@kernel.org> wrote: > > On Fri, Feb 27, 2026 at 07:28:11PM +0000, Alice Ryhl wrote: > > > delayed_work is just pointing to the wq pointer. On destroy_workqueue(), we > > > can shut it down and free all the supporting stuff while leaving zombie wq > > > struct which noops execution and let the whole thing go away when refs reach > > > zero? > > > > But isn't that a problem for e.g. self-freeing work? If we don't run the > > work, then its memory is just leaked. > > Yeah, good point. Maybe we should just keep the whole thing up while > removing it from sysfs. Would that work? We can but there are two variants of that: If destroy_workqueue() waits for delayed work, then it may take a long time. If destroy_workqueue() does not wait for delayed work, then I'm worried about bugs resulting from module unload and similar. Alice
On Fri, Feb 27, 2026 at 09:36:22PM +0100, Alice Ryhl wrote: > On Fri, Feb 27, 2026 at 8:46 PM Tejun Heo <tj@kernel.org> wrote: > > > > On Fri, Feb 27, 2026 at 07:28:11PM +0000, Alice Ryhl wrote: > > > > delayed_work is just pointing to the wq pointer. On destroy_workqueue(), we > > > > can shut it down and free all the supporting stuff while leaving zombie wq > > > > struct which noops execution and let the whole thing go away when refs reach > > > > zero? > > > > > > But isn't that a problem for e.g. self-freeing work? If we don't run the > > > work, then its memory is just leaked. > > > > Yeah, good point. Maybe we should just keep the whole thing up while > > removing it from sysfs. Would that work? > > We can but there are two variants of that: > > If destroy_workqueue() waits for delayed work, then it may take a long time. > > If destroy_workqueue() does not wait for delayed work, then I'm > worried about bugs resulting from module unload and similar. I see. Yeah, neither seems workable. We should be able to flush the delayed work items. Maybe we can make that an optional feature so that rust wrappers can turn it on for safety. Thanks. -- tejun
On Fri Feb 27, 2026 at 2:53 PM GMT, Alice Ryhl wrote:
> When a workqueue is shut down, delayed work that is pending but not
> scheduled does not get properly cleaned up, so it's not safe to use
> `enqueue_delayed` on a workqueue that might be destroyed. To fix this,
> restricted `enqueue_delayed` to static queues.
>
> Cc: stable@vger.kernel.org
> Fixes: 7c098cd5eaae ("workqueue: rust: add delayed work items")
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
> ---
> rust/kernel/workqueue.rs | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
On Fri Feb 27, 2026 at 3:53 PM CET, Alice Ryhl wrote:
> When a workqueue is shut down, delayed work that is pending but not
> scheduled does not get properly cleaned up, so it's not safe to use
> `enqueue_delayed` on a workqueue that might be destroyed. To fix this,
> restricted `enqueue_delayed` to static queues.
:(
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
> Cc: stable@vger.kernel.org
> Fixes: 7c098cd5eaae ("workqueue: rust: add delayed work items")
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
> rust/kernel/workqueue.rs | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/rust/kernel/workqueue.rs b/rust/kernel/workqueue.rs
> index 706e833e9702..1acd113c04ee 100644
> --- a/rust/kernel/workqueue.rs
> +++ b/rust/kernel/workqueue.rs
> @@ -296,8 +296,15 @@ pub fn enqueue<W, const ID: u64>(&self, w: W) -> W::EnqueueOutput
> ///
> /// This may fail if the work item is already enqueued in a workqueue.
> ///
> + /// This is only valid for global workqueues (with static lifetimes) because those are the only
> + /// ones that outlive all possible delayed work items.
We should probably add a FIXME comment pointing out that this should be fixed in
the C code.
Maybe also link your approach?
> + ///
> /// The work item will be submitted using `WORK_CPU_UNBOUND`.
> - pub fn enqueue_delayed<W, const ID: u64>(&self, w: W, delay: Jiffies) -> W::EnqueueOutput
> + pub fn enqueue_delayed<W, const ID: u64>(
> + &'static self,
> + w: W,
> + delay: Jiffies,
> + ) -> W::EnqueueOutput
> where
> W: RawDelayedWorkItem<ID> + Send + 'static,
> {
>
> --
> 2.53.0.473.g4a7958ca14-goog
© 2016 - 2026 Red Hat, Inc.