[PATCH v1] irq_work: Fix use-after-free in irq_work_single on PREEMPT_RT

Jiayuan Chen posted 1 patch 1 week, 1 day ago
There is a newer version of this series
kernel/irq_work.c | 29 ++++++++++++++++++++++++++---
1 file changed, 26 insertions(+), 3 deletions(-)
[PATCH v1] irq_work: Fix use-after-free in irq_work_single on PREEMPT_RT
Posted by Jiayuan Chen 1 week, 1 day ago
On PREEMPT_RT, non-HARD irq_work runs in a per-CPU kthread, so
irq_work_sync() uses rcuwait (sleeping) to wait for BUSY==0.

After irq_work_single() clears BUSY via atomic_cmpxchg(), an
irq_work_sync() caller on another CPU that enters *after* BUSY is
cleared can observe BUSY==0 immediately (without sleeping), return,
and free the work. Meanwhile irq_work_single() still dereferences
@work for irq_work_is_hard() and rcuwait_wake_up(), causing a
use-after-free.

Note: if a sync waiter is actually sleeping, @work is still alive
(it can't be freed until the waiter returns), so there is no UAF in
that case. The UAF only occurs when sync checks BUSY==0 without
going through schedule().

Fix this by extracting and pinning the irq_work_sync waiter's
task_struct (if any) while BUSY is still set and @work is guaranteed
alive. After clearing BUSY, wake the pinned task directly without
touching @work.

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
 kernel/irq_work.c | 29 ++++++++++++++++++++++++++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 73f7e1fd4ab4..b10b75d1cc09 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -200,6 +200,7 @@ bool irq_work_needs_cpu(void)
 
 void irq_work_single(void *arg)
 {
+	struct task_struct *waiter = NULL;
 	struct irq_work *work = arg;
 	int flags;
 
@@ -221,15 +222,37 @@ void irq_work_single(void *arg)
 	work->func(work);
 	lockdep_irq_work_exit(flags);
 
+	/*
+	 * Extract and pin the irq_work_sync() waiter before clearing
+	 * BUSY. Once BUSY is cleared, @work may be freed immediately
+	 * by a sync caller that observes BUSY==0 without sleeping, so
+	 * @work must not be dereferenced after the cmpxchg below.
+	 */
+	if ((IS_ENABLED(CONFIG_PREEMPT_RT) && !irq_work_is_hard(work)) ||
+	    !arch_irq_work_has_interrupt()) {
+		rcu_read_lock();
+		waiter = rcu_dereference(work->irqwait.task);
+		if (waiter)
+			get_task_struct(waiter);
+		rcu_read_unlock();
+	}
+
 	/*
 	 * Clear the BUSY bit, if set, and return to the free state if no-one
 	 * else claimed it meanwhile.
 	 */
 	(void)atomic_cmpxchg(&work->node.a_flags, flags, flags & ~IRQ_WORK_BUSY);
 
-	if ((IS_ENABLED(CONFIG_PREEMPT_RT) && !irq_work_is_hard(work)) ||
-	    !arch_irq_work_has_interrupt())
-		rcuwait_wake_up(&work->irqwait);
+	/*
+	 * @work must not be dereferenced past this point. Wake the
+	 * pinned waiter if one was sleeping; if none was sleeping,
+	 * either irq_work_sync() has not been called or it will
+	 * observe BUSY==0 on its own.
+	 */
+	if (waiter) {
+		wake_up_process(waiter);
+		put_task_struct(waiter);
+	}
 }
 
 static void irq_work_run_list(struct llist_head *list)
-- 
2.43.0
Re: [PATCH v1] irq_work: Fix use-after-free in irq_work_single on PREEMPT_RT
Posted by Steven Rostedt 1 week, 1 day ago
On Wed, 25 Mar 2026 11:05:04 +0800
Jiayuan Chen <jiayuan.chen@linux.dev> wrote:

Hi Jiayuan,

Adding "v1" to your first patch doesn't show much confidence in the code.
As it implies there will definitely be a v2. It's kind of like introducing
your spouse as, "Hi, this is my first wife". ;-)

> On PREEMPT_RT, non-HARD irq_work runs in a per-CPU kthread, so
> irq_work_sync() uses rcuwait (sleeping) to wait for BUSY==0.
> 
> After irq_work_single() clears BUSY via atomic_cmpxchg(), an
> irq_work_sync() caller on another CPU that enters *after* BUSY is
> cleared can observe BUSY==0 immediately (without sleeping), return,
> and free the work. Meanwhile irq_work_single() still dereferences
> @work for irq_work_is_hard() and rcuwait_wake_up(), causing a
> use-after-free.
> 
> Note: if a sync waiter is actually sleeping, @work is still alive
> (it can't be freed until the waiter returns), so there is no UAF in
> that case. The UAF only occurs when sync checks BUSY==0 without
> going through schedule().
> 
> Fix this by extracting and pinning the irq_work_sync waiter's
> task_struct (if any) while BUSY is still set and @work is guaranteed
> alive. After clearing BUSY, wake the pinned task directly without
> touching @work.
> 
> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> ---
>  kernel/irq_work.c | 29 ++++++++++++++++++++++++++---
>  1 file changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> index 73f7e1fd4ab4..b10b75d1cc09 100644
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -200,6 +200,7 @@ bool irq_work_needs_cpu(void)
>  
>  void irq_work_single(void *arg)
>  {
> +	struct task_struct *waiter = NULL;
>  	struct irq_work *work = arg;
>  	int flags;
>  
> @@ -221,15 +222,37 @@ void irq_work_single(void *arg)
>  	work->func(work);
>  	lockdep_irq_work_exit(flags);
>  
> +	/*
> +	 * Extract and pin the irq_work_sync() waiter before clearing
> +	 * BUSY. Once BUSY is cleared, @work may be freed immediately
> +	 * by a sync caller that observes BUSY==0 without sleeping, so
> +	 * @work must not be dereferenced after the cmpxchg below.
> +	 */
> +	if ((IS_ENABLED(CONFIG_PREEMPT_RT) && !irq_work_is_hard(work)) ||
> +	    !arch_irq_work_has_interrupt()) {

Unrelated to this patch, but the above if conditional should be defined as
a macro as it is used in more than one place, and they do need to match.

Maybe something like:

#define is_irq_work_threaded(work)  ((IS_ENABLED(CONFIG_PREEMPT_RT) && !irq_work_is_hard(work)) || \
				     !arch_irq_work_has_interrupt())

But again, that isn't related to this patch.

> +		rcu_read_lock();
> +		waiter = rcu_dereference(work->irqwait.task);

Hmm, is it proper to access the internals of an rcuwait type?

Paul, is there a more proper way to do this?

> +		if (waiter)
> +			get_task_struct(waiter);
> +		rcu_read_unlock();
> +	}
> +
>  	/*
>  	 * Clear the BUSY bit, if set, and return to the free state if no-one
>  	 * else claimed it meanwhile.
>  	 */
>  	(void)atomic_cmpxchg(&work->node.a_flags, flags, flags & ~IRQ_WORK_BUSY);
>  
> -	if ((IS_ENABLED(CONFIG_PREEMPT_RT) && !irq_work_is_hard(work)) ||
> -	    !arch_irq_work_has_interrupt())
> -		rcuwait_wake_up(&work->irqwait);
> +	/*
> +	 * @work must not be dereferenced past this point. Wake the
> +	 * pinned waiter if one was sleeping; if none was sleeping,
> +	 * either irq_work_sync() has not been called or it will
> +	 * observe BUSY==0 on its own.
> +	 */
> +	if (waiter) {
> +		wake_up_process(waiter);

Yeah, this is open coding the rcuwait_wake_up().

I'm not sure we want to do this.

Perhaps RCU could provide a way to save the rcuwait?

	struct rcuwait = __RCUWAIT_INITIALIZER(rcuwait);

	[..]

		rcuwait_copy(&rcuwait, &work->irqwait);

	[..]

		rcuwait_wake_up(&rcuwait);

?

> +		put_task_struct(waiter);
> +	}

-- Steve

>  }
>  
>  static void irq_work_run_list(struct llist_head *list)
Re: [PATCH v1] irq_work: Fix use-after-free in irq_work_single on PREEMPT_RT
Posted by Sebastian Andrzej Siewior 1 week, 1 day ago
On 2026-03-25 11:13:51 [-0400], Steven Rostedt wrote:
> Yeah, this is open coding the rcuwait_wake_up().
> 
> I'm not sure we want to do this.
> 
> Perhaps RCU could provide a way to save the rcuwait?
> 
> 	struct rcuwait = __RCUWAIT_INITIALIZER(rcuwait);
> 
> 	[..]
> 
> 		rcuwait_copy(&rcuwait, &work->irqwait);
> 
> 	[..]
> 
> 		rcuwait_wake_up(&rcuwait);
> 
> ?

Most irq-work aren't free()ed since they are static and remain around.
There is no task assigned if there is no active waiter.
Wouldn't it be easier to kfree_rcu() the struct using the irq-work?

> > +		put_task_struct(waiter);
> > +	}
> 
> -- Steve

Sebastian
Re: [PATCH v1] irq_work: Fix use-after-free in irq_work_single on PREEMPT_RT
Posted by Steven Rostedt 1 week, 1 day ago
On Wed, 25 Mar 2026 16:38:26 +0100
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> Most irq-work aren't free()ed since they are static and remain around.
> There is no task assigned if there is no active waiter.
> Wouldn't it be easier to kfree_rcu() the struct using the irq-work?

I guess we should add some kind of helper then. Like tracepoints have.

   tracepoint_synchronize_unregister()

Perhaps have a:

   irq_work_synchronize_free();

Or something like that to let developers know that they just can't safely free a
structure that contains an irq_work?

-- Steve
Re: [PATCH v1] irq_work: Fix use-after-free in irq_work_single on PREEMPT_RT
Posted by Sebastian Andrzej Siewior 1 week, 1 day ago
On 2026-03-25 11:53:15 [-0400], Steven Rostedt wrote:
> On Wed, 25 Mar 2026 16:38:26 +0100
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> 
> > Most irq-work aren't free()ed since they are static and remain around.
> > There is no task assigned if there is no active waiter.
> > Wouldn't it be easier to kfree_rcu() the struct using the irq-work?
> 
> I guess we should add some kind of helper then. Like tracepoints have.
> 
>    tracepoint_synchronize_unregister()
> 
> Perhaps have a:
> 
>    irq_work_synchronize_free();
> 
> Or something like that to let developers know that they just can't safely free a
> structure that contains an irq_work?

That sounds great.

> -- Steve

Sebastian
Re: [PATCH v1] irq_work: Fix use-after-free in irq_work_single on PREEMPT_RT
Posted by Jiayuan Chen 1 week, 1 day ago
On 3/25/26 11:55 PM, Sebastian Andrzej Siewior wrote:
> On 2026-03-25 11:53:15 [-0400], Steven Rostedt wrote:
>> On Wed, 25 Mar 2026 16:38:26 +0100
>> Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
>>
>>> Most irq-work aren't free()ed since they are static and remain around.
>>> There is no task assigned if there is no active waiter.
>>> Wouldn't it be easier to kfree_rcu() the struct using the irq-work?
>> I guess we should add some kind of helper then. Like tracepoints have.
>>
>>     tracepoint_synchronize_unregister()
>>
>> Perhaps have a:
>>
>>     irq_work_synchronize_free();
>>
>> Or something like that to let developers know that they just can't safely free a
>> structure that contains an irq_work?
> That sounds great.
>
>> -- Steve
> Sebastian


Hi Steve, Sebastian,

Thanks for the review!

I came across this issue while working on the BPF side. In bpf_ringbuf,
the irq_work is embedded in struct bpf_ringbuf which is vmap'd — after
irq_work_sync(), the whole region is vunmap'd immediately 
(bpf_ringbuf_free).

Looking further, this pattern is actually widespread. Several other
subsystems embed irq_work in a dynamically allocated container and free
it right after irq_work_sync():

   - kernel/trace/ring_buffer.c:
   rb_free_cpu_buffer() syncs then kfree(cpu_buffer)
   ring_buffer_free() syncs then kfree(buffer)
   - drivers/gpu/drm/i915/gt/intel_breadcrumbs.c:
   intel_breadcrumbs_free() syncs then kfree(b)
   - kernel/sched/ext.c:
   scx_sched_free_rcu_work() syncs then kfree(sch)
   - kernel/irq/irq_sim.c:
   irq_domain_remove_sim() syncs then kfree(work_ctx)
   - drivers/iio/trigger/iio-trig-sysfs.c:
   iio_sysfs_trigger_destroy() syncs then kfree(t)
   - drivers/edac/igen6_edac.c:
   igen6_remove() syncs then kfree()


I agree that open-coding rcuwait internals is not ideal. I'd like to
check my understanding of the direction you're suggesting — would
something like the following be on the right track?

In irq_work_single(), just wrap the post-callback section with
rcu_read_lock to keep the work structure alive through an RCU grace
period:

'''
   lockdep_irq_work_enter(flags);
   work->func(work);
   lockdep_irq_work_exit(flags);

+   rcu_read_lock();

   (void)atomic_cmpxchg(&work->node.a_flags, flags, flags & ~IRQ_WORK_BUSY);

   if ((IS_ENABLED(CONFIG_PREEMPT_RT) && !irq_work_is_hard(work)) ||
       !arch_irq_work_has_interrupt())
           rcuwait_wake_up(&work->irqwait);

+   rcu_read_unlock();
'''

Then provide a helper for callers that need to free:

void irq_work_synchronize_free(struct irq_work *work)
{
   irq_work_sync(work);
   synchronize_rcu();
}


Callers that free the containing structure would switch to
irq_work_synchronize_free(), or use kfree_rcu() if appropriate

Thanks,
Jiayuan





Re: [PATCH v1] irq_work: Fix use-after-free in irq_work_single on PREEMPT_RT
Posted by Sebastian Andrzej Siewior 1 week, 1 day ago
On 2026-03-26 00:34:58 [+0800], Jiayuan Chen wrote:
> In irq_work_single(), just wrap the post-callback section with
> rcu_read_lock to keep the work structure alive through an RCU grace
> period:
> 
> '''
>   lockdep_irq_work_enter(flags);
>   work->func(work);
>   lockdep_irq_work_exit(flags);
> 
> +   rcu_read_lock();
> 
>   (void)atomic_cmpxchg(&work->node.a_flags, flags, flags & ~IRQ_WORK_BUSY);
> 
>   if ((IS_ENABLED(CONFIG_PREEMPT_RT) && !irq_work_is_hard(work)) ||
>       !arch_irq_work_has_interrupt())
>           rcuwait_wake_up(&work->irqwait);
> 
> +   rcu_read_unlock();
> '''

This shouldn't be needed. run_irq_workd() should get a guard(rcu)(),
the other callers run with disabled interrupts. This should include CPU
hotplug invocation but need to check.

> Then provide a helper for callers that need to free:
> 
> void irq_work_synchronize_free(struct irq_work *work)
> {
>   irq_work_sync(work);
>   synchronize_rcu();
> }

Why not just having the synchronize_rcu()?

> Callers that free the containing structure would switch to
> irq_work_synchronize_free(), or use kfree_rcu() if appropriate

If we provide the irq_work_synchronize_free() then using kfree_rcu()
would sort of open code irq_work_synchronize_free() since we couldn't
simply replace synchronize_rcu() with something else and update the
irq_work core side (we would also have to update all users). I guess
that was Steven's idea in providing a helper for synchronisation.

> Thanks,
> Jiayuan

Sebastian
Re: [PATCH v1] irq_work: Fix use-after-free in irq_work_single on PREEMPT_RT
Posted by Steven Rostedt 1 week, 1 day ago
On Wed, 25 Mar 2026 18:05:39 +0100
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> Why not just having the synchronize_rcu()?
> 
> > Callers that free the containing structure would switch to
> > irq_work_synchronize_free(), or use kfree_rcu() if appropriate  
> 
> If we provide the irq_work_synchronize_free() then using kfree_rcu()
> would sort of open code irq_work_synchronize_free() since we couldn't
> simply replace synchronize_rcu() with something else and update the
> irq_work core side (we would also have to update all users). I guess
> that was Steven's idea in providing a helper for synchronisation.
> 

Yeah, the helper was just document that free work needs synchronization.

Perhaps Jiayuan's idea is better as it will not require modifying current
callers and does fix the issue.

But it would still need helper functions from RCU as I really do not think
it's a good idea to open code the rcuwait logic.

-- Steve
Re: [PATCH v1] irq_work: Fix use-after-free in irq_work_single on PREEMPT_RT
Posted by Sebastian Andrzej Siewior 1 week, 1 day ago
On 2026-03-25 13:44:33 [-0400], Steven Rostedt wrote:
> On Wed, 25 Mar 2026 18:05:39 +0100
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> 
> > Why not just having the synchronize_rcu()?
> > 
> > > Callers that free the containing structure would switch to
> > > irq_work_synchronize_free(), or use kfree_rcu() if appropriate  
> > 
> > If we provide the irq_work_synchronize_free() then using kfree_rcu()
> > would sort of open code irq_work_synchronize_free() since we couldn't
> > simply replace synchronize_rcu() with something else and update the
> > irq_work core side (we would also have to update all users). I guess
> > that was Steven's idea in providing a helper for synchronisation.
> > 
> 
> Yeah, the helper was just document that free work needs synchronization.
> 
> Perhaps Jiayuan's idea is better as it will not require modifying current
> callers and does fix the issue.

Don't you need to replace irq_work_sync() with this new one?

> But it would still need helper functions from RCU as I really do not think
> it's a good idea to open code the rcuwait logic.

Why is rcuwait a concern?

> -- Steve

Sebastian
Re: [PATCH v1] irq_work: Fix use-after-free in irq_work_single on PREEMPT_RT
Posted by Steven Rostedt 1 week, 1 day ago
On Wed, 25 Mar 2026 18:51:50 +0100
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> > Perhaps Jiayuan's idea is better as it will not require modifying current
> > callers and does fix the issue.  
> 
> Don't you need to replace irq_work_sync() with this new one?
> 
> > But it would still need helper functions from RCU as I really do not think
> > it's a good idea to open code the rcuwait logic.  
> 
> Why is rcuwait a concern?

Oh, I was talking about how the patch open coded rcuwait (which we shouldn't do).

Are you saying that if we stick a synchronize_rcu() in irq_work_sync() that
could work too?

-- Steve
Re: [PATCH v1] irq_work: Fix use-after-free in irq_work_single on PREEMPT_RT
Posted by Sebastian Andrzej Siewior 1 week, 1 day ago
On 2026-03-25 13:55:40 [-0400], Steven Rostedt wrote:
> On Wed, 25 Mar 2026 18:51:50 +0100
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> 
> > > Perhaps Jiayuan's idea is better as it will not require modifying current
> > > callers and does fix the issue.  
> > 
> > Don't you need to replace irq_work_sync() with this new one?
> > 
> > > But it would still need helper functions from RCU as I really do not think
> > > it's a good idea to open code the rcuwait logic.  
> > 
> > Why is rcuwait a concern?
> 
> Oh, I was talking about how the patch open coded rcuwait (which we shouldn't do).
> 
> Are you saying that if we stick a synchronize_rcu() in irq_work_sync() that
> could work too?

I was thinking about your helper doing synchronize_rcu().
I haven't looked at irq_work_sync() but it would need solve the problem,
too. There shouldn't be any user of irq_work_sync() which does not
intend to free the object, why else should they wait, right? So it might
be even simpler.

> -- Steve

Sebastian
Re: [PATCH v1] irq_work: Fix use-after-free in irq_work_single on PREEMPT_RT
Posted by Jiayuan Chen 1 week ago
On 3/26/26 1:59 AM, Sebastian Andrzej Siewior wrote:
> On 2026-03-25 13:55:40 [-0400], Steven Rostedt wrote:
>> On Wed, 25 Mar 2026 18:51:50 +0100
>> Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
>>
>>>> Perhaps Jiayuan's idea is better as it will not require modifying current
>>>> callers and does fix the issue.
>>> Don't you need to replace irq_work_sync() with this new one?
>>>
>>>> But it would still need helper functions from RCU as I really do not think
>>>> it's a good idea to open code the rcuwait logic.
>>> Why is rcuwait a concern?
>> Oh, I was talking about how the patch open coded rcuwait (which we shouldn't do).
>>
>> Are you saying that if we stick a synchronize_rcu() in irq_work_sync() that
>> could work too?
> I was thinking about your helper doing synchronize_rcu().
> I haven't looked at irq_work_sync() but it would need solve the problem,
> too. There shouldn't be any user of irq_work_sync() which does not
> intend to free the object, why else should they wait, right? So it might
> be even simpler.
>
Combining your and Steven's suggestions, I think the simplest fix would be:

static void run_irq_workd(unsigned int cpu)
{
+   guard(rcu)();
     irq_work_run_list(this_cpu_ptr(&lazy_list));
}

void irq_work_sync(struct irq_work *work)
{
     lockdep_assert_irqs_enabled();
     might_sleep();

     if ((IS_ENABLED(CONFIG_PREEMPT_RT) && !irq_work_is_hard(work)) ||
         !arch_irq_work_has_interrupt()) {
             rcuwait_wait_event(&work->irqwait, !irq_work_is_busy(work),
                                TASK_UNINTERRUPTIBLE);
   +             /*
   +              * Ensure run_irq_workd() / irq_work_single() is done
   +              * accessing @work before the caller can free it.
   +              */
   +             synchronize_rcu();
                 return;
     }

     while (irq_work_is_busy(work))
             cpu_relax();
}

Re: [PATCH v1] irq_work: Fix use-after-free in irq_work_single on PREEMPT_RT
Posted by Sebastian Andrzej Siewior 1 week ago
On 2026-03-26 10:27:10 [+0800], Jiayuan Chen wrote:
> Combining your and Steven's suggestions, I think the simplest fix would be:
> 
> static void run_irq_workd(unsigned int cpu)
> {
> +   guard(rcu)();
>     irq_work_run_list(this_cpu_ptr(&lazy_list));
> }
> 
> void irq_work_sync(struct irq_work *work)
> {
>     lockdep_assert_irqs_enabled();
>     might_sleep();
> 
>     if ((IS_ENABLED(CONFIG_PREEMPT_RT) && !irq_work_is_hard(work)) ||
>         !arch_irq_work_has_interrupt()) {
>             rcuwait_wait_event(&work->irqwait, !irq_work_is_busy(work),
>                                TASK_UNINTERRUPTIBLE);
>   +             /*
>   +              * Ensure run_irq_workd() / irq_work_single() is done
>   +              * accessing @work before the caller can free it.

  Ensure irq_work_single() does not access @work after removing
  IRQ_WORK_BUSY. It is always accessed within a RCU-read section.

But, yes, this looks like a simple fix.

>   +              */
>   +             synchronize_rcu();
>                 return;
>     }

Sebastian