[PATCH v2] bpf: Fix use-after-free in __bpf_trace_run()

Qing Wang posted 1 patch 1 month, 1 week ago
kernel/bpf/syscall.c | 7 +++++++
1 file changed, 7 insertions(+)
[PATCH v2] bpf: Fix use-after-free in __bpf_trace_run()
Posted by Qing Wang 1 month, 1 week ago
A use-after-free issue reported from syzbot exists in __bpf_trace_run().

BUG: KASAN: slab-use-after-free in __bpf_trace_run kernel/trace/bpf_trace.c:2075 [inline]
    -> struct bpf_prog *prog = link->link.prog;

The link(struct bpf_raw_tp_link) was freed before accessing
link->link.prog.

The root cause is that: When bpf_probe_unregister() is called, tasks may
have already entered the old tp_probes array (RCU read-side section)
before rcu_assign_pointer() updates tp->funcs. These tasks can access the
link through the old array. Without synchronization, the link can be freed
via call_rcu() after bpf_probe_unregister() in bpf_link_free(), leading to
use-after-free in __bpf_trace_run().

CPU 0 (free link)                    CPU 1 (enter old tp probe)
─────────────────                    ────────────────────────

                                     rcu_read_lock()
                                     old_funcs = tp->funcs
bpf_raw_tp_link_release()
bpf_probe_unregister()
rcu_assign_pointer(tp->funcs, new)
call_srcu/call_rcu_tasks_trace(old_tp)
...
call_rcu/call_rcu_tasks_trace(&link->rcu, ...)
(RCU grace period)
kfree(link)
                                     __bpf_trace_run(link, ...)
                                     access link->link.prog
                                     UAF!

Fix by calling tracepoint_synchronize_unregister() to ensure all
in-flight tracepoint callbacks have completed, so the link is no
longer reachable before it is freed.

The issue was introduced by commit d4dfc5700e86 ("bpf:
pass whole link instead of prog when triggering raw tracepoint"),
which changed tracepoint callbacks to receive bpf_raw_tp_link pointers
instead of bpf_prog pointers.

Prior to this commit, this issue did not occur because the bpf_prog was
directly used and protected by reference counting.

Fixes: d4dfc5700e86 ("bpf: pass whole link instead of prog when triggering raw tracepoint")
Reported-by: syzbot+b4c5ad098c821bf8d8bc@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b4c5ad098c821bf8d8bc
Tested-by: syzbot+b4c5ad098c821bf8d8bc@syzkaller.appspotmail.com
Signed-off-by: Qing Wang <wangqing7171@gmail.com>
---
Changes in v2:
- Modified commit message from bpf-ci AI reviewed.
- Link to v1: https://lore.kernel.org/all/20260304070927.178464-1-wangqing7171@gmail.com/T/

 kernel/bpf/syscall.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 0378e83b4099..dd491bc35027 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3783,6 +3783,13 @@ static void bpf_raw_tp_link_release(struct bpf_link *link)
 
 	bpf_probe_unregister(raw_tp->btp, raw_tp);
 	bpf_put_raw_tracepoint(raw_tp->btp);
+
+	/*
+	 * Wait for all in-flight tracepoint callbacks to complete so the
+	 * link is no longer reachable through tp_probes. This prevents
+	 * use-after-free in __bpf_trace_run() when a tracepoint fires.
+	 */
+	tracepoint_synchronize_unregister();
 }
 
 static void bpf_raw_tp_link_dealloc(struct bpf_link *link)
-- 
2.34.1

Re: [PATCH v2] bpf: Fix use-after-free in __bpf_trace_run()
Posted by Jordan Rife 1 month, 1 week ago
On Wed, Mar 04, 2026 at 05:23:45PM +0800, Qing Wang wrote:
> A use-after-free issue reported from syzbot exists in __bpf_trace_run().
> 
> BUG: KASAN: slab-use-after-free in __bpf_trace_run kernel/trace/bpf_trace.c:2075 [inline]
>     -> struct bpf_prog *prog = link->link.prog;
> 
> The link(struct bpf_raw_tp_link) was freed before accessing
> link->link.prog.
> 
> The root cause is that: When bpf_probe_unregister() is called, tasks may
> have already entered the old tp_probes array (RCU read-side section)
> before rcu_assign_pointer() updates tp->funcs. These tasks can access the
> link through the old array. Without synchronization, the link can be freed
> via call_rcu() after bpf_probe_unregister() in bpf_link_free(), leading to
> use-after-free in __bpf_trace_run().
> 
> CPU 0 (free link)                    CPU 1 (enter old tp probe)
> ─────────────────                    ────────────────────────
> 
>                                      rcu_read_lock()
>                                      old_funcs = tp->funcs
> bpf_raw_tp_link_release()
> bpf_probe_unregister()
> rcu_assign_pointer(tp->funcs, new)
> call_srcu/call_rcu_tasks_trace(old_tp)
> ...
> call_rcu/call_rcu_tasks_trace(&link->rcu, ...)

If CPU 1 is in an RCU read-side section, then call_rcu would wait for
the RCU GP anyway before freeing the link in question.

> (RCU grace period)
> kfree(link)
>                                      __bpf_trace_run(link, ...)
>                                      access link->link.prog
>                                      UAF!
> 
> Fix by calling tracepoint_synchronize_unregister() to ensure all
> in-flight tracepoint callbacks have completed, so the link is no
> longer reachable before it is freed.

It looks like tracepoint_synchronize_unregister() just calls
synchronize_rcu_tasks_trace() and synchronize_rcu(), but it should also
be sufficient to use call_rcu() or call_rcu_tasks_trace() to ensure that
the appopriate grace period elapses for that tracepoint. Is the extra
delay just masking the problem instead of fixing the root cause?

> The issue was introduced by commit d4dfc5700e86 ("bpf:
> pass whole link instead of prog when triggering raw tracepoint"),
> which changed tracepoint callbacks to receive bpf_raw_tp_link pointers
> instead of bpf_prog pointers.

Did you run a bisect?

> Prior to this commit, this issue did not occur because the bpf_prog was
> directly used and protected by reference counting.
> 
> Fixes: d4dfc5700e86 ("bpf: pass whole link instead of prog when triggering raw tracepoint")
> Reported-by: syzbot+b4c5ad098c821bf8d8bc@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=b4c5ad098c821bf8d8bc
> Tested-by: syzbot+b4c5ad098c821bf8d8bc@syzkaller.appspotmail.com
> Signed-off-by: Qing Wang <wangqing7171@gmail.com>
> ---
> Changes in v2:
> - Modified commit message from bpf-ci AI reviewed.
> - Link to v1: https://lore.kernel.org/all/20260304070927.178464-1-wangqing7171@gmail.com/T/
> 
>  kernel/bpf/syscall.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 0378e83b4099..dd491bc35027 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -3783,6 +3783,13 @@ static void bpf_raw_tp_link_release(struct bpf_link *link)
>  
>  	bpf_probe_unregister(raw_tp->btp, raw_tp);
>  	bpf_put_raw_tracepoint(raw_tp->btp);
> +
> +	/*
> +	 * Wait for all in-flight tracepoint callbacks to complete so the
> +	 * link is no longer reachable through tp_probes. This prevents
> +	 * use-after-free in __bpf_trace_run() when a tracepoint fires.
> +	 */
> +	tracepoint_synchronize_unregister();
>  }
>  
>  static void bpf_raw_tp_link_dealloc(struct bpf_link *link)
> -- 
> 2.34.1
> 

Jordan
Re: [PATCH v2] bpf: Fix use-after-free in __bpf_trace_run()
Posted by Qing Wang 1 month, 1 week ago
On Thu, 05 Mar 2026 at 09:38, Jordan Rife <jrife@google.com> wrote:
> > A use-after-free issue reported from syzbot exists in __bpf_trace_run().
> > 
> > BUG: KASAN: slab-use-after-free in __bpf_trace_run kernel/trace/bpf_trace.c:2075 [inline]
> >     -> struct bpf_prog *prog = link->link.prog;
> > 
> > The link(struct bpf_raw_tp_link) was freed before accessing
> > link->link.prog.
> > 
> > The root cause is that: When bpf_probe_unregister() is called, tasks may
> > have already entered the old tp_probes array (RCU read-side section)
> > before rcu_assign_pointer() updates tp->funcs. These tasks can access the
> > link through the old array. Without synchronization, the link can be freed
> > via call_rcu() after bpf_probe_unregister() in bpf_link_free(), leading to
> > use-after-free in __bpf_trace_run().
> > 
> > CPU 0 (free link)                    CPU 1 (enter old tp probe)
> > ─────────────────                    ────────────────────────
> > 
> >                                      rcu_read_lock()
> >                                      old_funcs = tp->funcs
> > bpf_raw_tp_link_release()
> > bpf_probe_unregister()
> > rcu_assign_pointer(tp->funcs, new)
> > call_srcu/call_rcu_tasks_trace(old_tp)
> > ...
> > call_rcu/call_rcu_tasks_trace(&link->rcu, ...)
> 
> If CPU 1 is in an RCU read-side section, then call_rcu would wait for
> the RCU GP anyway before freeing the link in question.

Sry, It's my mistake that it should be 'srcu_read_lock(&tracepoint_srcu)'[0]
but not rcu_read_lock(), so that misleaded you. It only wait for the srcu
grace period (tracepoint).

    [0]
    include/linux/tracepoint.h:279
    #define __DECLARE_TRACE(name, proto, args, cond, data_proto)			\
    	__DECLARE_TRACE_COMMON(name, PARAMS(proto), PARAMS(args), PARAMS(data_proto)) \
    	static inline void __do_trace_##name(proto)			\
    	{								\
    		TRACEPOINT_CHECK(name)					\
    		if (cond) {						\
    			guard(srcu_fast_notrace)(&tracepoint_srcu);	\   <----
    			__DO_TRACE_CALL(name, TP_ARGS(args));		\
    		}							\
        }


> > (RCU grace period)
> > kfree(link)
> >                                      __bpf_trace_run(link, ...)
> >                                      access link->link.prog
> >                                      UAF!
> > 
> > Fix by calling tracepoint_synchronize_unregister() to ensure all
> > in-flight tracepoint callbacks have completed, so the link is no
> > longer reachable before it is freed.
> 
> It looks like tracepoint_synchronize_unregister() just calls
> synchronize_rcu_tasks_trace() and synchronize_rcu(), but it should also
> be sufficient to use call_rcu() or call_rcu_tasks_trace() to ensure that
> the appopriate grace period elapses for that tracepoint. Is the extra
> delay just masking the problem instead of fixing the root cause?

I think using synchronize_srcu(&tracepoint_srcu) is enough to ensure those
used old tp_probes can exit srcu_read_lock() before kfree(link). It needs
further discussion whether to use tracepoint_synchronize_unregister().

> > The issue was introduced by commit d4dfc5700e86 ("bpf:
> > pass whole link instead of prog when triggering raw tracepoint"),
> > which changed tracepoint callbacks to receive bpf_raw_tp_link pointers
> > instead of bpf_prog pointers.
> 
> Did you run a bisect?

I'm trying to run it, but I haven't reproduced it yet.

--
Qing