[v2] bpf: Fix use-after-free of sockmap

[PATCH bpf-next v2 1/3] bpf, sockmap: avoid using sk_socket after free

Posted by Jiayuan Chen 11 months, 2 weeks ago

Use RCU lock to protect sk_socket, preventing concurrent close and release
by another thread.

Because TCP/UDP are already within a relatively large critical section:
'''
ip_local_deliver_finish
  rcu_read_lock
  ip_protocol_deliver_rcu
      tcp_rcv/udp_rcv
  rcu_read_unlock
'''

Adding rcu_read_{un}lock() at the entrance and exit of sk_data_ready
will not increase performance overhead.

Fixes: c63829182c37 ("af_unix: Implement ->psock_update_sk_prot()")
Reported-by: syzbot+dd90a702f518e0eac072@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/bpf/6734c033.050a0220.2a2fcc.0015.GAE@google.com/
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
 net/core/skmsg.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 0ddc4c718833..1b71ae1d1bf5 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -1222,27 +1222,35 @@ static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb)
 
 static void sk_psock_verdict_data_ready(struct sock *sk)
 {
-	struct socket *sock = sk->sk_socket;
+	struct socket *sock;
 	const struct proto_ops *ops;
 	int copied;
 
 	trace_sk_data_ready(sk);
 
+	/* We need RCU to prevent the sk_socket from being released.
+	 * Especially for Unix sockets, we are currently in the process
+	 * context and do not have RCU protection.
+	 */
+	rcu_read_lock();
+	sock = sk->sk_socket;
 	if (unlikely(!sock))
-		return;
+		goto unlock;
+
 	ops = READ_ONCE(sock->ops);
 	if (!ops || !ops->read_skb)
-		return;
+		goto unlock;
+
 	copied = ops->read_skb(sk, sk_psock_verdict_recv);
 	if (copied >= 0) {
 		struct sk_psock *psock;
 
-		rcu_read_lock();
 		psock = sk_psock(sk);
 		if (psock)
 			sk_psock_data_ready(sk, psock);
-		rcu_read_unlock();
 	}
+unlock:
+	rcu_read_unlock();
 }
 
 void sk_psock_start_verdict(struct sock *sk, struct sk_psock *psock)
-- 
2.47.1

Re: [PATCH bpf-next v2 1/3] bpf, sockmap: avoid using sk_socket after free

Posted by Michal Luczaj 11 months, 1 week ago

On 2/28/25 06:51, Jiayuan Chen wrote:
> ...
>  static void sk_psock_verdict_data_ready(struct sock *sk)
>  {
> -	struct socket *sock = sk->sk_socket;
> +	struct socket *sock;
>  	const struct proto_ops *ops;
>  	int copied;
>  
>  	trace_sk_data_ready(sk);
>  
> +	/* We need RCU to prevent the sk_socket from being released.
> +	 * Especially for Unix sockets, we are currently in the process
> +	 * context and do not have RCU protection.
> +	 */
> +	rcu_read_lock();
> +	sock = sk->sk_socket;
>  	if (unlikely(!sock))
> -		return;
> +		goto unlock;
> +
>  	ops = READ_ONCE(sock->ops);
>  	if (!ops || !ops->read_skb)
> -		return;
> +		goto unlock;
> +
>  	copied = ops->read_skb(sk, sk_psock_verdict_recv);
>  	if (copied >= 0) {
>  		struct sk_psock *psock;
>  
> -		rcu_read_lock();
>  		psock = sk_psock(sk);
>  		if (psock)
>  			sk_psock_data_ready(sk, psock);
> -		rcu_read_unlock();
>  	}
> +unlock:
> +	rcu_read_unlock();
>  }

Hi,

Doesn't sk_psock_handle_skb() (!ingress path) have the same `struct socket`
release race issue? Any plans on fixing that one, too?

BTW, lockdep (CONFIG_LOCKDEP=y) complains about calling AF_UNIX's
read_skb() under RCU read lock.

Thanks,
Michal

Re: [PATCH bpf-next v2 1/3] bpf, sockmap: avoid using sk_socket after free

Posted by Jiayuan Chen 11 months ago

March 7, 2025 at 5:45 PM, "Michal Luczaj" <mhal@rbox.co> wrote:

> 
> On 2/28/25 06:51, Jiayuan Chen wrote:
> 
> > 
> > ...
> > 
> >  static void sk_psock_verdict_data_ready(struct sock *sk)
> > 
> >  {
> > 
> >  - struct socket *sock = sk->sk_socket;
> > 
> >  + struct socket *sock;
> > 
> >  const struct proto_ops *ops;
> > 
> >  int copied;
> > 
> >  
> > 
> >  trace_sk_data_ready(sk);
> > 
> >  
> > 
> >  + /* We need RCU to prevent the sk_socket from being released.
> > 
> >  + * Especially for Unix sockets, we are currently in the process
> > 
> >  + * context and do not have RCU protection.
> > 
> >  + */
> > 
> >  + rcu_read_lock();
> > 
> >  + sock = sk->sk_socket;
> > 
> >  if (unlikely(!sock))
> > 
> >  - return;
> > 
> >  + goto unlock;
> > 
> >  +
> > 
> >  ops = READ_ONCE(sock->ops);
> > 
> >  if (!ops || !ops->read_skb)
> > 
> >  - return;
> > 
> >  + goto unlock;
> > 
> >  +
> > 
> >  copied = ops->read_skb(sk, sk_psock_verdict_recv);
> > 
> >  if (copied >= 0) {
> > 
> >  struct sk_psock *psock;
> > 
> >  
> > 
> >  - rcu_read_lock();
> > 
> >  psock = sk_psock(sk);
> > 
> >  if (psock)
> > 
> >  sk_psock_data_ready(sk, psock);
> > 
> >  - rcu_read_unlock();
> > 
> >  }
> > 
> >  +unlock:
> > 
> >  + rcu_read_unlock();
> > 
> >  }
> > 
> 
> Hi,
> 
> Doesn't sk_psock_handle_skb() (!ingress path) have the same `struct socket`
> 
> release race issue? Any plans on fixing that one, too?

Yes, the send path logic also has similar issues, and after some hacking,
I was able to reproduce it. Thanks for providing this information.
I can fix these together in the next revision of this patchset, anyway,
this patchset still needs further confirmation from the maintainer.

> 
> BTW, lockdep (CONFIG_LOCKDEP=y) complains about calling AF_UNIX's
> 
> read_skb() under RCU read lock.
> 
> Thanks,
> 
> Michal
>
My environment also has LOCKDEP enabled, but I didn't see similar
warnings.
Moreover, RCU assertions are typically written as:

WARN_ON_ONCE(!rcu_read_lock_held())

And when LOCKDEP is not enabled, rcu_read_lock_held() defaults to
returning 1. So, it's unlikely to trigger a warning due to an RCU lock
being held.

Could you provide more of the call stack?

Thanks.

Re: [PATCH bpf-next v2 1/3] bpf, sockmap: avoid using sk_socket after free

Posted by Michal Luczaj 11 months ago

On 3/10/25 12:36, Jiayuan Chen wrote:
> March 7, 2025 at 5:45 PM, "Michal Luczaj" <mhal@rbox.co> wrote:
> ...
>> BTW, lockdep (CONFIG_LOCKDEP=y) complains about calling AF_UNIX's
>> read_skb() under RCU read lock.
>>
> My environment also has LOCKDEP enabled, but I didn't see similar
> warnings.
> Moreover, RCU assertions are typically written as:
> 
> WARN_ON_ONCE(!rcu_read_lock_held())
> 
> And when LOCKDEP is not enabled, rcu_read_lock_held() defaults to
> returning 1. So, it's unlikely to trigger a warning due to an RCU lock
> being held.
> 
> Could you provide more of the call stack?

Sure, bpf-next with this series applied, test_progs -t sockmap_basic:

=============================
[ BUG: Invalid wait context ]
6.14.0-rc3+ #111 Tainted: G           OE
-----------------------------
test_progs/37755 is trying to lock:
ffff88810d9bc3c0 (&u->iolock){+.+.}-{4:4}, at: unix_stream_read_skb+0x30/0x120
other info that might help us debug this:
context-{5:5}
1 lock held by test_progs/37755:
 #0: ffffffff833700e0 (rcu_read_lock){....}-{1:3}, at: sk_psock_verdict_data_ready+0x3e/0x2a0
stack backtrace:
CPU: 13 UID: 0 PID: 37755 Comm: test_progs Tainted: G           OE      6.14.0-rc3+ #111
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
Call Trace:
 dump_stack_lvl+0x68/0x90
 lock_acquire+0xcf/0x2e0
 __mutex_lock+0x9c/0xcc0
 unix_stream_read_skb+0x30/0x120
 sk_psock_verdict_data_ready+0x8d/0x2a0
 unix_stream_sendmsg+0x232/0x640
 __sys_sendto+0x1cd/0x1e0
 __x64_sys_sendto+0x20/0x30
 do_syscall_64+0x93/0x180
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Re: [PATCH bpf-next v2 1/3] bpf, sockmap: avoid using sk_socket after free

Posted by Jiayuan Chen 11 months ago

March 10, 2025 at 9:08 PM, "Michal Luczaj" <mhal@rbox.co> wrote:



> 
> On 3/10/25 12:36, Jiayuan Chen wrote:
> 
> > 
> > March 7, 2025 at 5:45 PM, "Michal Luczaj" <mhal@rbox.co> wrote:
> > 
> >  ...
> > 
> > > 
> > > BTW, lockdep (CONFIG_LOCKDEP=y) complains about calling AF_UNIX's
> > > 
> > >  read_skb() under RCU read lock.
> > > 
> > 
> >  My environment also has LOCKDEP enabled, but I didn't see similar
> > 
> >  warnings.
> > 
> >  Moreover, RCU assertions are typically written as:
> > 
> >  
> > 
> >  WARN_ON_ONCE(!rcu_read_lock_held())
> > 
> >  
> > 
> >  And when LOCKDEP is not enabled, rcu_read_lock_held() defaults to
> > 
> >  returning 1. So, it's unlikely to trigger a warning due to an RCU lock
> > 
> >  being held.
> > 
> >  
> > 
> >  Could you provide more of the call stack?
> > 
> 
> Sure, bpf-next with this series applied, test_progs -t sockmap_basic:
> 
> =============================
> 
> [ BUG: Invalid wait context ]
> 
> 6.14.0-rc3+ #111 Tainted: G OE
> 
> -----------------------------
> 
> test_progs/37755 is trying to lock:
> 
> ffff88810d9bc3c0 (&u->iolock){+.+.}-{4:4}, at: unix_stream_read_skb+0x30/0x120
> 
> other info that might help us debug this:
> 
> context-{5:5}
> 
> 1 lock held by test_progs/37755:
> 
>  #0: ffffffff833700e0 (rcu_read_lock){....}-{1:3}, at: sk_psock_verdict_data_ready+0x3e/0x2a0
> 
> stack backtrace:
> 
> CPU: 13 UID: 0 PID: 37755 Comm: test_progs Tainted: G OE 6.14.0-rc3+ #111
> 
> Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> 
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
> 
> Call Trace:
> 
>  dump_stack_lvl+0x68/0x90
> 
>  lock_acquire+0xcf/0x2e0
> 
>  __mutex_lock+0x9c/0xcc0
> 
>  unix_stream_read_skb+0x30/0x120
> 
>  sk_psock_verdict_data_ready+0x8d/0x2a0
> 
>  unix_stream_sendmsg+0x232/0x640
> 
>  __sys_sendto+0x1cd/0x1e0
> 
>  __x64_sys_sendto+0x20/0x30
> 
>  do_syscall_64+0x93/0x180
> 
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
Thanks, I got this stack too after enabling CONFIG_PROVE_LOCKING.
It seems that we can't call sleepable lock such as mutex_lock under rcu-locked context.
I'm working on it.