From nobody Mon Feb 9 20:35:20 2026 Received: from mailtransmit04.runbox.com (mailtransmit04.runbox.com [185.226.149.37]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B455D34F472; Sat, 7 Feb 2026 14:35:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.226.149.37 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770474934; cv=none; b=QMQbBgtvuLh33PwEqnVs9bWJl1Xg312P7gtUVNbATsyEeKouO3yO7Kfd5XeYqoLCRhF6y1gk1q+Hm8TSsbqwvoHd9mNCRILaSIjHiuNBGeiYlQmJIqnZoZXeYQ9McWLwckYrcvalg9kM090L+z6jydv5d3jc5caViQZM7I0HLeE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770474934; c=relaxed/simple; bh=qlrHDBtba3vTO/Bn5PxIN6BjuWLrUyUbnIk8fyr4qVk=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=iYtFcNvNfK5DiNjk9m+erY3t8F+E0JnL+BpPC+zTInLng0ULyIxitRdmbPGY/BjCses64LL2hCmvXA6uhErQLXWQH+fyBQM72luk4yt1i4TTzoBoP8arq0Z9e/UgzmL4WHTwyRLT208fad/v8PXH8TJCvfa19Mn9e9Rf0X7ZNlY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=rbox.co; spf=pass smtp.mailfrom=rbox.co; dkim=pass (2048-bit key) header.d=rbox.co header.i=@rbox.co header.b=lIwpbTYL; arc=none smtp.client-ip=185.226.149.37 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=rbox.co Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=rbox.co Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=rbox.co header.i=@rbox.co header.b="lIwpbTYL" Received: from mailtransmit03.runbox ([10.9.9.163] helo=aibo.runbox.com) by mailtransmit04.runbox.com with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from ) id 1vojPC-00AhCc-Vc; Sat, 07 Feb 2026 15:35:26 +0100 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=rbox.co; s=selector2; h=Cc:To:In-Reply-To:References:Message-Id: Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date:From; bh=DX8BCBW2MTYLYee6mM+3hgNqEdhj4yKdeAJaLf3biuc=; b=lIwpbTYLXGC+Wthoc+yHKFo44V jC4rVC337/M8TjscPMl4qXimLclxqCxnYnJCfl0ebv3c9jbedjAlKPawhAdXrqKv1tXfhi4HizayM 850bnokkpLH21fPhMO5WkvjWzRtCGF/Lo8RepcWHHRkB4+DCshqDlZdVjfThdIcyUnQqNEdua3rl8 5c8VFAYK5RlREiKscmMhJPrIzKFIxsaqET9vhNoxAG+H8xvAEHxAcHR7AQXL+mXudMNGSRwPVBwSs Pb+sxzZ07Yk5l/vP6tioaeDt6qWQXJjp1Kb0jet2QJoeetobwzzXTNaXQRXS/5DEVnMBMJ3krBzMj FtM+0WJA==; Received: from [10.9.9.72] (helo=submission01.runbox) by mailtransmit03.runbox with esmtp (Exim 4.86_2) (envelope-from ) id 1vojPC-0005bg-KY; Sat, 07 Feb 2026 15:35:26 +0100 Received: by submission01.runbox with esmtpsa [Authenticated ID (604044)] (TLS1.2:ECDHE_SECP256R1__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.93) id 1vojOt-006OQ1-Ks; Sat, 07 Feb 2026 15:35:07 +0100 From: Michal Luczaj Date: Sat, 07 Feb 2026 15:34:56 +0100 Subject: [PATCH bpf v2 3/4] bpf, sockmap: Adapt for the af_unix-specific lock Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260207-unix-proto-update-null-ptr-deref-v2-3-9f091330e7cd@rbox.co> References: <20260207-unix-proto-update-null-ptr-deref-v2-0-9f091330e7cd@rbox.co> In-Reply-To: <20260207-unix-proto-update-null-ptr-deref-v2-0-9f091330e7cd@rbox.co> To: John Fastabend , Jakub Sitnicki , Kuniyuki Iwashima , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Daniel Borkmann , Willem de Bruijn , Cong Wang , Alexei Starovoitov , Yonghong Song , Andrii Nakryiko , Eduard Zingerman , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Shuah Khan Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Michal Luczaj X-Mailer: b4 0.14.3 unix_stream_connect() sets sk_state (`WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)`) _before_ it assigns a peer (`unix_peer(sk) =3D newsk`). sk_state =3D=3D TCP_ESTABLISHED makes sock_map_sk_state_allowed() believe t= hat socket is properly set up, which would include having a defined peer. IOW, there's a window when unix_stream_bpf_update_proto() can be called on socket which still has unix_peer(sk) =3D=3D NULL. T0 bpf T1 connect ------ ---------- WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED) sock_map_sk_state_allowed(sk) ... sk_pair =3D unix_peer(sk) sock_hold(sk_pair) sock_hold(newsk) smp_mb__after_atomic() unix_peer(sk) =3D newsk BUG: kernel NULL pointer dereference, address: 0000000000000080 RIP: 0010:unix_stream_bpf_update_proto+0xa0/0x1b0 Call Trace: sock_map_link+0x564/0x8b0 sock_map_update_common+0x6e/0x340 sock_map_update_elem_sys+0x17d/0x240 __sys_bpf+0x26db/0x3250 __x64_sys_bpf+0x21/0x30 do_syscall_64+0x6b/0x3a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Initial idea was to move peer assignment _before_ the sk_state update[1], but that involved an additional memory barrier, and changing the hot path was rejected. Then a check during proto update was considered[2], but a follow-up discussion[3] concluded the root cause is sockmap taking a wrong lock. Thus, teach sockmap about the af_unix-specific locking: instead of the usual lock_sock() involving sock::sk_lock, af_unix protects critical sections under unix_state_lock() operating on unix_sock::lock. [1]: https://lore.kernel.org/netdev/ba5c50aa-1df4-40c2-ab33-a72022c5a32e@rb= ox.co/ [2]: https://lore.kernel.org/netdev/20240610174906.32921-1-kuniyu@amazon.co= m/ [3]: https://lore.kernel.org/netdev/7603c0e6-cd5b-452b-b710-73b64bd9de26@li= nux.dev/ This patch also happens to fix a deadlock that may occur when bpf_iter_unix_seq_show()'s lock_sock_fast() takes the fast path and the iter prog attempts to update a sockmap. Which ends up spinning at sock_map_update_elem()'s bh_lock_sock(): WARNING: possible recursive locking detected Suggested-by: Kuniyuki Iwashima Suggested-by: Martin KaFai Lau -------------------------------------------- test_progs/1393 is trying to acquire lock: ffff88811ec25f58 (slock-AF_UNIX){+...}-{3:3}, at: sock_map_update_elem+0xdb= /0x1f0 but task is already holding lock: ffff88811ec25f58 (slock-AF_UNIX){+...}-{3:3}, at: __lock_sock_fast+0x37/0xe0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(slock-AF_UNIX); lock(slock-AF_UNIX); *** DEADLOCK *** May be due to missing lock nesting notation 4 locks held by test_progs/1393: #0: ffff88814b59c790 (&p->lock){+.+.}-{4:4}, at: bpf_seq_read+0x59/0x10d0 #1: ffff88811ec25fd8 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: bpf_seq_read+0x42c= /0x10d0 #2: ffff88811ec25f58 (slock-AF_UNIX){+...}-{3:3}, at: __lock_sock_fast+0x3= 7/0xe0 #3: ffffffff85a6a7c0 (rcu_read_lock){....}-{1:3}, at: bpf_iter_run_prog+0x= 51d/0xb00 Call Trace: dump_stack_lvl+0x5d/0x80 print_deadlock_bug.cold+0xc0/0xce __lock_acquire+0x130f/0x2590 lock_acquire+0x14e/0x2b0 _raw_spin_lock+0x30/0x40 sock_map_update_elem+0xdb/0x1f0 bpf_prog_2d0075e5d9b721cd_dump_unix+0x55/0x4f4 bpf_iter_run_prog+0x5b9/0xb00 bpf_iter_unix_seq_show+0x1f7/0x2e0 bpf_seq_read+0x42c/0x10d0 vfs_read+0x171/0xb20 ksys_read+0xff/0x200 do_syscall_64+0x6b/0x3a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Suggested-by: Kuniyuki Iwashima Suggested-by: Martin KaFai Lau Fixes: c63829182c37 ("af_unix: Implement ->psock_update_sk_prot()") Fixes: 2c860a43dd77 ("bpf: af_unix: Implement BPF iterator for UNIX domain = socket.") Signed-off-by: Michal Luczaj --- Keeping sparse annotations in sock_map_sk_{acquire,release}() required some hackery I'm not proud of. Is there a better way? --- net/core/sock_map.c | 47 +++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 39 insertions(+), 8 deletions(-) diff --git a/net/core/sock_map.c b/net/core/sock_map.c index b6586d9590b7..0c638b1f363a 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -12,6 +12,7 @@ #include #include #include +#include #include =20 struct bpf_stab { @@ -115,17 +116,49 @@ int sock_map_prog_detach(const union bpf_attr *attr, = enum bpf_prog_type ptype) } =20 static void sock_map_sk_acquire(struct sock *sk) - __acquires(&sk->sk_lock.slock) + __acquires(sock_or_unix_lock) { - lock_sock(sk); + if (sk_is_unix(sk)) { + unix_state_lock(sk); + __release(sk); /* Silence sparse. */ + } else { + lock_sock(sk); + } + rcu_read_lock(); } =20 static void sock_map_sk_release(struct sock *sk) - __releases(&sk->sk_lock.slock) + __releases(sock_or_unix_lock) { rcu_read_unlock(); - release_sock(sk); + + if (sk_is_unix(sk)) { + unix_state_unlock(sk); + __acquire(sk); /* Silence sparse. */ + } else { + release_sock(sk); + } +} + +static inline void sock_map_sk_acquire_fast(struct sock *sk) +{ + local_bh_disable(); + + if (sk_is_unix(sk)) + unix_state_lock(sk); + else + bh_lock_sock(sk); +} + +static inline void sock_map_sk_release_fast(struct sock *sk) +{ + if (sk_is_unix(sk)) + unix_state_unlock(sk); + else + bh_unlock_sock(sk); + + local_bh_enable(); } =20 static void sock_map_add_link(struct sk_psock *psock, @@ -604,16 +637,14 @@ static long sock_map_update_elem(struct bpf_map *map,= void *key, if (!sock_map_sk_is_suitable(sk)) return -EOPNOTSUPP; =20 - local_bh_disable(); - bh_lock_sock(sk); + sock_map_sk_acquire_fast(sk); if (!sock_map_sk_state_allowed(sk)) ret =3D -EOPNOTSUPP; else if (map->map_type =3D=3D BPF_MAP_TYPE_SOCKMAP) ret =3D sock_map_update_common(map, *(u32 *)key, sk, flags); else ret =3D sock_hash_update_common(map, key, sk, flags); - bh_unlock_sock(sk); - local_bh_enable(); + sock_map_sk_release_fast(sk); return ret; } =20 --=20 2.52.0