[PATCH] bpf: Fix suspicious RCU usage in LPM trie for sleepable programs

Breno Leitao posted 1 patch 2 months, 1 week ago
kernel/bpf/lpm_trie.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
[PATCH] bpf: Fix suspicious RCU usage in LPM trie for sleepable programs
Posted by Breno Leitao 2 months, 1 week ago
trie_lookup_elem() uses rcu_dereference_check() with
rcu_read_lock_bh_held() as the lockdep condition. This is insufficient
when the lookup is called from a sleepable BPF program, which holds
rcu_read_lock_trace() (via __bpf_prog_enter_sleepable) instead of
rcu_read_lock_bh(). With CONFIG_PROVE_LOCKING enabled, this triggers the
following warning:

  WARNING: suspicious RCU usage
  kernel/bpf/lpm_trie.c:249 suspicious rcu_dereference_check() usage!

  rcu_scheduler_active = 2, debug_locks = 1
  1 lock held by .../...:
   #0: ffffffff86ca5bd8 (rcu_tasks_trace_srcu_struct){....}-{0:0},
       at: __bpf_prog_enter_sleepable+0x26/0x280

  Call Trace:
   <TASK>
   dump_stack_lvl+0x69/0xa0
   lockdep_rcu_suspicious+0x13f/0x1d0
   trie_lookup_elem+0x99e/0x9d0
   bpf_prog_3980d36ecbef0e34_net_check_ip_pod+0x42a/0x510
   bpf_prog_57df4ce643736a70_enforce_security_socket_connect+0x3e9/0x69e
   bpf_trampoline_6442540179+0x60/0xf9
   security_socket_connect+0x25/0x80
   __sys_connect+0x15c/0x280
   __x64_sys_connect+0x76/0x80
   do_syscall_64+0xe6/0x930

Use bpf_rcu_lock_held() instead, which checks all three RCU flavors
(regular, bh, and trace) and is the canonical helper for BPF map
operations.

Fixes: 694cea395fded ("bpf: Allow RCU-protected lookups to happen from bh context")
Cc: stable@vger.kernel.org
Signed-off-by: Breno Leitao <leitao@debian.org>
---
I've hacked a reproducer for this issue, and it could be found at
https://github.com/leitao/linux/commit/59c83f313face36107ef1e8392e27b1cf4887b70
---
 kernel/bpf/lpm_trie.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index 0f57608b385d4..ac36063cb7e62 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -246,7 +246,7 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key)
 
 	/* Start walking the trie from the root node ... */
 
-	for (node = rcu_dereference_check(trie->root, rcu_read_lock_bh_held());
+	for (node = rcu_dereference_check(trie->root, bpf_rcu_lock_held());
 	     node;) {
 		unsigned int next_bit;
 		size_t matchlen;
@@ -280,7 +280,7 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key)
 		 */
 		next_bit = extract_bit(key->data, node->prefixlen);
 		node = rcu_dereference_check(node->child[next_bit],
-					     rcu_read_lock_bh_held());
+					     bpf_rcu_lock_held());
 	}
 
 	if (!found)

---
base-commit: 59c83f313face36107ef1e8392e27b1cf4887b70
change-id: 20260407-bpf_rcu-f6c40fc4f3c6

Best regards,
--  
Breno Leitao <leitao@debian.org>
Re: [PATCH] bpf: Fix suspicious RCU usage in LPM trie for sleepable programs
Posted by Alexei Starovoitov 2 months, 1 week ago
On Tue, Apr 7, 2026 at 3:48 AM Breno Leitao <leitao@debian.org> wrote:
>
> trie_lookup_elem() uses rcu_dereference_check() with
> rcu_read_lock_bh_held() as the lockdep condition. This is insufficient
> when the lookup is called from a sleepable BPF program, which holds
> rcu_read_lock_trace() (via __bpf_prog_enter_sleepable) instead of
> rcu_read_lock_bh(). With CONFIG_PROVE_LOCKING enabled, this triggers the
> following warning:
>
>   WARNING: suspicious RCU usage
>   kernel/bpf/lpm_trie.c:249 suspicious rcu_dereference_check() usage!
>
>   rcu_scheduler_active = 2, debug_locks = 1
>   1 lock held by .../...:
>    #0: ffffffff86ca5bd8 (rcu_tasks_trace_srcu_struct){....}-{0:0},
>        at: __bpf_prog_enter_sleepable+0x26/0x280
>
>   Call Trace:
>    <TASK>
>    dump_stack_lvl+0x69/0xa0
>    lockdep_rcu_suspicious+0x13f/0x1d0
>    trie_lookup_elem+0x99e/0x9d0
>    bpf_prog_3980d36ecbef0e34_net_check_ip_pod+0x42a/0x510
>    bpf_prog_57df4ce643736a70_enforce_security_socket_connect+0x3e9/0x69e
>    bpf_trampoline_6442540179+0x60/0xf9
>    security_socket_connect+0x25/0x80
>    __sys_connect+0x15c/0x280
>    __x64_sys_connect+0x76/0x80
>    do_syscall_64+0xe6/0x930
>
> Use bpf_rcu_lock_held() instead, which checks all three RCU flavors
> (regular, bh, and trace) and is the canonical helper for BPF map
> operations.
>
> Fixes: 694cea395fded ("bpf: Allow RCU-protected lookups to happen from bh context")
> Cc: stable@vger.kernel.org
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
> I've hacked a reproducer for this issue, and it could be found at
> https://github.com/leitao/linux/commit/59c83f313face36107ef1e8392e27b1cf4887b70
> ---
>  kernel/bpf/lpm_trie.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
> index 0f57608b385d4..ac36063cb7e62 100644
> --- a/kernel/bpf/lpm_trie.c
> +++ b/kernel/bpf/lpm_trie.c
> @@ -246,7 +246,7 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key)
>
>         /* Start walking the trie from the root node ... */
>
> -       for (node = rcu_dereference_check(trie->root, rcu_read_lock_bh_held());
> +       for (node = rcu_dereference_check(trie->root, bpf_rcu_lock_held());
>              node;) {
>                 unsigned int next_bit;
>                 size_t matchlen;
> @@ -280,7 +280,7 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key)
>                  */
>                 next_bit = extract_bit(key->data, node->prefixlen);
>                 node = rcu_dereference_check(node->child[next_bit],
> -                                            rcu_read_lock_bh_held());
> +                                            bpf_rcu_lock_held());

This is not a fix.
The issue is deeper than it looks. We discussed it before.