kernel/bpf/lpm_trie.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
trie_lookup_elem() annotates its rcu_dereference_check() walks with
only rcu_read_lock_bh_held(). Because rcu_dereference_check(p, c)
resolves to "c || rcu_read_lock_held()", this passes for XDP/NAPI and
classic RCU readers but fails for sleepable BPF programs, which enter
via __bpf_prog_enter_sleepable() and hold only rcu_read_lock_trace().
A sleepable LSM hook that ends up doing bpf_map_lookup_elem() on an LPM
trie therefore triggers lockdep on debug kernels:
=============================
WARNING: suspicious RCU usage
7.1.0-... Tainted: G E
-----------------------------
kernel/bpf/lpm_trie.c:249 suspicious rcu_dereference_check() usage!
1 lock held by net_tests/540:
#0: (rcu_tasks_trace_srcu_struct){....}-{0:0},
at: __bpf_prog_enter_sleepable+0x26/0x280
Call Trace:
dump_stack_lvl
lockdep_rcu_suspicious
trie_lookup_elem
bpf_prog_..._enforce_security_socket_connect
bpf_trampoline_...
security_socket_connect
__sys_connect
do_syscall_64
This is lockdep-only -- no UAF, since Tasks Trace RCU does serialize
against the trie's reclaim path -- but it spams the console once per
distinct callsite on every debug kernel running a sleepable BPF LSM
that does map lookups on an LPM trie, which is increasingly common.
Other map types already use the bpf_rcu_lock_held() helper, which
accepts all three contexts (classic, BH, Tasks Trace). Use it here as
well, matching the established convention.
Fixes: 694cea395fde ("bpf: Allow RCU-protected lookups to happen from bh context")
Cc: stable@vger.kernel.org
Signed-off-by: Vlad Poenaru <vlad.wing@gmail.com>
---
kernel/bpf/lpm_trie.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index 0f57608b385d..ac36063cb7e6 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -246,7 +246,7 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key)
/* Start walking the trie from the root node ... */
- for (node = rcu_dereference_check(trie->root, rcu_read_lock_bh_held());
+ for (node = rcu_dereference_check(trie->root, bpf_rcu_lock_held());
node;) {
unsigned int next_bit;
size_t matchlen;
@@ -280,7 +280,7 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key)
*/
next_bit = extract_bit(key->data, node->prefixlen);
node = rcu_dereference_check(node->child[next_bit],
- rcu_read_lock_bh_held());
+ bpf_rcu_lock_held());
}
if (!found)
--
2.53.0-Meta
On Fri May 29, 2026 at 1:42 PM EDT, Vlad Poenaru wrote:
> trie_lookup_elem() annotates its rcu_dereference_check() walks with
> only rcu_read_lock_bh_held(). Because rcu_dereference_check(p, c)
> resolves to "c || rcu_read_lock_held()", this passes for XDP/NAPI and
> classic RCU readers but fails for sleepable BPF programs, which enter
> via __bpf_prog_enter_sleepable() and hold only rcu_read_lock_trace().
>
> A sleepable LSM hook that ends up doing bpf_map_lookup_elem() on an LPM
> trie therefore triggers lockdep on debug kernels:
>
> =============================
> WARNING: suspicious RCU usage
> 7.1.0-... Tainted: G E
> -----------------------------
> kernel/bpf/lpm_trie.c:249 suspicious rcu_dereference_check() usage!
> 1 lock held by net_tests/540:
> #0: (rcu_tasks_trace_srcu_struct){....}-{0:0},
> at: __bpf_prog_enter_sleepable+0x26/0x280
> Call Trace:
> dump_stack_lvl
> lockdep_rcu_suspicious
> trie_lookup_elem
> bpf_prog_..._enforce_security_socket_connect
> bpf_trampoline_...
> security_socket_connect
> __sys_connect
> do_syscall_64
>
> This is lockdep-only -- no UAF, since Tasks Trace RCU does serialize
> against the trie's reclaim path -- but it spams the console once per
> distinct callsite on every debug kernel running a sleepable BPF LSM
> that does map lookups on an LPM trie, which is increasingly common.
>
> Other map types already use the bpf_rcu_lock_held() helper, which
> accepts all three contexts (classic, BH, Tasks Trace). Use it here as
> well, matching the established convention.
>
> Fixes: 694cea395fde ("bpf: Allow RCU-protected lookups to happen from bh context")
> Cc: stable@vger.kernel.org
> Signed-off-by: Vlad Poenaru <vlad.wing@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
> ---
> kernel/bpf/lpm_trie.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
> index 0f57608b385d..ac36063cb7e6 100644
> --- a/kernel/bpf/lpm_trie.c
> +++ b/kernel/bpf/lpm_trie.c
> @@ -246,7 +246,7 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key)
>
> /* Start walking the trie from the root node ... */
>
> - for (node = rcu_dereference_check(trie->root, rcu_read_lock_bh_held());
> + for (node = rcu_dereference_check(trie->root, bpf_rcu_lock_held());
> node;) {
> unsigned int next_bit;
> size_t matchlen;
> @@ -280,7 +280,7 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key)
> */
> next_bit = extract_bit(key->data, node->prefixlen);
> node = rcu_dereference_check(node->child[next_bit],
> - rcu_read_lock_bh_held());
> + bpf_rcu_lock_held());
> }
>
> if (!found)
© 2016 - 2026 Red Hat, Inc.