kernel/bpf/helpers.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-)
From: Kaitao Cheng <chengkaitao@kylinos.cn>
bpf_rb_root_free() detaches the root by copying the current rb_root_cached
and then replacing the live root with RB_ROOT_CACHED. It then walks the
copied root and drops each object contained in the tree.
This leaves the rb node state intact while dropping the object. If the
object is refcounted and survives the drop, its bpf_rb_node_kern still
contains an owner pointer to the freed root and stale rb tree linkage. If
a later bpf_rb_root allocation reuses the same address, bpf_rbtree_remove()
can incorrectly pass the owner check and call rb_erase_cached() on a node
whose rb pointers belong to the old tree.
Mirror the list draining behavior by marking nodes as busy while the root
is being detached, then clear the rb node and release the owner before
dropping the containing object. This makes surviving nodes unowned and
safe to reject from remove or accept for a later add.
Fixes: 9c395c1b99bd ("bpf: Add basic bpf_rb_{root,node} support")
Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
kernel/bpf/helpers.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 9ca195104667..46e8eada463b 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2307,22 +2307,30 @@ void bpf_rb_root_free(const struct btf_field *field, void *rb_root,
{
struct rb_root_cached orig_root, *root = rb_root;
struct rb_node *pos, *n;
- void *obj;
BUILD_BUG_ON(sizeof(struct rb_root_cached) > sizeof(struct bpf_rb_root));
BUILD_BUG_ON(__alignof__(struct rb_root_cached) > __alignof__(struct bpf_rb_root));
__bpf_spin_lock_irqsave(spin_lock);
orig_root = *root;
+ bpf_rbtree_postorder_for_each_entry_safe(pos, n, &orig_root.rb_root) {
+ struct bpf_rb_node_kern *node;
+
+ node = rb_entry(pos, struct bpf_rb_node_kern, rb_node);
+ WRITE_ONCE(node->owner, BPF_PTR_POISON);
+ }
*root = RB_ROOT_CACHED;
__bpf_spin_unlock_irqrestore(spin_lock);
bpf_rbtree_postorder_for_each_entry_safe(pos, n, &orig_root.rb_root) {
- obj = pos;
- obj -= field->graph_root.node_offset;
+ struct bpf_rb_node_kern *node;
-
- __bpf_obj_drop_impl(obj, field->graph_root.value_rec, false);
+ node = rb_entry(pos, struct bpf_rb_node_kern, rb_node);
+ RB_CLEAR_NODE(pos);
+ /* Ensure __bpf_rbtree_add() sees the node as unlinked. */
+ smp_store_release(&node->owner, NULL);
+ __bpf_obj_drop_impl((char *)pos - field->graph_root.node_offset,
+ field->graph_root.value_rec, false);
}
}
--
2.50.1 (Apple Git-155)
On 5/31/26 10:58 PM, Kaitao Cheng wrote:
> From: Kaitao Cheng <chengkaitao@kylinos.cn>
>
> bpf_rb_root_free() detaches the root by copying the current rb_root_cached
> and then replacing the live root with RB_ROOT_CACHED. It then walks the
> copied root and drops each object contained in the tree.
>
> This leaves the rb node state intact while dropping the object. If the
> object is refcounted and survives the drop, its bpf_rb_node_kern still
> contains an owner pointer to the freed root and stale rb tree linkage. If
> a later bpf_rb_root allocation reuses the same address, bpf_rbtree_remove()
> can incorrectly pass the owner check and call rb_erase_cached() on a node
> whose rb pointers belong to the old tree.
>
> Mirror the list draining behavior by marking nodes as busy while the root
> is being detached, then clear the rb node and release the owner before
> dropping the containing object. This makes surviving nodes unowned and
> safe to reject from remove or accept for a later add.
>
> Fixes: 9c395c1b99bd ("bpf: Add basic bpf_rb_{root,node} support")
> Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
Please use [PATCH bpf] tag so CI can test it. Do we need a selftest?
LGTM with a few nits below.
Acked-by: Yonghong Song <yonghong.song@linux.dev>
> ---
> kernel/bpf/helpers.c | 18 +++++++++++++-----
> 1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 9ca195104667..46e8eada463b 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2307,22 +2307,30 @@ void bpf_rb_root_free(const struct btf_field *field, void *rb_root,
> {
> struct rb_root_cached orig_root, *root = rb_root;
> struct rb_node *pos, *n;
> - void *obj;
>
> BUILD_BUG_ON(sizeof(struct rb_root_cached) > sizeof(struct bpf_rb_root));
> BUILD_BUG_ON(__alignof__(struct rb_root_cached) > __alignof__(struct bpf_rb_root));
>
> __bpf_spin_lock_irqsave(spin_lock);
> orig_root = *root;
> + bpf_rbtree_postorder_for_each_entry_safe(pos, n, &orig_root.rb_root) {
> + struct bpf_rb_node_kern *node;
Move 'struct bpf_rb_node_kern *node;' and the below to the top function declaration.
This will make code simpler.
> +
> + node = rb_entry(pos, struct bpf_rb_node_kern, rb_node);
> + WRITE_ONCE(node->owner, BPF_PTR_POISON);
> + }
> *root = RB_ROOT_CACHED;
> __bpf_spin_unlock_irqrestore(spin_lock);
>
> bpf_rbtree_postorder_for_each_entry_safe(pos, n, &orig_root.rb_root) {
> - obj = pos;
> - obj -= field->graph_root.node_offset;
We can keep this two ...
> + struct bpf_rb_node_kern *node;
>
> -
> - __bpf_obj_drop_impl(obj, field->graph_root.value_rec, false);
> + node = rb_entry(pos, struct bpf_rb_node_kern, rb_node);
> + RB_CLEAR_NODE(pos);
> + /* Ensure __bpf_rbtree_add() sees the node as unlinked. */
> + smp_store_release(&node->owner, NULL);
> + __bpf_obj_drop_impl((char *)pos - field->graph_root.node_offset,
> + field->graph_root.value_rec, false);
and then __bpf_obj_drop_impl(...) will not change.
> }
> }
>
在 2026/6/2 02:06, Yonghong Song 写道:
>
>
> On 5/31/26 10:58 PM, Kaitao Cheng wrote:
>> From: Kaitao Cheng <chengkaitao@kylinos.cn>
>>
>> bpf_rb_root_free() detaches the root by copying the current rb_root_cached
>> and then replacing the live root with RB_ROOT_CACHED. It then walks the
>> copied root and drops each object contained in the tree.
>>
>> This leaves the rb node state intact while dropping the object. If the
>> object is refcounted and survives the drop, its bpf_rb_node_kern still
>> contains an owner pointer to the freed root and stale rb tree linkage. If
>> a later bpf_rb_root allocation reuses the same address, bpf_rbtree_remove()
>> can incorrectly pass the owner check and call rb_erase_cached() on a node
>> whose rb pointers belong to the old tree.
>>
>> Mirror the list draining behavior by marking nodes as busy while the root
>> is being detached, then clear the rb node and release the owner before
>> dropping the containing object. This makes surviving nodes unowned and
>> safe to reject from remove or accept for a later add.
>>
>> Fixes: 9c395c1b99bd ("bpf: Add basic bpf_rb_{root,node} support")
>> Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
>
> Please use [PATCH bpf] tag so CI can test it. Do we need a selftest?
The bug fixed by this patch has fairly strict reproduction conditions,
so it is difficult to find a stable reproducer.
I have addressed the other feedback. Thanks for your review.
please see v2 for details.
https://lore.kernel.org/all/20260605094143.5509-1-kaitao.cheng@linux.dev/
> LGTM with a few nits below.
>
> Acked-by: Yonghong Song <yonghong.song@linux.dev>
>
>> ---
>> kernel/bpf/helpers.c | 18 +++++++++++++-----
>> 1 file changed, 13 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index 9ca195104667..46e8eada463b 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -2307,22 +2307,30 @@ void bpf_rb_root_free(const struct btf_field *field, void *rb_root,
>> {
>> struct rb_root_cached orig_root, *root = rb_root;
>> struct rb_node *pos, *n;
>> - void *obj;
>> BUILD_BUG_ON(sizeof(struct rb_root_cached) > sizeof(struct bpf_rb_root));
>> BUILD_BUG_ON(__alignof__(struct rb_root_cached) > __alignof__(struct bpf_rb_root));
>> __bpf_spin_lock_irqsave(spin_lock);
>> orig_root = *root;
>> + bpf_rbtree_postorder_for_each_entry_safe(pos, n, &orig_root.rb_root) {
>> + struct bpf_rb_node_kern *node;
>
> Move 'struct bpf_rb_node_kern *node;' and the below to the top function declaration.
> This will make code simpler.
>
>> +
>> + node = rb_entry(pos, struct bpf_rb_node_kern, rb_node);
>> + WRITE_ONCE(node->owner, BPF_PTR_POISON);
>> + }
>> *root = RB_ROOT_CACHED;
>> __bpf_spin_unlock_irqrestore(spin_lock);
>> bpf_rbtree_postorder_for_each_entry_safe(pos, n, &orig_root.rb_root) {
>> - obj = pos;
>> - obj -= field->graph_root.node_offset;
>
> We can keep this two ...
>
>> + struct bpf_rb_node_kern *node;
>> -
>> - __bpf_obj_drop_impl(obj, field->graph_root.value_rec, false);
>> + node = rb_entry(pos, struct bpf_rb_node_kern, rb_node);
>> + RB_CLEAR_NODE(pos);
>> + /* Ensure __bpf_rbtree_add() sees the node as unlinked. */
>> + smp_store_release(&node->owner, NULL);
>> + __bpf_obj_drop_impl((char *)pos - field->graph_root.node_offset,
>> + field->graph_root.value_rec, false);
>
> and then __bpf_obj_drop_impl(...) will not change.
>
>> }
>> }
>>
>
--
Thanks
Kaitao Cheng
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 9ca195104667..46e8eada463b 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2307,22 +2307,30 @@ void bpf_rb_root_free(const struct btf_field *field, void *rb_root,
> {
> struct rb_root_cached orig_root, *root = rb_root;
> struct rb_node *pos, *n;
> - void *obj;
>
> BUILD_BUG_ON(sizeof(struct rb_root_cached) > sizeof(struct bpf_rb_root));
> BUILD_BUG_ON(__alignof__(struct rb_root_cached) > __alignof__(struct bpf_rb_root));
>
> __bpf_spin_lock_irqsave(spin_lock);
> orig_root = *root;
> + bpf_rbtree_postorder_for_each_entry_safe(pos, n, &orig_root.rb_root) {
> + struct bpf_rb_node_kern *node;
> +
> + node = rb_entry(pos, struct bpf_rb_node_kern, rb_node);
> + WRITE_ONCE(node->owner, BPF_PTR_POISON);
> + }
> *root = RB_ROOT_CACHED;
> __bpf_spin_unlock_irqrestore(spin_lock);
Since there is no verifier-enforced limit on the number of nodes in a BPF
rbtree, could this O(N) post-order traversal trigger latency spikes, RCU
stalls, or NMI watchdog lockups?
This loop runs inside the __bpf_spin_lock_irqsave() critical section, which
disables local interrupts.
Additionally, since rb_first_postorder() and rb_next_postorder() are exported
and traceable, if an fentry/fexit BPF program is attached to them and attempts
to acquire the same bpf_spin_lock, will it cause a hard deadlock?
(from review discussion at
https://lore.kernel.org/bpf/20260601061503.EFD881F00898@smtp.kernel.org/)
[ ... ]
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26738658508
© 2016 - 2026 Red Hat, Inc.