[v2] bpf: Clear rb node linkage when freeing bpf_rb_root

[PATCH bpf-next v2] bpf: Clear rb node linkage when freeing bpf_rb_root

Posted by Kaitao Cheng 2 days, 19 hours ago

From: Kaitao Cheng <chengkaitao@kylinos.cn>

bpf_rb_root_free() detaches the root by copying the current rb_root_cached
and then replacing the live root with RB_ROOT_CACHED. It then walks the
copied root and drops each object contained in the tree.

This leaves the rb node state intact while dropping the object. If the
object is refcounted and survives the drop, its bpf_rb_node_kern still
contains an owner pointer to the freed root and stale rb tree linkage. If
a later bpf_rb_root allocation reuses the same address, bpf_rbtree_remove()
can incorrectly pass the owner check and call rb_erase_cached() on a node
whose rb pointers belong to the old tree.

Mirror the list draining behavior by marking nodes as busy while the root
is being detached, then clear the rb node and release the owner before
dropping the containing object. This makes surviving nodes unowned and
safe to reject from remove or accept for a later add.

Fixes: 9c395c1b99bd ("bpf: Add basic bpf_rb_{root,node} support")
Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
---
Changes in v2:
 - Use [PATCH bpf-next] tag
 - Minimized code churn

Link to v1:
https://lore.kernel.org/all/20260601055859.13888-1-kaitao.cheng@linux.dev/

---
 kernel/bpf/helpers.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 03004e4451f5..8ba2b8965caf 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2306,6 +2306,7 @@ void bpf_rb_root_free(const struct btf_field *field, void *rb_root,
 		      struct bpf_spin_lock *spin_lock)
 {
 	struct rb_root_cached orig_root, *root = rb_root;
+	struct bpf_rb_node_kern *node;
 	struct rb_node *pos, *n;
 	void *obj;
 
@@ -2314,14 +2315,20 @@ void bpf_rb_root_free(const struct btf_field *field, void *rb_root,
 
 	__bpf_spin_lock_irqsave(spin_lock);
 	orig_root = *root;
+	bpf_rbtree_postorder_for_each_entry_safe(pos, n, &orig_root.rb_root) {
+		node = rb_entry(pos, struct bpf_rb_node_kern, rb_node);
+		WRITE_ONCE(node->owner, BPF_PTR_POISON);
+	}
 	*root = RB_ROOT_CACHED;
 	__bpf_spin_unlock_irqrestore(spin_lock);
 
 	bpf_rbtree_postorder_for_each_entry_safe(pos, n, &orig_root.rb_root) {
 		obj = pos;
 		obj -= field->graph_root.node_offset;
-
-
+		node = rb_entry(pos, struct bpf_rb_node_kern, rb_node);
+		RB_CLEAR_NODE(pos);
+		/* Ensure __bpf_rbtree_add() sees the node as unlinked. */
+		smp_store_release(&node->owner, NULL);
 		__bpf_obj_drop_impl(obj, field->graph_root.value_rec, false);
 	}
 }
-- 
2.50.1 (Apple Git-155)

Re: [PATCH bpf-next v2] bpf: Clear rb node linkage when freeing bpf_rb_root

Posted by bot+bpf-ci@kernel.org 2 days, 18 hours ago

> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 03004e4451f5..8ba2b8965caf 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2306,6 +2306,7 @@ void bpf_rb_root_free(const struct btf_field *field, void *rb_root,
>  		      struct bpf_spin_lock *spin_lock)
>  {
>  	struct rb_root_cached orig_root, *root = rb_root;
> +	struct bpf_rb_node_kern *node;
>  	struct rb_node *pos, *n;
>  	void *obj;
>
> @@ -2314,14 +2315,20 @@ void bpf_rb_root_free(const struct btf_field *field, void *rb_root,
>
>  	__bpf_spin_lock_irqsave(spin_lock);
>  	orig_root = *root;
> +	bpf_rbtree_postorder_for_each_entry_safe(pos, n, &orig_root.rb_root) {
> +		node = rb_entry(pos, struct bpf_rb_node_kern, rb_node);
> +		WRITE_ONCE(node->owner, BPF_PTR_POISON);
> +	}

An earlier review of v1 raised a concern about this loop that does not
appear to have been resolved in this version.

This new bpf_rbtree_postorder_for_each_entry_safe() walk runs inside the
__bpf_spin_lock_irqsave() critical section, which disables local
interrupts. There is no verifier-enforced limit on the number of nodes
in a bpf rbtree.

Can this unbounded O(N) traversal cause latency spikes, RCU stalls, or
NMI watchdog lockups for a large tree while interrupts are disabled?

rb_first_postorder() and rb_next_postorder(), reached through
bpf_rbtree_postorder_for_each_entry_safe(), are exported and traceable.
If an fentry/fexit bpf program is attached to one of them and tries to
acquire the same bpf_spin_lock, can that deadlock while this loop holds
the lock?

The original discussion is here:

  https://lore.kernel.org/bpf/20260601061503.EFD881F00898@smtp.kernel.org/

The reply on the thread noted the per-iteration work is small but left
the deadlock question open. Would it help to move the owner poisoning
out of the locked section, or otherwise bound the work done under the
lock?

>  	*root = RB_ROOT_CACHED;
>  	__bpf_spin_unlock_irqrestore(spin_lock);
>
>  	bpf_rbtree_postorder_for_each_entry_safe(pos, n, &orig_root.rb_root) {
>  		obj = pos;
>  		obj -= field->graph_root.node_offset;
> -
> -
> +		node = rb_entry(pos, struct bpf_rb_node_kern, rb_node);
> +		RB_CLEAR_NODE(pos);
> +		/* Ensure __bpf_rbtree_add() sees the node as unlinked. */
> +		smp_store_release(&node->owner, NULL);
>  		__bpf_obj_drop_impl(obj, field->graph_root.value_rec, false);
>  	}
>  }


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27008654905