[PATCH bpf-next v9 3/9] bpf: clear list node owner and unlink before drop

Chengkaitao posted 9 patches 3 days, 22 hours ago
[PATCH bpf-next v9 3/9] bpf: clear list node owner and unlink before drop
Posted by Chengkaitao 3 days, 22 hours ago
From: Kaitao Cheng <chengkaitao@kylinos.cn>

When draining a BPF list_head, clear each node's owner pointer while still
holding the spinlock, so concurrent readers always see a consistent owner.

Delink each node with list_del_init() before calling __bpf_obj_drop_impl(),
preventing subsequent users who hold a reference count to the node from
acquiring an invalid next node.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 kernel/bpf/helpers.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 94fcd4ab39e9..8abb99712043 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2232,7 +2232,7 @@ EXPORT_SYMBOL_GPL(bpf_base_func_proto);
 void bpf_list_head_free(const struct btf_field *field, void *list_head,
 			struct bpf_spin_lock *spin_lock)
 {
-	struct list_head *head = list_head, *orig_head = list_head;
+	struct list_head *head = list_head, *orig_head = list_head, *pos;
 
 	BUILD_BUG_ON(sizeof(struct list_head) > sizeof(struct bpf_list_head));
 	BUILD_BUG_ON(__alignof__(struct list_head) > __alignof__(struct bpf_list_head));
@@ -2247,6 +2247,9 @@ void bpf_list_head_free(const struct btf_field *field, void *list_head,
 	if (!head->next || list_empty(head))
 		goto unlock;
 	head = head->next;
+	/* Clear owner under spinlock to ensure the owner is always valid */
+	for (pos = head; pos != orig_head; pos = pos->next)
+		WRITE_ONCE(container_of(pos, struct bpf_list_node_kern, list_head)->owner, NULL);
 unlock:
 	INIT_LIST_HEAD(orig_head);
 	__bpf_spin_unlock_irqrestore(spin_lock);
@@ -2255,7 +2258,9 @@ void bpf_list_head_free(const struct btf_field *field, void *list_head,
 		void *obj = head;
 
 		obj -= field->graph_root.node_offset;
+		pos = head;
 		head = head->next;
+		list_del_init(pos);
 		/* The contained type can also have resources, including a
 		 * bpf_list_head which needs to be freed.
 		 */
-- 
2.50.1 (Apple Git-155)
Re: [PATCH bpf-next v9 3/9] bpf: clear list node owner and unlink before drop
Posted by bot+bpf-ci@kernel.org 3 days, 21 hours ago
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 94fcd4ab39e9..8abb99712043 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2247,6 +2247,9 @@ void bpf_list_head_free(const struct btf_field *field, void *list_head,
>  	if (!head->next || list_empty(head))
>  		goto unlock;
>  	head = head->next;
> +	/* Clear owner under spinlock to ensure the owner is always valid */
> +	for (pos = head; pos != orig_head; pos = pos->next)
> +		WRITE_ONCE(container_of(pos, struct bpf_list_node_kern, list_head)->owner, NULL);
>  unlock:
>  	INIT_LIST_HEAD(orig_head);
>  	__bpf_spin_unlock_irqrestore(spin_lock);

Setting owner to NULL under lock A is good for concurrent readers
(bpf_list_front/back), but could it open a window for concurrent
writers?

After the lock is released, the drain loop below processes each node
without holding any lock. Meanwhile, if a BPF program on another CPU
holds a refcounted owning reference to one of these nodes (obtained
earlier via bpf_refcount_acquire), it can attempt __bpf_list_add to
a different list under a different lock B.

In __bpf_list_add():

    if (cmpxchg(&node->owner, NULL, BPF_PTR_POISON)) {
        ...
    }
    list_add(n, prev);

Before this patch, owner pointed to the old list head (non-NULL), so
the cmpxchg would fail and the add would be rejected. With owner
cleared to NULL, the cmpxchg succeeds, and list_add() modifies the
node's next/prev to point into list B.

> @@ -2255,7 +2258,9 @@ void bpf_list_head_free(const struct btf_field *field, void *list_head,
>  		void *obj = head;
>
>  		obj -= field->graph_root.node_offset;
> +		pos = head;
>  		head = head->next;
> +		list_del_init(pos);

If the concurrent list_add above has already modified head->next, does
this drain loop then follow into list B's chain? The termination
condition (head != orig_head) would never be satisfied by list B's
nodes, and the loop would call __bpf_obj_drop_impl on list B's nodes
without holding lock B.

Would it be safer to also call list_del_init() under the spinlock
(e.g., splicing into a local list), so that nodes are fully detached
from the chain before the lock is released? The existing comment notes
that draining outside the lock avoids deadlocks from tracing programs,
but the list_del_init operations themselves should be inexpensive
compared to __bpf_obj_drop_impl.

This commit also appears to be a bug fix (addressing stale owner
pointers after drain) but does not include a Fixes: tag. The concurrent
reader issue was introduced when bpf_list_front/back was added.

Suggested tag:

    Fixes: fb5b480205ba ("bpf: Add bpf_list_{front,back} kfunc")


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/23710972607