bpf: restrict verifier access to bpf_lru_node.ref

[PATCH] bpf: restrict verifier access to bpf_lru_node.ref

Posted by Shankari Anand 2 months, 3 weeks ago

syzbot reported a data race on the `ref` field of `struct bpf_lru_node`:
https://syzkaller.appspot.com/bug?extid=ad4661d6ca888ce7fe11

This race arises when user programs read the `.ref` field from a BPF map
that uses LRU logic, potentially exposing unprotected state.

Accesses to `ref` are already wrapped with READ_ONCE() and WRITE_ONCE().
However, the BPF verifier currently allows unprivileged programs to
read this field via BTF-enabled pointer, bypassing internal assumptions.

To mitigate this, the verifier is updated to disallow access
to the `.ref` field in `struct bpf_lru_node`.
This is done by checking both the base type and field name
in `check_ptr_to_btf_access()` and returning -EACCES if matched.

Reported-by: syzbot+ad4661d6ca888ce7fe11@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6847e661.a70a0220.27c366.005d.GAE@google.com/T/
Signed-off-by: Shankari Anand <shankari.ak0208@gmail.com>
---
 kernel/bpf/verifier.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 169845710c7e..775ce454268c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7159,6 +7159,19 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 		}
 
 		ret = btf_struct_access(&env->log, reg, off, size, atype, &btf_id, &flag, &field_name);
+
+		/* Block access to sensitive kernel-internal fields */
+		if (field_name && reg->btf && btf_is_kernel(reg->btf)) {
+			const struct btf_type *base_type = btf_type_by_id(reg->btf, reg->btf_id);
+			const char *type_name = btf_name_by_offset(reg->btf, base_type->name_off);
+
+			if (strcmp(type_name, "bpf_lru_node") == 0 &&
+				strcmp(field_name, "ref") == 0) {
+				verbose(env,
+					"access to field 'ref' in struct bpf_lru_node is not allowed\n");
+				return -EACCES;
+			}
+		}
 	}
 
 	if (ret < 0)

base-commit: 155a3c003e555a7300d156a5252c004c392ec6b0
-- 
2.34.1

Re: [PATCH] bpf: restrict verifier access to bpf_lru_node.ref

Posted by Alexei Starovoitov 2 months, 3 weeks ago

On Tue, Jul 15, 2025 at 12:58 AM Shankari Anand
<shankari.ak0208@gmail.com> wrote:
>
> syzbot reported a data race on the `ref` field of `struct bpf_lru_node`:
> https://syzkaller.appspot.com/bug?extid=ad4661d6ca888ce7fe11
>
> This race arises when user programs read the `.ref` field from a BPF map
> that uses LRU logic, potentially exposing unprotected state.
>
> Accesses to `ref` are already wrapped with READ_ONCE() and WRITE_ONCE().
> However, the BPF verifier currently allows unprivileged programs to
> read this field via BTF-enabled pointer, bypassing internal assumptions.
>
> To mitigate this, the verifier is updated to disallow access
> to the `.ref` field in `struct bpf_lru_node`.
> This is done by checking both the base type and field name
> in `check_ptr_to_btf_access()` and returning -EACCES if matched.
>
> Reported-by: syzbot+ad4661d6ca888ce7fe11@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/6847e661.a70a0220.27c366.005d.GAE@google.com/T/
> Signed-off-by: Shankari Anand <shankari.ak0208@gmail.com>
> ---
>  kernel/bpf/verifier.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 169845710c7e..775ce454268c 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -7159,6 +7159,19 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
>                 }
>
>                 ret = btf_struct_access(&env->log, reg, off, size, atype, &btf_id, &flag, &field_name);
> +
> +               /* Block access to sensitive kernel-internal fields */

This makes no sense. Tracing bpf progs are allowed to read
all kernel internal data fields.

Also you misread the kcsan report.

It says that 'read' comes from:

read to 0xffff888118f3d568 of 4 bytes by task 4719 on cpu 1:
 lookup_nulls_elem_raw kernel/bpf/hashtab.c:643 [inline]

which is reading hash and key of htab_elem while
write side actually writes hash too:
*(u32 *)((void *)node + lru->hash_offset) = hash;

Martin,
is it really possible for these read/write to race ?

--
pw-bot: cr

Re: [PATCH] bpf: restrict verifier access to bpf_lru_node.ref

Posted by Martin KaFai Lau 2 months, 3 weeks ago

On 7/15/25 7:49 AM, Alexei Starovoitov wrote:
> Also you misread the kcsan report.
> 
> It says that 'read' comes from:
> 
> read to 0xffff888118f3d568 of 4 bytes by task 4719 on cpu 1:
>   lookup_nulls_elem_raw kernel/bpf/hashtab.c:643 [inline]
> 
> which is reading hash and key of htab_elem while
> write side actually writes hash too:
> *(u32 *)((void *)node + lru->hash_offset) = hash;
> 
> Martin,
> is it really possible for these read/write to race ?

I think it is possible. The elem in the lru's freelist currently does not wait 
for a rcu gp before reuse. There is a chance that the rcu reader is still 
reading the hash value that was put in the freelist, while the writer is reusing 
and updating it.

I think the percpu_freelist used in the regular hashmap should have similar 
behavior, so may be worth finding a common solution, such as waiting for a rcu 
gp before reusing it.

Re: [PATCH] bpf: restrict verifier access to bpf_lru_node.ref

Posted by Shankari Anand 2 months, 3 weeks ago

Hello,
>
>
> Also you misread the kcsan report.

> It says that 'read' comes from:
>
> read to 0xffff888118f3d568 of 4 bytes by task 4719 on cpu 1:
>  lookup_nulls_elem_raw kernel/bpf/hashtab.c:643 [inline]

> which is reading hash and key of htab_elem while
> write side actually writes hash too:
> *(u32 *)((void *)node + lru->hash_offset) = hash;

Thanks for the clarification. I misattributed the race to the ref
field, but the KCSAN report indeed points to a data race between a
reader, lookup_nulls_elem_raw(), accessing the hash or key fields, and
a writer, bpf_lru_pop_free(), reinitializing and reusing the same
element from the LRU freelist without waiting for an RCU grace period.

> I think it is possible. The elem in the lru's freelist currently does not wait
> for a rcu gp before reuse. There is a chance that the rcu reader is still
> reading the hash value that was put in the freelist, while the writer is reusing
> and updating it.
>
> I think the percpu_freelist used in the regular hashmap should have similar
> behavior, so may be worth finding a common solution, such as waiting for a rcu
> gp before reusing it.

To resolve this, would it make sense to ensure that elements popped
from the free list are only reused after a grace period? Similar to
how other parts of the kernel manage safe object reuse.

--
Regards,
Shankari



On Wed, Jul 16, 2025 at 2:57 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 7/15/25 7:49 AM, Alexei Starovoitov wrote:
> > Also you misread the kcsan report.
> >
> > It says that 'read' comes from:
> >
> > read to 0xffff888118f3d568 of 4 bytes by task 4719 on cpu 1:
> >   lookup_nulls_elem_raw kernel/bpf/hashtab.c:643 [inline]
> >
> > which is reading hash and key of htab_elem while
> > write side actually writes hash too:
> > *(u32 *)((void *)node + lru->hash_offset) = hash;
> >
> > Martin,
> > is it really possible for these read/write to race ?
>
> I think it is possible. The elem in the lru's freelist currently does not wait
> for a rcu gp before reuse. There is a chance that the rcu reader is still
> reading the hash value that was put in the freelist, while the writer is reusing
> and updating it.
>
> I think the percpu_freelist used in the regular hashmap should have similar
> behavior, so may be worth finding a common solution, such as waiting for a rcu
> gp before reusing it.

Re: [PATCH] bpf: restrict verifier access to bpf_lru_node.ref

Posted by Martin KaFai Lau 2 months, 3 weeks ago

On 7/15/25 11:32 PM, Shankari Anand wrote:
>> I think the percpu_freelist used in the regular hashmap should have similar
>> behavior, so may be worth finding a common solution, such as waiting for a rcu
>> gp before reusing it.
> To resolve this, would it make sense to ensure that elements popped
> from the free list are only reused after a grace period? Similar to
> how other parts of the kernel manage safe object reuse.

The reuse behavior has been there for a long time. It had been discussed
before. Please go back to those threads for the background
and the direction that it is going. This thread is a good start:

https://lore.kernel.org/bpf/20250204082848.13471-3-hotforest@gmail.com/