kernel/bpf/verifier.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
syzbot reported a data race on the `ref` field of `struct bpf_lru_node`:
https://syzkaller.appspot.com/bug?extid=ad4661d6ca888ce7fe11
This race arises when user programs read the `.ref` field from a BPF map
that uses LRU logic, potentially exposing unprotected state.
Accesses to `ref` are already wrapped with READ_ONCE() and WRITE_ONCE().
However, the BPF verifier currently allows unprivileged programs to
read this field via BTF-enabled pointer, bypassing internal assumptions.
To mitigate this, the verifier is updated to disallow access
to the `.ref` field in `struct bpf_lru_node`.
This is done by checking both the base type and field name
in `check_ptr_to_btf_access()` and returning -EACCES if matched.
Reported-by: syzbot+ad4661d6ca888ce7fe11@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6847e661.a70a0220.27c366.005d.GAE@google.com/T/
Signed-off-by: Shankari Anand <shankari.ak0208@gmail.com>
---
kernel/bpf/verifier.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 169845710c7e..775ce454268c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7159,6 +7159,19 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
}
ret = btf_struct_access(&env->log, reg, off, size, atype, &btf_id, &flag, &field_name);
+
+ /* Block access to sensitive kernel-internal fields */
+ if (field_name && reg->btf && btf_is_kernel(reg->btf)) {
+ const struct btf_type *base_type = btf_type_by_id(reg->btf, reg->btf_id);
+ const char *type_name = btf_name_by_offset(reg->btf, base_type->name_off);
+
+ if (strcmp(type_name, "bpf_lru_node") == 0 &&
+ strcmp(field_name, "ref") == 0) {
+ verbose(env,
+ "access to field 'ref' in struct bpf_lru_node is not allowed\n");
+ return -EACCES;
+ }
+ }
}
if (ret < 0)
base-commit: 155a3c003e555a7300d156a5252c004c392ec6b0
--
2.34.1
On Tue, Jul 15, 2025 at 12:58 AM Shankari Anand <shankari.ak0208@gmail.com> wrote: > > syzbot reported a data race on the `ref` field of `struct bpf_lru_node`: > https://syzkaller.appspot.com/bug?extid=ad4661d6ca888ce7fe11 > > This race arises when user programs read the `.ref` field from a BPF map > that uses LRU logic, potentially exposing unprotected state. > > Accesses to `ref` are already wrapped with READ_ONCE() and WRITE_ONCE(). > However, the BPF verifier currently allows unprivileged programs to > read this field via BTF-enabled pointer, bypassing internal assumptions. > > To mitigate this, the verifier is updated to disallow access > to the `.ref` field in `struct bpf_lru_node`. > This is done by checking both the base type and field name > in `check_ptr_to_btf_access()` and returning -EACCES if matched. > > Reported-by: syzbot+ad4661d6ca888ce7fe11@syzkaller.appspotmail.com > Closes: https://lore.kernel.org/all/6847e661.a70a0220.27c366.005d.GAE@google.com/T/ > Signed-off-by: Shankari Anand <shankari.ak0208@gmail.com> > --- > kernel/bpf/verifier.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 169845710c7e..775ce454268c 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -7159,6 +7159,19 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env, > } > > ret = btf_struct_access(&env->log, reg, off, size, atype, &btf_id, &flag, &field_name); > + > + /* Block access to sensitive kernel-internal fields */ This makes no sense. Tracing bpf progs are allowed to read all kernel internal data fields. Also you misread the kcsan report. It says that 'read' comes from: read to 0xffff888118f3d568 of 4 bytes by task 4719 on cpu 1: lookup_nulls_elem_raw kernel/bpf/hashtab.c:643 [inline] which is reading hash and key of htab_elem while write side actually writes hash too: *(u32 *)((void *)node + lru->hash_offset) = hash; Martin, is it really possible for these read/write to race ? -- pw-bot: cr
On 7/15/25 7:49 AM, Alexei Starovoitov wrote: > Also you misread the kcsan report. > > It says that 'read' comes from: > > read to 0xffff888118f3d568 of 4 bytes by task 4719 on cpu 1: > lookup_nulls_elem_raw kernel/bpf/hashtab.c:643 [inline] > > which is reading hash and key of htab_elem while > write side actually writes hash too: > *(u32 *)((void *)node + lru->hash_offset) = hash; > > Martin, > is it really possible for these read/write to race ? I think it is possible. The elem in the lru's freelist currently does not wait for a rcu gp before reuse. There is a chance that the rcu reader is still reading the hash value that was put in the freelist, while the writer is reusing and updating it. I think the percpu_freelist used in the regular hashmap should have similar behavior, so may be worth finding a common solution, such as waiting for a rcu gp before reusing it.
Hello, > > > Also you misread the kcsan report. > It says that 'read' comes from: > > read to 0xffff888118f3d568 of 4 bytes by task 4719 on cpu 1: > lookup_nulls_elem_raw kernel/bpf/hashtab.c:643 [inline] > which is reading hash and key of htab_elem while > write side actually writes hash too: > *(u32 *)((void *)node + lru->hash_offset) = hash; Thanks for the clarification. I misattributed the race to the ref field, but the KCSAN report indeed points to a data race between a reader, lookup_nulls_elem_raw(), accessing the hash or key fields, and a writer, bpf_lru_pop_free(), reinitializing and reusing the same element from the LRU freelist without waiting for an RCU grace period. > I think it is possible. The elem in the lru's freelist currently does not wait > for a rcu gp before reuse. There is a chance that the rcu reader is still > reading the hash value that was put in the freelist, while the writer is reusing > and updating it. > > I think the percpu_freelist used in the regular hashmap should have similar > behavior, so may be worth finding a common solution, such as waiting for a rcu > gp before reusing it. To resolve this, would it make sense to ensure that elements popped from the free list are only reused after a grace period? Similar to how other parts of the kernel manage safe object reuse. -- Regards, Shankari On Wed, Jul 16, 2025 at 2:57 AM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > On 7/15/25 7:49 AM, Alexei Starovoitov wrote: > > Also you misread the kcsan report. > > > > It says that 'read' comes from: > > > > read to 0xffff888118f3d568 of 4 bytes by task 4719 on cpu 1: > > lookup_nulls_elem_raw kernel/bpf/hashtab.c:643 [inline] > > > > which is reading hash and key of htab_elem while > > write side actually writes hash too: > > *(u32 *)((void *)node + lru->hash_offset) = hash; > > > > Martin, > > is it really possible for these read/write to race ? > > I think it is possible. The elem in the lru's freelist currently does not wait > for a rcu gp before reuse. There is a chance that the rcu reader is still > reading the hash value that was put in the freelist, while the writer is reusing > and updating it. > > I think the percpu_freelist used in the regular hashmap should have similar > behavior, so may be worth finding a common solution, such as waiting for a rcu > gp before reusing it.
On 7/15/25 11:32 PM, Shankari Anand wrote: >> I think the percpu_freelist used in the regular hashmap should have similar >> behavior, so may be worth finding a common solution, such as waiting for a rcu >> gp before reusing it. > To resolve this, would it make sense to ensure that elements popped > from the free list are only reused after a grace period? Similar to > how other parts of the kernel manage safe object reuse. The reuse behavior has been there for a long time. It had been discussed before. Please go back to those threads for the background and the direction that it is going. This thread is a good start: https://lore.kernel.org/bpf/20250204082848.13471-3-hotforest@gmail.com/
© 2016 - 2025 Red Hat, Inc.