kernel/bpf/arena.c | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-)
The vmalloc_to_page() check and range_tree_clear() in arena_vm_fault()
are not protected by a common critical section. A concurrent
bpf_arena_free_pages() call on the same pgoff can return the physical
page to the allocator between these two operations. arena_vm_fault()
then inserts a stale or already-freed page into the user PTE, resulting
in a SIGSEGV on next access or a silent use-after-free.
Fix: acquire arena->lock before vmalloc_to_page() and hold it through
range_tree_clear(), making the check-and-claim atomic with respect to
concurrent allocators and free operations.
arena->lock is a mutex already used by arena_vm_open() and
arena_vm_close() for vma_list serialization. Reusing it here is
consistent with the existing locking model and avoids introducing a
new lock. arena_vm_fault() runs in page fault context with
mmap_read_lock held and is sleepable, so taking a mutex is safe.
The pte_none() check inside apply_range_set_cb() is not a sufficient
guard: it prevents double-mapping but does not prevent the range tree
desynchronization that occurs when the race is lost, leaving pgoff
marked free while the PTE remains populated.
Fixes: a7d032218a53 ("bpf: Introduce bpf_arena")
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Andrii Kuchmenko <capyenglishlite@gmail.com>
---
kernel/bpf/arena.c | 22 +++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c
index a1b2c3d..e4f5c6d 100644
--- a/kernel/bpf/arena.c
+++ b/kernel/bpf/arena.c
@@ -XXX,7 +XXX,7 @@ static vm_fault_t arena_vm_fault(struct vm_fault *vmf)
struct bpf_map *map = vmf->vma->vm_file->private_data;
struct bpf_arena *arena = container_of(map, struct bpf_arena, map);
struct page *page;
- long kbase, kaddr;
+ long kbase, kaddr;
int ret;
kbase = bpf_arena_get_kern_vm_start(arena);
@@ -XXX,12 +XXX,24 @@ static vm_fault_t arena_vm_fault(struct vm_fault *vmf)
kbase = bpf_arena_get_kern_vm_start(arena);
kaddr = kbase + (u32)(vmf->address);
+ /*
+ * Acquire arena->lock before vmalloc_to_page() and hold it through
+ * range_tree_clear() to close the TOCTOU window.
+ *
+ * Without this lock, a concurrent bpf_arena_free_pages() on the
+ * same pgoff can run between vmalloc_to_page() returning NULL and
+ * range_tree_clear() completing:
+ *
+ * arena_vm_fault() bpf_arena_free_pages()
+ * vmalloc_to_page() = NULL
+ * [window] page freed, PTE zeroed in kern vma
+ * range_tree_clear(pgoff)
+ * alloc_page() + vm_insert_page() -> stale PTE in user vma
+ *
+ * The user VMA then holds a reference to a freed physical page.
+ * Next access produces SIGSEGV or silent use-after-free.
+ */
+ guard(mutex)(&arena->lock);
+
page = vmalloc_to_page((void *)kaddr);
if (page)
goto out;
-
ret = range_tree_clear(&arena->rt, vmf->pgoff, 1);
if (ret)
return VM_FAULT_SIGBUS;
--
2.39.0
© 2016 - 2026 Red Hat, Inc.