[PATCH] bpf: arena: fix TOCTOU race in arena_vm_fault()

Andrii Kuchmenko posted 1 patch 1 week ago
kernel/bpf/arena.c | 22 +++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)
[PATCH] bpf: arena: fix TOCTOU race in arena_vm_fault()
Posted by Andrii Kuchmenko 1 week ago
The vmalloc_to_page() check and range_tree_clear() in arena_vm_fault()
are not protected by a common critical section. A concurrent
bpf_arena_free_pages() call on the same pgoff can return the physical
page to the allocator between these two operations. arena_vm_fault()
then inserts a stale or already-freed page into the user PTE, resulting
in a SIGSEGV on next access or a silent use-after-free.

Fix: acquire arena->lock before vmalloc_to_page() and hold it through
range_tree_clear(), making the check-and-claim atomic with respect to
concurrent allocators and free operations.

arena->lock is a mutex already used by arena_vm_open() and
arena_vm_close() for vma_list serialization. Reusing it here is
consistent with the existing locking model and avoids introducing a
new lock. arena_vm_fault() runs in page fault context with
mmap_read_lock held and is sleepable, so taking a mutex is safe.

The pte_none() check inside apply_range_set_cb() is not a sufficient
guard: it prevents double-mapping but does not prevent the range tree
desynchronization that occurs when the race is lost, leaving pgoff
marked free while the PTE remains populated.

Fixes: a7d032218a53 ("bpf: Introduce bpf_arena")
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Andrii Kuchmenko <capyenglishlite@gmail.com>
---
 kernel/bpf/arena.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c
index a1b2c3d..e4f5c6d 100644
--- a/kernel/bpf/arena.c
+++ b/kernel/bpf/arena.c
@@ -XXX,7 +XXX,7 @@ static vm_fault_t arena_vm_fault(struct vm_fault *vmf)
 	struct bpf_map *map = vmf->vma->vm_file->private_data;
 	struct bpf_arena *arena = container_of(map, struct bpf_arena, map);
 	struct page *page;
-	long kbase, kaddr;
+	long kbase, kaddr;
 	int ret;
 
 	kbase = bpf_arena_get_kern_vm_start(arena);
@@ -XXX,12 +XXX,24 @@ static vm_fault_t arena_vm_fault(struct vm_fault *vmf)
 	kbase = bpf_arena_get_kern_vm_start(arena);
 	kaddr = kbase + (u32)(vmf->address);
 
+	/*
+	 * Acquire arena->lock before vmalloc_to_page() and hold it through
+	 * range_tree_clear() to close the TOCTOU window.
+	 *
+	 * Without this lock, a concurrent bpf_arena_free_pages() on the
+	 * same pgoff can run between vmalloc_to_page() returning NULL and
+	 * range_tree_clear() completing:
+	 *
+	 *   arena_vm_fault()              bpf_arena_free_pages()
+	 *   vmalloc_to_page() = NULL
+	 *   [window]                      page freed, PTE zeroed in kern vma
+	 *   range_tree_clear(pgoff)
+	 *   alloc_page() + vm_insert_page() -> stale PTE in user vma
+	 *
+	 * The user VMA then holds a reference to a freed physical page.
+	 * Next access produces SIGSEGV or silent use-after-free.
+	 */
+	guard(mutex)(&arena->lock);
+
 	page = vmalloc_to_page((void *)kaddr);
 	if (page)
 		goto out;
-
 	ret = range_tree_clear(&arena->rt, vmf->pgoff, 1);
 	if (ret)
 		return VM_FAULT_SIGBUS;
-- 
2.39.0