Documentation/bpf/kfuncs.rst | 26 ++++++++++++++++ kernel/bpf/arena.c | 59 ++++++++++++++++++++++++++++++++++++ 2 files changed, 85 insertions(+)
The page-management kfuncs exposed by BPF arena -
bpf_arena_alloc_pages(), bpf_arena_free_pages() and
bpf_arena_reserve_pages() - are part of the BPF kfunc ABI but lack
rendered documentation. Their contracts (valid argument ranges,
sleepable-only context, and the set of error returns) are today only
discoverable by reading kernel/bpf/arena.c.
Add a kernel-doc comment block above each of the three kfuncs and
render them under a new "BPF arena kfuncs" subsection in
Documentation/bpf/kfuncs.rst, alongside the existing core kfunc
subsections.
No functional change.
Signed-off-by: Dhiraj Shah <find.dhiraj@gmail.com>
---
Changes in v2:
- Fix the return-value description for bpf_arena_alloc_pages(): the kfunc
returns a user-space virtual address (translated by the BPF JIT for
accesses from the BPF program), not a kernel pointer. Thanks to Alexei
Starovoitov, Emil Tsalapatis and the AI reviewers for catching this.
- Drop the "callable only from sleepable BPF programs" claims for
bpf_arena_alloc_pages() and bpf_arena_free_pages(): the verifier
rewrites these calls to their _non_sleepable variants when the calling
program is non-sleepable, so callers do not need to care about this
distinction. Thanks to Emil Tsalapatis.
- Tighten the prose in Documentation/bpf/kfuncs.rst accordingly.
v1: https://lore.kernel.org/bpf/20260521043553.199781-1-find.dhiraj@gmail.com/
Documentation/bpf/kfuncs.rst | 26 ++++++++++++++++
kernel/bpf/arena.c | 59 ++++++++++++++++++++++++++++++++++++
2 files changed, 85 insertions(+)
diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
index 75e6c078e0e7..28b6b477012a 100644
--- a/Documentation/bpf/kfuncs.rst
+++ b/Documentation/bpf/kfuncs.rst
@@ -732,3 +732,29 @@ the verifier. bpf_cgroup_ancestor() can be used as follows:
BPF provides a set of kfuncs that can be used to query, allocate, mutate, and
destroy struct cpumask * objects. Please refer to :ref:`cpumasks-header-label`
for more details.
+
+4.4 BPF arena kfuncs
+--------------------
+
+A BPF arena (``BPF_MAP_TYPE_ARENA``) is a sparsely-populated shared memory
+region that a BPF program and a user-space process can both address. The
+following kfuncs allow a BPF program to allocate, free, and reserve pages
+within an arena:
+
+.. kernel-doc:: kernel/bpf/arena.c
+ :identifiers: bpf_arena_alloc_pages bpf_arena_free_pages bpf_arena_reserve_pages
+
+A typical pattern is to allocate one or more pages, write to them from BPF,
+and let user space access the same pages through its mapping of the arena:
+
+.. code-block:: c
+
+ void __arena *page;
+
+ page = bpf_arena_alloc_pages(&arena, NULL, 1, NUMA_NO_NODE, 0);
+ if (!page)
+ return -ENOMEM;
+
+ /* ... use the page from BPF; user space sees the same bytes ... */
+
+ bpf_arena_free_pages(&arena, page, 1);
diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c
index 49a8f7b1beef..948a43159106 100644
--- a/kernel/bpf/arena.c
+++ b/kernel/bpf/arena.c
@@ -870,6 +870,31 @@ static void arena_free_irq(struct irq_work *iw)
__bpf_kfunc_start_defs();
+/**
+ * bpf_arena_alloc_pages() - Allocate pages within a BPF arena.
+ * @p__map: Pointer to a ``BPF_MAP_TYPE_ARENA`` map.
+ * @addr__ign: Page-aligned user-space address within the arena at which to
+ * place the allocation, or %NULL to let the kernel choose. When
+ * non-NULL the address must fall inside the arena's user VMA
+ * range; otherwise the allocation fails.
+ * @page_cnt: Number of pages to allocate. Must be non-zero and no greater
+ * than the arena's configured size in pages.
+ * @node_id: NUMA node hint for the backing pages, or %NUMA_NO_NODE.
+ * @flags: Reserved for future use; must be 0.
+ *
+ * Allocates @page_cnt pages and inserts them into the arena at the offset
+ * corresponding to @addr__ign (or at an arbitrary free offset when
+ * @addr__ign is %NULL). The pages become accessible to the BPF program
+ * immediately and to user space through the arena's mmap()ed region.
+ *
+ * Return:
+ * * The user-space virtual address of the start of the allocated region on
+ * success. The BPF JIT translates this address for accesses from the BPF
+ * program.
+ * * %NULL if @p__map is not an arena, @flags is non-zero, @page_cnt is zero
+ * or exceeds the arena size, @addr__ign is misaligned or outside the
+ * arena, @node_id is invalid, or the kernel is out of memory.
+ */
__bpf_kfunc void *bpf_arena_alloc_pages(void *p__map, void *addr__ign, u32 page_cnt,
int node_id, u64 flags)
{
@@ -893,6 +918,20 @@ void *bpf_arena_alloc_pages_non_sleepable(void *p__map, void *addr__ign, u32 pag
return (void *)arena_alloc_pages(arena, (long)addr__ign, page_cnt, node_id, false);
}
+
+/**
+ * bpf_arena_free_pages() - Free a range of pages within a BPF arena.
+ * @p__map: Pointer to a ``BPF_MAP_TYPE_ARENA`` map.
+ * @ptr__ign: User-space virtual address of the first page to free, as
+ * returned by bpf_arena_alloc_pages().
+ * @page_cnt: Number of pages to free.
+ *
+ * Releases the backing pages and unmaps them from any user-space mapping
+ * of the arena.
+ *
+ * The call is a no-op when @p__map is not an arena, when @page_cnt is zero,
+ * or when @ptr__ign is %NULL.
+ */
__bpf_kfunc void bpf_arena_free_pages(void *p__map, void *ptr__ign, u32 page_cnt)
{
struct bpf_map *map = p__map;
@@ -913,6 +952,26 @@ void bpf_arena_free_pages_non_sleepable(void *p__map, void *ptr__ign, u32 page_c
arena_free_pages(arena, (long)ptr__ign, page_cnt, false);
}
+/**
+ * bpf_arena_reserve_pages() - Reserve a page range within a BPF arena.
+ * @p__map: Pointer to a ``BPF_MAP_TYPE_ARENA`` map.
+ * @ptr__ign: Page-aligned user-space virtual address of the start of the
+ * range to reserve.
+ * @page_cnt: Number of pages to reserve. Zero is permitted and is a no-op.
+ *
+ * Marks @page_cnt pages starting at @ptr__ign as reserved so that subsequent
+ * bpf_arena_alloc_pages() calls will not place allocations in that range.
+ * No physical pages are allocated by this kfunc; the range is simply
+ * excluded from the arena's free space.
+ *
+ * Return:
+ * * 0 on success, or when @page_cnt is zero.
+ * * -EINVAL if @p__map is not an arena or the requested range falls outside
+ * the arena's user VMA.
+ * * -EBUSY if any page in the requested range is already allocated, or if
+ * contention on the arena's internal spinlock prevents the operation from
+ * completing.
+ */
__bpf_kfunc int bpf_arena_reserve_pages(void *p__map, void *ptr__ign, u32 page_cnt)
{
struct bpf_map *map = p__map;
--
2.43.0
> The page-management kfuncs exposed by BPF arena -
> bpf_arena_alloc_pages(), bpf_arena_free_pages() and
> bpf_arena_reserve_pages() - are part of the BPF kfunc ABI but lack
> rendered documentation. Their contracts (valid argument ranges,
> sleepable-only context, and the set of error returns) are today only
> discoverable by reading kernel/bpf/arena.c.
Is the description 'sleepable-only context' accurate here? The BPF
verifier supports calling bpf_arena_alloc_pages() and bpf_arena_free_pages()
in non-sleepable contexts by dynamically replacing the calls with their
non-sleepable variants. Looking at the BTF_ID_FLAGS definitions, they do not
include KF_SLEEPABLE. While the code documentation was fixed in v2 to reflect
this, the commit message text was not updated.
> diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c
> index 49a8f7b1beef..948a43159106 100644
> --- a/kernel/bpf/arena.c
> +++ b/kernel/bpf/arena.c
> @@ -870,6 +870,31 @@ static void arena_free_irq(struct irq_work *iw)
>
> __bpf_kfunc_start_defs();
>
> +/**
> + * bpf_arena_alloc_pages() - Allocate pages within a BPF arena.
> + * @p__map: Pointer to a ``BPF_MAP_TYPE_ARENA`` map.
> + * @addr__ign: Page-aligned user-space address within the arena at which to
> + * place the allocation, or %NULL to let the kernel choose. When
> + * non-NULL the address must fall inside the arena's user VMA
> + * range; otherwise the allocation fails.
> + * @page_cnt: Number of pages to allocate. Must be non-zero and no greater
> + * than the arena's configured size in pages.
> + * @node_id: NUMA node hint for the backing pages, or %NUMA_NO_NODE.
> + * @flags: Reserved for future use; must be 0.
> + *
> + * Allocates @page_cnt pages and inserts them into the arena at the offset
> + * corresponding to @addr__ign (or at an arbitrary free offset when
> + * @addr__ign is %NULL). The pages become accessible to the BPF program
> + * immediately and to user space through the arena's mmap()ed region.
> + *
> + * Return:
> + * * The user-space virtual address of the start of the allocated region on
> + * success. The BPF JIT translates this address for accesses from the BPF
> + * program.
> + * * %NULL if @p__map is not an arena, @flags is non-zero, @page_cnt is zero
> + * or exceeds the arena size, @addr__ign is misaligned or outside the
> + * arena, @node_id is invalid, or the kernel is out of memory.
> + */
Are there missing failure conditions in this documentation? It appears
bpf_arena_alloc_pages() can also return NULL if the requested address is
already allocated, or if there is no contiguous free space of the requested
size in the arena.
[ ... ]
> @@ -913,6 +952,26 @@ void bpf_arena_free_pages_non_sleepable(void *p__map, void *ptr__ign, u32 page_c
> arena_free_pages(arena, (long)ptr__ign, page_cnt, false);
> }
>
> +/**
> + * bpf_arena_reserve_pages() - Reserve a page range within a BPF arena.
> + * @p__map: Pointer to a ``BPF_MAP_TYPE_ARENA`` map.
> + * @ptr__ign: Page-aligned user-space virtual address of the start of the
> + * range to reserve.
> + * @page_cnt: Number of pages to reserve. Zero is permitted and is a no-op.
> + *
> + * Marks @page_cnt pages starting at @ptr__ign as reserved so that subsequent
> + * bpf_arena_alloc_pages() calls will not place allocations in that range.
> + * No physical pages are allocated by this kfunc; the range is simply
> + * excluded from the arena's free space.
> + *
> + * Return:
> + * * 0 on success, or when @page_cnt is zero.
> + * * -EINVAL if @p__map is not an arena or the requested range falls outside
> + * the arena's user VMA.
> + * * -EBUSY if any page in the requested range is already allocated, or if
> + * contention on the arena's internal spinlock prevents the operation from
> + * completing.
> + */
Does this documentation correctly describe the error returns? Looking at
arena_reserve_pages() in kernel/bpf/arena.c, there's a mismatch between the
documented -EINVAL return for misaligned addresses and the actual
implementation:
arena_reserve_pages() {
...
if (uaddr & ~PAGE_MASK)
return 0;
...
}
The documentation implies that misalignment should return -EINVAL (as it's
a validation error), but the code silently succeeds with return 0.
For consistency, bpf_arena_alloc_pages() has the same behavior (returns
NULL for misalignment) and its documentation correctly states it returns
NULL for misaligned addresses. Should either the code be changed to return
-EINVAL for misalignment to match the new documentation, or should the
documentation be updated to state that misaligned addresses result in
returning 0 (no-op)?
Also, does this need to document -ENOMEM? If range_tree_clear() fails to
allocate a new node during a split operation, it seems this can return
-ENOMEM.
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26360590766
© 2016 - 2026 Red Hat, Inc.