lib/test_vmalloc.c | 52 ++++++++++++++++++++++++++++++++++++++++++ mm/vmalloc.c | 67 ++++++++++++++++++++++++++++++++++++++---------------- 2 files changed, 100 insertions(+), 19 deletions(-)
This series implements the TODO in vrealloc() to unmap and free unused
pages when shrinking across a page boundary.
Problem:
When vrealloc() shrinks an allocation, it updates bookkeeping
(requested_size, KASAN shadow) but does not free the underlying physical
pages. This wastes memory for the lifetime of the allocation.
Solution:
- Patch 1: Extracts a vm_area_free_pages(vm, start, end) helper from
vfree() that frees a range of pages with memcg and nr_vmalloc_pages
accounting. Freed page pointers are set to NULL to prevent stale
references.
- Patch 2: Uses the helper to free tail pages when vrealloc() shrinks
across a page boundary. Skips huge page allocations (page_order > 0)
since compound pages cannot be partially freed. Allocations with
VM_FLUSH_RESET_PERMS are also skipped. Also fixes the grow-in-place
path to check vm->nr_pages instead of get_vm_area_size(), which
reflects the virtual reservation and does not change on shrink.
- Patch 3: Adds a vrealloc test case to lib/test_vmalloc that exercises
grow-realloc, shrink-across-boundary, shrink-within-page, and
grow-in-place paths with data integrity validation.
The virtual address reservation is kept intact to preserve the range
for potential future grow-in-place support.
A concrete user is the Rust binder driver's KVVec::shrink_to [1], which
performs explicit vrealloc() shrinks for memory reclamation.
Tested:
- KASAN KUnit (vmalloc_oob passes)
- lib/test_vmalloc stress tests (3/3, 1M iterations each)
- checkpatch, sparse, W=1, allmodconfig, coccicheck clean
[1] https://lore.kernel.org/all/20260216-binder-shrink-vec-v3-v6-0-ece8e8593e53@zohomail.in/
Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
---
Changes in v5:
- Skip vrealloc shrink for VM_FLUSH_RESET_PERMS (Uladzislau Rezki)
- Link to v4: https://lore.kernel.org/r/20260314-vmalloc-shrink-v4-0-c1e2e0bb5455@zohomail.in
Changes in v4:
- Rename vmalloc_free_pages() to vm_area_free_pages() to align with
vm_area_alloc_pages() (Uladzislau Rezki)
- NULL out freed vm->pages[] entries to prevent stale pointers (Alice Ryhl)
- Remove redundant if (vm->nr_pages) guard in vfree() (Uladzislau Rezki)
- Add vrealloc test case to lib/test_vmalloc (new patch 3/3)
- Link to v3: https://lore.kernel.org/r/20260309-vmalloc-shrink-v3-0-5590fd8de2eb@zohomail.in
Changes in v3:
- Restore the comment.
- Rebase to the latest mm-new
- Link to v2: https://lore.kernel.org/r/20260304-vmalloc-shrink-v2-0-28c291d60100@zohomail.in
Changes in v2:
- Updated the base-commit to mm-new
- Fix conflicts after rebase
- Ran `clang-format` on the changes made
- Use a single `kasan_vrealloc` (Alice Ryhl)
- Link to v1: https://lore.kernel.org/r/20260302-vmalloc-shrink-v1-0-46deff465b7e@zohomail.in
---
Shivam Kalra (3):
mm/vmalloc: extract vm_area_free_pages() helper from vfree()
mm/vmalloc: free unused pages on vrealloc() shrink
lib/test_vmalloc: add vrealloc test case
lib/test_vmalloc.c | 52 ++++++++++++++++++++++++++++++++++++++++++
mm/vmalloc.c | 67 ++++++++++++++++++++++++++++++++++++++----------------
2 files changed, 100 insertions(+), 19 deletions(-)
---
base-commit: 7d47a508dfdc335c107fb00b4d9ef46488281a52
change-id: 20260302-vmalloc-shrink-04b2fa688a14
Best regards,
--
Shivam Kalra <shivamkalra98@zohomail.in>
On 17/03/26 13:47, Shivam Kalra via B4 Relay wrote: > This series implements the TODO in vrealloc() to unmap and free unused > pages when shrinking across a page boundary. > > Problem: > When vrealloc() shrinks an allocation, it updates bookkeeping > (requested_size, KASAN shadow) but does not free the underlying physical > pages. This wastes memory for the lifetime of the allocation. > > Solution: > - Patch 1: Extracts a vm_area_free_pages(vm, start, end) helper from > vfree() that frees a range of pages with memcg and nr_vmalloc_pages > accounting. Freed page pointers are set to NULL to prevent stale > references. > - Patch 2: Uses the helper to free tail pages when vrealloc() shrinks > across a page boundary. Skips huge page allocations (page_order > 0) > since compound pages cannot be partially freed. Allocations with > VM_FLUSH_RESET_PERMS are also skipped. Also fixes the grow-in-place > path to check vm->nr_pages instead of get_vm_area_size(), which > reflects the virtual reservation and does not change on shrink. > - Patch 3: Adds a vrealloc test case to lib/test_vmalloc that exercises > grow-realloc, shrink-across-boundary, shrink-within-page, and > grow-in-place paths with data integrity validation. > > The virtual address reservation is kept intact to preserve the range > for potential future grow-in-place support. > A concrete user is the Rust binder driver's KVVec::shrink_to [1], which > performs explicit vrealloc() shrinks for memory reclamation. > > Tested: > - KASAN KUnit (vmalloc_oob passes) > - lib/test_vmalloc stress tests (3/3, 1M iterations each) > - checkpatch, sparse, W=1, allmodconfig, coccicheck clean > > [1] https://lore.kernel.org/all/20260216-binder-shrink-vec-v3-v6-0-ece8e8593e53@zohomail.in/ > > Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in> > --- > Changes in v5: > - Skip vrealloc shrink for VM_FLUSH_RESET_PERMS (Uladzislau Rezki) > - Link to v4: https://lore.kernel.org/r/20260314-vmalloc-shrink-v4-0-c1e2e0bb5455@zohomail.in > > Changes in v4: > - Rename vmalloc_free_pages() to vm_area_free_pages() to align with > vm_area_alloc_pages() (Uladzislau Rezki) > - NULL out freed vm->pages[] entries to prevent stale pointers (Alice Ryhl) > - Remove redundant if (vm->nr_pages) guard in vfree() (Uladzislau Rezki) > - Add vrealloc test case to lib/test_vmalloc (new patch 3/3) > - Link to v3: https://lore.kernel.org/r/20260309-vmalloc-shrink-v3-0-5590fd8de2eb@zohomail.in > > Changes in v3: > - Restore the comment. > - Rebase to the latest mm-new > - Link to v2: https://lore.kernel.org/r/20260304-vmalloc-shrink-v2-0-28c291d60100@zohomail.in > > Changes in v2: > - Updated the base-commit to mm-new > - Fix conflicts after rebase > - Ran `clang-format` on the changes made > - Use a single `kasan_vrealloc` (Alice Ryhl) > - Link to v1: https://lore.kernel.org/r/20260302-vmalloc-shrink-v1-0-46deff465b7e@zohomail.in > > --- > Shivam Kalra (3): > mm/vmalloc: extract vm_area_free_pages() helper from vfree() > mm/vmalloc: free unused pages on vrealloc() shrink > lib/test_vmalloc: add vrealloc test case > > lib/test_vmalloc.c | 52 ++++++++++++++++++++++++++++++++++++++++++ > mm/vmalloc.c | 67 ++++++++++++++++++++++++++++++++++++++---------------- > 2 files changed, 100 insertions(+), 19 deletions(-) > --- > base-commit: 7d47a508dfdc335c107fb00b4d9ef46488281a52 > change-id: 20260302-vmalloc-shrink-04b2fa688a14 > > Best regards, Hi everyone, Following up on the concerns raised regarding `get_vm_area_size()` versus `vm->nr_pages << PAGE_SHIFT`, Andrew kindly ran the patchset through an AI review which flagged several concrete issues. I've used those results to audit the code and figure out exactly what breaks when we shrink allocations while preserving the virtual area size. Based on that research, here is what I am planning to include in the v6 series to address these edge cases: 1. Fixing the VM_USERMAP crash Alice correctly pointed out that `remap_vmalloc_range_partial()` relies on `get_vm_area_size()` to validate the mapping size. If we free tail pages but keep `vm->size` unchanged, mapping the full original size would cause a NULL pointer dereference in `vm_insert_page()`. Plan: I'll update the shrink path to explicitly bail out if `VM_USERMAP` is set, ensuring safety for these mappings. 2. Fixing the Kmemleak scanner panic Kmemleak tracks the original allocation size and scans it periodically. If we unmap and free tail pages without notifying kmemleak, its scanner will fault on the unmapped virtual addresses. Plan: I'll add a call to `kmemleak_free_part()` during the shrink to keep its tracked object size updated. 3. Fixing a /proc/vmallocinfo race condition `show_numa_info()` iterates over `v->nr_pages`. During a shrink, modifying `nr_pages` and NULL-ing out the page pointers concurrently could cause a reader to dereference a NULL page pointer. Plan: I'll update the reader to use `READ_ONCE(v->nr_pages)`, and have the shrink path do a `WRITE_ONCE(vm->nr_pages, new_nr_pages)` before freeing the pages. This guarantees that concurrent readers either see the old count with valid pages or the new, smaller count. 4. Fixing a stale data leak on grow A vrealloc grow with `__GFP_ZERO` could leak previously discarded data if an intermediate shrink happened without `__GFP_ZERO` (which skips zeroing the freed region). Plan: I will add mandatory zeroing in the grow-in-place path for `want_init_on_alloc()` to clear any newly exposed bytes. Thanks again to Alice and Danilo for prompting the closer look, and to Andrew for providing the review. I should have v6 ready for review soon. Best regards, Shivam
On Sat, Mar 21, 2026 at 01:45:35PM +0530, Shivam Kalra wrote: > On 17/03/26 13:47, Shivam Kalra via B4 Relay wrote: > 3. Fixing a /proc/vmallocinfo race condition > `show_numa_info()` iterates over `v->nr_pages`. During a shrink, > modifying `nr_pages` and NULL-ing out the page pointers concurrently > could cause a reader to dereference a NULL page pointer. > Plan: I'll update the reader to use `READ_ONCE(v->nr_pages)`, and have > the shrink path do a `WRITE_ONCE(vm->nr_pages, new_nr_pages)` before > freeing the pages. This guarantees that concurrent readers either see > the old count with valid pages or the new, smaller count. This doesn't fix the race. Consider this: nr < vm->nr_pages == true vm->nr_pages = nr free vm->pages[nr] page_to_nid(v->pages[nr]) // UAF perhaps changing vm->nr_pages should happen under the vn->busy.lock spinlock? show_numa_info() is called under that lock too. Alice
On Sun, Mar 22, 2026 at 12:48:28PM +0000, Alice Ryhl wrote: > On Sat, Mar 21, 2026 at 01:45:35PM +0530, Shivam Kalra wrote: > > On 17/03/26 13:47, Shivam Kalra via B4 Relay wrote: > > 3. Fixing a /proc/vmallocinfo race condition > > `show_numa_info()` iterates over `v->nr_pages`. During a shrink, > > modifying `nr_pages` and NULL-ing out the page pointers concurrently > > could cause a reader to dereference a NULL page pointer. > > Plan: I'll update the reader to use `READ_ONCE(v->nr_pages)`, and have > > the shrink path do a `WRITE_ONCE(vm->nr_pages, new_nr_pages)` before > > freeing the pages. This guarantees that concurrent readers either see > > the old count with valid pages or the new, smaller count. > > This doesn't fix the race. Consider this: > > nr < vm->nr_pages == true > vm->nr_pages = nr > free vm->pages[nr] > page_to_nid(v->pages[nr]) // UAF > > perhaps changing vm->nr_pages should happen under the vn->busy.lock > spinlock? show_numa_info() is called under that lock too. > vn->busy.lock protects VA in a busy tree. So if you update the nr_pages of given VA you should hold the lock of the node VA belongs to. -- Uladzislau Rezki
On Sat, 21 Mar 2026 13:45:35 +0530 Shivam Kalra <shivamkalra98@zohomail.in> wrote: > Thanks again to Alice and Danilo for prompting the closer look, and to > Andrew for providing the review. Well, thanks to those who developed and provided the reviewbot! Reviews seem to take 12+ hours at present. Go to https://sashiko.dev/, select linux-mm list, paste in the subject. Or go directly to id=<Message-ID> https://sashiko.dev/#/patchset/$id
On Tue, 17 Mar 2026 13:47:32 +0530 Shivam Kalra via B4 Relay <devnull+shivamkalra98.zohomail.in@kernel.org> wrote: > This series implements the TODO in vrealloc() to unmap and free unused > pages when shrinking across a page boundary. Lots of questions have been posed by AI review: https://sashiko.dev/#/patchset/20260317-vmalloc-shrink-v5-0-bbfbf54c5265@zohomail.in
On 18/03/26 02:41, Andrew Morton wrote: > On Tue, 17 Mar 2026 13:47:32 +0530 Shivam Kalra via B4 Relay <devnull+shivamkalra98.zohomail.in@kernel.org> wrote: > >> This series implements the TODO in vrealloc() to unmap and free unused >> pages when shrinking across a page boundary. > > Lots of questions have been posed by AI review: > > https://sashiko.dev/#/patchset/20260317-vmalloc-shrink-v5-0-bbfbf54c5265@zohomail.in These seem like valid concerns. I will revisit the patch series, and post a new improved version of it. Allow me some time.
© 2016 - 2026 Red Hat, Inc.