lib/test_vmalloc.c | 62 +++++++++++++++++++++++ mm/vmalloc.c | 143 ++++++++++++++++++++++++++++++++++++++++++----------- 2 files changed, 175 insertions(+), 30 deletions(-)
This series implements the TODO in vrealloc() to unmap and free unused
pages when shrinking across a page boundary.
Problem:
When vrealloc() shrinks an allocation, it updates bookkeeping
(requested_size, KASAN shadow) but does not free the underlying physical
pages. This wastes memory for the lifetime of the allocation.
Solution:
- Patch 1: Extracts a vm_area_free_pages(vm, start_idx, end_idx) helper
from vfree() that frees a range of pages with memcg and nr_vmalloc_pages
accounting. Freed page pointers are set to NULL to prevent stale
references.
- Patch 2: Fixes the grow-in-place path to check vm->nr_pages instead
of get_vm_area_size(), which reflects the virtual reservation and does
not change on shrink. This is a prerequisite for shrinking.
- Patch 3: Zeros newly exposed memory on vrealloc() grow if __GFP_ZERO
is requested, preventing stale data leaks from previously shrunk regions.
- Patch 4: Protects /proc/vmallocinfo readers with READ_ONCE() to safely
handle concurrent decreases to vm->nr_pages and NULL page pointers.
- Patch 5: Uses the helper to free tail pages when vrealloc() shrinks
across a page boundary. Skips huge page allocations, VM_FLUSH_RESET_PERMS,
and VM_USERMAP. Updates Kmemleak tracking of the allocation.
- Patch 6: Adds a vrealloc test case to lib/test_vmalloc that exercises
grow-realloc, shrink-across-boundary, shrink-within-page, and
grow-in-place paths.
The virtual address reservation is kept intact to preserve the range
for potential future grow-in-place support.
A concrete user is the Rust binder driver's KVVec::shrink_to [1], which
performs explicit vrealloc() shrinks for memory reclamation.
Tested:
- KASAN KUnit (vmalloc_oob passes)
- lib/test_vmalloc stress tests (3/3, 1M iterations each)
- checkpatch, sparse, W=1, allmodconfig, coccicheck clean
[1] https://lore.kernel.org/all/20260216-binder-shrink-vec-v3-v6-0-ece8e8593e53@zohomail.in/
Suggested-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
---
Changes in v8:
- Strip the KASAN tag from the pointer before addr_to_node()
to avoid acquiring the wrong node lock (Sashiko).
- Rebase to latest mm-new.
- Link to v7: https://lore.kernel.org/r/20260324-vmalloc-shrink-v7-0-c0e62b8e5d83@zohomail.in
Changes in v7:
- Fix NULL pointer dereference in shrink path (Sashiko)
- Acquire vn->busy.lock when updating vm->nr_pages to synchronize
with concurrent readers (Uladzislau Rezki)
- Use READ_ONCE in vmalloc_dump_obj (Sashiko)
- Skip shrink path on GFP_NIO or GFP_NOFS. (Sashiko)
- Fix Overflow issue for large allocations. (Sashiko)
- Use vrealloc instead of vmalloc in vrealloc test.
- Link to v6: https://lore.kernel.org/r/20260321-vmalloc-shrink-v6-0-062ca7b7ceb2@zohomail.in
Changes in v6:
- Fix VM_USERMAP crash by explicitly bypassing early in the shrink path if the flag is set.(Sashiko)
- Fix Kmemleak scanner panic by calling kmemleak_free_part() to update tracking on shrink.(Sashiko)
- Fix /proc/vmallocinfo race condition by protecting vm->nr_pages access with
READ_ONCE()/WRITE_ONCE() for concurrent readers.(Sashiko)
- Fix stale data leak on grow-after-shrink by enforcing mandatory zeroing of the newly exposed memory.(Sashiko)
- Fix memory leaks in vrealloc_test() by using a temporary pointer to preserve and
free the original allocation upon failure.(Sashiko)
- Rename vmalloc_free_pages parameters from start/end to start_idx/end_idx for better clarity.(Uladzislau Rezki)
- Link to v5: https://lore.kernel.org/r/20260317-vmalloc-shrink-v5-0-bbfbf54c5265@zohomail.in
- Link to Sashiko: https://sashiko.dev/#/patchset/20260317-vmalloc-shrink-v5-0-bbfbf54c5265%40zohomail.in
Changes in v5:
- Skip vrealloc shrink for VM_FLUSH_RESET_PERMS (Uladzislau Rezki)
- Link to v4: https://lore.kernel.org/r/20260314-vmalloc-shrink-v4-0-c1e2e0bb5455@zohomail.in
Changes in v4:
- Rename vmalloc_free_pages() to vm_area_free_pages() to align with
vm_area_alloc_pages() (Uladzislau Rezki)
- NULL out freed vm->pages[] entries to prevent stale pointers (Alice Ryhl)
- Remove redundant if (vm->nr_pages) guard in vfree() (Uladzislau Rezki)
- Add vrealloc test case to lib/test_vmalloc (new patch 3/3)
- Link to v3: https://lore.kernel.org/r/20260309-vmalloc-shrink-v3-0-5590fd8de2eb@zohomail.in
Changes in v3:
- Restore the comment.
- Rebase to the latest mm-new
- Link to v2: https://lore.kernel.org/r/20260304-vmalloc-shrink-v2-0-28c291d60100@zohomail.in
Changes in v2:
- Updated the base-commit to mm-new
- Fix conflicts after rebase
- Ran `clang-format` on the changes made
- Use a single `kasan_vrealloc` (Alice Ryhl)
- Link to v1: https://lore.kernel.org/r/20260302-vmalloc-shrink-v1-0-46deff465b7e@zohomail.in
---
Shivam Kalra (6):
mm/vmalloc: extract vm_area_free_pages() helper from vfree()
mm/vmalloc: fix vrealloc() grow-in-place check
mm/vmalloc: zero newly exposed memory on vrealloc() grow
mm/vmalloc: use READ_ONCE() for vmalloc nr_pages status readers
mm/vmalloc: free unused pages on vrealloc() shrink
lib/test_vmalloc: add vrealloc test case
lib/test_vmalloc.c | 62 +++++++++++++++++++++++
mm/vmalloc.c | 143 ++++++++++++++++++++++++++++++++++++++++++-----------
2 files changed, 175 insertions(+), 30 deletions(-)
---
base-commit: f46991f1780ef97efff3b668627b763581032067
change-id: 20260302-vmalloc-shrink-04b2fa688a14
Best regards,
--
Shivam Kalra <shivamkalra98@zohomail.in>
On Fri, 27 Mar 2026 15:18:36 +0530 Shivam Kalra via B4 Relay <devnull+shivamkalra98.zohomail.in@kernel.org> wrote: > This series implements the TODO in vrealloc() to unmap and free unused > pages when shrinking across a page boundary. Thanks. I'd prefer to defer this until the next -rc cycle (https://lkml.kernel.org/r/20260323202941.08ddf2b0411501cae801ab4c@linux-foundation.org). If Ulad would prefer that we push ahead then OK. Are we able to describe how much memory this change might save under various scenarios? If the savings are impressively large then that would get attention. AI review might have found a couple of things, one pre-existing: https://sashiko.dev/#/patchset/20260327-vmalloc-shrink-v8-0-cc6b57059ed7@zohomail.in
On Fri, Mar 27, 2026 at 11:37:58AM -0700, Andrew Morton wrote: > On Fri, 27 Mar 2026 15:18:36 +0530 Shivam Kalra via B4 Relay <devnull+shivamkalra98.zohomail.in@kernel.org> wrote: > > > This series implements the TODO in vrealloc() to unmap and free unused > > pages when shrinking across a page boundary. > > Thanks. I'd prefer to defer this until the next -rc cycle > (https://lkml.kernel.org/r/20260323202941.08ddf2b0411501cae801ab4c@linux-foundation.org). > If Ulad would prefer that we push ahead then OK. That's fine. No rush from my side. > Are we able to describe how much memory this change might save under > various scenarios? If the savings are impressively large then that > would get attention. The primary purpose of this is to ensure that this scenario is not possible: * There is a global list in the Binder driver. It's an array allocated using kvmalloc and resized on demand using kvrealloc. * A process decides to add a bajillion elements to that global list. * The process exits, taking its entries with it. * The global list now remains extremely large for no good reason. I don't know if it would save a significant amount of memory for well-behaved programs. Alice
On Mon, Mar 30, 2026 at 08:05:44AM +0000, Alice Ryhl wrote: > On Fri, Mar 27, 2026 at 11:37:58AM -0700, Andrew Morton wrote: > > On Fri, 27 Mar 2026 15:18:36 +0530 Shivam Kalra via B4 Relay <devnull+shivamkalra98.zohomail.in@kernel.org> wrote: > > > > > This series implements the TODO in vrealloc() to unmap and free unused > > > pages when shrinking across a page boundary. > > > > Thanks. I'd prefer to defer this until the next -rc cycle > > (https://lkml.kernel.org/r/20260323202941.08ddf2b0411501cae801ab4c@linux-foundation.org). > > If Ulad would prefer that we push ahead then OK. > > That's fine. No rush from my side. > > > Are we able to describe how much memory this change might save under > > various scenarios? If the savings are impressively large then that > > would get attention. > > The primary purpose of this is to ensure that this scenario is not > possible: > > * There is a global list in the Binder driver. It's an array allocated > using kvmalloc and resized on demand using kvrealloc. > * A process decides to add a bajillion elements to that global list. > * The process exits, taking its entries with it. > * The global list now remains extremely large for no good reason. > > I don't know if it would save a significant amount of memory for > well-behaved programs. > > Alice Agree, let's postpone this, so i have a chance to check it one more time even though it seems ready. -- Uladzislau Rezki
On 30/03/26 17:57, Uladzislau Rezki wrote: > Agree, let's postpone this, so i have a chance to check it one more > time even though it seems ready. Hey Uladzislau I missed rephrasing the commit message pointed out by Alice in v7. If you want that changed or any other assistance from my end, let me know. Shivam
Hello, Shivam! > On 30/03/26 17:57, Uladzislau Rezki wrote: > > Agree, let's postpone this, so i have a chance to check it one more > > time even though it seems ready. > Hey Uladzislau > > I missed rephrasing the commit message pointed out by Alice in v7. > If you want that changed or any other assistance from my end, > let me know. > I have added some comments. Please have a look. Do we need those READ_ONCE()/WRITE_ONCE()? -- Uladzislau Rezki
On 31/03/26 23:20, Uladzislau Rezki wrote: > I have added some comments. Please have a look. Do we need those > READ_ONCE()/WRITE_ONCE()? > > -- > Uladzislau Rezki You're right, the spinlock makes these redundant. I'll drop patch 4/6 and the WRITE_ONCE in patch 5/6 for v9. Thanks Shivam
On Fri Mar 27, 2026 at 7:37 PM CET, Andrew Morton wrote: > Are we able to describe how much memory this change might save under > various scenarios? If the savings are impressively large then that > would get attention. We already have a workaround in place for shrinking vmalloc buffers through a deep copy in Rust alloc. Given that binder, which motivated this workaround, uses it already, it suggests that the savings are significant enough to accept this overhead (I assume Alice has some numbers). So, I assume the more interesting question would be how badly the deep copy hurts binder compared to unmapping and freeing spare pages.
© 2016 - 2026 Red Hat, Inc.