[PATCH v5 0/3] mm/vmalloc: free unused pages on vrealloc() shrink

Shivam Kalra via B4 Relay posted 3 patches 2 weeks, 6 days ago
There is a newer version of this series
lib/test_vmalloc.c | 52 ++++++++++++++++++++++++++++++++++++++++++
mm/vmalloc.c       | 67 ++++++++++++++++++++++++++++++++++++++----------------
2 files changed, 100 insertions(+), 19 deletions(-)
[PATCH v5 0/3] mm/vmalloc: free unused pages on vrealloc() shrink
Posted by Shivam Kalra via B4 Relay 2 weeks, 6 days ago
This series implements the TODO in vrealloc() to unmap and free unused
pages when shrinking across a page boundary.

Problem:
When vrealloc() shrinks an allocation, it updates bookkeeping
(requested_size, KASAN shadow) but does not free the underlying physical
pages. This wastes memory for the lifetime of the allocation.

Solution:
- Patch 1: Extracts a vm_area_free_pages(vm, start, end) helper from
  vfree() that frees a range of pages with memcg and nr_vmalloc_pages
  accounting. Freed page pointers are set to NULL to prevent stale
  references.
- Patch 2: Uses the helper to free tail pages when vrealloc() shrinks
  across a page boundary. Skips huge page allocations (page_order > 0)
  since compound pages cannot be partially freed. Allocations with 
  VM_FLUSH_RESET_PERMS are also skipped. Also fixes the grow-in-place
  path to check vm->nr_pages instead of get_vm_area_size(), which 
  reflects the virtual reservation and does not change on shrink.
- Patch 3: Adds a vrealloc test case to lib/test_vmalloc that exercises
  grow-realloc, shrink-across-boundary, shrink-within-page, and
  grow-in-place paths with data integrity validation.

The virtual address reservation is kept intact to preserve the range
for potential future grow-in-place support.
A concrete user is the Rust binder driver's KVVec::shrink_to [1], which
performs explicit vrealloc() shrinks for memory reclamation.

Tested:
- KASAN KUnit (vmalloc_oob passes)
- lib/test_vmalloc stress tests (3/3, 1M iterations each)
- checkpatch, sparse, W=1, allmodconfig, coccicheck clean

[1] https://lore.kernel.org/all/20260216-binder-shrink-vec-v3-v6-0-ece8e8593e53@zohomail.in/

Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
---
Changes in v5:
- Skip vrealloc shrink for VM_FLUSH_RESET_PERMS (Uladzislau Rezki)
- Link to v4: https://lore.kernel.org/r/20260314-vmalloc-shrink-v4-0-c1e2e0bb5455@zohomail.in

Changes in v4:
- Rename vmalloc_free_pages() to vm_area_free_pages() to align with
  vm_area_alloc_pages() (Uladzislau Rezki)
- NULL out freed vm->pages[] entries to prevent stale pointers (Alice Ryhl)
- Remove redundant if (vm->nr_pages) guard in vfree() (Uladzislau Rezki)
- Add vrealloc test case to lib/test_vmalloc (new patch 3/3)
- Link to v3: https://lore.kernel.org/r/20260309-vmalloc-shrink-v3-0-5590fd8de2eb@zohomail.in

Changes in v3:
- Restore the comment.
- Rebase to the latest mm-new 
- Link to v2: https://lore.kernel.org/r/20260304-vmalloc-shrink-v2-0-28c291d60100@zohomail.in

Changes in v2:
- Updated the base-commit to mm-new
- Fix conflicts after rebase
- Ran `clang-format` on the changes made
- Use a single `kasan_vrealloc` (Alice Ryhl)
- Link to v1: https://lore.kernel.org/r/20260302-vmalloc-shrink-v1-0-46deff465b7e@zohomail.in

---
Shivam Kalra (3):
      mm/vmalloc: extract vm_area_free_pages() helper from vfree()
      mm/vmalloc: free unused pages on vrealloc() shrink
      lib/test_vmalloc: add vrealloc test case

 lib/test_vmalloc.c | 52 ++++++++++++++++++++++++++++++++++++++++++
 mm/vmalloc.c       | 67 ++++++++++++++++++++++++++++++++++++++----------------
 2 files changed, 100 insertions(+), 19 deletions(-)
---
base-commit: 7d47a508dfdc335c107fb00b4d9ef46488281a52
change-id: 20260302-vmalloc-shrink-04b2fa688a14

Best regards,
-- 
Shivam Kalra <shivamkalra98@zohomail.in>
Re: [PATCH v5 0/3] mm/vmalloc: free unused pages on vrealloc() shrink
Posted by Shivam Kalra 2 weeks, 2 days ago
On 17/03/26 13:47, Shivam Kalra via B4 Relay wrote:
> This series implements the TODO in vrealloc() to unmap and free unused
> pages when shrinking across a page boundary.
> 
> Problem:
> When vrealloc() shrinks an allocation, it updates bookkeeping
> (requested_size, KASAN shadow) but does not free the underlying physical
> pages. This wastes memory for the lifetime of the allocation.
> 
> Solution:
> - Patch 1: Extracts a vm_area_free_pages(vm, start, end) helper from
>   vfree() that frees a range of pages with memcg and nr_vmalloc_pages
>   accounting. Freed page pointers are set to NULL to prevent stale
>   references.
> - Patch 2: Uses the helper to free tail pages when vrealloc() shrinks
>   across a page boundary. Skips huge page allocations (page_order > 0)
>   since compound pages cannot be partially freed. Allocations with 
>   VM_FLUSH_RESET_PERMS are also skipped. Also fixes the grow-in-place
>   path to check vm->nr_pages instead of get_vm_area_size(), which 
>   reflects the virtual reservation and does not change on shrink.
> - Patch 3: Adds a vrealloc test case to lib/test_vmalloc that exercises
>   grow-realloc, shrink-across-boundary, shrink-within-page, and
>   grow-in-place paths with data integrity validation.
> 
> The virtual address reservation is kept intact to preserve the range
> for potential future grow-in-place support.
> A concrete user is the Rust binder driver's KVVec::shrink_to [1], which
> performs explicit vrealloc() shrinks for memory reclamation.
> 
> Tested:
> - KASAN KUnit (vmalloc_oob passes)
> - lib/test_vmalloc stress tests (3/3, 1M iterations each)
> - checkpatch, sparse, W=1, allmodconfig, coccicheck clean
> 
> [1] https://lore.kernel.org/all/20260216-binder-shrink-vec-v3-v6-0-ece8e8593e53@zohomail.in/
> 
> Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
> ---
> Changes in v5:
> - Skip vrealloc shrink for VM_FLUSH_RESET_PERMS (Uladzislau Rezki)
> - Link to v4: https://lore.kernel.org/r/20260314-vmalloc-shrink-v4-0-c1e2e0bb5455@zohomail.in
> 
> Changes in v4:
> - Rename vmalloc_free_pages() to vm_area_free_pages() to align with
>   vm_area_alloc_pages() (Uladzislau Rezki)
> - NULL out freed vm->pages[] entries to prevent stale pointers (Alice Ryhl)
> - Remove redundant if (vm->nr_pages) guard in vfree() (Uladzislau Rezki)
> - Add vrealloc test case to lib/test_vmalloc (new patch 3/3)
> - Link to v3: https://lore.kernel.org/r/20260309-vmalloc-shrink-v3-0-5590fd8de2eb@zohomail.in
> 
> Changes in v3:
> - Restore the comment.
> - Rebase to the latest mm-new 
> - Link to v2: https://lore.kernel.org/r/20260304-vmalloc-shrink-v2-0-28c291d60100@zohomail.in
> 
> Changes in v2:
> - Updated the base-commit to mm-new
> - Fix conflicts after rebase
> - Ran `clang-format` on the changes made
> - Use a single `kasan_vrealloc` (Alice Ryhl)
> - Link to v1: https://lore.kernel.org/r/20260302-vmalloc-shrink-v1-0-46deff465b7e@zohomail.in
> 
> ---
> Shivam Kalra (3):
>       mm/vmalloc: extract vm_area_free_pages() helper from vfree()
>       mm/vmalloc: free unused pages on vrealloc() shrink
>       lib/test_vmalloc: add vrealloc test case
> 
>  lib/test_vmalloc.c | 52 ++++++++++++++++++++++++++++++++++++++++++
>  mm/vmalloc.c       | 67 ++++++++++++++++++++++++++++++++++++++----------------
>  2 files changed, 100 insertions(+), 19 deletions(-)
> ---
> base-commit: 7d47a508dfdc335c107fb00b4d9ef46488281a52
> change-id: 20260302-vmalloc-shrink-04b2fa688a14
> 
> Best regards,
Hi everyone,

Following up on the concerns raised regarding `get_vm_area_size()` versus
`vm->nr_pages << PAGE_SHIFT`, Andrew kindly ran the patchset through an
AI review which flagged several concrete issues.

I've used those results to audit the code and figure out exactly what
breaks when we shrink allocations while preserving the virtual area size.
Based on that research, here is what I am planning to include in the v6
series to address these edge cases:

1. Fixing the VM_USERMAP crash
Alice correctly pointed out that `remap_vmalloc_range_partial()` relies
on `get_vm_area_size()` to validate the mapping size. If we free tail
pages but keep `vm->size` unchanged, mapping the full original size
would cause a NULL pointer dereference in `vm_insert_page()`.
Plan: I'll update the shrink path to explicitly bail out if `VM_USERMAP`
is set, ensuring safety for these mappings.

2. Fixing the Kmemleak scanner panic
Kmemleak tracks the original allocation size and scans it periodically.
If we unmap and free tail pages without notifying kmemleak, its scanner
will fault on the unmapped virtual addresses.
Plan: I'll add a call to `kmemleak_free_part()` during the shrink to
keep its tracked object size updated.

3. Fixing a /proc/vmallocinfo race condition
`show_numa_info()` iterates over `v->nr_pages`. During a shrink,
modifying `nr_pages` and NULL-ing out the page pointers concurrently
could cause a reader to dereference a NULL page pointer.
Plan: I'll update the reader to use `READ_ONCE(v->nr_pages)`, and have
the shrink path do a `WRITE_ONCE(vm->nr_pages, new_nr_pages)` before
freeing the pages. This guarantees that concurrent readers either see
the old count with valid pages or the new, smaller count.

4. Fixing a stale data leak on grow
A vrealloc grow with `__GFP_ZERO` could leak previously discarded data
if an intermediate shrink happened without `__GFP_ZERO` (which skips
zeroing the freed region).
Plan: I will add mandatory zeroing in the grow-in-place path for
`want_init_on_alloc()` to clear any newly exposed bytes.

Thanks again to Alice and Danilo for prompting the closer look, and to
Andrew for providing the review. I should have v6 ready for review soon.

Best regards,
Shivam
Re: [PATCH v5 0/3] mm/vmalloc: free unused pages on vrealloc() shrink
Posted by Alice Ryhl 2 weeks, 1 day ago
On Sat, Mar 21, 2026 at 01:45:35PM +0530, Shivam Kalra wrote:
> On 17/03/26 13:47, Shivam Kalra via B4 Relay wrote:
> 3. Fixing a /proc/vmallocinfo race condition
> `show_numa_info()` iterates over `v->nr_pages`. During a shrink,
> modifying `nr_pages` and NULL-ing out the page pointers concurrently
> could cause a reader to dereference a NULL page pointer.
> Plan: I'll update the reader to use `READ_ONCE(v->nr_pages)`, and have
> the shrink path do a `WRITE_ONCE(vm->nr_pages, new_nr_pages)` before
> freeing the pages. This guarantees that concurrent readers either see
> the old count with valid pages or the new, smaller count.

This doesn't fix the race. Consider this:

nr < vm->nr_pages == true
		vm->nr_pages = nr
		free vm->pages[nr]
page_to_nid(v->pages[nr]) // UAF

perhaps changing vm->nr_pages should happen under the vn->busy.lock
spinlock? show_numa_info() is called under that lock too.

Alice
Re: [PATCH v5 0/3] mm/vmalloc: free unused pages on vrealloc() shrink
Posted by Uladzislau Rezki 2 weeks, 1 day ago
On Sun, Mar 22, 2026 at 12:48:28PM +0000, Alice Ryhl wrote:
> On Sat, Mar 21, 2026 at 01:45:35PM +0530, Shivam Kalra wrote:
> > On 17/03/26 13:47, Shivam Kalra via B4 Relay wrote:
> > 3. Fixing a /proc/vmallocinfo race condition
> > `show_numa_info()` iterates over `v->nr_pages`. During a shrink,
> > modifying `nr_pages` and NULL-ing out the page pointers concurrently
> > could cause a reader to dereference a NULL page pointer.
> > Plan: I'll update the reader to use `READ_ONCE(v->nr_pages)`, and have
> > the shrink path do a `WRITE_ONCE(vm->nr_pages, new_nr_pages)` before
> > freeing the pages. This guarantees that concurrent readers either see
> > the old count with valid pages or the new, smaller count.
> 
> This doesn't fix the race. Consider this:
> 
> nr < vm->nr_pages == true
> 		vm->nr_pages = nr
> 		free vm->pages[nr]
> page_to_nid(v->pages[nr]) // UAF
> 
> perhaps changing vm->nr_pages should happen under the vn->busy.lock
> spinlock? show_numa_info() is called under that lock too.
> 
vn->busy.lock protects VA in a busy tree. So if you update the nr_pages
of given VA you should hold the lock of the node VA belongs to.

--
Uladzislau Rezki
Re: [PATCH v5 0/3] mm/vmalloc: free unused pages on vrealloc() shrink
Posted by Andrew Morton 2 weeks, 2 days ago
On Sat, 21 Mar 2026 13:45:35 +0530 Shivam Kalra <shivamkalra98@zohomail.in> wrote:

> Thanks again to Alice and Danilo for prompting the closer look, and to
> Andrew for providing the review. 

Well, thanks to those who developed and provided the reviewbot!

Reviews seem to take 12+ hours at present.  Go to https://sashiko.dev/,
select linux-mm list, paste in the subject.  Or go directly to 

	id=<Message-ID>
	https://sashiko.dev/#/patchset/$id
Re: [PATCH v5 0/3] mm/vmalloc: free unused pages on vrealloc() shrink
Posted by Andrew Morton 2 weeks, 6 days ago
On Tue, 17 Mar 2026 13:47:32 +0530 Shivam Kalra via B4 Relay <devnull+shivamkalra98.zohomail.in@kernel.org> wrote:

> This series implements the TODO in vrealloc() to unmap and free unused
> pages when shrinking across a page boundary.

Lots of questions have been posed by AI review:

https://sashiko.dev/#/patchset/20260317-vmalloc-shrink-v5-0-bbfbf54c5265@zohomail.in
Re: [PATCH v5 0/3] mm/vmalloc: free unused pages on vrealloc() shrink
Posted by Shivam Kalra 2 weeks, 5 days ago
On 18/03/26 02:41, Andrew Morton wrote:
> On Tue, 17 Mar 2026 13:47:32 +0530 Shivam Kalra via B4 Relay <devnull+shivamkalra98.zohomail.in@kernel.org> wrote:
> 
>> This series implements the TODO in vrealloc() to unmap and free unused
>> pages when shrinking across a page boundary.
> 
> Lots of questions have been posed by AI review:
> 
> https://sashiko.dev/#/patchset/20260317-vmalloc-shrink-v5-0-bbfbf54c5265@zohomail.in
These seem like valid concerns. I will revisit the patch series,
and post a new improved version of it. Allow me some time.