[PATCH v5 2/3] mm/vmalloc: free unused pages on vrealloc() shrink

Shivam Kalra via B4 Relay posted 3 patches 2 weeks, 6 days ago
There is a newer version of this series
[PATCH v5 2/3] mm/vmalloc: free unused pages on vrealloc() shrink
Posted by Shivam Kalra via B4 Relay 2 weeks, 6 days ago
From: Shivam Kalra <shivamkalra98@zohomail.in>

When vrealloc() shrinks an allocation and the new size crosses a page
boundary, unmap and free the tail pages that are no longer needed. This
reclaims physical memory that was previously wasted for the lifetime
of the allocation.

The heuristic is simple: always free when at least one full page becomes
unused. Huge page allocations (page_order > 0) are skipped, as partial
freeing would require splitting. Allocations with VM_FLUSH_RESET_PERMS
are also skipped, as their direct-map permissions must be reset before
pages are returned to the page allocator, which is handled by
vm_reset_perms() during vfree().

The virtual address reservation (vm->size / vmap_area) is intentionally
kept unchanged, preserving the address for potential future grow-in-place
support.

Fix the grow-in-place check to compare against vm->nr_pages rather than
get_vm_area_size(), since the latter reflects the virtual reservation
which does not shrink. Without this fix, a grow after shrink would
access freed pages.

Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
---
 mm/vmalloc.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index b29bf58c0e3f..f3820c6712c1 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -4345,14 +4345,24 @@ void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned long align
 			goto need_realloc;
 	}
 
-	/*
-	 * TODO: Shrink the vm_area, i.e. unmap and free unused pages. What
-	 * would be a good heuristic for when to shrink the vm_area?
-	 */
 	if (size <= old_size) {
+		unsigned int new_nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
 		/* Zero out "freed" memory, potentially for future realloc. */
 		if (want_init_on_free() || want_init_on_alloc(flags))
 			memset((void *)p + size, 0, old_size - size);
+
+		/* Free tail pages when shrink crosses a page boundary. */
+		if (new_nr_pages < vm->nr_pages && !vm_area_page_order(vm) &&
+		    !(vm->flags & VM_FLUSH_RESET_PERMS)) {
+			unsigned long addr = (unsigned long)p;
+
+			vunmap_range(addr + (new_nr_pages << PAGE_SHIFT),
+				     addr + (vm->nr_pages << PAGE_SHIFT));
+
+			vm_area_free_pages(vm, new_nr_pages, vm->nr_pages);
+			vm->nr_pages = new_nr_pages;
+		}
 		vm->requested_size = size;
 		kasan_vrealloc(p, old_size, size);
 		return (void *)p;
@@ -4361,7 +4371,7 @@ void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned long align
 	/*
 	 * We already have the bytes available in the allocation; use them.
 	 */
-	if (size <= alloced_size) {
+	if (size <= (size_t)vm->nr_pages << PAGE_SHIFT) {
 		/*
 		 * No need to zero memory here, as unused memory will have
 		 * already been zeroed at initial allocation time or during

-- 
2.43.0
Re: [PATCH v5 2/3] mm/vmalloc: free unused pages on vrealloc() shrink
Posted by Alice Ryhl 2 weeks, 6 days ago
On Tue, Mar 17, 2026 at 01:47:34PM +0530, Shivam Kalra wrote:
> When vrealloc() shrinks an allocation and the new size crosses a page
> boundary, unmap and free the tail pages that are no longer needed. This
> reclaims physical memory that was previously wasted for the lifetime
> of the allocation.
> 
> The heuristic is simple: always free when at least one full page becomes
> unused. Huge page allocations (page_order > 0) are skipped, as partial
> freeing would require splitting. Allocations with VM_FLUSH_RESET_PERMS
> are also skipped, as their direct-map permissions must be reset before
> pages are returned to the page allocator, which is handled by
> vm_reset_perms() during vfree().
> 
> The virtual address reservation (vm->size / vmap_area) is intentionally
> kept unchanged, preserving the address for potential future grow-in-place
> support.
> 
> Fix the grow-in-place check to compare against vm->nr_pages rather than
> get_vm_area_size(), since the latter reflects the virtual reservation
> which does not shrink. Without this fix, a grow after shrink would
> access freed pages.
> 
> Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
> ---
>  mm/vmalloc.c | 20 +++++++++++++++-----
>  1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index b29bf58c0e3f..f3820c6712c1 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -4345,14 +4345,24 @@ void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned long align
>  			goto need_realloc;
>  	}
>  
> -	/*
> -	 * TODO: Shrink the vm_area, i.e. unmap and free unused pages. What
> -	 * would be a good heuristic for when to shrink the vm_area?
> -	 */
>  	if (size <= old_size) {
> +		unsigned int new_nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
> +
>  		/* Zero out "freed" memory, potentially for future realloc. */
>  		if (want_init_on_free() || want_init_on_alloc(flags))
>  			memset((void *)p + size, 0, old_size - size);
> +
> +		/* Free tail pages when shrink crosses a page boundary. */
> +		if (new_nr_pages < vm->nr_pages && !vm_area_page_order(vm) &&
> +		    !(vm->flags & VM_FLUSH_RESET_PERMS)) {
> +			unsigned long addr = (unsigned long)p;
> +
> +			vunmap_range(addr + (new_nr_pages << PAGE_SHIFT),
> +				     addr + (vm->nr_pages << PAGE_SHIFT));
> +
> +			vm_area_free_pages(vm, new_nr_pages, vm->nr_pages);
> +			vm->nr_pages = new_nr_pages;
> +		}
>  		vm->requested_size = size;
>  		kasan_vrealloc(p, old_size, size);
>  		return (void *)p;
> @@ -4361,7 +4371,7 @@ void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned long align
>  	/*
>  	 * We already have the bytes available in the allocation; use them.
>  	 */
> -	if (size <= alloced_size) {
> +	if (size <= (size_t)vm->nr_pages << PAGE_SHIFT) {
>  		/*
>  		 * No need to zero memory here, as unused memory will have
>  		 * already been zeroed at initial allocation time or during

Hmm. So what happened here is that it has previously always been the
case that get_vm_area_size(area) == vm->nr_pages << PAGE_SHIFT, so these
constants were interchangable. But now that is no longer the case.

For example, 'remap_vmalloc_range_partial' compares the vm area size
with the range being mapped, and then proceeds to look up the pages and
map them. But now those pages may be missing.

I can't really tell if there are other places in this file that need to
be updated too.

Alice
Re: [PATCH v5 2/3] mm/vmalloc: free unused pages on vrealloc() shrink
Posted by Danilo Krummrich 2 weeks, 6 days ago
On Tue Mar 17, 2026 at 3:39 PM CET, Alice Ryhl wrote:
> On Tue, Mar 17, 2026 at 01:47:34PM +0530, Shivam Kalra wrote:
>> When vrealloc() shrinks an allocation and the new size crosses a page
>> boundary, unmap and free the tail pages that are no longer needed. This
>> reclaims physical memory that was previously wasted for the lifetime
>> of the allocation.
>> 
>> The heuristic is simple: always free when at least one full page becomes
>> unused. Huge page allocations (page_order > 0) are skipped, as partial
>> freeing would require splitting. Allocations with VM_FLUSH_RESET_PERMS
>> are also skipped, as their direct-map permissions must be reset before
>> pages are returned to the page allocator, which is handled by
>> vm_reset_perms() during vfree().
>> 
>> The virtual address reservation (vm->size / vmap_area) is intentionally
>> kept unchanged, preserving the address for potential future grow-in-place
>> support.
>> 
>> Fix the grow-in-place check to compare against vm->nr_pages rather than
>> get_vm_area_size(), since the latter reflects the virtual reservation
>> which does not shrink. Without this fix, a grow after shrink would
>> access freed pages.
>> 
>> Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>

Feel free to add

Suggested-by: Danilo Krummrich <dakr@kernel.org>

>> ---
>>  mm/vmalloc.c | 20 +++++++++++++++-----
>>  1 file changed, 15 insertions(+), 5 deletions(-)
>> 
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index b29bf58c0e3f..f3820c6712c1 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -4345,14 +4345,24 @@ void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned long align
>>  			goto need_realloc;
>>  	}
>>  
>> -	/*
>> -	 * TODO: Shrink the vm_area, i.e. unmap and free unused pages. What
>> -	 * would be a good heuristic for when to shrink the vm_area?
>> -	 */
>>  	if (size <= old_size) {
>> +		unsigned int new_nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
>> +
>>  		/* Zero out "freed" memory, potentially for future realloc. */
>>  		if (want_init_on_free() || want_init_on_alloc(flags))
>>  			memset((void *)p + size, 0, old_size - size);
>> +
>> +		/* Free tail pages when shrink crosses a page boundary. */
>> +		if (new_nr_pages < vm->nr_pages && !vm_area_page_order(vm) &&
>> +		    !(vm->flags & VM_FLUSH_RESET_PERMS)) {
>> +			unsigned long addr = (unsigned long)p;
>> +
>> +			vunmap_range(addr + (new_nr_pages << PAGE_SHIFT),
>> +				     addr + (vm->nr_pages << PAGE_SHIFT));
>> +
>> +			vm_area_free_pages(vm, new_nr_pages, vm->nr_pages);
>> +			vm->nr_pages = new_nr_pages;
>> +		}
>>  		vm->requested_size = size;
>>  		kasan_vrealloc(p, old_size, size);
>>  		return (void *)p;
>> @@ -4361,7 +4371,7 @@ void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned long align
>>  	/*
>>  	 * We already have the bytes available in the allocation; use them.
>>  	 */
>> -	if (size <= alloced_size) {
>> +	if (size <= (size_t)vm->nr_pages << PAGE_SHIFT) {
>>  		/*
>>  		 * No need to zero memory here, as unused memory will have
>>  		 * already been zeroed at initial allocation time or during
>
> Hmm. So what happened here is that it has previously always been the
> case that get_vm_area_size(area) == vm->nr_pages << PAGE_SHIFT, so these
> constants were interchangable. But now that is no longer the case.
>
> For example, 'remap_vmalloc_range_partial' compares the vm area size
> with the range being mapped, and then proceeds to look up the pages and
> map them. But now those pages may be missing.
>
> I can't really tell if there are other places in this file that need to
> be updated too.

This may well be possible. I remember that when I added vrealloc() and looked
into growing and shrinking, I concluded that it might need a bit of rework in
terms of tracking the sizes of the different layers. Unfortunately, I don't
remember the details anymore, but I'm quite sure there were some subtleties
along the lines of what Alice points out, so I recommend to double check.
Re: [PATCH v5 2/3] mm/vmalloc: free unused pages on vrealloc() shrink
Posted by Shivam Kalra 2 weeks, 6 days ago
On 17/03/26 20:15, Danilo Krummrich wrote:
> On Tue Mar 17, 2026 at 3:39 PM CET, Alice Ryhl wrote:
>> On Tue, Mar 17, 2026 at 01:47:34PM +0530, Shivam Kalra wrote:
>>> When vrealloc() shrinks an allocation and the new size crosses a page
>>> boundary, unmap and free the tail pages that are no longer needed. This
>>> reclaims physical memory that was previously wasted for the lifetime
>>> of the allocation.
>>>
>>> The heuristic is simple: always free when at least one full page becomes
>>> unused. Huge page allocations (page_order > 0) are skipped, as partial
>>> freeing would require splitting. Allocations with VM_FLUSH_RESET_PERMS
>>> are also skipped, as their direct-map permissions must be reset before
>>> pages are returned to the page allocator, which is handled by
>>> vm_reset_perms() during vfree().
>>>
>>> The virtual address reservation (vm->size / vmap_area) is intentionally
>>> kept unchanged, preserving the address for potential future grow-in-place
>>> support.
>>>
>>> Fix the grow-in-place check to compare against vm->nr_pages rather than
>>> get_vm_area_size(), since the latter reflects the virtual reservation
>>> which does not shrink. Without this fix, a grow after shrink would
>>> access freed pages.
>>>
>>> Signed-off-by: Shivam Kalra <shivamkalra98@zohomail.in>
> 
> Feel free to add
> 
> Suggested-by: Danilo Krummrich <dakr@kernel.org>
> 
>>> ---
>>>  mm/vmalloc.c | 20 +++++++++++++++-----
>>>  1 file changed, 15 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>>> index b29bf58c0e3f..f3820c6712c1 100644
>>> --- a/mm/vmalloc.c
>>> +++ b/mm/vmalloc.c
>>> @@ -4345,14 +4345,24 @@ void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned long align
>>>  			goto need_realloc;
>>>  	}
>>>  
>>> -	/*
>>> -	 * TODO: Shrink the vm_area, i.e. unmap and free unused pages. What
>>> -	 * would be a good heuristic for when to shrink the vm_area?
>>> -	 */
>>>  	if (size <= old_size) {
>>> +		unsigned int new_nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
>>> +
>>>  		/* Zero out "freed" memory, potentially for future realloc. */
>>>  		if (want_init_on_free() || want_init_on_alloc(flags))
>>>  			memset((void *)p + size, 0, old_size - size);
>>> +
>>> +		/* Free tail pages when shrink crosses a page boundary. */
>>> +		if (new_nr_pages < vm->nr_pages && !vm_area_page_order(vm) &&
>>> +		    !(vm->flags & VM_FLUSH_RESET_PERMS)) {
>>> +			unsigned long addr = (unsigned long)p;
>>> +
>>> +			vunmap_range(addr + (new_nr_pages << PAGE_SHIFT),
>>> +				     addr + (vm->nr_pages << PAGE_SHIFT));
>>> +
>>> +			vm_area_free_pages(vm, new_nr_pages, vm->nr_pages);
>>> +			vm->nr_pages = new_nr_pages;
>>> +		}
>>>  		vm->requested_size = size;
>>>  		kasan_vrealloc(p, old_size, size);
>>>  		return (void *)p;
>>> @@ -4361,7 +4371,7 @@ void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned long align
>>>  	/*
>>>  	 * We already have the bytes available in the allocation; use them.
>>>  	 */
>>> -	if (size <= alloced_size) {
>>> +	if (size <= (size_t)vm->nr_pages << PAGE_SHIFT) {
>>>  		/*
>>>  		 * No need to zero memory here, as unused memory will have
>>>  		 * already been zeroed at initial allocation time or during
>>
>> Hmm. So what happened here is that it has previously always been the
>> case that get_vm_area_size(area) == vm->nr_pages << PAGE_SHIFT, so these
>> constants were interchangable. But now that is no longer the case.
>>
>> For example, 'remap_vmalloc_range_partial' compares the vm area size
>> with the range being mapped, and then proceeds to look up the pages and
>> map them. But now those pages may be missing.
>>
>> I can't really tell if there are other places in this file that need to
>> be updated too.
> 
> This may well be possible. I remember that when I added vrealloc() and looked
> into growing and shrinking, I concluded that it might need a bit of rework in
> terms of tracking the sizes of the different layers. Unfortunately, I don't
> remember the details anymore, but I'm quite sure there were some subtleties
> along the lines of what Alice points out, so I recommend to double check.
I will leave an update if I find some issue.