[PATCH v2] mm: khugepaged: fix memory leak in collapse_file xas retry loop

Shardul Bankar posted 1 patch 1 week ago
mm/khugepaged.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
[PATCH v2] mm: khugepaged: fix memory leak in collapse_file xas retry loop
Posted by Shardul Bankar 1 week ago
collapse_file() uses xas_create_range() in a retry loop that calls
xas_nomem() on -ENOMEM and then retries. xas_nomem() may allocate a
spare xa_node and store it in xas->xa_alloc.

If the lock is dropped after xas_nomem(), another thread can expand
the xarray tree in the meantime. On the next retry, xas_create_range()
can then succeed trivially without consuming the node stored in
xas->xa_alloc. If we then either succeed or give up and go to the
rollback path without calling xas_destroy(), that spare node leaks.

Fix this by calling xas_destroy(&xas) in both the success case
(!xas_error(&xas)) and the failure case where xas_nomem() returns
false and we abort. xas_destroy() will free any unused spare node in
xas->xa_alloc and is a no-op if there is nothing left to free.

Link: https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
---
 v2:
 - Call xas_destroy() on both success and failure
 - Explained retry semantics and xa_alloc / concurrency risk
 - Dropped cleanup_empty_nodes from previous proposal

 mm/khugepaged.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index abe54f0043c7..0794a99c807f 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1872,11 +1872,14 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 	do {
 		xas_lock_irq(&xas);
 		xas_create_range(&xas);
-		if (!xas_error(&xas))
+		if (!xas_error(&xas)) {
+			xas_destroy(&xas);
 			break;
+		}
 		xas_unlock_irq(&xas);
 		if (!xas_nomem(&xas, GFP_KERNEL)) {
 			result = SCAN_FAIL;
+			xas_destroy(&xas);
 			goto rollback;
 		}
 	} while (1);
-- 
2.34.1
Re: [PATCH v2] mm: khugepaged: fix memory leak in collapse_file xas retry loop
Posted by Matthew Wilcox 1 week ago
On Mon, Nov 24, 2025 at 09:41:49PM +0530, Shardul Bankar wrote:
> collapse_file() uses xas_create_range() in a retry loop that calls
> xas_nomem() on -ENOMEM and then retries. xas_nomem() may allocate a
> spare xa_node and store it in xas->xa_alloc.
> 
> If the lock is dropped after xas_nomem(), another thread can expand
> the xarray tree in the meantime. On the next retry, xas_create_range()
> can then succeed trivially without consuming the node stored in
> xas->xa_alloc. If we then either succeed or give up and go to the
> rollback path without calling xas_destroy(), that spare node leaks.

Then wouldn't freeing the excess node in xas_create_range() be the
correct fix, instead of requiring the caller to think about this?

> Fix this by calling xas_destroy(&xas) in both the success case
> (!xas_error(&xas)) and the failure case where xas_nomem() returns
> false and we abort. xas_destroy() will free any unused spare node in
> xas->xa_alloc and is a no-op if there is nothing left to free.
> 
> Link: https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
> Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
> Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
> ---
>  v2:
>  - Call xas_destroy() on both success and failure
>  - Explained retry semantics and xa_alloc / concurrency risk
>  - Dropped cleanup_empty_nodes from previous proposal
> 
>  mm/khugepaged.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index abe54f0043c7..0794a99c807f 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1872,11 +1872,14 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>  	do {
>  		xas_lock_irq(&xas);
>  		xas_create_range(&xas);
> -		if (!xas_error(&xas))
> +		if (!xas_error(&xas)) {
> +			xas_destroy(&xas);
>  			break;
> +		}
>  		xas_unlock_irq(&xas);
>  		if (!xas_nomem(&xas, GFP_KERNEL)) {
>  			result = SCAN_FAIL;
> +			xas_destroy(&xas);
>  			goto rollback;
>  		}
>  	} while (1);
> -- 
> 2.34.1
> 
>
Re: [PATCH v2] mm: khugepaged: fix memory leak in collapse_file xas retry loop
Posted by Shardul Bankar 1 week ago
On Mon, 2025-11-24 at 16:21 +0000, Matthew Wilcox wrote:
> 
> Then wouldn't freeing the excess node in xas_create_range() be the
> correct fix, instead of requiring the caller to think about this?
> 
> 
Hi Matthew,

Thanks for the feedback. Agreed, this is better fixed inside xarray
instead of in collapse_file(), so callers don’t need to think about
xas_destroy() at all.

Looking at the internals, xas_nomem() only allocates a spare node into
xas->xa_alloc and xas_alloc() consumes it only if it is required. The
only point where we know that the retry loop is truly finished is after
xas_create_range() (or xas_create()) succeeds — at that point, any
remaining xa_alloc must be unused.

So to align API expectations, I’m trying to understand where you would
prefer to enforce the invariant:

  - In xas_create_range() after success, ensuring no spare remains?
  - Or in xas_create(), so that non-range callers benefit as well?

Once that API boundary is clear, I can prepare a v3 that moves the fix
into lib/xarray.c.

Thanks,
Shardul
[PATCH v3] lib: xarray: free unused spare node in xas_create_range()
Posted by Shardul Bankar 16 hours ago
xas_create_range() is typically called in a retry loop that uses
xas_nomem() to handle -ENOMEM errors. xas_nomem() may allocate a spare
xa_node and store it in xas->xa_alloc for use in the retry.

If the lock is dropped after xas_nomem(), another thread can expand the
xarray tree in the meantime. On the next retry, xas_create_range() can
then succeed without consuming the spare node stored in xas->xa_alloc.
If the function returns without freeing this spare node, it leaks.

xas_create_range() calls xas_create() multiple times in a loop for
different index ranges. A spare node that isn't needed for one range
iteration might be needed for the next, so we cannot free it after each
xas_create() call. We can only safely free it after xas_create_range()
completes.

Fix this by calling xas_destroy() at the end of xas_create_range() to
free any unused spare node. This makes the API safer by default and
prevents callers from needing to remember cleanup.

This fixes a memory leak in mm/khugepaged.c and potentially other
callers that use xas_nomem() with xas_create_range().

Link: https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
---
 v3:
 - Move fix from collapse_file() to xas_create_range() as suggested by Matthew Wilcox
 - Fix in library function makes API safer by default, preventing callers from needing
   to remember cleanup
 - Use shared cleanup label that both restore: and success: paths jump to
 - Clean up unused spare node on both success and error exit paths
 v2:
 - Call xas_destroy() on both success and failure
 - Explained retry semantics and xa_alloc / concurrency risk
 - Dropped cleanup_empty_nodes from previous proposal
 lib/xarray.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/lib/xarray.c b/lib/xarray.c
index 9a8b4916540c..a924421c0c4c 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -744,11 +744,17 @@ void xas_create_range(struct xa_state *xas)
 	xas->xa_shift = shift;
 	xas->xa_sibs = sibs;
 	xas->xa_index = index;
-	return;
+	goto cleanup;
+
 success:
 	xas->xa_index = index;
 	if (xas->xa_node)
 		xas_set_offset(xas);
+
+cleanup:
+	/* Free any unused spare node from xas_nomem() */
+	if (xas->xa_alloc)
+		xas_destroy(xas);
 }
 EXPORT_SYMBOL_GPL(xas_create_range);
 
-- 
2.34.1
Re: [PATCH v3] lib: xarray: free unused spare node in xas_create_range()
Posted by David Hildenbrand (Red Hat) 15 hours ago
On 12/1/25 08:45, Shardul Bankar wrote:

Please don't post new versions as reply to old versions.

> xas_create_range() is typically called in a retry loop that uses
> xas_nomem() to handle -ENOMEM errors. xas_nomem() may allocate a spare
> xa_node and store it in xas->xa_alloc for use in the retry.
> 
> If the lock is dropped after xas_nomem(), another thread can expand the
> xarray tree in the meantime. On the next retry, xas_create_range() can
> then succeed without consuming the spare node stored in xas->xa_alloc.
> If the function returns without freeing this spare node, it leaks.
> 
> xas_create_range() calls xas_create() multiple times in a loop for
> different index ranges. A spare node that isn't needed for one range
> iteration might be needed for the next, so we cannot free it after each
> xas_create() call. We can only safely free it after xas_create_range()
> completes.
> 
> Fix this by calling xas_destroy() at the end of xas_create_range() to
> free any unused spare node. This makes the API safer by default and
> prevents callers from needing to remember cleanup.
> 
> This fixes a memory leak in mm/khugepaged.c and potentially other
> callers that use xas_nomem() with xas_create_range().
> 
> Link: https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
> Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
> Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
> ---
>   v3:
>   - Move fix from collapse_file() to xas_create_range() as suggested by Matthew Wilcox
>   - Fix in library function makes API safer by default, preventing callers from needing
>     to remember cleanup
>   - Use shared cleanup label that both restore: and success: paths jump to
>   - Clean up unused spare node on both success and error exit paths
>   v2:
>   - Call xas_destroy() on both success and failure
>   - Explained retry semantics and xa_alloc / concurrency risk
>   - Dropped cleanup_empty_nodes from previous proposal
>   lib/xarray.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/xarray.c b/lib/xarray.c
> index 9a8b4916540c..a924421c0c4c 100644
> --- a/lib/xarray.c
> +++ b/lib/xarray.c
> @@ -744,11 +744,17 @@ void xas_create_range(struct xa_state *xas)
>   	xas->xa_shift = shift;
>   	xas->xa_sibs = sibs;
>   	xas->xa_index = index;
> -	return;
> +	goto cleanup;
> +
>   success:
>   	xas->xa_index = index;
>   	if (xas->xa_node)
>   		xas_set_offset(xas);
> +
> +cleanup:
> +	/* Free any unused spare node from xas_nomem() */
> +	if (xas->xa_alloc)
> +		xas_destroy(xas);

The first thing xas_destroy() does is check whether xa_alloc is set.

I'd assume that the compiler is smart enough to inline xas_destroy() 
completely here, so likely the xa_alloc check here can just be dropped.


Staring at xas_destroy() callers, we only have a single one outside of 
lib: mm/huge_memory.c:__folio_split()

Is that one still required?

-- 
Cheers

David