mm/khugepaged.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
collapse_file() uses xas_create_range() in a retry loop that calls
xas_nomem() on -ENOMEM and then retries. xas_nomem() may allocate a
spare xa_node and store it in xas->xa_alloc.
If the lock is dropped after xas_nomem(), another thread can expand
the xarray tree in the meantime. On the next retry, xas_create_range()
can then succeed trivially without consuming the node stored in
xas->xa_alloc. If we then either succeed or give up and go to the
rollback path without calling xas_destroy(), that spare node leaks.
Fix this by calling xas_destroy(&xas) in both the success case
(!xas_error(&xas)) and the failure case where xas_nomem() returns
false and we abort. xas_destroy() will free any unused spare node in
xas->xa_alloc and is a no-op if there is nothing left to free.
Link: https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
---
v2:
- Call xas_destroy() on both success and failure
- Explained retry semantics and xa_alloc / concurrency risk
- Dropped cleanup_empty_nodes from previous proposal
mm/khugepaged.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index abe54f0043c7..0794a99c807f 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1872,11 +1872,14 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
do {
xas_lock_irq(&xas);
xas_create_range(&xas);
- if (!xas_error(&xas))
+ if (!xas_error(&xas)) {
+ xas_destroy(&xas);
break;
+ }
xas_unlock_irq(&xas);
if (!xas_nomem(&xas, GFP_KERNEL)) {
result = SCAN_FAIL;
+ xas_destroy(&xas);
goto rollback;
}
} while (1);
--
2.34.1
On Mon, Nov 24, 2025 at 09:41:49PM +0530, Shardul Bankar wrote:
> collapse_file() uses xas_create_range() in a retry loop that calls
> xas_nomem() on -ENOMEM and then retries. xas_nomem() may allocate a
> spare xa_node and store it in xas->xa_alloc.
>
> If the lock is dropped after xas_nomem(), another thread can expand
> the xarray tree in the meantime. On the next retry, xas_create_range()
> can then succeed trivially without consuming the node stored in
> xas->xa_alloc. If we then either succeed or give up and go to the
> rollback path without calling xas_destroy(), that spare node leaks.
Then wouldn't freeing the excess node in xas_create_range() be the
correct fix, instead of requiring the caller to think about this?
> Fix this by calling xas_destroy(&xas) in both the success case
> (!xas_error(&xas)) and the failure case where xas_nomem() returns
> false and we abort. xas_destroy() will free any unused spare node in
> xas->xa_alloc and is a no-op if there is nothing left to free.
>
> Link: https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
> Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
> Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
> ---
> v2:
> - Call xas_destroy() on both success and failure
> - Explained retry semantics and xa_alloc / concurrency risk
> - Dropped cleanup_empty_nodes from previous proposal
>
> mm/khugepaged.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index abe54f0043c7..0794a99c807f 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1872,11 +1872,14 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
> do {
> xas_lock_irq(&xas);
> xas_create_range(&xas);
> - if (!xas_error(&xas))
> + if (!xas_error(&xas)) {
> + xas_destroy(&xas);
> break;
> + }
> xas_unlock_irq(&xas);
> if (!xas_nomem(&xas, GFP_KERNEL)) {
> result = SCAN_FAIL;
> + xas_destroy(&xas);
> goto rollback;
> }
> } while (1);
> --
> 2.34.1
>
>
On Mon, 2025-11-24 at 16:21 +0000, Matthew Wilcox wrote: > > Then wouldn't freeing the excess node in xas_create_range() be the > correct fix, instead of requiring the caller to think about this? > > Hi Matthew, Thanks for the feedback. Agreed, this is better fixed inside xarray instead of in collapse_file(), so callers don’t need to think about xas_destroy() at all. Looking at the internals, xas_nomem() only allocates a spare node into xas->xa_alloc and xas_alloc() consumes it only if it is required. The only point where we know that the retry loop is truly finished is after xas_create_range() (or xas_create()) succeeds — at that point, any remaining xa_alloc must be unused. So to align API expectations, I’m trying to understand where you would prefer to enforce the invariant: - In xas_create_range() after success, ensuring no spare remains? - Or in xas_create(), so that non-range callers benefit as well? Once that API boundary is clear, I can prepare a v3 that moves the fix into lib/xarray.c. Thanks, Shardul
xas_create_range() is typically called in a retry loop that uses
xas_nomem() to handle -ENOMEM errors. xas_nomem() may allocate a spare
xa_node and store it in xas->xa_alloc for use in the retry.
If the lock is dropped after xas_nomem(), another thread can expand the
xarray tree in the meantime. On the next retry, xas_create_range() can
then succeed without consuming the spare node stored in xas->xa_alloc.
If the function returns without freeing this spare node, it leaks.
xas_create_range() calls xas_create() multiple times in a loop for
different index ranges. A spare node that isn't needed for one range
iteration might be needed for the next, so we cannot free it after each
xas_create() call. We can only safely free it after xas_create_range()
completes.
Fix this by calling xas_destroy() at the end of xas_create_range() to
free any unused spare node. This makes the API safer by default and
prevents callers from needing to remember cleanup.
This fixes a memory leak in mm/khugepaged.c and potentially other
callers that use xas_nomem() with xas_create_range().
Link: https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
---
v3:
- Move fix from collapse_file() to xas_create_range() as suggested by Matthew Wilcox
- Fix in library function makes API safer by default, preventing callers from needing
to remember cleanup
- Use shared cleanup label that both restore: and success: paths jump to
- Clean up unused spare node on both success and error exit paths
v2:
- Call xas_destroy() on both success and failure
- Explained retry semantics and xa_alloc / concurrency risk
- Dropped cleanup_empty_nodes from previous proposal
lib/xarray.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/lib/xarray.c b/lib/xarray.c
index 9a8b4916540c..a924421c0c4c 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -744,11 +744,17 @@ void xas_create_range(struct xa_state *xas)
xas->xa_shift = shift;
xas->xa_sibs = sibs;
xas->xa_index = index;
- return;
+ goto cleanup;
+
success:
xas->xa_index = index;
if (xas->xa_node)
xas_set_offset(xas);
+
+cleanup:
+ /* Free any unused spare node from xas_nomem() */
+ if (xas->xa_alloc)
+ xas_destroy(xas);
}
EXPORT_SYMBOL_GPL(xas_create_range);
--
2.34.1
On 12/1/25 08:45, Shardul Bankar wrote:
Please don't post new versions as reply to old versions.
> xas_create_range() is typically called in a retry loop that uses
> xas_nomem() to handle -ENOMEM errors. xas_nomem() may allocate a spare
> xa_node and store it in xas->xa_alloc for use in the retry.
>
> If the lock is dropped after xas_nomem(), another thread can expand the
> xarray tree in the meantime. On the next retry, xas_create_range() can
> then succeed without consuming the spare node stored in xas->xa_alloc.
> If the function returns without freeing this spare node, it leaks.
>
> xas_create_range() calls xas_create() multiple times in a loop for
> different index ranges. A spare node that isn't needed for one range
> iteration might be needed for the next, so we cannot free it after each
> xas_create() call. We can only safely free it after xas_create_range()
> completes.
>
> Fix this by calling xas_destroy() at the end of xas_create_range() to
> free any unused spare node. This makes the API safer by default and
> prevents callers from needing to remember cleanup.
>
> This fixes a memory leak in mm/khugepaged.c and potentially other
> callers that use xas_nomem() with xas_create_range().
>
> Link: https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
> Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
> Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
> ---
> v3:
> - Move fix from collapse_file() to xas_create_range() as suggested by Matthew Wilcox
> - Fix in library function makes API safer by default, preventing callers from needing
> to remember cleanup
> - Use shared cleanup label that both restore: and success: paths jump to
> - Clean up unused spare node on both success and error exit paths
> v2:
> - Call xas_destroy() on both success and failure
> - Explained retry semantics and xa_alloc / concurrency risk
> - Dropped cleanup_empty_nodes from previous proposal
> lib/xarray.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/lib/xarray.c b/lib/xarray.c
> index 9a8b4916540c..a924421c0c4c 100644
> --- a/lib/xarray.c
> +++ b/lib/xarray.c
> @@ -744,11 +744,17 @@ void xas_create_range(struct xa_state *xas)
> xas->xa_shift = shift;
> xas->xa_sibs = sibs;
> xas->xa_index = index;
> - return;
> + goto cleanup;
> +
> success:
> xas->xa_index = index;
> if (xas->xa_node)
> xas_set_offset(xas);
> +
> +cleanup:
> + /* Free any unused spare node from xas_nomem() */
> + if (xas->xa_alloc)
> + xas_destroy(xas);
The first thing xas_destroy() does is check whether xa_alloc is set.
I'd assume that the compiler is smart enough to inline xas_destroy()
completely here, so likely the xa_alloc check here can just be dropped.
Staring at xas_destroy() callers, we only have a single one outside of
lib: mm/huge_memory.c:__folio_split()
Is that one still required?
--
Cheers
David
© 2016 - 2025 Red Hat, Inc.