lib/xarray.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
xas_create_range() is typically called in a retry loop that uses
xas_nomem() to handle -ENOMEM errors. xas_nomem() may allocate a spare
xa_node and store it in xas->xa_alloc for use in the retry.
If the lock is dropped after xas_nomem(), another thread can expand the
xarray tree in the meantime. On the next retry, xas_create_range() can
then succeed without consuming the spare node stored in xas->xa_alloc.
If the function returns without freeing this spare node, it leaks.
xas_create_range() calls xas_create() multiple times in a loop for
different index ranges. A spare node that isn't needed for one range
iteration might be needed for the next, so we cannot free it after each
xas_create() call. We can only safely free it after xas_create_range()
completes.
Fix this by calling xas_destroy() at the end of xas_create_range() to
free any unused spare node. This makes the API safer by default and
prevents callers from needing to remember cleanup.
This fixes a memory leak in mm/khugepaged.c and potentially other
callers that use xas_nomem() with xas_create_range().
Link: https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
---
v3:
- Move fix from collapse_file() to xas_create_range() as suggested by Matthew Wilcox
- Fix in library function makes API safer by default, preventing callers from needing
to remember cleanup
- Use shared cleanup label that both restore: and success: paths jump to
- Clean up unused spare node on both success and error exit paths
v2:
- Call xas_destroy() on both success and failure
- Explained retry semantics and xa_alloc / concurrency risk
- Dropped cleanup_empty_nodes from previous proposal
lib/xarray.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/lib/xarray.c b/lib/xarray.c
index 9a8b4916540c..a924421c0c4c 100644
--- a/lib/xarray.c
+++ b/lib/xarray.c
@@ -744,11 +744,17 @@ void xas_create_range(struct xa_state *xas)
xas->xa_shift = shift;
xas->xa_sibs = sibs;
xas->xa_index = index;
- return;
+ goto cleanup;
+
success:
xas->xa_index = index;
if (xas->xa_node)
xas_set_offset(xas);
+
+cleanup:
+ /* Free any unused spare node from xas_nomem() */
+ if (xas->xa_alloc)
+ xas_destroy(xas);
}
EXPORT_SYMBOL_GPL(xas_create_range);
--
2.34.1
On 12/1/25 08:45, Shardul Bankar wrote:
Please don't post new versions as reply to old versions.
> xas_create_range() is typically called in a retry loop that uses
> xas_nomem() to handle -ENOMEM errors. xas_nomem() may allocate a spare
> xa_node and store it in xas->xa_alloc for use in the retry.
>
> If the lock is dropped after xas_nomem(), another thread can expand the
> xarray tree in the meantime. On the next retry, xas_create_range() can
> then succeed without consuming the spare node stored in xas->xa_alloc.
> If the function returns without freeing this spare node, it leaks.
>
> xas_create_range() calls xas_create() multiple times in a loop for
> different index ranges. A spare node that isn't needed for one range
> iteration might be needed for the next, so we cannot free it after each
> xas_create() call. We can only safely free it after xas_create_range()
> completes.
>
> Fix this by calling xas_destroy() at the end of xas_create_range() to
> free any unused spare node. This makes the API safer by default and
> prevents callers from needing to remember cleanup.
>
> This fixes a memory leak in mm/khugepaged.c and potentially other
> callers that use xas_nomem() with xas_create_range().
>
> Link: https://syzkaller.appspot.com/bug?id=a274d65fc733448ed518ad15481ed575669dd98c
> Fixes: cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
> Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
> ---
> v3:
> - Move fix from collapse_file() to xas_create_range() as suggested by Matthew Wilcox
> - Fix in library function makes API safer by default, preventing callers from needing
> to remember cleanup
> - Use shared cleanup label that both restore: and success: paths jump to
> - Clean up unused spare node on both success and error exit paths
> v2:
> - Call xas_destroy() on both success and failure
> - Explained retry semantics and xa_alloc / concurrency risk
> - Dropped cleanup_empty_nodes from previous proposal
> lib/xarray.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/lib/xarray.c b/lib/xarray.c
> index 9a8b4916540c..a924421c0c4c 100644
> --- a/lib/xarray.c
> +++ b/lib/xarray.c
> @@ -744,11 +744,17 @@ void xas_create_range(struct xa_state *xas)
> xas->xa_shift = shift;
> xas->xa_sibs = sibs;
> xas->xa_index = index;
> - return;
> + goto cleanup;
> +
> success:
> xas->xa_index = index;
> if (xas->xa_node)
> xas_set_offset(xas);
> +
> +cleanup:
> + /* Free any unused spare node from xas_nomem() */
> + if (xas->xa_alloc)
> + xas_destroy(xas);
The first thing xas_destroy() does is check whether xa_alloc is set.
I'd assume that the compiler is smart enough to inline xas_destroy()
completely here, so likely the xa_alloc check here can just be dropped.
Staring at xas_destroy() callers, we only have a single one outside of
lib: mm/huge_memory.c:__folio_split()
Is that one still required?
--
Cheers
David
On Mon, 2025-12-01 at 09:39 +0100, David Hildenbrand (Red Hat) wrote: > Please don't post new versions as reply to old versions. > ... > > ... > The first thing xas_destroy() does is check whether xa_alloc is set. > > I'd assume that the compiler is smart enough to inline xas_destroy() > completely here, so likely the xa_alloc check here can just be > dropped. Got it, will share a v4 of the patch on a new chain with redundant xas_destroy() removed. > Staring at xas_destroy() callers, we only have a single one outside > of > lib: mm/huge_memory.c:__folio_split() > > Is that one still required? I checked the callers of xas_destroy(). Apart from the internal uses in lib/xarray.c and the unit tests in lib/test_xarray.c, the only external user is indeed mm/huge_memory.c:__folio_split(). That path is slightly different from the xas_nomem() retry loop I fixed in xas_create_range(): __folio_split() goes through xas_split_alloc() and then xas_split() / xas_try_split(), which allocate and consume nodes via xas->xa_alloc. The final xas_destroy(&xas) in __folio_split() is there to drop any leftover split-allocation nodes, not the xas_nomem() spare node I handled in xas_create_range(). So with the current code I don’t think I can safely declare that xas_destroy() in __folio_split() is redundant- it still acts as the last cleanup for the split helpers. For v4 I’d therefore like to keep the scope focused on the syzkaller leak and just drop the redundant "if (xa_alloc)" around xas_destroy() in xas_create_range() as you suggested. Separately, I agree it would be cleaner if the split helpers guaranteed that xa_alloc is always cleared on return, so callers never have to think about xas_destroy(). I can take a closer look at xas_split() / xas_try_split() and, if it looks sound, propose a small follow-up series that makes their cleanup behaviour explicit and then removes the xas_destroy() from __folio_split(). Thanks, Shardul
On 12/4/25 15:15, Shardul Bankar wrote: > On Mon, 2025-12-01 at 09:39 +0100, David Hildenbrand (Red Hat) wrote: >> Please don't post new versions as reply to old versions. >> ... >> >> ... >> The first thing xas_destroy() does is check whether xa_alloc is set. >> >> I'd assume that the compiler is smart enough to inline xas_destroy() >> completely here, so likely the xa_alloc check here can just be >> dropped. > > Got it, will share a v4 of the patch on a new chain with redundant > xas_destroy() removed. > >> Staring at xas_destroy() callers, we only have a single one outside >> of >> lib: mm/huge_memory.c:__folio_split() >> >> Is that one still required? > > I checked the callers of xas_destroy(). Apart from the internal uses in > lib/xarray.c and the unit tests in lib/test_xarray.c, the only external > user is indeed mm/huge_memory.c:__folio_split(). > > That path is slightly different from the xas_nomem() retry loop I fixed > in xas_create_range(): > > __folio_split() goes through xas_split_alloc() and then > xas_split() / xas_try_split(), which allocate and consume nodes via > xas->xa_alloc. > > The final xas_destroy(&xas) in __folio_split() is there to > drop any leftover split-allocation nodes, not the xas_nomem() spare > node I handled in xas_create_range(). > > So with the current code I don’t think I can safely declare that > xas_destroy() in __folio_split() is redundant- it still acts as the > last cleanup for the split helpers. > > For v4 I’d therefore like to keep the scope focused on the syzkaller > leak and just drop the redundant "if (xa_alloc)" around xas_destroy() > in xas_create_range() as you suggested. > > Separately, I agree it would be cleaner if the split helpers guaranteed > that xa_alloc is always cleared on return, so callers never have to > think about xas_destroy(). I can take a closer look at xas_split() / > xas_try_split() and, if it looks sound, propose a small follow-up > series that makes their cleanup behaviour explicit and then removes the > xas_destroy() from __folio_split(). That makes sense, thanks. Handling it internally is certainly harder to mess up by callers ... -- Cheers David
© 2016 - 2026 Red Hat, Inc.