xfs: detect cycles in recovered unlinked inode lists

[PATCH] xfs: detect cycles in recovered unlinked inode lists

Posted by Michael Bommarito 1 week ago

Two XFS walkers follow on-disk AGI unlinked bucket lists until the
NULLAGINO sentinel and trust the links unconditionally:

  xlog_recover_iunlink_bucket(), reached at mount time when the log
  needs recovery, via xlog_recover_finish() ->
  xlog_recover_process_iunlinks().

  xfs_inode_reload_unlinked_bucket(), reached post-mount via
  xfs_bulkstat_one_int() and other inode-reload paths.

A crafted image with an AGI bucket head pointing into a cycle traps
either walker forever. With the mount-time walker, the mount(8)
syscall itself never returns; the kernel thread is uninterruptible
inside the loop, so SIGKILL on the mount process is queued but never
delivered. With the bulkstat walker, an XFS_IOC_BULKSTAT call hangs
the same way.

Reject invalid AG inode numbers, bucket mismatches, and repeated AG
inode numbers in both walkers. Mark the AGI sick and return
-EFSCORRUPTED so callers handle the corrupt metadata instead of
walking the same on-disk list forever.

Reproduced on a crafted image whose AGI bucket holds a self-cycle and
whose on-disk log is dirty: a stock kernel hangs in the mount syscall
indefinitely, while the patched kernel completes log recovery,
reports the corruption, and the filesystem shuts down with
-EFSCORRUPTED.

Fixes: 04755d2e5821b3afbaadd09fe5df58d04de36484 ("xfs: refactor xlog_recover_process_iunlinks()")
Fixes: 83771c50e42b92de6740a63e152c96c052d37736 ("xfs: reload entire unlinked bucket lists")
Assisted-by: Codex:gpt-5-5-xhigh
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
 fs/xfs/xfs_inode.c       | 29 +++++++++++++++++++++++++++++
 fs/xfs/xfs_log_recover.c | 26 +++++++++++++++++++++++++-
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index beaa26ec62da4..f930d5c823c77 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -4,6 +4,7 @@
  * All Rights Reserved.
  */
 #include <linux/iversion.h>
+#include <linux/xarray.h>
 
 #include "xfs_platform.h"
 #include "xfs_fs.h"
@@ -2859,6 +2860,7 @@ xfs_inode_reload_unlinked_bucket(
 	struct xfs_buf		*agibp;
 	struct xfs_agi		*agi;
 	struct xfs_perag	*pag;
+	struct xarray		seen_aginos;
 	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, ip->i_ino);
 	xfs_agino_t		agino = XFS_INO_TO_AGINO(mp, ip->i_ino);
 	xfs_agino_t		prev_agino, next_agino;
@@ -2873,6 +2875,8 @@ xfs_inode_reload_unlinked_bucket(
 	if (error)
 		return error;
 
+	xa_init(&seen_aginos);
+
 	/*
 	 * We've taken ILOCK_SHARED and the AGI buffer lock to stabilize the
 	 * incore unlinked list pointers for this inode.  Check once more to
@@ -2897,6 +2901,30 @@ xfs_inode_reload_unlinked_bucket(
 	while (next_agino != NULLAGINO) {
 		struct xfs_inode	*next_ip = NULL;
 
+		/*
+		 * The on-disk unlinked list is corrupt if it points outside this
+		 * AG, into another bucket, or back to an inode that we already
+		 * saw during this reload walk.
+		 */
+		if (!xfs_verify_agino(pag, next_agino) ||
+		    next_agino % XFS_AGI_UNLINKED_BUCKETS != bucket) {
+			xfs_buf_mark_corrupt(agibp);
+			xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI);
+			error = -EFSCORRUPTED;
+			break;
+		}
+
+		if (xa_load(&seen_aginos, next_agino)) {
+			xfs_buf_mark_corrupt(agibp);
+			xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI);
+			error = -EFSCORRUPTED;
+			break;
+		}
+		error = xa_err(xa_store(&seen_aginos, next_agino,
+					xa_mk_value(1), GFP_NOFS));
+		if (error)
+			break;
+
 		/* Found this caller's inode, set its backlink. */
 		if (next_agino == agino) {
 			next_ip = ip;
@@ -2931,6 +2959,7 @@ xfs_inode_reload_unlinked_bucket(
 	}
 
 out_agibp:
+	xa_destroy(&seen_aginos);
 	xfs_trans_brelse(tp, agibp);
 	/* Should have found this inode somewhere in the iunlinked bucket. */
 	if (!error && !foundit)
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 09e6678ca4878..8ca70bbbfac81 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3,6 +3,8 @@
  * Copyright (c) 2000-2006 Silicon Graphics, Inc.
  * All Rights Reserved.
  */
+#include <linux/xarray.h>
+
 #include "xfs_platform.h"
 #include "xfs_fs.h"
 #include "xfs_shared.h"
@@ -28,6 +30,7 @@
 #include "xfs_ag.h"
 #include "xfs_quota.h"
 #include "xfs_reflink.h"
+#include "xfs_health.h"
 
 #define BLK_AVG(blk1, blk2)	((blk1+blk2) >> 1)
 
@@ -2726,11 +2729,31 @@ xlog_recover_iunlink_bucket(
 	struct xfs_mount	*mp = pag_mount(pag);
 	struct xfs_inode	*prev_ip = NULL;
 	struct xfs_inode	*ip;
+	struct xarray		seen_aginos;
 	xfs_agino_t		prev_agino, agino;
 	int			error = 0;
 
+	xa_init(&seen_aginos);
+
 	agino = be32_to_cpu(agi->agi_unlinked[bucket]);
 	while (agino != NULLAGINO) {
+		if (!xfs_verify_agino(pag, agino) ||
+		    agino % XFS_AGI_UNLINKED_BUCKETS != bucket) {
+			xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI);
+			error = -EFSCORRUPTED;
+			break;
+		}
+
+		if (xa_load(&seen_aginos, agino)) {
+			xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI);
+			error = -EFSCORRUPTED;
+			break;
+		}
+		error = xa_err(xa_store(&seen_aginos, agino, xa_mk_value(1),
+					GFP_NOFS));
+		if (error)
+			break;
+
 		error = xfs_iget(mp, NULL, xfs_agino_to_ino(pag, agino), 0, 0,
 				&ip);
 		if (error)
@@ -2771,8 +2794,9 @@ xlog_recover_iunlink_bucket(
 
 		error2 = xfs_inodegc_flush(mp);
 		if (error2 && !error)
-			return error2;
+			error = error2;
 	}
+	xa_destroy(&seen_aginos);
 	return error;
 }
 
-- 
2.53.0

Re: [PATCH] xfs: detect cycles in recovered unlinked inode lists

Posted by Darrick J. Wong 6 days ago

On Sun, May 17, 2026 at 09:31:43PM -0400, Michael Bommarito wrote:
> Two XFS walkers follow on-disk AGI unlinked bucket lists until the
> NULLAGINO sentinel and trust the links unconditionally:
> 
>   xlog_recover_iunlink_bucket(), reached at mount time when the log
>   needs recovery, via xlog_recover_finish() ->
>   xlog_recover_process_iunlinks().
> 
>   xfs_inode_reload_unlinked_bucket(), reached post-mount via
>   xfs_bulkstat_one_int() and other inode-reload paths.
> 
> A crafted image with an AGI bucket head pointing into a cycle traps
> either walker forever. With the mount-time walker, the mount(8)
> syscall itself never returns; the kernel thread is uninterruptible
> inside the loop, so SIGKILL on the mount process is queued but never
> delivered. With the bulkstat walker, an XFS_IOC_BULKSTAT call hangs
> the same way.
> 
> Reject invalid AG inode numbers, bucket mismatches, and repeated AG
> inode numbers in both walkers. Mark the AGI sick and return
> -EFSCORRUPTED so callers handle the corrupt metadata instead of
> walking the same on-disk list forever.
> 
> Reproduced on a crafted image whose AGI bucket holds a self-cycle and
> whose on-disk log is dirty: a stock kernel hangs in the mount syscall
> indefinitely, while the patched kernel completes log recovery,
> reports the corruption, and the filesystem shuts down with
> -EFSCORRUPTED.

(Does xfs_repair fix the problem if the mount fails?)

> Fixes: 04755d2e5821b3afbaadd09fe5df58d04de36484 ("xfs: refactor xlog_recover_process_iunlinks()")
> Fixes: 83771c50e42b92de6740a63e152c96c052d37736 ("xfs: reload entire unlinked bucket lists")
> Assisted-by: Codex:gpt-5-5-xhigh
> Assisted-by: Claude:claude-opus-4-7
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> ---
>  fs/xfs/xfs_inode.c       | 29 +++++++++++++++++++++++++++++
>  fs/xfs/xfs_log_recover.c | 26 +++++++++++++++++++++++++-
>  2 files changed, 54 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index beaa26ec62da4..f930d5c823c77 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -4,6 +4,7 @@
>   * All Rights Reserved.
>   */
>  #include <linux/iversion.h>
> +#include <linux/xarray.h>
>  
>  #include "xfs_platform.h"
>  #include "xfs_fs.h"
> @@ -2859,6 +2860,7 @@ xfs_inode_reload_unlinked_bucket(
>  	struct xfs_buf		*agibp;
>  	struct xfs_agi		*agi;
>  	struct xfs_perag	*pag;
> +	struct xarray		seen_aginos;
>  	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, ip->i_ino);
>  	xfs_agino_t		agino = XFS_INO_TO_AGINO(mp, ip->i_ino);
>  	xfs_agino_t		prev_agino, next_agino;
> @@ -2873,6 +2875,8 @@ xfs_inode_reload_unlinked_bucket(
>  	if (error)
>  		return error;
>  
> +	xa_init(&seen_aginos);

Why not use xbitmap.h for this?  Is it too memory-inefficient over an
xarray?

> +
>  	/*
>  	 * We've taken ILOCK_SHARED and the AGI buffer lock to stabilize the
>  	 * incore unlinked list pointers for this inode.  Check once more to
> @@ -2897,6 +2901,30 @@ xfs_inode_reload_unlinked_bucket(
>  	while (next_agino != NULLAGINO) {
>  		struct xfs_inode	*next_ip = NULL;
>  
> +		/*
> +		 * The on-disk unlinked list is corrupt if it points outside this
> +		 * AG, into another bucket, or back to an inode that we already
> +		 * saw during this reload walk.
> +		 */
> +		if (!xfs_verify_agino(pag, next_agino) ||
> +		    next_agino % XFS_AGI_UNLINKED_BUCKETS != bucket) {

Does online fsck need any corrections w.r.t. these new findings?

--D

> +			xfs_buf_mark_corrupt(agibp);
> +			xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI);
> +			error = -EFSCORRUPTED;
> +			break;
> +		}
> +
> +		if (xa_load(&seen_aginos, next_agino)) {
> +			xfs_buf_mark_corrupt(agibp);
> +			xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI);
> +			error = -EFSCORRUPTED;
> +			break;
> +		}
> +		error = xa_err(xa_store(&seen_aginos, next_agino,
> +					xa_mk_value(1), GFP_NOFS));
> +		if (error)
> +			break;
> +
>  		/* Found this caller's inode, set its backlink. */
>  		if (next_agino == agino) {
>  			next_ip = ip;
> @@ -2931,6 +2959,7 @@ xfs_inode_reload_unlinked_bucket(
>  	}
>  
>  out_agibp:
> +	xa_destroy(&seen_aginos);
>  	xfs_trans_brelse(tp, agibp);
>  	/* Should have found this inode somewhere in the iunlinked bucket. */
>  	if (!error && !foundit)
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 09e6678ca4878..8ca70bbbfac81 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -3,6 +3,8 @@
>   * Copyright (c) 2000-2006 Silicon Graphics, Inc.
>   * All Rights Reserved.
>   */
> +#include <linux/xarray.h>
> +
>  #include "xfs_platform.h"
>  #include "xfs_fs.h"
>  #include "xfs_shared.h"
> @@ -28,6 +30,7 @@
>  #include "xfs_ag.h"
>  #include "xfs_quota.h"
>  #include "xfs_reflink.h"
> +#include "xfs_health.h"
>  
>  #define BLK_AVG(blk1, blk2)	((blk1+blk2) >> 1)
>  
> @@ -2726,11 +2729,31 @@ xlog_recover_iunlink_bucket(
>  	struct xfs_mount	*mp = pag_mount(pag);
>  	struct xfs_inode	*prev_ip = NULL;
>  	struct xfs_inode	*ip;
> +	struct xarray		seen_aginos;
>  	xfs_agino_t		prev_agino, agino;
>  	int			error = 0;
>  
> +	xa_init(&seen_aginos);
> +
>  	agino = be32_to_cpu(agi->agi_unlinked[bucket]);
>  	while (agino != NULLAGINO) {
> +		if (!xfs_verify_agino(pag, agino) ||
> +		    agino % XFS_AGI_UNLINKED_BUCKETS != bucket) {
> +			xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI);
> +			error = -EFSCORRUPTED;
> +			break;
> +		}
> +
> +		if (xa_load(&seen_aginos, agino)) {
> +			xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI);
> +			error = -EFSCORRUPTED;
> +			break;
> +		}
> +		error = xa_err(xa_store(&seen_aginos, agino, xa_mk_value(1),
> +					GFP_NOFS));
> +		if (error)
> +			break;
> +
>  		error = xfs_iget(mp, NULL, xfs_agino_to_ino(pag, agino), 0, 0,
>  				&ip);
>  		if (error)
> @@ -2771,8 +2794,9 @@ xlog_recover_iunlink_bucket(
>  
>  		error2 = xfs_inodegc_flush(mp);
>  		if (error2 && !error)
> -			return error2;
> +			error = error2;
>  	}
> +	xa_destroy(&seen_aginos);
>  	return error;
>  }
>  
> -- 
> 2.53.0
>

Re: [PATCH] xfs: detect cycles in recovered unlinked inode lists

Posted by Christoph Hellwig 5 days, 20 hours ago

On Mon, May 18, 2026 at 10:02:35PM -0700, Darrick J. Wong wrote:
> (Does xfs_repair fix the problem if the mount fails?)

More importantly, what are we trying to fix here?  What is the
motivation?  And how are we going to test this code, as right now
we trade a potential issue with a crafted malicious image (which isn't
our usual threat model) against adding new code run at mountime without
any test coverage.

Re: [PATCH] xfs: detect cycles in recovered unlinked inode lists

Posted by Michael Bommarito 5 days, 16 hours ago

Hi Christoph,

This research agenda and patch comes from a hardening project
focused on automount filesystems in major modern distros.
Maybe the USB stick threat model isn't worth the patch for
just DoS impact, but that's at least why I thought it worth
submitting in the first place.

I looked more closely at current xfsprogs for-next
to answer Darrick's repair question.  Plain xfs_repair refuses
the reproducer because the log is dirty and needs replay; that
replay is the mount-time path that hangs on the vulnerable kernel.

xfs_repair -n diagnoses the AGI unlinked bucket and inode
next_unlinked damage, and xfs_repair -L clears the bucket /
next_unlinked state and repairs the image, with the usual log-loss
caveat.

I also found a narrower userspace diagnostic issue: xfs_db
dump_iunlinked follows di_next_unlinked until NULLAGINO and will
spin on the same self-cycle.  xfs_repair itself does not appear to
use that unbounded walker in the tested repair path, and xfs_scrub
userspace is mostly driving the kernel scrub ioctls.

To be fully safe from automount attacks, I think both sides would
need to be addressed.

Your bigger point about regressions and test coverage is still
valid though.  FWIW, I have been talking with some other fs
maintainers about adding test/ fuzzing coverage either in-tree
or external.  I would be happy to help with add XFS coverage
as part of that project, although I'd defer to your preference
about where those tests should actually live.

All that said, just let me know if you still want a v2 based
on the threat model.

Thanks,
Mike

Re: [PATCH] xfs: detect cycles in recovered unlinked inode lists

Posted by Carlos Maiolino 4 days, 16 hours ago

Please leave the message you're replying to quoted in your email...

On Tue, May 19, 2026 at 08:48:33AM -0400, Michael Bommarito wrote:
> Hi Christoph,
> 
> This research agenda and patch comes from a hardening project
> focused on automount filesystems in major modern distros.
> Maybe the USB stick threat model isn't worth the patch for
> just DoS impact, but that's at least why I thought it worth
> submitting in the first place.

The patch ain't completely invalid, but by the time you've somebody
untrusted with physical access to the system to put a USB stick on,
your security is already gone.
Automounting filesystems (a thing we've been fighting against for
years) just adds on top of it.

> 
> I looked more closely at current xfsprogs for-next
> to answer Darrick's repair question.  Plain xfs_repair refuses
> the reproducer because the log is dirty and needs replay; that
> replay is the mount-time path that hangs on the vulnerable kernel.
> 
> xfs_repair -n diagnoses the AGI unlinked bucket and inode
> next_unlinked damage, and xfs_repair -L clears the bucket /
> next_unlinked state and repairs the image, with the usual log-loss
> caveat.
> 
> I also found a narrower userspace diagnostic issue: xfs_db
> dump_iunlinked follows di_next_unlinked until NULLAGINO and will
> spin on the same self-cycle.  xfs_repair itself does not appear to
> use that unbounded walker in the tested repair path, and xfs_scrub
> userspace is mostly driving the kernel scrub ioctls.
> 
> To be fully safe from automount attacks, I think both sides would
> need to be addressed.
> 
> Your bigger point about regressions and test coverage is still
> valid though.  FWIW, I have been talking with some other fs
> maintainers about adding test/ fuzzing coverage either in-tree
> or external.  I would be happy to help with add XFS coverage
> as part of that project, although I'd defer to your preference
> about where those tests should actually live.

I don't think such tests belongs neither in-tree nor in xfstests, but
this seems a good idea to start a project related to test filesystems
resiliency against those issues. syscaller already does a lot of fuzzing
but mostly random and not containing a bunch of carefully crafted
filesystems to triggers specific issues.

> All that said, just let me know if you still want a v2 based
> on the threat model.

I don't think this is something we want to have without having a
reliable way to test this code to fix a theoretical attack vector, but,
if by any chance it comes to a v2, there are a lot of duplicated code
that should probably belong into its own helper.

Cheers,
Carlos

Re: [PATCH] xfs: detect cycles in recovered unlinked inode lists

Posted by Michael Bommarito 4 days, 16 hours ago

On Wed, May 20, 2026 at 8:19 AM Carlos Maiolino <cem@kernel.org> wrote:
>
> Please leave the message you're replying to quoted in your email...

(Sorry about that. Christoph's message was never actually delivered to
my @gmail.com so I had to respond like it was 1999.  @Christoph, not
sure if on my end or yours...)

> The patch ain't completely invalid, but by the time you've somebody
> untrusted with physical access to the system to put a USB stick on,
> your security is already gone.
> Automounting filesystems (a thing we've been fighting against for
> years) just adds on top of it.

I get it.  Better a known and accepted risk than hidden.

> I don't think such tests belongs neither in-tree nor in xfstests, but
> this seems a good idea to start a project related to test filesystems
> resiliency against those issues. syscaller already does a lot of fuzzing
> but mostly random and not containing a bunch of carefully crafted
> filesystems to triggers specific issues.

Yes, and this is where LLMs have been very helpful in particular.
Given an AST or parser / format specification, you can delegate the
mechanics of implementing hundreds of different tests / perturbations.
Many of those strategies are analogous across file systems.  For
example, this XFS finding was found by expanding from my NTFS finding
here:
https://lore.kernel.org/all/20260517234140.1261718-1-michael.bommarito@gmail.com/

I found a little prior work but nothing that seems active and comprehensive:
https://github.com/sslab-gatech/janus
https://github.com/stevegrubb/fsfuzzer

I'm close to finishing triage and disclosure for all of the stuff I've
found so far, so as soon as that's done, I'll share what I have
publicly and RFC to continue this thread separately.

In the meantime, it sounds like we just park this until we know it
won't break things.