fs/xfs/xfs_inode.c | 29 +++++++++++++++++++++++++++++ fs/xfs/xfs_log_recover.c | 26 +++++++++++++++++++++++++- 2 files changed, 54 insertions(+), 1 deletion(-)
Two XFS walkers follow on-disk AGI unlinked bucket lists until the
NULLAGINO sentinel and trust the links unconditionally:
xlog_recover_iunlink_bucket(), reached at mount time when the log
needs recovery, via xlog_recover_finish() ->
xlog_recover_process_iunlinks().
xfs_inode_reload_unlinked_bucket(), reached post-mount via
xfs_bulkstat_one_int() and other inode-reload paths.
A crafted image with an AGI bucket head pointing into a cycle traps
either walker forever. With the mount-time walker, the mount(8)
syscall itself never returns; the kernel thread is uninterruptible
inside the loop, so SIGKILL on the mount process is queued but never
delivered. With the bulkstat walker, an XFS_IOC_BULKSTAT call hangs
the same way.
Reject invalid AG inode numbers, bucket mismatches, and repeated AG
inode numbers in both walkers. Mark the AGI sick and return
-EFSCORRUPTED so callers handle the corrupt metadata instead of
walking the same on-disk list forever.
Reproduced on a crafted image whose AGI bucket holds a self-cycle and
whose on-disk log is dirty: a stock kernel hangs in the mount syscall
indefinitely, while the patched kernel completes log recovery,
reports the corruption, and the filesystem shuts down with
-EFSCORRUPTED.
Fixes: 04755d2e5821b3afbaadd09fe5df58d04de36484 ("xfs: refactor xlog_recover_process_iunlinks()")
Fixes: 83771c50e42b92de6740a63e152c96c052d37736 ("xfs: reload entire unlinked bucket lists")
Assisted-by: Codex:gpt-5-5-xhigh
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
fs/xfs/xfs_inode.c | 29 +++++++++++++++++++++++++++++
fs/xfs/xfs_log_recover.c | 26 +++++++++++++++++++++++++-
2 files changed, 54 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index beaa26ec62da4..f930d5c823c77 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -4,6 +4,7 @@
* All Rights Reserved.
*/
#include <linux/iversion.h>
+#include <linux/xarray.h>
#include "xfs_platform.h"
#include "xfs_fs.h"
@@ -2859,6 +2860,7 @@ xfs_inode_reload_unlinked_bucket(
struct xfs_buf *agibp;
struct xfs_agi *agi;
struct xfs_perag *pag;
+ struct xarray seen_aginos;
xfs_agnumber_t agno = XFS_INO_TO_AGNO(mp, ip->i_ino);
xfs_agino_t agino = XFS_INO_TO_AGINO(mp, ip->i_ino);
xfs_agino_t prev_agino, next_agino;
@@ -2873,6 +2875,8 @@ xfs_inode_reload_unlinked_bucket(
if (error)
return error;
+ xa_init(&seen_aginos);
+
/*
* We've taken ILOCK_SHARED and the AGI buffer lock to stabilize the
* incore unlinked list pointers for this inode. Check once more to
@@ -2897,6 +2901,30 @@ xfs_inode_reload_unlinked_bucket(
while (next_agino != NULLAGINO) {
struct xfs_inode *next_ip = NULL;
+ /*
+ * The on-disk unlinked list is corrupt if it points outside this
+ * AG, into another bucket, or back to an inode that we already
+ * saw during this reload walk.
+ */
+ if (!xfs_verify_agino(pag, next_agino) ||
+ next_agino % XFS_AGI_UNLINKED_BUCKETS != bucket) {
+ xfs_buf_mark_corrupt(agibp);
+ xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI);
+ error = -EFSCORRUPTED;
+ break;
+ }
+
+ if (xa_load(&seen_aginos, next_agino)) {
+ xfs_buf_mark_corrupt(agibp);
+ xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI);
+ error = -EFSCORRUPTED;
+ break;
+ }
+ error = xa_err(xa_store(&seen_aginos, next_agino,
+ xa_mk_value(1), GFP_NOFS));
+ if (error)
+ break;
+
/* Found this caller's inode, set its backlink. */
if (next_agino == agino) {
next_ip = ip;
@@ -2931,6 +2959,7 @@ xfs_inode_reload_unlinked_bucket(
}
out_agibp:
+ xa_destroy(&seen_aginos);
xfs_trans_brelse(tp, agibp);
/* Should have found this inode somewhere in the iunlinked bucket. */
if (!error && !foundit)
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 09e6678ca4878..8ca70bbbfac81 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3,6 +3,8 @@
* Copyright (c) 2000-2006 Silicon Graphics, Inc.
* All Rights Reserved.
*/
+#include <linux/xarray.h>
+
#include "xfs_platform.h"
#include "xfs_fs.h"
#include "xfs_shared.h"
@@ -28,6 +30,7 @@
#include "xfs_ag.h"
#include "xfs_quota.h"
#include "xfs_reflink.h"
+#include "xfs_health.h"
#define BLK_AVG(blk1, blk2) ((blk1+blk2) >> 1)
@@ -2726,11 +2729,31 @@ xlog_recover_iunlink_bucket(
struct xfs_mount *mp = pag_mount(pag);
struct xfs_inode *prev_ip = NULL;
struct xfs_inode *ip;
+ struct xarray seen_aginos;
xfs_agino_t prev_agino, agino;
int error = 0;
+ xa_init(&seen_aginos);
+
agino = be32_to_cpu(agi->agi_unlinked[bucket]);
while (agino != NULLAGINO) {
+ if (!xfs_verify_agino(pag, agino) ||
+ agino % XFS_AGI_UNLINKED_BUCKETS != bucket) {
+ xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI);
+ error = -EFSCORRUPTED;
+ break;
+ }
+
+ if (xa_load(&seen_aginos, agino)) {
+ xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI);
+ error = -EFSCORRUPTED;
+ break;
+ }
+ error = xa_err(xa_store(&seen_aginos, agino, xa_mk_value(1),
+ GFP_NOFS));
+ if (error)
+ break;
+
error = xfs_iget(mp, NULL, xfs_agino_to_ino(pag, agino), 0, 0,
&ip);
if (error)
@@ -2771,8 +2794,9 @@ xlog_recover_iunlink_bucket(
error2 = xfs_inodegc_flush(mp);
if (error2 && !error)
- return error2;
+ error = error2;
}
+ xa_destroy(&seen_aginos);
return error;
}
--
2.53.0
On Sun, May 17, 2026 at 09:31:43PM -0400, Michael Bommarito wrote:
> Two XFS walkers follow on-disk AGI unlinked bucket lists until the
> NULLAGINO sentinel and trust the links unconditionally:
>
> xlog_recover_iunlink_bucket(), reached at mount time when the log
> needs recovery, via xlog_recover_finish() ->
> xlog_recover_process_iunlinks().
>
> xfs_inode_reload_unlinked_bucket(), reached post-mount via
> xfs_bulkstat_one_int() and other inode-reload paths.
>
> A crafted image with an AGI bucket head pointing into a cycle traps
> either walker forever. With the mount-time walker, the mount(8)
> syscall itself never returns; the kernel thread is uninterruptible
> inside the loop, so SIGKILL on the mount process is queued but never
> delivered. With the bulkstat walker, an XFS_IOC_BULKSTAT call hangs
> the same way.
>
> Reject invalid AG inode numbers, bucket mismatches, and repeated AG
> inode numbers in both walkers. Mark the AGI sick and return
> -EFSCORRUPTED so callers handle the corrupt metadata instead of
> walking the same on-disk list forever.
>
> Reproduced on a crafted image whose AGI bucket holds a self-cycle and
> whose on-disk log is dirty: a stock kernel hangs in the mount syscall
> indefinitely, while the patched kernel completes log recovery,
> reports the corruption, and the filesystem shuts down with
> -EFSCORRUPTED.
(Does xfs_repair fix the problem if the mount fails?)
> Fixes: 04755d2e5821b3afbaadd09fe5df58d04de36484 ("xfs: refactor xlog_recover_process_iunlinks()")
> Fixes: 83771c50e42b92de6740a63e152c96c052d37736 ("xfs: reload entire unlinked bucket lists")
> Assisted-by: Codex:gpt-5-5-xhigh
> Assisted-by: Claude:claude-opus-4-7
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> ---
> fs/xfs/xfs_inode.c | 29 +++++++++++++++++++++++++++++
> fs/xfs/xfs_log_recover.c | 26 +++++++++++++++++++++++++-
> 2 files changed, 54 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index beaa26ec62da4..f930d5c823c77 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -4,6 +4,7 @@
> * All Rights Reserved.
> */
> #include <linux/iversion.h>
> +#include <linux/xarray.h>
>
> #include "xfs_platform.h"
> #include "xfs_fs.h"
> @@ -2859,6 +2860,7 @@ xfs_inode_reload_unlinked_bucket(
> struct xfs_buf *agibp;
> struct xfs_agi *agi;
> struct xfs_perag *pag;
> + struct xarray seen_aginos;
> xfs_agnumber_t agno = XFS_INO_TO_AGNO(mp, ip->i_ino);
> xfs_agino_t agino = XFS_INO_TO_AGINO(mp, ip->i_ino);
> xfs_agino_t prev_agino, next_agino;
> @@ -2873,6 +2875,8 @@ xfs_inode_reload_unlinked_bucket(
> if (error)
> return error;
>
> + xa_init(&seen_aginos);
Why not use xbitmap.h for this? Is it too memory-inefficient over an
xarray?
> +
> /*
> * We've taken ILOCK_SHARED and the AGI buffer lock to stabilize the
> * incore unlinked list pointers for this inode. Check once more to
> @@ -2897,6 +2901,30 @@ xfs_inode_reload_unlinked_bucket(
> while (next_agino != NULLAGINO) {
> struct xfs_inode *next_ip = NULL;
>
> + /*
> + * The on-disk unlinked list is corrupt if it points outside this
> + * AG, into another bucket, or back to an inode that we already
> + * saw during this reload walk.
> + */
> + if (!xfs_verify_agino(pag, next_agino) ||
> + next_agino % XFS_AGI_UNLINKED_BUCKETS != bucket) {
Does online fsck need any corrections w.r.t. these new findings?
--D
> + xfs_buf_mark_corrupt(agibp);
> + xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI);
> + error = -EFSCORRUPTED;
> + break;
> + }
> +
> + if (xa_load(&seen_aginos, next_agino)) {
> + xfs_buf_mark_corrupt(agibp);
> + xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI);
> + error = -EFSCORRUPTED;
> + break;
> + }
> + error = xa_err(xa_store(&seen_aginos, next_agino,
> + xa_mk_value(1), GFP_NOFS));
> + if (error)
> + break;
> +
> /* Found this caller's inode, set its backlink. */
> if (next_agino == agino) {
> next_ip = ip;
> @@ -2931,6 +2959,7 @@ xfs_inode_reload_unlinked_bucket(
> }
>
> out_agibp:
> + xa_destroy(&seen_aginos);
> xfs_trans_brelse(tp, agibp);
> /* Should have found this inode somewhere in the iunlinked bucket. */
> if (!error && !foundit)
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 09e6678ca4878..8ca70bbbfac81 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -3,6 +3,8 @@
> * Copyright (c) 2000-2006 Silicon Graphics, Inc.
> * All Rights Reserved.
> */
> +#include <linux/xarray.h>
> +
> #include "xfs_platform.h"
> #include "xfs_fs.h"
> #include "xfs_shared.h"
> @@ -28,6 +30,7 @@
> #include "xfs_ag.h"
> #include "xfs_quota.h"
> #include "xfs_reflink.h"
> +#include "xfs_health.h"
>
> #define BLK_AVG(blk1, blk2) ((blk1+blk2) >> 1)
>
> @@ -2726,11 +2729,31 @@ xlog_recover_iunlink_bucket(
> struct xfs_mount *mp = pag_mount(pag);
> struct xfs_inode *prev_ip = NULL;
> struct xfs_inode *ip;
> + struct xarray seen_aginos;
> xfs_agino_t prev_agino, agino;
> int error = 0;
>
> + xa_init(&seen_aginos);
> +
> agino = be32_to_cpu(agi->agi_unlinked[bucket]);
> while (agino != NULLAGINO) {
> + if (!xfs_verify_agino(pag, agino) ||
> + agino % XFS_AGI_UNLINKED_BUCKETS != bucket) {
> + xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI);
> + error = -EFSCORRUPTED;
> + break;
> + }
> +
> + if (xa_load(&seen_aginos, agino)) {
> + xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI);
> + error = -EFSCORRUPTED;
> + break;
> + }
> + error = xa_err(xa_store(&seen_aginos, agino, xa_mk_value(1),
> + GFP_NOFS));
> + if (error)
> + break;
> +
> error = xfs_iget(mp, NULL, xfs_agino_to_ino(pag, agino), 0, 0,
> &ip);
> if (error)
> @@ -2771,8 +2794,9 @@ xlog_recover_iunlink_bucket(
>
> error2 = xfs_inodegc_flush(mp);
> if (error2 && !error)
> - return error2;
> + error = error2;
> }
> + xa_destroy(&seen_aginos);
> return error;
> }
>
> --
> 2.53.0
>
On Mon, May 18, 2026 at 10:02:35PM -0700, Darrick J. Wong wrote: > (Does xfs_repair fix the problem if the mount fails?) More importantly, what are we trying to fix here? What is the motivation? And how are we going to test this code, as right now we trade a potential issue with a crafted malicious image (which isn't our usual threat model) against adding new code run at mountime without any test coverage.
Hi Christoph, This research agenda and patch comes from a hardening project focused on automount filesystems in major modern distros. Maybe the USB stick threat model isn't worth the patch for just DoS impact, but that's at least why I thought it worth submitting in the first place. I looked more closely at current xfsprogs for-next to answer Darrick's repair question. Plain xfs_repair refuses the reproducer because the log is dirty and needs replay; that replay is the mount-time path that hangs on the vulnerable kernel. xfs_repair -n diagnoses the AGI unlinked bucket and inode next_unlinked damage, and xfs_repair -L clears the bucket / next_unlinked state and repairs the image, with the usual log-loss caveat. I also found a narrower userspace diagnostic issue: xfs_db dump_iunlinked follows di_next_unlinked until NULLAGINO and will spin on the same self-cycle. xfs_repair itself does not appear to use that unbounded walker in the tested repair path, and xfs_scrub userspace is mostly driving the kernel scrub ioctls. To be fully safe from automount attacks, I think both sides would need to be addressed. Your bigger point about regressions and test coverage is still valid though. FWIW, I have been talking with some other fs maintainers about adding test/ fuzzing coverage either in-tree or external. I would be happy to help with add XFS coverage as part of that project, although I'd defer to your preference about where those tests should actually live. All that said, just let me know if you still want a v2 based on the threat model. Thanks, Mike
Please leave the message you're replying to quoted in your email... On Tue, May 19, 2026 at 08:48:33AM -0400, Michael Bommarito wrote: > Hi Christoph, > > This research agenda and patch comes from a hardening project > focused on automount filesystems in major modern distros. > Maybe the USB stick threat model isn't worth the patch for > just DoS impact, but that's at least why I thought it worth > submitting in the first place. The patch ain't completely invalid, but by the time you've somebody untrusted with physical access to the system to put a USB stick on, your security is already gone. Automounting filesystems (a thing we've been fighting against for years) just adds on top of it. > > I looked more closely at current xfsprogs for-next > to answer Darrick's repair question. Plain xfs_repair refuses > the reproducer because the log is dirty and needs replay; that > replay is the mount-time path that hangs on the vulnerable kernel. > > xfs_repair -n diagnoses the AGI unlinked bucket and inode > next_unlinked damage, and xfs_repair -L clears the bucket / > next_unlinked state and repairs the image, with the usual log-loss > caveat. > > I also found a narrower userspace diagnostic issue: xfs_db > dump_iunlinked follows di_next_unlinked until NULLAGINO and will > spin on the same self-cycle. xfs_repair itself does not appear to > use that unbounded walker in the tested repair path, and xfs_scrub > userspace is mostly driving the kernel scrub ioctls. > > To be fully safe from automount attacks, I think both sides would > need to be addressed. > > Your bigger point about regressions and test coverage is still > valid though. FWIW, I have been talking with some other fs > maintainers about adding test/ fuzzing coverage either in-tree > or external. I would be happy to help with add XFS coverage > as part of that project, although I'd defer to your preference > about where those tests should actually live. I don't think such tests belongs neither in-tree nor in xfstests, but this seems a good idea to start a project related to test filesystems resiliency against those issues. syscaller already does a lot of fuzzing but mostly random and not containing a bunch of carefully crafted filesystems to triggers specific issues. > All that said, just let me know if you still want a v2 based > on the threat model. I don't think this is something we want to have without having a reliable way to test this code to fix a theoretical attack vector, but, if by any chance it comes to a v2, there are a lot of duplicated code that should probably belong into its own helper. Cheers, Carlos
On Wed, May 20, 2026 at 8:19 AM Carlos Maiolino <cem@kernel.org> wrote: > > Please leave the message you're replying to quoted in your email... (Sorry about that. Christoph's message was never actually delivered to my @gmail.com so I had to respond like it was 1999. @Christoph, not sure if on my end or yours...) > The patch ain't completely invalid, but by the time you've somebody > untrusted with physical access to the system to put a USB stick on, > your security is already gone. > Automounting filesystems (a thing we've been fighting against for > years) just adds on top of it. I get it. Better a known and accepted risk than hidden. > I don't think such tests belongs neither in-tree nor in xfstests, but > this seems a good idea to start a project related to test filesystems > resiliency against those issues. syscaller already does a lot of fuzzing > but mostly random and not containing a bunch of carefully crafted > filesystems to triggers specific issues. Yes, and this is where LLMs have been very helpful in particular. Given an AST or parser / format specification, you can delegate the mechanics of implementing hundreds of different tests / perturbations. Many of those strategies are analogous across file systems. For example, this XFS finding was found by expanding from my NTFS finding here: https://lore.kernel.org/all/20260517234140.1261718-1-michael.bommarito@gmail.com/ I found a little prior work but nothing that seems active and comprehensive: https://github.com/sslab-gatech/janus https://github.com/stevegrubb/fsfuzzer I'm close to finishing triage and disclosure for all of the stuff I've found so far, so as soon as that's done, I'll share what I have publicly and RFC to continue this thread separately. In the meantime, it sounds like we just park this until we know it won't break things.
© 2016 - 2026 Red Hat, Inc.