xfs: do not inactivate inodes on a failed mount

[PATCH v2] xfs: do not inactivate inodes on a failed mount

Posted by Mikhail Lobanov 5 days, 12 hours ago

A corrupt/crafted XFS image can make mount fail after background inode
inactivation has already been enabled.  xfs_mountfs() turns on inodegc
(xfs_inodegc_start()) right after log recovery, but the quota subsystem
(mp->m_quotainfo) is only allocated much later, in xfs_qm_newmount() /
xfs_qm_mount_quotas().  The quota accounting flags in mp->m_qflags are
parsed from the mount options before xfs_mountfs() even runs.

If the mount then aborts in between - e.g. xfs_rtmount_inodes() failing
with "failed to read RT inodes" - the unwind path inactivates the inodes
that are still queued, and xfs_inactive() calls xfs_qm_dqattach().  That
path trusts XFS_IS_QUOTA_ON() (the flag is set) and dereferences the not
yet allocated mp->m_quotainfo:

  XFS (loop0): failed to read RT inodes
  Oops: general protection fault, probably for non-canonical address
        0xdffffc000000002a: 0000 [#1] PREEMPT SMP KASAN NOPTI
  KASAN: null-ptr-deref in range [0x0000000000000150-0x0000000000000157]
  Workqueue: xfs-inodegc/loop0 xfs_inodegc_worker
  RIP: 0010:__mutex_lock+0xfe/0x930
  Call Trace:
   xfs_qm_dqget_cache_lookup+0x63/0x7f0
   xfs_qm_dqget_inode+0x336/0x860
   xfs_qm_dqattach_one+0x232/0x4e0
   xfs_qm_dqattach_locked+0x2c6/0x470
   xfs_qm_dqattach+0x46/0x70
   xfs_inactive+0x988/0xe80
   xfs_inodegc_worker+0x27c/0x730

The NULL m_quotainfo deref is only one symptom.  The deeper problem is
that a failed mount should not be inactivating inodes at all: it must not
write to the (possibly corrupt, only partially set up) persistent
metadata of a filesystem we just refused to mount, and the subsystems
inactivation relies on may not be initialised.

XFS already encodes this rule: xfs_inode_needs_inactive() returns false
when the mount is shut down ("If the log isn't running, push inodes
straight to reclaim"), so an inode destroyed on a shut down mount is
never queued for inactivation.  The gap is that this is only evaluated at
queue time; an inode queued while the mount was still live is then
inactivated by the worker even after the mount has been torn down.  Honour
the same invariant at gc time: in xfs_inodegc_inactivate(), skip
xfs_inactive() when the mount is shut down and just make the inode
reclaimable (xfs_inodegc_set_reclaimable() already handles the shutdown
case).  This is not a new policy, just consistency with the existing one.

Then, in the xfs_mountfs() failure path, shut the mount down before
flushing the inodegc queue, so the queued inodes are dropped to reclaim
instead of inactivated.  They are still pulled down so reclaim can free
them (which is why the flush was added in commit ab23a7768739 ("xfs:
per-cpu deferred inode inactivation queues")), but without touching the
on-disk structures - matching that comment's own "pull down all the state
and flee" intent.

Note that shutting down alone is not enough to stop the crash:
xfs_inactive() calls xfs_qm_dqattach() before any shutdown-sensitive
transaction, and neither xfs_qm_need_dqattach() nor xfs_qm_dqattach()
tests for shutdown - so the worker change is what actually closes it.

Found by fuzzing XFS with syzkaller (corrupt image mount); reproduced and
verified under QEMU/KASAN.

Fixes: ab23a7768739 ("xfs: per-cpu deferred inode inactivation queues")
Signed-off-by: Mikhail Lobanov <m.lobanov@rosa.ru>
---
v2: change approach after Christoph Hellwig's review of v1.  Instead of
    guarding xfs_qm_need_dqattach() in the quota code, do not inactivate
    inodes at all on a failed/shut-down mount: skip xfs_inactive() in the
    inodegc worker when the mount is shut down (consistent with
    xfs_inode_needs_inactive(), which already pushes inodes straight to
    reclaim on shutdown), and shut the mount down in the xfs_mountfs()
    failure path before flushing the inodegc queue.

Review of v1: https://lore.kernel.org/linux-xfs/ah6BIsvEitNW5Edb@infradead.org/

Open question: in the failure path I used xfs_force_shutdown(mp,
SHUTDOWN_FORCE_UMOUNT) to mark the fs down.  It logs "User initiated
shutdown received", which is a bit misleading for a mount failure (the tag
actually shown is "Metadata I/O Error (0x4)").  Would a different flag, or
just quietly setting the shutdown state, be preferable here?

 fs/xfs/xfs_icache.c | 14 ++++++++++++--
 fs/xfs/xfs_mount.c  | 15 +++++++++++----
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 2040a9292ee6..1f725804be17 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1940,10 +1940,20 @@ static int
 xfs_inodegc_inactivate(
 	struct xfs_inode	*ip)
 {
-	int			error;
+	int			error = 0;

 	trace_xfs_inode_inactivating(ip);
-	error = xfs_inactive(ip);
+
+	/*
+	 * If the filesystem has been shut down - for example a mount that failed
+	 * after background inactivation was enabled - do not inactivate the
+	 * inode.  Inactivation modifies the persistent metadata and its
+	 * transactions cannot complete on a shut down mount anyway, and the
+	 * subsystems it relies on (e.g. quota, mp->m_quotainfo) may not be set
+	 * up.  Just make the inode reclaimable so it can be freed.
+	 */
+	if (!xfs_is_shutdown(ip->i_mount))
+		error = xfs_inactive(ip);
 	xfs_inodegc_set_reclaimable(ip);
 	return error;

diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index b24195f570cd..37fb69165502 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1243,11 +1243,18 @@ xfs_mountfs(
 		xfs_irele(mp->m_metadirip);

 	/*
-	 * Inactivate all inodes that might still be in memory after a log
-	 * intent recovery failure so that reclaim can free them.  Metadata
-	 * inodes and the root directory shouldn't need inactivation, but the
-	 * mount failed for some reason, so pull down all the state and flee.
+	 * The mount has failed.  Mark the filesystem shut down so that any
+	 * inodes still queued for background inactivation are dropped straight
+	 * to reclaim instead of being inactivated: a failed mount must not write
+	 * to the (possibly corrupt, only partially set up) persistent metadata,
+	 * and parts of the mount it would need - e.g. the quota subsystem
+	 * (mp->m_quotainfo) - may never have been initialised.
+	 *
+	 * Flush the queue so that those inodes are pulled down and reclaim can
+	 * free them; with the fs shut down xfs_inodegc_inactivate() turns each
+	 * one reclaimable without touching the on-disk structures.
 	 */
+	xfs_force_shutdown(mp, SHUTDOWN_FORCE_UMOUNT);
 	xfs_inodegc_flush(mp);

 	/*
--
2.43.0

Re: [PATCH v2] xfs: do not inactivate inodes on a failed mount

Posted by Christoph Hellwig 2 days, 19 hours ago

On Tue, Jun 02, 2026 at 05:37:17PM +0300, Mikhail Lobanov wrote:
> XFS already encodes this rule: xfs_inode_needs_inactive() returns false
> when the mount is shut down ("If the log isn't running, push inodes
> straight to reclaim"), so an inode destroyed on a shut down mount is
> never queued for inactivation.  The gap is that this is only evaluated at
> queue time; an inode queued while the mount was still live is then
> inactivated by the worker even after the mount has been torn down.  Honour
> the same invariant at gc time: in xfs_inodegc_inactivate(), skip
> xfs_inactive() when the mount is shut down and just make the inode
> reclaimable (xfs_inodegc_set_reclaimable() already handles the shutdown
> case).  This is not a new policy, just consistency with the existing one.
> 
> Then, in the xfs_mountfs() failure path, shut the mount down before
> flushing the inodegc queue, so the queued inodes are dropped to reclaim
> instead of inactivated.

Doing a shutdown on failed mount is actually a really nice idea!  I
hadn't though of that before, but it makes a lot of sense.

> Note that shutting down alone is not enough to stop the crash:
> xfs_inactive() calls xfs_qm_dqattach() before any shutdown-sensitive
> transaction, and neither xfs_qm_need_dqattach() nor xfs_qm_dqattach()
> tests for shutdown - so the worker change is what actually closes it.

The Sashiko review points out a that skipping the entire inactive can
leak the dquot references when we get here due to a normal shutdown,
and I think it is right so we migþt still need to call into the quota
code in an else branch that checks if quotas actually were attached
and drop the reference very carefully.  It probably makes sense to
split all these shutdown in inactive handling into a separate prep
patch from shutting down in the mount failure path as well.

> Open question: in the failure path I used xfs_force_shutdown(mp,
> SHUTDOWN_FORCE_UMOUNT) to mark the fs down.  It logs "User initiated
> shutdown received", which is a bit misleading for a mount failure (the tag
> actually shown is "Metadata I/O Error (0x4)").  Would a different flag, or
> just quietly setting the shutdown state, be preferable here?

I think a different state makes sense.  It could be quite as we usually
have other messages for mount failures, but I dont think a message
really hurts either, so in doubt avoid the special casing.

> +	 * If the filesystem has been shut down - for example a mount that failed

Overly long line here.

> +	 * to reclaim instead of being inactivated: a failed mount must not write

.. and here