[PATCH] xfs: drain inodegc before quota teardown on mount failure

Zhang Cen posted 1 patch 4 weeks ago
There is a newer version of this series
[PATCH] xfs: drain inodegc before quota teardown on mount failure
Posted by Zhang Cen 4 weeks ago
xfs_mountfs() starts background inodegc before mount completion. If a
later setup step fails, the out_agresv unwind can release the quota
inodes via xfs_qm_unmount_quotas() while m_qflags still leave dqattach
enabled, and the common failure path can destroy mp->m_quotainfo via
xfs_qm_unmount() before the first common inodegc drain.

That leaves two teardown windows for an inodegc worker that resumes
xfs_inactive(): dqget can still need qi_*quotaip after
xfs_qm_unmount_quotas(), and the quota cache lookup still dereferences
mp->m_quotainfo until xfs_qm_unmount() finishes. Normal unmount and the
quotacheck abort path already drain inodegc before those teardown
stages.

Fix this by draining inodegc twice on the mount failure path: once on
out_agresv before dropping AG reservations and calling
xfs_qm_unmount_quotas(), and again on the common teardown path before
xfs_qm_unmount(). Keep xfs_unmount_flush_inodes() in place afterwards so
that reclaim queued by quota, realtime, root, or metadir teardown still
runs after xfs_qm_unmount() releases quota inodes on partially
initialized mounts.

The buggy scenario involves two paths, with each column showing the order
within that path:

path A label: mount failure unwind      path B label: inodegc worker
1. xfs_mountfs() starts inodegc and     1. Recovery or eviction leaves an
   later jumps into out_agresv.            ordinary inode queued for
                                           background inactivation.
2. out_agresv used to reach             2. xfs_inodegc_worker() runs
   xfs_qm_unmount_quotas(), which          xfs_inactive() on that inode.
   releases qi_*quotaip while           3. xfs_inactive() calls
   m_qflags still allow dqattach.          xfs_qm_dqattach().
3. The common unwind later reached      4. dqattach follows m_qflags into
   xfs_qm_unmount(), which purged          dqget, which may need the quota
   dquots and destroyed                    inodes and mp->m_quotainfo being
   mp->m_quotainfo.                        torn down by the unwind.
4. The first common inodegc drain
   used to happen only after quota
   teardown had started.

Sanitizer validation reported:
general protection fault
Call trace:
  xfs_qm_dqget_cache_lookup() (?:?)
  trace_function() (kernel/trace/trace.h:806)
  xfs_qm_dqget_inode() (?:?)
  ring_buffer_unlock_commit() (kernel/trace/ring_buffer.c:4306)
  xfs_qm_dqattach_locked() (?:?)
  srso_alias_return_thunk() (arch/x86/include/asm/nospec-branch.h:375)
  xfs_qm_dqattach() (?:?)
  xfs_free_eofblocks() (?:?)
  xfs_inactive() (?:?)
  xfs_inodegc_worker() (?:?)
  process_one_work() (kernel/workqueue.c:3200)
  worker_thread() (?:?)
  kthread() (?:?)
  _raw_spin_unlock_irq() (kernel/locking/spinlock.c:204)
  ret_from_fork() (?:?)
  __switch_to() (?:?)
  ret_from_fork_asm() (?:?)

Fixes: ab23a7768739 ("xfs: per-cpu deferred inode inactivation queues")

Signed-off-by: Zhang Cen <rollkingzzc@gmail.com>

---
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index b24195f570cd..e8a870482d1a 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1228,6 +1228,12 @@ xfs_mountfs(
 	return 0;
 
  out_agresv:
+	/*
+	 * Drain ordinary inodegc before we drop per-AG reservations or tear
+	 * down quota state.  Background inactivation can still attach dquots
+	 * here, and dqget can still need the quota inodes.
+	 */
+	xfs_inodegc_flush(mp);
 	xfs_fs_unreserve_ag_blocks(mp);
 	xfs_qm_unmount_quotas(mp);
 	if (xfs_has_zoned(mp))
@@ -1236,24 +1242,25 @@ xfs_mountfs(
 	xfs_rtunmount_inodes(mp);
  out_rele_rip:
 	xfs_irele(rip);
-	/* Clean out dquots that might be in memory after quotacheck. */
-	xfs_qm_unmount(mp);
  out_free_metadir:
 	if (mp->m_metadirip)
 		xfs_irele(mp->m_metadirip);
 
 	/*
-	 * Inactivate all inodes that might still be in memory after a log
-	 * intent recovery failure so that reclaim can free them.  Metadata
-	 * inodes and the root directory shouldn't need inactivation, but the
-	 * mount failed for some reason, so pull down all the state and flee.
+	 * Flush inodegc before we destroy quotainfo.  This is the first drain
+	 * for paths that bypass out_agresv, and it also catches any reclaim
+	 * queued by the quota, realtime, root, or metadir teardown above.
 	 */
 	xfs_inodegc_flush(mp);
 
+	/* Clean out dquots that might be in memory after quotacheck. */
+	xfs_qm_unmount(mp);
+
 	/*
 	 * Flush all inode reclamation work and flush the log.
 	 * We have to do this /after/ rtunmount and qm_unmount because those
-	 * two will have scheduled delayed reclaim for the rt/quota inodes.
+	 * will have scheduled delayed reclaim for the rt/quota/root/metadir
+	 * inodes.
 	 *
 	 * This is slightly different from the unmountfs call sequence
 	 * because we could be tearing down a partially set up mount.  In
Re: [PATCH] xfs: drain inodegc before quota teardown on mount failure
Posted by Christoph Hellwig 3 weeks, 3 days ago
On Fri, May 15, 2026 at 12:07:12AM +0800, Zhang Cen wrote:
> Sanitizer validation reported:

How did you run into this?  Normal xfstests under a new sanitizer?

> Fixes: ab23a7768739 ("xfs: per-cpu deferred inode inactivation queues")
> 
> Signed-off-by: Zhang Cen <rollkingzzc@gmail.com>

Nit: no empty line between the Fixes and the Signoff or other tags,
please.

> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index b24195f570cd..e8a870482d1a 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -1228,6 +1228,12 @@ xfs_mountfs(
>  	return 0;
>  
>   out_agresv:
> +	/*
> +	 * Drain ordinary inodegc before we drop per-AG reservations or tear
> +	 * down quota state.  Background inactivation can still attach dquots
> +	 * here, and dqget can still need the quota inodes.
> +	 */
> +	xfs_inodegc_flush(mp);

This looks good.

>   out_free_metadir:
>  	if (mp->m_metadirip)
>  		xfs_irele(mp->m_metadirip);
>  
>  	/*
> -	 * Inactivate all inodes that might still be in memory after a log
> -	 * intent recovery failure so that reclaim can free them.  Metadata
> -	 * inodes and the root directory shouldn't need inactivation, but the
> -	 * mount failed for some reason, so pull down all the state and flee.
> +	 * Flush inodegc before we destroy quotainfo.  This is the first drain
> +	 * for paths that bypass out_agresv, and it also catches any reclaim
> +	 * queued by the quota, realtime, root, or metadir teardown above.
>  	 */
>  	xfs_inodegc_flush(mp);
>  
> +	/* Clean out dquots that might be in memory after quotacheck. */
> +	xfs_qm_unmount(mp);

This is later in the teardown order than where I'd expect it. The
first thing that can bring in quotainodes is xfs_qm_newmount.  I'd
expect a new label placed above out_rtunmount to handle this.  Is
there a reason I'm missing why this won't work?
Re: [PATCH] xfs: drain inodegc before quota teardown on mount failure
Posted by Zhang Cen 2 weeks, 1 day ago
Hi Christoph,

Sorry, I missed this reply earlier and only noticed it now from the
public archive.

> How did you run into this?  Normal xfstests under a new sanitizer?

This was not from normal xfstests.  I reproduced it in a KASAN-enabled
QEMU guest with a disposable quota-enabled XFS image, then forced a late
mount failure after quota setup while tracing the inodegc and quota
teardown path.  The failing ordering was:

  xfs_qm_mount_quotas()
  late mount failure
  xfs_qm_unmount_quotas()
  xfs_qm_unmount() / xfs_qm_destroy_quotainfo()
  xfs_inodegc_flush()

with inodegc still able to reach the dqattach/dqget side during the
teardown window.

> This looks good.

Thanks.

> This is later in the teardown order than where I'd expect it. The
> first thing that can bring in quotainodes is xfs_qm_newmount.  I'd
> expect a new label placed above out_rtunmount to handle this.  Is
> there a reason I'm missing why this won't work?

No, I don't think you are missing anything.  A separate label above
out_rtunmount is the cleaner place for this.  I'll send a v2 that keeps
the out_agresv inodegc drain before xfs_qm_unmount_quotas(), adds an
out_qm_unmount label above out_rtunmount, and routes the later
post-quota failure paths through that label.

I'll also fix the trailer spacing in v2.

Thanks,
Zhang
Re: [PATCH] xfs: drain inodegc before quota teardown on mount failure
Posted by Christoph Hellwig 2 weeks, 1 day ago
On Thu, May 28, 2026 at 01:31:28PM +0800, Zhang Cen wrote:
> This was not from normal xfstests.  I reproduced it in a KASAN-enabled
> QEMU guest with a disposable quota-enabled XFS image, then forced a late
> mount failure after quota setup while tracing the inodegc and quota
> teardown path.  The failing ordering was:
> 
>   xfs_qm_mount_quotas()
>   late mount failure
>   xfs_qm_unmount_quotas()
>   xfs_qm_unmount() / xfs_qm_destroy_quotainfo()
>   xfs_inodegc_flush()
> 
> with inodegc still able to reach the dqattach/dqget side during the
> teardown window.

Can you come up with an xfstest for this?  I guess you'd need a new
error injection point, but that should not be horrible.
Re: [PATCH] xfs: drain inodegc before quota teardown on mount failure
Posted by Cen Zhang 2 weeks ago
Hi Christoph,

> Can you come up with an xfstest for this?  I guess you'd need a new
> error injection point, but that should not be horrible.

I tried to turn this into an xfstest.

In my local attempts, a mount-failure errortag after
xfs_qm_mount_quotas() was not enough by itself, because ordinary test
setup did not reliably leave an inodegc item that would enter the
dqattach/eofblocks path during the mount-failure unwind.

The draft I have locally uses two DEBUG-only errortags:

  - mount_fail_after_qm_mount_quotas

    This injects -EIO immediately after xfs_qm_mount_quotas().

  - force_inodegc_eofblocks

    This makes a regular inode released after quotacheck scanning go
    through inodegc/eofblocks cleanup, and delays that inodegc work so
    the mount-failure unwind can reach quota teardown first.

The xfstest creates one regular scratch file without quotas enabled,
unmounts, and then remounts with:

  -o uquota,errortag=force_inodegc_eofblocks,errortag=mount_fail_after_qm_mount_quotas

On a kernel with the errortags but without the inodegc drain fix, this
reproduces the same teardown ordering problem:

  KASAN: null-ptr-deref
  Workqueue: xfs-inodegc/vdc xfs_inodegc_worker
  xfs_qm_dqget_cache_lookup()
  xfs_qm_dqget_inode()
  xfs_qm_dqattach_locked()
  xfs_qm_dqattach()
  xfs_free_eofblocks()
  xfs_inactive()
  xfs_inodegc_worker()

With the drain fix applied, the same xfstest passes and the filesystem
can be mounted again cleanly afterwards.

I do not have much experience with XFS-specific fstests, so I wanted to
check whether this is the right shape before posting it formally.  If
this approach looks acceptable, I can send a small kernel series adding
the DEBUG errortags and the mount-failure fix, followed by a separate
xfstests patch.

Thanks,
Zhang Cen
Re: [PATCH] xfs: drain inodegc before quota teardown on mount failure
Posted by Christoph Hellwig 1 week, 3 days ago
On Thu, May 28, 2026 at 09:28:02PM +0800, Cen Zhang wrote:
> In my local attempts, a mount-failure errortag after
> xfs_qm_mount_quotas() was not enough by itself, because ordinary test
> setup did not reliably leave an inodegc item that would enter the
> dqattach/eofblocks path during the mount-failure unwind.
> 
> The draft I have locally uses two DEBUG-only errortags:
> 
>   - mount_fail_after_qm_mount_quotas
> 
>     This injects -EIO immediately after xfs_qm_mount_quotas().
> 
>   - force_inodegc_eofblocks
> 
>     This makes a regular inode released after quotacheck scanning go
>     through inodegc/eofblocks cleanup, and delays that inodegc work so
>     the mount-failure unwind can reach quota teardown first.

This sounds pretty useful, but I actually have a different
question:

>   Workqueue: xfs-inodegc/vdc xfs_inodegc_worker
>   xfs_qm_dqget_cache_lookup()
>   xfs_qm_dqget_inode()
>   xfs_qm_dqattach_locked()
>   xfs_qm_dqattach()
>   xfs_free_eofblocks()
>   xfs_inactive()
>   xfs_inodegc_worker()

What kind of inode do we actually get here?  I guess your test
case reproduces an unclean shutdown and this processes inodes
that otherwise would have been processed on unmount?

I'm kinda tempted to just make xfs_can_free_eofblocks return false
during early mount (or rather during mount failure), as trying
to fix all the corner cases here seems like an uphill battle (
also see the sashiko review)