xfs_mountfs() starts background inodegc before mount completion. If a
later setup step fails, the out_agresv unwind can release the quota
inodes via xfs_qm_unmount_quotas() while m_qflags still leave dqattach
enabled, and the common failure path can destroy mp->m_quotainfo via
xfs_qm_unmount() before the first common inodegc drain.
That leaves two teardown windows for an inodegc worker that resumes
xfs_inactive(): dqget can still need qi_*quotaip after
xfs_qm_unmount_quotas(), and the quota cache lookup still dereferences
mp->m_quotainfo until xfs_qm_unmount() finishes. Normal unmount and the
quotacheck abort path already drain inodegc before those teardown
stages.
Fix this by draining inodegc twice on the mount failure path: once on
out_agresv before dropping AG reservations and calling
xfs_qm_unmount_quotas(), and again on the common teardown path before
xfs_qm_unmount(). Keep xfs_unmount_flush_inodes() in place afterwards so
that reclaim queued by quota, realtime, root, or metadir teardown still
runs after xfs_qm_unmount() releases quota inodes on partially
initialized mounts.
The buggy scenario involves two paths, with each column showing the order
within that path:
path A label: mount failure unwind path B label: inodegc worker
1. xfs_mountfs() starts inodegc and 1. Recovery or eviction leaves an
later jumps into out_agresv. ordinary inode queued for
background inactivation.
2. out_agresv used to reach 2. xfs_inodegc_worker() runs
xfs_qm_unmount_quotas(), which xfs_inactive() on that inode.
releases qi_*quotaip while 3. xfs_inactive() calls
m_qflags still allow dqattach. xfs_qm_dqattach().
3. The common unwind later reached 4. dqattach follows m_qflags into
xfs_qm_unmount(), which purged dqget, which may need the quota
dquots and destroyed inodes and mp->m_quotainfo being
mp->m_quotainfo. torn down by the unwind.
4. The first common inodegc drain
used to happen only after quota
teardown had started.
Sanitizer validation reported:
general protection fault
Call trace:
xfs_qm_dqget_cache_lookup() (?:?)
trace_function() (kernel/trace/trace.h:806)
xfs_qm_dqget_inode() (?:?)
ring_buffer_unlock_commit() (kernel/trace/ring_buffer.c:4306)
xfs_qm_dqattach_locked() (?:?)
srso_alias_return_thunk() (arch/x86/include/asm/nospec-branch.h:375)
xfs_qm_dqattach() (?:?)
xfs_free_eofblocks() (?:?)
xfs_inactive() (?:?)
xfs_inodegc_worker() (?:?)
process_one_work() (kernel/workqueue.c:3200)
worker_thread() (?:?)
kthread() (?:?)
_raw_spin_unlock_irq() (kernel/locking/spinlock.c:204)
ret_from_fork() (?:?)
__switch_to() (?:?)
ret_from_fork_asm() (?:?)
Fixes: ab23a7768739 ("xfs: per-cpu deferred inode inactivation queues")
Signed-off-by: Zhang Cen <rollkingzzc@gmail.com>
---
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index b24195f570cd..e8a870482d1a 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1228,6 +1228,12 @@ xfs_mountfs(
return 0;
out_agresv:
+ /*
+ * Drain ordinary inodegc before we drop per-AG reservations or tear
+ * down quota state. Background inactivation can still attach dquots
+ * here, and dqget can still need the quota inodes.
+ */
+ xfs_inodegc_flush(mp);
xfs_fs_unreserve_ag_blocks(mp);
xfs_qm_unmount_quotas(mp);
if (xfs_has_zoned(mp))
@@ -1236,24 +1242,25 @@ xfs_mountfs(
xfs_rtunmount_inodes(mp);
out_rele_rip:
xfs_irele(rip);
- /* Clean out dquots that might be in memory after quotacheck. */
- xfs_qm_unmount(mp);
out_free_metadir:
if (mp->m_metadirip)
xfs_irele(mp->m_metadirip);
/*
- * Inactivate all inodes that might still be in memory after a log
- * intent recovery failure so that reclaim can free them. Metadata
- * inodes and the root directory shouldn't need inactivation, but the
- * mount failed for some reason, so pull down all the state and flee.
+ * Flush inodegc before we destroy quotainfo. This is the first drain
+ * for paths that bypass out_agresv, and it also catches any reclaim
+ * queued by the quota, realtime, root, or metadir teardown above.
*/
xfs_inodegc_flush(mp);
+ /* Clean out dquots that might be in memory after quotacheck. */
+ xfs_qm_unmount(mp);
+
/*
* Flush all inode reclamation work and flush the log.
* We have to do this /after/ rtunmount and qm_unmount because those
- * two will have scheduled delayed reclaim for the rt/quota inodes.
+ * will have scheduled delayed reclaim for the rt/quota/root/metadir
+ * inodes.
*
* This is slightly different from the unmountfs call sequence
* because we could be tearing down a partially set up mount. In
On Fri, May 15, 2026 at 12:07:12AM +0800, Zhang Cen wrote:
> Sanitizer validation reported:
How did you run into this? Normal xfstests under a new sanitizer?
> Fixes: ab23a7768739 ("xfs: per-cpu deferred inode inactivation queues")
>
> Signed-off-by: Zhang Cen <rollkingzzc@gmail.com>
Nit: no empty line between the Fixes and the Signoff or other tags,
please.
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index b24195f570cd..e8a870482d1a 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -1228,6 +1228,12 @@ xfs_mountfs(
> return 0;
>
> out_agresv:
> + /*
> + * Drain ordinary inodegc before we drop per-AG reservations or tear
> + * down quota state. Background inactivation can still attach dquots
> + * here, and dqget can still need the quota inodes.
> + */
> + xfs_inodegc_flush(mp);
This looks good.
> out_free_metadir:
> if (mp->m_metadirip)
> xfs_irele(mp->m_metadirip);
>
> /*
> - * Inactivate all inodes that might still be in memory after a log
> - * intent recovery failure so that reclaim can free them. Metadata
> - * inodes and the root directory shouldn't need inactivation, but the
> - * mount failed for some reason, so pull down all the state and flee.
> + * Flush inodegc before we destroy quotainfo. This is the first drain
> + * for paths that bypass out_agresv, and it also catches any reclaim
> + * queued by the quota, realtime, root, or metadir teardown above.
> */
> xfs_inodegc_flush(mp);
>
> + /* Clean out dquots that might be in memory after quotacheck. */
> + xfs_qm_unmount(mp);
This is later in the teardown order than where I'd expect it. The
first thing that can bring in quotainodes is xfs_qm_newmount. I'd
expect a new label placed above out_rtunmount to handle this. Is
there a reason I'm missing why this won't work?
Hi Christoph, Sorry, I missed this reply earlier and only noticed it now from the public archive. > How did you run into this? Normal xfstests under a new sanitizer? This was not from normal xfstests. I reproduced it in a KASAN-enabled QEMU guest with a disposable quota-enabled XFS image, then forced a late mount failure after quota setup while tracing the inodegc and quota teardown path. The failing ordering was: xfs_qm_mount_quotas() late mount failure xfs_qm_unmount_quotas() xfs_qm_unmount() / xfs_qm_destroy_quotainfo() xfs_inodegc_flush() with inodegc still able to reach the dqattach/dqget side during the teardown window. > This looks good. Thanks. > This is later in the teardown order than where I'd expect it. The > first thing that can bring in quotainodes is xfs_qm_newmount. I'd > expect a new label placed above out_rtunmount to handle this. Is > there a reason I'm missing why this won't work? No, I don't think you are missing anything. A separate label above out_rtunmount is the cleaner place for this. I'll send a v2 that keeps the out_agresv inodegc drain before xfs_qm_unmount_quotas(), adds an out_qm_unmount label above out_rtunmount, and routes the later post-quota failure paths through that label. I'll also fix the trailer spacing in v2. Thanks, Zhang
On Thu, May 28, 2026 at 01:31:28PM +0800, Zhang Cen wrote: > This was not from normal xfstests. I reproduced it in a KASAN-enabled > QEMU guest with a disposable quota-enabled XFS image, then forced a late > mount failure after quota setup while tracing the inodegc and quota > teardown path. The failing ordering was: > > xfs_qm_mount_quotas() > late mount failure > xfs_qm_unmount_quotas() > xfs_qm_unmount() / xfs_qm_destroy_quotainfo() > xfs_inodegc_flush() > > with inodegc still able to reach the dqattach/dqget side during the > teardown window. Can you come up with an xfstest for this? I guess you'd need a new error injection point, but that should not be horrible.
Hi Christoph,
> Can you come up with an xfstest for this? I guess you'd need a new
> error injection point, but that should not be horrible.
I tried to turn this into an xfstest.
In my local attempts, a mount-failure errortag after
xfs_qm_mount_quotas() was not enough by itself, because ordinary test
setup did not reliably leave an inodegc item that would enter the
dqattach/eofblocks path during the mount-failure unwind.
The draft I have locally uses two DEBUG-only errortags:
- mount_fail_after_qm_mount_quotas
This injects -EIO immediately after xfs_qm_mount_quotas().
- force_inodegc_eofblocks
This makes a regular inode released after quotacheck scanning go
through inodegc/eofblocks cleanup, and delays that inodegc work so
the mount-failure unwind can reach quota teardown first.
The xfstest creates one regular scratch file without quotas enabled,
unmounts, and then remounts with:
-o uquota,errortag=force_inodegc_eofblocks,errortag=mount_fail_after_qm_mount_quotas
On a kernel with the errortags but without the inodegc drain fix, this
reproduces the same teardown ordering problem:
KASAN: null-ptr-deref
Workqueue: xfs-inodegc/vdc xfs_inodegc_worker
xfs_qm_dqget_cache_lookup()
xfs_qm_dqget_inode()
xfs_qm_dqattach_locked()
xfs_qm_dqattach()
xfs_free_eofblocks()
xfs_inactive()
xfs_inodegc_worker()
With the drain fix applied, the same xfstest passes and the filesystem
can be mounted again cleanly afterwards.
I do not have much experience with XFS-specific fstests, so I wanted to
check whether this is the right shape before posting it formally. If
this approach looks acceptable, I can send a small kernel series adding
the DEBUG errortags and the mount-failure fix, followed by a separate
xfstests patch.
Thanks,
Zhang Cen
On Thu, May 28, 2026 at 09:28:02PM +0800, Cen Zhang wrote: > In my local attempts, a mount-failure errortag after > xfs_qm_mount_quotas() was not enough by itself, because ordinary test > setup did not reliably leave an inodegc item that would enter the > dqattach/eofblocks path during the mount-failure unwind. > > The draft I have locally uses two DEBUG-only errortags: > > - mount_fail_after_qm_mount_quotas > > This injects -EIO immediately after xfs_qm_mount_quotas(). > > - force_inodegc_eofblocks > > This makes a regular inode released after quotacheck scanning go > through inodegc/eofblocks cleanup, and delays that inodegc work so > the mount-failure unwind can reach quota teardown first. This sounds pretty useful, but I actually have a different question: > Workqueue: xfs-inodegc/vdc xfs_inodegc_worker > xfs_qm_dqget_cache_lookup() > xfs_qm_dqget_inode() > xfs_qm_dqattach_locked() > xfs_qm_dqattach() > xfs_free_eofblocks() > xfs_inactive() > xfs_inodegc_worker() What kind of inode do we actually get here? I guess your test case reproduces an unclean shutdown and this processes inodes that otherwise would have been processed on unmount? I'm kinda tempted to just make xfs_can_free_eofblocks return false during early mount (or rather during mount failure), as trying to fix all the corner cases here seems like an uphill battle ( also see the sashiko review)
© 2016 - 2026 Red Hat, Inc.