[PATCH v3 1/4] xfs: stop reclaim before pushing AIL during unmount

Yuto Ohnuki posted 4 patches 1 month ago
There is a newer version of this series
[PATCH v3 1/4] xfs: stop reclaim before pushing AIL during unmount
Posted by Yuto Ohnuki 1 month ago
The unmount sequence in xfs_unmount_flush_inodes() pushed the AIL while
background reclaim and inodegc are still running. This creates a race
where reclaim can free inodes and their log items while the AIL push is
still referencing them.

Reorder xfs_unmount_flush_inodes() to cancel background reclaim and stop
inodegc before pushing the AIL, so that background reclaim and inodegc
are no longer running while the AIL is pushed.

Reported-by: syzbot+652af2b3c5569c4ab63c@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=652af2b3c5569c4ab63c
Fixes: 90c60e164012 ("xfs: xfs_iflush() is no longer necessary")
Cc: <stable@vger.kernel.org> # v5.9
Signed-off-by: Yuto Ohnuki <ytohnuki@amazon.com>
---
 fs/xfs/xfs_mount.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 9c295abd0a0a..786e1fc720e5 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -621,9 +621,9 @@ xfs_unmount_flush_inodes(
 
 	xfs_set_unmounting(mp);
 
-	xfs_ail_push_all_sync(mp->m_ail);
-	xfs_inodegc_stop(mp);
 	cancel_delayed_work_sync(&mp->m_reclaim_work);
+	xfs_inodegc_stop(mp);
+	xfs_ail_push_all_sync(mp->m_ail);
 	xfs_reclaim_inodes(mp);
 	xfs_health_unmount(mp);
 	xfs_healthmon_unmount(mp);
-- 
2.50.1




Amazon Web Services EMEA SARL, 38 avenue John F. Kennedy, L-1855 Luxembourg, R.C.S. Luxembourg B186284

Amazon Web Services EMEA SARL, Irish Branch, One Burlington Plaza, Burlington Road, Dublin 4, Ireland, branch registration number 908705
Re: [PATCH v3 1/4] xfs: stop reclaim before pushing AIL during unmount
Posted by Darrick J. Wong 1 month ago
On Sun, Mar 08, 2026 at 06:28:06PM +0000, Yuto Ohnuki wrote:
> The unmount sequence in xfs_unmount_flush_inodes() pushed the AIL while
> background reclaim and inodegc are still running. This creates a race
> where reclaim can free inodes and their log items while the AIL push is
> still referencing them.

Is this a general race between background inode reclaim and AIL pushes?
Or is the race between an AIL push and the explicit call to
xfs_reclaim_inodes below?

I ask because there's a call to xfs_ail_push_all_sync from various
places in the codebase:

- Log covering/quiescing activities

- xchk_checkpoint_log in the online fsck code if the inode btree
  scrubber thinks it's racing with inode reclaim.

If inode reclaim happens to be running at the same time as these AIL
pushes, won't the same race condition manifest there?  But maybe you
meant the race is with the explicit xfs_reclaim_inodes below?

> Reorder xfs_unmount_flush_inodes() to cancel background reclaim and stop
> inodegc before pushing the AIL, so that background reclaim and inodegc
> are no longer running while the AIL is pushed.
> 
> Reported-by: syzbot+652af2b3c5569c4ab63c@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=652af2b3c5569c4ab63c
> Fixes: 90c60e164012 ("xfs: xfs_iflush() is no longer necessary")
> Cc: <stable@vger.kernel.org> # v5.9
> Signed-off-by: Yuto Ohnuki <ytohnuki@amazon.com>
> ---
>  fs/xfs/xfs_mount.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index 9c295abd0a0a..786e1fc720e5 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -621,9 +621,9 @@ xfs_unmount_flush_inodes(
>  
>  	xfs_set_unmounting(mp);
>  
> -	xfs_ail_push_all_sync(mp->m_ail);
> -	xfs_inodegc_stop(mp);
>  	cancel_delayed_work_sync(&mp->m_reclaim_work);
> +	xfs_inodegc_stop(mp);

xfs_inodegc_inactivate (aka the inodegc worker) can call
xfs_inodegc_set_reclaimable, which in turn calls xfs_reclaim_work_queue.
That will re-queue m_reclaim_work, which we just cancelled.  I think
inodegc_stop has to come before cancelling m_reclaim_work.

--D

> +	xfs_ail_push_all_sync(mp->m_ail);
>  	xfs_reclaim_inodes(mp);
>  	xfs_health_unmount(mp);
>  	xfs_healthmon_unmount(mp);
> -- 
> 2.50.1
> 
> 
> 
> 
> Amazon Web Services EMEA SARL, 38 avenue John F. Kennedy, L-1855 Luxembourg, R.C.S. Luxembourg B186284
> 
> Amazon Web Services EMEA SARL, Irish Branch, One Burlington Plaza, Burlington Road, Dublin 4, Ireland, branch registration number 908705
> 
> 
> 
>
Re: [PATCH v3 1/4] xfs: stop reclaim before pushing AIL during unmount
Posted by Yuto Ohnuki 4 weeks, 1 day ago
> Is this a general race between background inode reclaim and AIL pushes?
> Or is the race between an AIL push and the explicit call to
> xfs_reclaim_inodes below?
> 
> I ask because there's a call to xfs_ail_push_all_sync from various
> places in the codebase:
> 
> - Log covering/quiescing activities
> 
> - xchk_checkpoint_log in the online fsck code if the inode btree
>   scrubber thinks it's racing with inode reclaim.
> 
> If inode reclaim happens to be running at the same time as these AIL
> pushes, won't the same race condition manifest there?  But maybe you
> meant the race is with the explicit xfs_reclaim_inodes below?

The UAF itself is a general race between background inode reclaim (and
the dquot shrinker) and AIL pushes, not a race between the AIL push
and the explicit xfs_reclaim_inodes call below. The syzbot report
triggered it during shutdown because aborting dirty inodes makes them
reclaimable while still referenced by the AIL, but the unsafe
post-push dereferences fixed in patches 2/4 and 3/4 in v4 are not
shutdown-specific. Those patches address the general race by
capturing log item fields before push callbacks and saving the ailp
pointer before dropping the AIL lock.

This patch (patch 1/4) is a separate correctness fix for the unmount
path. As Dave analysed in his v1 review [1], the unmount sequence is
broken independently of the UAF - background reclaim and inodegc
should not be running while the AIL is being pushed during unmount.
This patch eliminates the conditions that make the general race
particularly likely to trigger during unmount.

[1] https://lore.kernel.org/all/aai66aCvGC66P8cN@dread/

> xfs_inodegc_inactivate (aka the inodegc worker) can call
> xfs_inodegc_set_reclaimable, which in turn calls xfs_reclaim_work_queue.
> That will re-queue m_reclaim_work, which we just cancelled.  I think
> inodegc_stop has to come before cancelling m_reclaim_work.
> 
> --D

Thank you for your valuable feedback.
Fixed in v4 - xfs_inodegc_stop is now called before
cancel_delayed_work_sync, and the function comment is updated to
reflect the new ordering.

Yuto



Amazon Web Services EMEA SARL, 38 avenue John F. Kennedy, L-1855 Luxembourg, R.C.S. Luxembourg B186284

Amazon Web Services EMEA SARL, Irish Branch, One Burlington Plaza, Burlington Road, Dublin 4, Ireland, branch registration number 908705