fs/xfs/xfs_log_cil.c | 34 +++++++++++++++++----------------- fs/xfs/xfs_log_priv.h | 2 +- 2 files changed, 18 insertions(+), 18 deletions(-)
xlog_cil_force() and xlog_cil_flush() both use the current CIL
checkpoint sequence to request an immediate push. They currently read
xc_current_sequence before xlog_cil_push_now() takes xc_push_lock.
The CIL push worker advances xc_current_sequence under xc_push_lock while
switching to a new checkpoint context. If a current-checkpoint force reads
an old sequence and then publishes it as xc_push_seq after the worker has
moved to the next context, xlog_cil_push_work() can treat the request as a
previously pushed sequence and skip it via the push_seq < ctx->sequence
check. That can leave the current dirty checkpoint unqueued by a
force/flush request.
Make xlog_cil_push_now() resolve push_seq == 0 to the current sequence
while holding xc_push_lock, and return the resolved sequence for tracing
and waiting. Route xlog_cil_force() and xlog_cil_flush() through that path
so the current-sequence snapshot and xc_push_seq publication are serialized
with the context switch.
Signed-off-by: Cen Zhang <zzzccc427@gmail.com>
---
fs/xfs/xfs_log_cil.c | 34 +++++++++++++++++-----------------
fs/xfs/xfs_log_priv.h | 2 +-
2 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index edc368938f30..6a1ced3c9314 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -1707,24 +1707,25 @@ xlog_cil_push_background(
* mechanism. Hence in this case we need to pass a flag to the push work to
* indicate it needs to flush the commit record itself.
*/
-static void
+static xfs_csn_t
xlog_cil_push_now(
struct xlog *log,
- xfs_lsn_t push_seq,
+ xfs_csn_t push_seq,
bool async)
{
struct xfs_cil *cil = log->l_cilp;
if (!cil)
- return;
-
- ASSERT(push_seq && push_seq <= cil->xc_current_sequence);
+ return 0;
/* start on any pending background push to minimise wait time on it */
if (!async)
flush_workqueue(cil->xc_push_wq);
spin_lock(&cil->xc_push_lock);
+ if (!push_seq)
+ push_seq = cil->xc_current_sequence;
+ ASSERT(push_seq && push_seq <= cil->xc_current_sequence);
/*
* If this is an async flush request, we always need to set the
@@ -1742,12 +1743,13 @@ xlog_cil_push_now(
if (test_bit(XLOG_CIL_EMPTY, &cil->xc_flags) ||
push_seq <= cil->xc_push_seq) {
spin_unlock(&cil->xc_push_lock);
- return;
+ return push_seq;
}
cil->xc_push_seq = push_seq;
queue_work(cil->xc_push_wq, &cil->xc_ctx->push_work);
spin_unlock(&cil->xc_push_lock);
+ return push_seq;
}
bool
@@ -1880,16 +1882,16 @@ void
xlog_cil_flush(
struct xlog *log)
{
- xfs_csn_t seq = log->l_cilp->xc_current_sequence;
+ struct xfs_cil *cil = log->l_cilp;
+ xfs_csn_t seq = xlog_cil_push_now(log, 0, true);
trace_xfs_log_force(log->l_mp, seq, _RET_IP_);
- xlog_cil_push_now(log, seq, true);
/*
* If the CIL is empty, make sure that any previous checkpoint that may
* still be in an active iclog is pushed to stable storage.
*/
- if (test_bit(XLOG_CIL_EMPTY, &log->l_cilp->xc_flags))
+ if (test_bit(XLOG_CIL_EMPTY, &cil->xc_flags))
xfs_log_force(log->l_mp, 0);
}
@@ -1911,12 +1913,7 @@ xlog_cil_force_seq(
struct xfs_cil *cil = log->l_cilp;
struct xfs_cil_ctx *ctx;
xfs_lsn_t commit_lsn = NULLCOMMITLSN;
-
- ASSERT(sequence <= cil->xc_current_sequence);
-
- if (!sequence)
- sequence = cil->xc_current_sequence;
- trace_xfs_log_force(log->l_mp, sequence, _RET_IP_);
+ bool traced = false;
/*
* check to see if we need to force out the current context.
@@ -1924,7 +1921,11 @@ xlog_cil_force_seq(
* so no need to deal with it here.
*/
restart:
- xlog_cil_push_now(log, sequence, false);
+ sequence = xlog_cil_push_now(log, sequence, false);
+ if (!traced) {
+ trace_xfs_log_force(log->l_mp, sequence, _RET_IP_);
+ traced = true;
+ }
/*
* See if we can find a previous sequence still committing.
@@ -2066,4 +2067,3 @@ xlog_cil_destroy(
destroy_workqueue(cil->xc_push_wq);
kfree(cil);
}
-
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index cf1e4ce61a8c..4ae98d3800ea 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -582,7 +582,7 @@ xfs_lsn_t xlog_cil_force_seq(struct xlog *log, xfs_csn_t sequence);
static inline void
xlog_cil_force(struct xlog *log)
{
- xlog_cil_force_seq(log, log->l_cilp->xc_current_sequence);
+ xlog_cil_force_seq(log, 0);
}
/*
On Wed, May 06, 2026 at 09:58:11AM +0800, Cen Zhang wrote: > xlog_cil_force() and xlog_cil_flush() both use the current CIL > checkpoint sequence to request an immediate push. They currently read > xc_current_sequence before xlog_cil_push_now() takes xc_push_lock. > > The CIL push worker advances xc_current_sequence under xc_push_lock while > switching to a new checkpoint context. If a current-checkpoint force reads > an old sequence and then publishes it as xc_push_seq after the worker has > moved to the next context, xlog_cil_push_work() can treat the request as a > previously pushed sequence and skip it via the push_seq < ctx->sequence > check. That can leave the current dirty checkpoint unqueued by a > force/flush request. Which does not matter at all. Integrity in XFS is defined by completion-to-submission semantics. IOWs, the only thing that a CIL -force- guarantees is that it will ensure the current CIL context containing fully completed transactions when it is called. See, for example, __xfs_trans_commit() -> xfs_log_force() for XFS_TRANS_SYNC transactions. So if xlog_cil_force() races with a push switching to a new sequence number, it means that all the transactions that were completed at the time xfs_log_force() was called are -already being pushed-. Hence all we need to do is wait for the CIL context at that sequence to be committed. The new sequence number, by definition, contains modifications that completed -after- the log force was started, and so are outside the scope of the current force operation. IOWs, we can safely skip the new sequence number in teh case of a push/force race, and the force will still correctly wait on the old sequence number being forced to disk. The race is benign, and not a bug. As for xlog_cil_flush(), it is a latency reduction mechanism and not a integrity mechanism. We just don't care about racing, because the caller will call it again if sufficient progress wasn't made by the previous call. Hence I don't think there is a bug that needs fixing here. Yes, the fact that deducing it is ok requires understanding competion-to-submission integrity semantics, but that's kinda assumed knowledge because it's exactly the same semantics as we relyon for data/metadata ordering and integrity w/ O_SYNC/O_DSYNC operations... > Make xlog_cil_push_now() resolve push_seq == 0 to the current sequence > while holding xc_push_lock, and return the resolved sequence for tracing > and waiting. Route xlog_cil_force() and xlog_cil_flush() through that path > so the current-sequence snapshot and xc_push_seq publication are serialized > with the context switch. So the change is likely correct, but somewhat nasty in places. Regardless, I'm inclined to reject it based on "if it isn't broken, don't fix it" principles. In future, when reporting problems and fixes like this, it is very important that you say how the issue was found, whether it can be reproduced, hardware configs, etc. Presenting multiple fixes in a short period for a wide variety of different conditions in extremely complex code without any context of how/when they occur or what impact they have on systems smells like AI slop more than anything.... So, please document what tools/AI are you using to find these problems and how they can be reproduced. I haven't commented on the other two "fixes" you've just posted because I haven't got hours to dig into them right now. You also posted a fix a month ago for a race condition that wasn't a race condition in buf log item handling, so something is clearly directing you at them.... -Dave. -- Dave Chinner dgc@kernel.org
Dear Chinner > > Which does not matter at all. > > Integrity in XFS is defined by completion-to-submission semantics. > IOWs, the only thing that a CIL -force- guarantees is that it will > ensure the current CIL context containing fully completed > transactions when it is called. > > See, for example, __xfs_trans_commit() -> xfs_log_force() for > XFS_TRANS_SYNC transactions. > > So if xlog_cil_force() races with a push switching to a new sequence > number, it means that all the transactions that were completed at > the time xfs_log_force() was called are -already being pushed-. > Hence all we need to do is wait for the CIL context at that sequence > to be committed. The new sequence number, by definition, contains > modifications that completed -after- the log force was started, and > so are outside the scope of the current force operation. > > IOWs, we can safely skip the new sequence number in teh case of a > push/force race, and the force will still correctly wait on the old > sequence number being forced to disk. The race is benign, and not a > bug. > > As for xlog_cil_flush(), it is a latency reduction mechanism and not > a integrity mechanism. We just don't care about racing, because the > caller will call it again if sufficient progress wasn't made by the > previous call. > > Hence I don't think there is a bug that needs fixing here. Yes, the > fact that deducing it is ok requires understanding > competion-to-submission integrity semantics, but that's kinda > assumed knowledge because it's exactly the same semantics as we > relyon for data/metadata ordering and integrity w/ O_SYNC/O_DSYNC > operations... > > > So the change is likely correct, but somewhat nasty in places. > Regardless, I'm inclined to reject it based on "if it isn't broken, > don't fix it" principles. > > In future, when reporting problems and fixes like this, it is very > important that you say how the issue was found, whether it can be > reproduced, hardware configs, etc. Presenting multiple fixes in a > short period for a wide variety of different conditions in extremely > complex code without any context of how/when they occur or what > impact they have on systems smells like AI slop more than > anything.... > > So, please document what tools/AI are you using to find these > problems and how they can be reproduced. I haven't commented on the > other two "fixes" you've just posted because I haven't got hours to > dig into them right now. You also posted a fix a month ago for a > race condition that wasn't a race condition in buf log item > handling, so something is clearly directing you at them.... > Thanks for the detailed explanation. You are right. My analysis of the XFS CIL checkpoint semantics was not careful enough, and I overstated the possible effect of this window. The report originally came from dynamic testing with a customized syzkaller setup, which observed an unsynchronised access pattern at runtime. I then tried to analyse whether that pattern could lead to a real correctness problem, with some AI assistance during the reasoning process. However, I did not validate that analysis sufficiently against the actual XFS CIL semantics, and that led to an incorrect conclusion on my side. I am sorry for the noise and for taking your time on this. In the past, when I sent more direct/raw reports from my testing, some of them were reasonably treated as bot-like reports. Since then I have been trying to better understand the reports before sending them, so that I can submit more useful and actionable issues or patches. Clearly I still got this wrong here. I will take your comments seriously and be more careful in future reports, especially when reasoning about subsystem-specific semantics such as XFS log/CIL behavior. Thanks again for the explanation, and sorry again for the noise. Best regards Cen
On Thu, May 07, 2026 at 12:34:23PM +0800, Cen Zhang wrote: > In the past, when I sent more direct/raw reports from my testing, some of > them were reasonably treated as bot-like reports. Since then I have been > trying to better understand the reports before sending them, so that I > can submit more useful and actionable issues or patches. Clearly I still > got this wrong here. The important thing is to explain how you found the issue, how it can be reproduced the impact of the bug being fixed, etc. Describing a race condition and it's fix purely in theoretical terms and omitting all other context makes it feel very "bot driven" as they tend to lack all context other than code analysis and the change itself... Think about how you'd describe the impact of the bug to someone, and how'd they'd diagnose the problem if it were occuring on their system. A good commit message should allow a user to identify the problem (and the fix) from it's contents... > I will take your comments seriously and be more careful in future > reports, especially when reasoning about subsystem-specific semantics > such as XFS log/CIL behavior. You don't need to get it 100% right before you post a fix - it's often much faster to post an RFC or ask a question and get immediate feedback than to try to be perfect on the first submission. > Thanks again for the explanation, and sorry again for the noise. It's not noise when people are listening and learning - that makes it time well spent IMO. :) -Dave, -- Dave Chinner dgc@kernel.org
© 2016 - 2026 Red Hat, Inc.