This helper is mandatory to be called before bdrv_co_flush(). In the
other case bdrv_co_flush() will be noop. This helper should be called
after actual write is completed for subsequent flush to perform some
work.
Actually this change is important, without it BDRV_REQ_FUA semantics is
broken completely. flush() is not called. This smells like potential
data loss if somebody relies on BDRV_REQ_FUA.
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Hanna Reitz <hreitz@redhat.com>
---
block/io.c | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/block/io.c b/block/io.c
index f39ff862c11..820b41fab12 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1163,8 +1163,11 @@ bdrv_driver_pwritev(BlockDriverState *bs, int64_t offset, int64_t bytes,
ret = drv->bdrv_co_writev(bs, sector_num, nb_sectors, qiov, flags);
emulate_flags:
- if (ret == 0 && emulate_fua) {
- ret = bdrv_co_flush(bs);
+ if (ret == 0) {
+ bdrv_advance_flush_gen(bs);
+ if (emulate_fua) {
+ ret = bdrv_co_flush(bs);
+ }
}
if (qiov == &local_qiov) {
@@ -2077,8 +2080,11 @@ bdrv_co_do_pwrite_zeroes(BlockDriverState *bs, int64_t offset, int64_t bytes,
}
fail:
- if (ret == 0 && need_flush) {
- ret = bdrv_co_flush(bs);
+ if (ret == 0) {
+ bdrv_advance_flush_gen(bs);
+ if (need_flush) {
+ ret = bdrv_co_flush(bs);
+ }
}
qemu_vfree(buf);
return ret;
@@ -2147,8 +2153,6 @@ bdrv_co_write_req_finish(BdrvChild *child, int64_t offset, int64_t bytes,
bdrv_check_request(offset, bytes, &error_abort);
- bdrv_advance_flush_gen(bs);
-
/*
* Discard cannot extend the image, but in error handling cases, such as
* when reverting a qcow2 cluster allocation, the discarded range can pass
@@ -3720,6 +3724,7 @@ int coroutine_fn bdrv_co_pdiscard(BdrvChild *child, int64_t offset,
}
ret = 0;
out:
+ bdrv_advance_flush_gen(bs);
bdrv_co_write_req_finish(child, req.offset, req.bytes, &req, ret);
tracked_request_end(&req);
bdrv_dec_in_flight(bs);
@@ -3994,6 +4000,7 @@ static int coroutine_fn GRAPH_RDLOCK bdrv_co_copy_range_internal(
bytes,
read_flags, write_flags);
}
+ bdrv_advance_flush_gen(dst->bs);
bdrv_co_write_req_finish(dst, dst_offset, bytes, &req, ret);
tracked_request_end(&req);
bdrv_dec_in_flight(dst->bs);
@@ -4187,6 +4194,7 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
} else {
offset = bs->total_sectors * BDRV_SECTOR_SIZE;
}
+ bdrv_advance_flush_gen(bs);
/*
* It's possible that truncation succeeded but bdrv_refresh_total_sectors
* failed, but the latter doesn't affect how we should finish the request.
--
2.43.5
On 29.12.25 17:07, Denis V. Lunev wrote: > This helper is mandatory to be called before bdrv_co_flush(). In the > other case bdrv_co_flush() will be noop. This helper should be called > after actual write is completed for subsequent flush to perform some > work. > > Actually this change is important, without it BDRV_REQ_FUA semantics is > broken completely. flush() is not called. This smells like potential > data loss if somebody relies on BDRV_REQ_FUA. > > Signed-off-by: Denis V. Lunev <den@openvz.org> > CC: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> > CC: Kevin Wolf <kwolf@redhat.com> > CC: Hanna Reitz <hreitz@redhat.com> > --- > block/io.c | 20 ++++++++++++++------ > 1 file changed, 14 insertions(+), 6 deletions(-) Definitely an improvement, but if possible, I would still prefer keeping the increment it in bdrv_co_write_req_finish(), because that is where all write requests eventually end up in, and that’s also where dirty bitmaps are handled. It seems slightly error-prone to manually have to call this every time the BDS data is modified. Can we maybe instead pull the FUA emulation out into bdrv_aligned_pwritev(), after bdrv_co_write_req_finish()? I think we can only pull out the flush, not the FUA emulation detection because for write-zeroes, we have to check either supported_zero_flags or supported_write_flags, depending on whether we fall back or not. But bdrv_driver_pwritev() and bdrv_co_do_pwrite_zeroes() could return an need_emulate_fua flag via pointer parameter, and if set, bdrv_aligned_pwritev() can then do the flush after bdrv_co_write_req_finish(). (This would also allow dropping both “only flush once at the end” blocks.) Maybe it would then also make sense to move the bdrv_co_flush() into bdrv_co_write_req_finish(), and add a need_flush parameter to it, so that it can see whether the whole request (write + flush) was successful. Hanna
On 1/26/26 16:07, Hanna Czenczek wrote:
> On 29.12.25 17:07, Denis V. Lunev wrote:
>> This helper is mandatory to be called before bdrv_co_flush(). In the
>> other case bdrv_co_flush() will be noop. This helper should be called
>> after actual write is completed for subsequent flush to perform some
>> work.
>>
>> Actually this change is important, without it BDRV_REQ_FUA semantics is
>> broken completely. flush() is not called. This smells like potential
>> data loss if somebody relies on BDRV_REQ_FUA.
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Hanna Reitz <hreitz@redhat.com>
>> ---
>> block/io.c | 20 ++++++++++++++------
>> 1 file changed, 14 insertions(+), 6 deletions(-)
> Definitely an improvement, but if possible, I would still prefer keeping
> the increment it in bdrv_co_write_req_finish(), because that is where
> all write requests eventually end up in, and that’s also where dirty
> bitmaps are handled. It seems slightly error-prone to manually have to
> call this every time the BDS data is modified.
>
> Can we maybe instead pull the FUA emulation out into
> bdrv_aligned_pwritev(), after bdrv_co_write_req_finish()?
>
> I think we can only pull out the flush, not the FUA emulation detection
> because for write-zeroes, we have to check either supported_zero_flags
> or supported_write_flags, depending on whether we fall back or not. But
> bdrv_driver_pwritev() and bdrv_co_do_pwrite_zeroes() could return an
> need_emulate_fua flag via pointer parameter, and if set,
> bdrv_aligned_pwritev() can then do the flush after
> bdrv_co_write_req_finish(). (This would also allow dropping both “only
> flush once at the end” blocks.)
>
> Maybe it would then also make sense to move the bdrv_co_flush() into
> bdrv_co_write_req_finish(), and add a need_flush parameter to it, so
> that it can see whether the whole request (write + flush) was successful.
>
> Hanna
>
I have tried to think about this problem in a this and that way
and found yet another missed place in the code. This has
frustrated me even more.
For the reference:
raw_co_prw(BlockDriverState *bs, int64_t *offset_ptr, uint64_t bytes,
QEMUIOVector *qiov, int type, int flags)
{
...
assert(qiov->size == bytes);
ret = raw_thread_pool_submit(handle_aiocb_rw, &acb);
if (ret == 0 && (flags & BDRV_REQ_FUA)) {
/* TODO Use pwritev2() instead if it's available */
ret = bdrv_co_flush(bs);
}
goto out; /* Avoid the compiler err of unused label */
This definitely states that the code is cumbersome and you
right that I need to try harder to move FUA emulation out
into generic code.
I'll invest more here.
Thank you for the point,
Den
© 2016 - 2026 Red Hat, Inc.