block/mirror: check range when setting zero bitmap for sync write

[PATCH] block/mirror: check range when setting zero bitmap for sync write

Posted by Fiona Ebner 3 weeks, 6 days ago

Some Proxmox users reported an occasional assertion failure [0][1] in
busy VMs when using drive mirror with active mode. In particular, the
failure may occur for zero writes shorter than the job granularity:

> #0  0x00007b421154b507 in abort ()
> #1  0x00007b421154b420 in ?? ()
> #2  0x0000641c582e061f in bitmap_set (map=0x7b4204014e00, start=14, nr=-1)
> #3  0x0000641c58062824 in do_sync_target_write (job=0x641c7e73d1e0,
>       method=MIRROR_METHOD_ZERO, offset=852480, bytes=4096, qiov=0x0, flags=0)
> #4  0x0000641c58062250 in bdrv_mirror_top_do_write (bs=0x641c7e62e1f0,
        method=MIRROR_METHOD_ZERO, copy_to_target=true, offset=852480,
        bytes=4096, qiov=0x0, flags=0)
> #5  0x0000641c58061f31 in bdrv_mirror_top_pwrite_zeroes (bs=0x641c7e62e1f0,
        offset=852480, bytes=4096, flags=0)

The range for the dirty bitmap described by dirty_bitmap_offset and
dirty_bitmap_end is narrower than the original range and in fact,
dirty_bitmap_end might be smaller than dirty_bitmap_offset. There
already is a check for 'dirty_bitmap_offset < dirty_bitmap_end' before
resetting the dirty bitmap. Add such a check for setting the zero
bitmap too, which uses the same narrower range.

[0]: https://forum.proxmox.com/threads/177981/
[1]: https://bugzilla.proxmox.com/show_bug.cgi?id=7222

Cc: qemu-stable@nongnu.org
Fixes: 7e277545b9 ("mirror: Skip writing zeroes when target is already zero")
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 block/mirror.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index b344182c74..bc982cb99a 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1514,9 +1514,12 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
         assert(!qiov);
         ret = blk_co_pwrite_zeroes(job->target, offset, bytes, flags);
         if (job->zero_bitmap && ret >= 0) {
-            bitmap_set(job->zero_bitmap, dirty_bitmap_offset / job->granularity,
-                       (dirty_bitmap_end - dirty_bitmap_offset) /
-                       job->granularity);
+            if (dirty_bitmap_offset < dirty_bitmap_end) {
+                bitmap_set(job->zero_bitmap,
+                           dirty_bitmap_offset / job->granularity,
+                           (dirty_bitmap_end - dirty_bitmap_offset) /
+                           job->granularity);
+            }
         }
         break;
 
-- 
2.47.3

Re: [PATCH] block/mirror: check range when setting zero bitmap for sync write

Posted by Stefan Hajnoczi 2 weeks, 6 days ago

On Mon, Jan 12, 2026 at 04:23:51PM +0100, Fiona Ebner wrote:
> Some Proxmox users reported an occasional assertion failure [0][1] in
> busy VMs when using drive mirror with active mode. In particular, the
> failure may occur for zero writes shorter than the job granularity:
> 
> > #0  0x00007b421154b507 in abort ()
> > #1  0x00007b421154b420 in ?? ()
> > #2  0x0000641c582e061f in bitmap_set (map=0x7b4204014e00, start=14, nr=-1)
> > #3  0x0000641c58062824 in do_sync_target_write (job=0x641c7e73d1e0,
> >       method=MIRROR_METHOD_ZERO, offset=852480, bytes=4096, qiov=0x0, flags=0)
> > #4  0x0000641c58062250 in bdrv_mirror_top_do_write (bs=0x641c7e62e1f0,
>         method=MIRROR_METHOD_ZERO, copy_to_target=true, offset=852480,
>         bytes=4096, qiov=0x0, flags=0)
> > #5  0x0000641c58061f31 in bdrv_mirror_top_pwrite_zeroes (bs=0x641c7e62e1f0,
>         offset=852480, bytes=4096, flags=0)
> 
> The range for the dirty bitmap described by dirty_bitmap_offset and
> dirty_bitmap_end is narrower than the original range and in fact,
> dirty_bitmap_end might be smaller than dirty_bitmap_offset. There
> already is a check for 'dirty_bitmap_offset < dirty_bitmap_end' before
> resetting the dirty bitmap. Add such a check for setting the zero
> bitmap too, which uses the same narrower range.
> 
> [0]: https://forum.proxmox.com/threads/177981/
> [1]: https://bugzilla.proxmox.com/show_bug.cgi?id=7222
> 
> Cc: qemu-stable@nongnu.org
> Fixes: 7e277545b9 ("mirror: Skip writing zeroes when target is already zero")
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
>  block/mirror.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index b344182c74..bc982cb99a 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1514,9 +1514,12 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
>          assert(!qiov);
>          ret = blk_co_pwrite_zeroes(job->target, offset, bytes, flags);
>          if (job->zero_bitmap && ret >= 0) {
> -            bitmap_set(job->zero_bitmap, dirty_bitmap_offset / job->granularity,
> -                       (dirty_bitmap_end - dirty_bitmap_offset) /
> -                       job->granularity);
> +            if (dirty_bitmap_offset < dirty_bitmap_end) {
> +                bitmap_set(job->zero_bitmap,
> +                           dirty_bitmap_offset / job->granularity,
> +                           (dirty_bitmap_end - dirty_bitmap_offset) /
> +                           job->granularity);
> +            }

Why does this case clause use dirty_bitmap_offset and dirty_bitmap_end
instead of zero_bitmap_offset and zero_bitmap_end like the other
zero_bitmap operations in this switch statement?

   if (job->zero_bitmap && ret >= 0) {
-      bitmap_set(job->zero_bitmap, dirty_bitmap_offset / job->granularity,
-                 (dirty_bitmap_end - dirty_bitmap_offset) /
-                 job->granularity);
+      bitmap_set(job->zero_bitmap, zero_bitmap_offset,
+                 zero_bitmap_end - zero_bitmap_offset);
   }

I'm probably missing something, but it's not obvious to me :).

Stefan

Re: [PATCH] block/mirror: check range when setting zero bitmap for sync write

Posted by Fiona Ebner 2 weeks, 5 days ago

Am 19.01.26 um 9:10 PM schrieb Stefan Hajnoczi:
> On Mon, Jan 12, 2026 at 04:23:51PM +0100, Fiona Ebner wrote:
>> Some Proxmox users reported an occasional assertion failure [0][1] in
>> busy VMs when using drive mirror with active mode. In particular, the
>> failure may occur for zero writes shorter than the job granularity:
>>
>>> #0  0x00007b421154b507 in abort ()
>>> #1  0x00007b421154b420 in ?? ()
>>> #2  0x0000641c582e061f in bitmap_set (map=0x7b4204014e00, start=14, nr=-1)
>>> #3  0x0000641c58062824 in do_sync_target_write (job=0x641c7e73d1e0,
>>>       method=MIRROR_METHOD_ZERO, offset=852480, bytes=4096, qiov=0x0, flags=0)
>>> #4  0x0000641c58062250 in bdrv_mirror_top_do_write (bs=0x641c7e62e1f0,
>>         method=MIRROR_METHOD_ZERO, copy_to_target=true, offset=852480,
>>         bytes=4096, qiov=0x0, flags=0)
>>> #5  0x0000641c58061f31 in bdrv_mirror_top_pwrite_zeroes (bs=0x641c7e62e1f0,
>>         offset=852480, bytes=4096, flags=0)
>>
>> The range for the dirty bitmap described by dirty_bitmap_offset and
>> dirty_bitmap_end is narrower than the original range and in fact,
>> dirty_bitmap_end might be smaller than dirty_bitmap_offset. There
>> already is a check for 'dirty_bitmap_offset < dirty_bitmap_end' before
>> resetting the dirty bitmap. Add such a check for setting the zero
>> bitmap too, which uses the same narrower range.
>>
>> [0]: https://forum.proxmox.com/threads/177981/
>> [1]: https://bugzilla.proxmox.com/show_bug.cgi?id=7222
>>
>> Cc: qemu-stable@nongnu.org
>> Fixes: 7e277545b9 ("mirror: Skip writing zeroes when target is already zero")
>> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
>> ---
>>  block/mirror.c | 9 ++++++---
>>  1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/mirror.c b/block/mirror.c
>> index b344182c74..bc982cb99a 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -1514,9 +1514,12 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
>>          assert(!qiov);
>>          ret = blk_co_pwrite_zeroes(job->target, offset, bytes, flags);
>>          if (job->zero_bitmap && ret >= 0) {
>> -            bitmap_set(job->zero_bitmap, dirty_bitmap_offset / job->granularity,
>> -                       (dirty_bitmap_end - dirty_bitmap_offset) /
>> -                       job->granularity);
>> +            if (dirty_bitmap_offset < dirty_bitmap_end) {
>> +                bitmap_set(job->zero_bitmap,
>> +                           dirty_bitmap_offset / job->granularity,
>> +                           (dirty_bitmap_end - dirty_bitmap_offset) /
>> +                           job->granularity);
>> +            }
> 
> Why does this case clause use dirty_bitmap_offset and dirty_bitmap_end
> instead of zero_bitmap_offset and zero_bitmap_end like the other
> zero_bitmap operations in this switch statement?
> 
>    if (job->zero_bitmap && ret >= 0) {
> -      bitmap_set(job->zero_bitmap, dirty_bitmap_offset / job->granularity,
> -                 (dirty_bitmap_end - dirty_bitmap_offset) /
> -                 job->granularity);
> +      bitmap_set(job->zero_bitmap, zero_bitmap_offset,
> +                 zero_bitmap_end - zero_bitmap_offset);
>    }
> 
> I'm probably missing something, but it's not obvious to me :).

Sorry, I could've mentioned this in the commit message. There is a
comment explaining it further up in the function:

>     /*
>      * Tails are either clean or shrunk, so for dirty bitmap resetting
>      * we safely align the range narrower.  But for zero bitmap, round
>      * range wider for checking or clearing, and narrower for setting.
>      */
>     dirty_bitmap_offset = QEMU_ALIGN_UP(offset, job->granularity);
>     dirty_bitmap_end = QEMU_ALIGN_DOWN(offset + bytes, job->granularity);
>     if (dirty_bitmap_offset < dirty_bitmap_end) {
>         bdrv_reset_dirty_bitmap(job->dirty_bitmap, dirty_bitmap_offset,
>                                 dirty_bitmap_end - dirty_bitmap_offset);
>     }
>     zero_bitmap_offset = offset / job->granularity;
>     zero_bitmap_end = DIV_ROUND_UP(offset + bytes, job->granularity);

Re: [PATCH] block/mirror: check range when setting zero bitmap for sync write

Posted by Stefan Hajnoczi 2 weeks, 4 days ago

On Tue, Jan 20, 2026 at 09:57:42AM +0100, Fiona Ebner wrote:
> Am 19.01.26 um 9:10 PM schrieb Stefan Hajnoczi:
> > On Mon, Jan 12, 2026 at 04:23:51PM +0100, Fiona Ebner wrote:
> >> Some Proxmox users reported an occasional assertion failure [0][1] in
> >> busy VMs when using drive mirror with active mode. In particular, the
> >> failure may occur for zero writes shorter than the job granularity:
> >>
> >>> #0  0x00007b421154b507 in abort ()
> >>> #1  0x00007b421154b420 in ?? ()
> >>> #2  0x0000641c582e061f in bitmap_set (map=0x7b4204014e00, start=14, nr=-1)
> >>> #3  0x0000641c58062824 in do_sync_target_write (job=0x641c7e73d1e0,
> >>>       method=MIRROR_METHOD_ZERO, offset=852480, bytes=4096, qiov=0x0, flags=0)
> >>> #4  0x0000641c58062250 in bdrv_mirror_top_do_write (bs=0x641c7e62e1f0,
> >>         method=MIRROR_METHOD_ZERO, copy_to_target=true, offset=852480,
> >>         bytes=4096, qiov=0x0, flags=0)
> >>> #5  0x0000641c58061f31 in bdrv_mirror_top_pwrite_zeroes (bs=0x641c7e62e1f0,
> >>         offset=852480, bytes=4096, flags=0)
> >>
> >> The range for the dirty bitmap described by dirty_bitmap_offset and
> >> dirty_bitmap_end is narrower than the original range and in fact,
> >> dirty_bitmap_end might be smaller than dirty_bitmap_offset. There
> >> already is a check for 'dirty_bitmap_offset < dirty_bitmap_end' before
> >> resetting the dirty bitmap. Add such a check for setting the zero
> >> bitmap too, which uses the same narrower range.
> >>
> >> [0]: https://forum.proxmox.com/threads/177981/
> >> [1]: https://bugzilla.proxmox.com/show_bug.cgi?id=7222
> >>
> >> Cc: qemu-stable@nongnu.org
> >> Fixes: 7e277545b9 ("mirror: Skip writing zeroes when target is already zero")
> >> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> >> ---
> >>  block/mirror.c | 9 ++++++---
> >>  1 file changed, 6 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/block/mirror.c b/block/mirror.c
> >> index b344182c74..bc982cb99a 100644
> >> --- a/block/mirror.c
> >> +++ b/block/mirror.c
> >> @@ -1514,9 +1514,12 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
> >>          assert(!qiov);
> >>          ret = blk_co_pwrite_zeroes(job->target, offset, bytes, flags);
> >>          if (job->zero_bitmap && ret >= 0) {
> >> -            bitmap_set(job->zero_bitmap, dirty_bitmap_offset / job->granularity,
> >> -                       (dirty_bitmap_end - dirty_bitmap_offset) /
> >> -                       job->granularity);
> >> +            if (dirty_bitmap_offset < dirty_bitmap_end) {
> >> +                bitmap_set(job->zero_bitmap,
> >> +                           dirty_bitmap_offset / job->granularity,
> >> +                           (dirty_bitmap_end - dirty_bitmap_offset) /
> >> +                           job->granularity);
> >> +            }
> > 
> > Why does this case clause use dirty_bitmap_offset and dirty_bitmap_end
> > instead of zero_bitmap_offset and zero_bitmap_end like the other
> > zero_bitmap operations in this switch statement?
> > 
> >    if (job->zero_bitmap && ret >= 0) {
> > -      bitmap_set(job->zero_bitmap, dirty_bitmap_offset / job->granularity,
> > -                 (dirty_bitmap_end - dirty_bitmap_offset) /
> > -                 job->granularity);
> > +      bitmap_set(job->zero_bitmap, zero_bitmap_offset,
> > +                 zero_bitmap_end - zero_bitmap_offset);
> >    }
> > 
> > I'm probably missing something, but it's not obvious to me :).
> 
> Sorry, I could've mentioned this in the commit message. There is a
> comment explaining it further up in the function:
> 
> >     /*
> >      * Tails are either clean or shrunk, so for dirty bitmap resetting
> >      * we safely align the range narrower.  But for zero bitmap, round
> >      * range wider for checking or clearing, and narrower for setting.
> >      */

Thanks!

Stefan

Re: [PATCH] block/mirror: check range when setting zero bitmap for sync write

Posted by Vladimir Sementsov-Ogievskiy 2 weeks, 6 days ago

On 12.01.26 18:23, Fiona Ebner wrote:
> Some Proxmox users reported an occasional assertion failure [0][1] in
> busy VMs when using drive mirror with active mode. In particular, the
> failure may occur for zero writes shorter than the job granularity:
> 
>> #0  0x00007b421154b507 in abort ()
>> #1  0x00007b421154b420 in ?? ()
>> #2  0x0000641c582e061f in bitmap_set (map=0x7b4204014e00, start=14, nr=-1)
>> #3  0x0000641c58062824 in do_sync_target_write (job=0x641c7e73d1e0,
>>        method=MIRROR_METHOD_ZERO, offset=852480, bytes=4096, qiov=0x0, flags=0)
>> #4  0x0000641c58062250 in bdrv_mirror_top_do_write (bs=0x641c7e62e1f0,
>          method=MIRROR_METHOD_ZERO, copy_to_target=true, offset=852480,
>          bytes=4096, qiov=0x0, flags=0)
>> #5  0x0000641c58061f31 in bdrv_mirror_top_pwrite_zeroes (bs=0x641c7e62e1f0,
>          offset=852480, bytes=4096, flags=0)
> 
> The range for the dirty bitmap described by dirty_bitmap_offset and
> dirty_bitmap_end is narrower than the original range and in fact,
> dirty_bitmap_end might be smaller than dirty_bitmap_offset. There
> already is a check for 'dirty_bitmap_offset < dirty_bitmap_end' before
> resetting the dirty bitmap. Add such a check for setting the zero
> bitmap too, which uses the same narrower range.
> 
> [0]: https://forum.proxmox.com/threads/177981/
> [1]: https://bugzilla.proxmox.com/show_bug.cgi?id=7222
> 
> Cc: qemu-stable@nongnu.org
> Fixes: 7e277545b9 ("mirror: Skip writing zeroes when target is already zero")
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

Thanks for fixing! Adding a test would be great still.

Would be good if someone take it into larger PULL request, but if not I'll do.
Queued in my block branch.

-- 
Best regards,
Vladimir

[PATCH] iotests: test active mirror with unaligned, small write zeroes op

Posted by Fiona Ebner 2 weeks, 5 days ago

This tests the scenario fixed by "block/mirror: check range
when setting zero bitmap for sync write" [0].

[0] https://lore.kernel.org/qemu-devel/20260112152544.261923-1-f.ebner@proxmox.com/

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 tests/qemu-iotests/151     | 20 ++++++++++++++++++++
 tests/qemu-iotests/151.out |  4 ++--
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/151 b/tests/qemu-iotests/151
index 06ee3585db..9b9c815db5 100755
--- a/tests/qemu-iotests/151
+++ b/tests/qemu-iotests/151
@@ -191,6 +191,26 @@ class TestActiveMirror(iotests.QMPTestCase):
 
         self.potential_writes_in_flight = False
 
+    def testUnalignedSmallerThanGranularityWriteZeroes(self):
+        # Fill the source image
+        self.vm.hmp_qemu_io('source', 'write -P 1 0 %i' % self.image_len);
+
+        # Start the block job
+        self.vm.cmd('blockdev-mirror',
+                    job_id='mirror',
+                    filter_node_name='mirror-node',
+                    device='source-node',
+                    target='target-node',
+                    sync='full',
+                    copy_mode='write-blocking')
+
+        # Wait for the READY event
+        self.wait_ready(drive='mirror')
+
+        for offset in range(6 * self.image_len // 8, 7 * self.image_len // 8, 1024 * 1024):
+            self.vm.hmp_qemu_io('source', 'aio_write -z %i 512' % (offset + 512))
+
+        self.complete_and_wait(drive='mirror', wait_ready=False)
 
 class TestThrottledWithNbdExportBase(iotests.QMPTestCase):
     image_len = 128 * 1024 * 1024  # MB
diff --git a/tests/qemu-iotests/151.out b/tests/qemu-iotests/151.out
index 3f8a935a08..2f7d3902f2 100644
--- a/tests/qemu-iotests/151.out
+++ b/tests/qemu-iotests/151.out
@@ -1,5 +1,5 @@
-......
+.......
 ----------------------------------------------------------------------
-Ran 6 tests
+Ran 7 tests
 
 OK
-- 
2.47.3

Re: [PATCH] iotests: test active mirror with unaligned, small write zeroes op

Posted by Vladimir Sementsov-Ogievskiy 2 weeks, 4 days ago

On 20.01.26 14:38, Fiona Ebner wrote:
> This tests the scenario fixed by "block/mirror: check range
> when setting zero bitmap for sync write" [0].
> 
> [0]https://lore.kernel.org/qemu-devel/20260112152544.261923-1-f.ebner@proxmox.com/
> 
> Signed-off-by: Fiona Ebner<f.ebner@proxmox.com>

Thanks!

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

and queued.

-- 
Best regards,
Vladimir