[v1] improve block_status() for cbw + snapshot setup

[PATCH 3/4] block/copy-before-write: reverse access bitmap

Posted by Andrey Zhadchenko 6 months ago

HBitmaps allow us to search set bits pretty fast. On the contrary,
when searching zeroes, we may be forced to fully traverse the lower
level.
When we run blockdev-backup with mode=full on top of snapshot filter
+ cbw filter, the job fills copy bitmap by calling block_status()
with range (X, virtual_size). The problem is that we check for zeroes
in this whole range. We also hit the worst case here, as access
bitmap is fully set and we need to scan the entire lowest level.
After scanning the full bitmap we actually ask the block status of
original image, which may return significantly lower amount of empty
clusters.
Beacuse of this, the backup job 'hangs' on block copy initializaiton
for a long time with 100% CPU.

Example copy bitmap buildup time for image with clu_size=65536 and
preallocated metadata
size                 10T   11T
blockdev-backup      52s   57s
cbw + snap           325s  413s
cbw + snap + patch   55s   61s

To fix it, reverse the access bitmap in cbw filter: rather set it
when the user is not allowed to read the cluster.

Update qemu-iotest 257: now access bitmap have count 0 instead of
the image size 67108864

Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
---
 block/copy-before-write.c  | 17 ++++++++++-------
 tests/qemu-iotests/257.out | 28 ++++++++++++++--------------
 2 files changed, 24 insertions(+), 21 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index fd470f5f92..5f5b3e7515 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -53,7 +53,7 @@ typedef struct BDRVCopyBeforeWriteState {
     CoMutex lock;
 
     /*
-     * @access_bitmap: represents areas allowed for reading by fleecing user.
+     * @access_bitmap: represents areas disallowed for reading by fleecing user.
      * Reading from non-dirty areas leads to -EACCES.
      */
     BdrvDirtyBitmap *access_bitmap;
@@ -220,7 +220,7 @@ cbw_snapshot_read_lock(BlockDriverState *bs, int64_t offset, int64_t bytes,
         return NULL;
     }
 
-    if (bdrv_dirty_bitmap_next_zero(s->access_bitmap, offset, bytes) != -1) {
+    if (bdrv_dirty_bitmap_next_dirty(s->access_bitmap, offset, bytes) != -1) {
         g_free(req);
         return NULL;
     }
@@ -338,8 +338,8 @@ cbw_co_pdiscard_snapshot(BlockDriverState *bs, int64_t offset, int64_t bytes)
     aligned_bytes = aligned_end - aligned_offset;
 
     WITH_QEMU_LOCK_GUARD(&s->lock) {
-        bdrv_reset_dirty_bitmap(s->access_bitmap, aligned_offset,
-                                aligned_bytes);
+        bdrv_set_dirty_bitmap(s->access_bitmap, aligned_offset,
+                              aligned_bytes);
     }
 
     block_copy_reset(s->bcs, aligned_offset, aligned_bytes);
@@ -501,9 +501,12 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
         return -EINVAL;
     }
     bdrv_disable_dirty_bitmap(s->access_bitmap);
-    bdrv_dirty_bitmap_merge_internal(s->access_bitmap,
-                                     block_copy_dirty_bitmap(s->bcs), NULL,
-                                     true);
+    if (bitmap) {
+        bdrv_dirty_bitmap_merge_internal(s->access_bitmap,
+                                         block_copy_dirty_bitmap(s->bcs), NULL,
+                                         true);
+        bdrv_dirty_bitmap_reverse(s->access_bitmap);
+    }
 
     qemu_co_mutex_init(&s->lock);
     QLIST_INIT(&s->frozen_read_reqs);
diff --git a/tests/qemu-iotests/257.out b/tests/qemu-iotests/257.out
index c33dd7f3a9..55efb418e6 100644
--- a/tests/qemu-iotests/257.out
+++ b/tests/qemu-iotests/257.out
@@ -109,7 +109,7 @@ write -P0x67 0x3fe0000 0x20000
     "backup-top": [
       {
         "busy": false,
-        "count": 67108864,
+        "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
@@ -585,7 +585,7 @@ write -P0x67 0x3fe0000 0x20000
     "backup-top": [
       {
         "busy": false,
-        "count": 67108864,
+        "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
@@ -854,7 +854,7 @@ write -P0x67 0x3fe0000 0x20000
     "backup-top": [
       {
         "busy": false,
-        "count": 67108864,
+        "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
@@ -1330,7 +1330,7 @@ write -P0x67 0x3fe0000 0x20000
     "backup-top": [
       {
         "busy": false,
-        "count": 67108864,
+        "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
@@ -1599,7 +1599,7 @@ write -P0x67 0x3fe0000 0x20000
     "backup-top": [
       {
         "busy": false,
-        "count": 67108864,
+        "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
@@ -2075,7 +2075,7 @@ write -P0x67 0x3fe0000 0x20000
     "backup-top": [
       {
         "busy": false,
-        "count": 67108864,
+        "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
@@ -2344,7 +2344,7 @@ write -P0x67 0x3fe0000 0x20000
     "backup-top": [
       {
         "busy": false,
-        "count": 67108864,
+        "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
@@ -2820,7 +2820,7 @@ write -P0x67 0x3fe0000 0x20000
     "backup-top": [
       {
         "busy": false,
-        "count": 67108864,
+        "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
@@ -3089,7 +3089,7 @@ write -P0x67 0x3fe0000 0x20000
     "backup-top": [
       {
         "busy": false,
-        "count": 67108864,
+        "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
@@ -3565,7 +3565,7 @@ write -P0x67 0x3fe0000 0x20000
     "backup-top": [
       {
         "busy": false,
-        "count": 67108864,
+        "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
@@ -3834,7 +3834,7 @@ write -P0x67 0x3fe0000 0x20000
     "backup-top": [
       {
         "busy": false,
-        "count": 67108864,
+        "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
@@ -4310,7 +4310,7 @@ write -P0x67 0x3fe0000 0x20000
     "backup-top": [
       {
         "busy": false,
-        "count": 67108864,
+        "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
@@ -4579,7 +4579,7 @@ write -P0x67 0x3fe0000 0x20000
     "backup-top": [
       {
         "busy": false,
-        "count": 67108864,
+        "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
@@ -5055,7 +5055,7 @@ write -P0x67 0x3fe0000 0x20000
     "backup-top": [
       {
         "busy": false,
-        "count": 67108864,
+        "count": 0,
         "granularity": 65536,
         "persistent": false,
         "recording": false
-- 
2.43.0

Re: [PATCH 3/4] block/copy-before-write: reverse access bitmap

Posted by Eric Blake 5 months, 4 weeks ago

On Tue, May 13, 2025 at 03:32:37AM +0200, Andrey Zhadchenko wrote:
> HBitmaps allow us to search set bits pretty fast. On the contrary,
> when searching zeroes, we may be forced to fully traverse the lower
> level.
> When we run blockdev-backup with mode=full on top of snapshot filter
> + cbw filter, the job fills copy bitmap by calling block_status()
> with range (X, virtual_size). The problem is that we check for zeroes
> in this whole range. We also hit the worst case here, as access
> bitmap is fully set and we need to scan the entire lowest level.
> After scanning the full bitmap we actually ask the block status of
> original image, which may return significantly lower amount of empty
> clusters.
> Beacuse of this, the backup job 'hangs' on block copy initializaiton
> for a long time with 100% CPU.
> 
> Example copy bitmap buildup time for image with clu_size=65536 and
> preallocated metadata
> size                 10T   11T
> blockdev-backup      52s   57s
> cbw + snap           325s  413s
> cbw + snap + patch   55s   61s
> 
> To fix it, reverse the access bitmap in cbw filter: rather set it

s/reverse/invert/

> when the user is not allowed to read the cluster.
> 
> Update qemu-iotest 257: now access bitmap have count 0 instead of
> the image size 67108864
> 
> Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
> ---
> @@ -501,9 +501,12 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
>          return -EINVAL;
>      }
>      bdrv_disable_dirty_bitmap(s->access_bitmap);
> -    bdrv_dirty_bitmap_merge_internal(s->access_bitmap,
> -                                     block_copy_dirty_bitmap(s->bcs), NULL,
> -                                     true);
> +    if (bitmap) {
> +        bdrv_dirty_bitmap_merge_internal(s->access_bitmap,
> +                                         block_copy_dirty_bitmap(s->bcs), NULL,
> +                                         true);
> +        bdrv_dirty_bitmap_reverse(s->access_bitmap);

Is this setting the bits correctly?  Inverting a bitmap is a
reversible operation, but it looks odd that you are only reversing
once around the merge.  Either the two sources of the merge have the
same sense (whether that be 0 for dirty 1 for clean, or 0 for readable
1 for avoid) and no inverting is needed before or after the merge, or
the two sources have opposite sense (in which case, I would have
expected inverting one of the bitmaps before the merge to get them to
agree on sense, then merging, then inverting back to the desired
sense).  Am I missing something?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

Re: [PATCH 3/4] block/copy-before-write: reverse access bitmap

Posted by Andrey Zhadchenko 5 months, 4 weeks ago


On 5/20/25 19:26, Eric Blake wrote:
> [?? ??????? ????????? ?????? ?? eblake@redhat.com. ???????, ?????? ??? ?????, ?? ?????? https://aka.ms/LearnAboutSenderIdentification ]
> 
> On Tue, May 13, 2025 at 03:32:37AM +0200, Andrey Zhadchenko wrote:
>> HBitmaps allow us to search set bits pretty fast. On the contrary,
>> when searching zeroes, we may be forced to fully traverse the lower
>> level.
>> When we run blockdev-backup with mode=full on top of snapshot filter
>> + cbw filter, the job fills copy bitmap by calling block_status()
>> with range (X, virtual_size). The problem is that we check for zeroes
>> in this whole range. We also hit the worst case here, as access
>> bitmap is fully set and we need to scan the entire lowest level.
>> After scanning the full bitmap we actually ask the block status of
>> original image, which may return significantly lower amount of empty
>> clusters.
>> Beacuse of this, the backup job 'hangs' on block copy initializaiton
>> for a long time with 100% CPU.
>>
>> Example copy bitmap buildup time for image with clu_size=65536 and
>> preallocated metadata
>> size                 10T   11T
>> blockdev-backup      52s   57s
>> cbw + snap           325s  413s
>> cbw + snap + patch   55s   61s
>>
>> To fix it, reverse the access bitmap in cbw filter: rather set it
> 
> s/reverse/invert/
> 
>> when the user is not allowed to read the cluster.
>>
>> Update qemu-iotest 257: now access bitmap have count 0 instead of
>> the image size 67108864
>>
>> Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
>> ---
>> @@ -501,9 +501,12 @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags,
>>           return -EINVAL;
>>       }
>>       bdrv_disable_dirty_bitmap(s->access_bitmap);
>> -    bdrv_dirty_bitmap_merge_internal(s->access_bitmap,
>> -                                     block_copy_dirty_bitmap(s->bcs), NULL,
>> -                                     true);
>> +    if (bitmap) {
>> +        bdrv_dirty_bitmap_merge_internal(s->access_bitmap,
>> +                                         block_copy_dirty_bitmap(s->bcs), NULL,
>> +                                         true);
>> +        bdrv_dirty_bitmap_reverse(s->access_bitmap);
> 
> Is this setting the bits correctly?  Inverting a bitmap is a
> reversible operation, but it looks odd that you are only reversing
> once around the merge.  Either the two sources of the merge have the
> same sense (whether that be 0 for dirty 1 for clean, or 0 for readable
> 1 for avoid) and no inverting is needed before or after the merge, or
> the two sources have opposite sense (in which case, I would have
> expected inverting one of the bitmaps before the merge to get them to
> agree on sense, then merging, then inverting back to the desired
> sense).  Am I missing something?

We have just created access bitmap a few lines above. So it is empty: 
everything is accessible.
  - If cbw did not have any bitmap, we don't need to to do anything
  - If someone set bitmap on cbw, we used it to init block-copy dirty 
bitmap. So to get the access bitmap, we inverse the bitmap that marks
to-be-copied blocks: access bitmap now has all of them as 0 (allowed) 
and not-to-be-copied as 1 (disallowed)

This case is also covered with iotests/image-fleecing with filter 
snapshot + cbw bitmap case: some not-in-a-bitmap blocks are accessed and 
we expect to get errors:

--- Sanity Check ---

read -P0x5d 0 64k
read -P0xd5 1M 64k
read -P0xdc 32M 64k
read -P0xcd 0x3ff0000 64k
read -P0 0x00f8000 32k
read failed: Invalid argument

read -P0 0x2010000 32k
read failed: Invalid argument

read -P0 0x3fe0000 64k
read failed: Invalid argument


> 
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.
> Virtualization:  qemu.org | libguestfs.org
>

[PATCH 1/4] hbitmap: drop meta bitmap leftovers
[PATCH 2/4] hbitmap: introduce hbitmap_reverse()
[PATCH 3/4] block/copy-before-write: reverse access bitmap
[PATCH 4/4] block/copy-before-write: report partial block status to snapshot