[Qemu-devel] [PATCH v8 18/20] qcow2: Switch store_bitmap_data() to byte-based iteration

Eric Blake posted 20 patches 8 years ago
There is a newer version of this series
[Qemu-devel] [PATCH v8 18/20] qcow2: Switch store_bitmap_data() to byte-based iteration
Posted by Eric Blake 8 years ago
Now that we have adjusted the majority of the calls this function
makes to be byte-based, it is easier to read the code if it makes
passes over the image using bytes rather than sectors.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

---
v7: rebase to earlier change, make rounding of offset obvious (no semantic
change, so R-b kept) [Kevin]
v5-v6: no change
v4: new patch
---
 block/qcow2-bitmap.c | 26 +++++++++++---------------
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
index 692ce0de88..302fffd6e1 100644
--- a/block/qcow2-bitmap.c
+++ b/block/qcow2-bitmap.c
@@ -1072,10 +1072,9 @@ static uint64_t *store_bitmap_data(BlockDriverState *bs,
 {
     int ret;
     BDRVQcow2State *s = bs->opaque;
-    int64_t sector;
-    uint64_t limit, sbc;
+    int64_t offset;
+    uint64_t limit;
     uint64_t bm_size = bdrv_dirty_bitmap_size(bitmap);
-    uint64_t bm_sectors = DIV_ROUND_UP(bm_size, BDRV_SECTOR_SIZE);
     const char *bm_name = bdrv_dirty_bitmap_name(bitmap);
     uint8_t *buf = NULL;
     BdrvDirtyBitmapIter *dbi;
@@ -1100,18 +1099,17 @@ static uint64_t *store_bitmap_data(BlockDriverState *bs,
     dbi = bdrv_dirty_iter_new(bitmap);
     buf = g_malloc(s->cluster_size);
     limit = bytes_covered_by_bitmap_cluster(s, bitmap);
-    sbc = limit >> BDRV_SECTOR_BITS;
     assert(DIV_ROUND_UP(bm_size, limit) == tb_size);

-    while ((sector = bdrv_dirty_iter_next(dbi) >> BDRV_SECTOR_BITS) >= 0) {
-        uint64_t cluster = sector / sbc;
+    while ((offset = bdrv_dirty_iter_next(dbi)) >= 0) {
+        uint64_t cluster = offset / limit;
         uint64_t end, write_size;
         int64_t off;

-        sector = cluster * sbc;
-        end = MIN(bm_sectors, sector + sbc);
-        write_size = bdrv_dirty_bitmap_serialization_size(bitmap,
-            sector * BDRV_SECTOR_SIZE, (end - sector) * BDRV_SECTOR_SIZE);
+        offset = QEMU_ALIGN_DOWN(offset, limit);
+        end = MIN(bm_size, offset + limit);
+        write_size = bdrv_dirty_bitmap_serialization_size(bitmap, offset,
+                                                          end - offset);
         assert(write_size <= s->cluster_size);

         off = qcow2_alloc_clusters(bs, s->cluster_size);
@@ -1123,9 +1121,7 @@ static uint64_t *store_bitmap_data(BlockDriverState *bs,
         }
         tb[cluster] = off;

-        bdrv_dirty_bitmap_serialize_part(bitmap, buf,
-                                         sector * BDRV_SECTOR_SIZE,
-                                         (end - sector) * BDRV_SECTOR_SIZE);
+        bdrv_dirty_bitmap_serialize_part(bitmap, buf, offset, end - offset);
         if (write_size < s->cluster_size) {
             memset(buf + write_size, 0, s->cluster_size - write_size);
         }
@@ -1143,11 +1139,11 @@ static uint64_t *store_bitmap_data(BlockDriverState *bs,
             goto fail;
         }

-        if (end >= bm_sectors) {
+        if (end >= bm_size) {
             break;
         }

-        bdrv_set_dirty_iter(dbi, end * BDRV_SECTOR_SIZE);
+        bdrv_set_dirty_iter(dbi, end);
     }

     *bitmap_table_size = tb_size;
-- 
2.13.5


Re: [Qemu-devel] [PATCH v8 18/20] qcow2: Switch store_bitmap_data() to byte-based iteration
Posted by Kevin Wolf 8 years ago
Am 18.09.2017 um 20:58 hat Eric Blake geschrieben:
> Now that we have adjusted the majority of the calls this function
> makes to be byte-based, it is easier to read the code if it makes
> passes over the image using bytes rather than sectors.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: John Snow <jsnow@redhat.com>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> 
> ---
> v7: rebase to earlier change, make rounding of offset obvious (no semantic
> change, so R-b kept) [Kevin]
> v5-v6: no change
> v4: new patch
> ---
>  block/qcow2-bitmap.c | 26 +++++++++++---------------
>  1 file changed, 11 insertions(+), 15 deletions(-)
> 
> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
> index 692ce0de88..302fffd6e1 100644
> --- a/block/qcow2-bitmap.c
> +++ b/block/qcow2-bitmap.c
> @@ -1072,10 +1072,9 @@ static uint64_t *store_bitmap_data(BlockDriverState *bs,
>  {
>      int ret;
>      BDRVQcow2State *s = bs->opaque;
> -    int64_t sector;
> -    uint64_t limit, sbc;
> +    int64_t offset;
> +    uint64_t limit;
>      uint64_t bm_size = bdrv_dirty_bitmap_size(bitmap);
> -    uint64_t bm_sectors = DIV_ROUND_UP(bm_size, BDRV_SECTOR_SIZE);
>      const char *bm_name = bdrv_dirty_bitmap_name(bitmap);
>      uint8_t *buf = NULL;
>      BdrvDirtyBitmapIter *dbi;
> @@ -1100,18 +1099,17 @@ static uint64_t *store_bitmap_data(BlockDriverState *bs,
>      dbi = bdrv_dirty_iter_new(bitmap);
>      buf = g_malloc(s->cluster_size);
>      limit = bytes_covered_by_bitmap_cluster(s, bitmap);
> -    sbc = limit >> BDRV_SECTOR_BITS;
>      assert(DIV_ROUND_UP(bm_size, limit) == tb_size);
> 
> -    while ((sector = bdrv_dirty_iter_next(dbi) >> BDRV_SECTOR_BITS) >= 0) {
> -        uint64_t cluster = sector / sbc;
> +    while ((offset = bdrv_dirty_iter_next(dbi)) >= 0) {
> +        uint64_t cluster = offset / limit;
>          uint64_t end, write_size;
>          int64_t off;
> 
> -        sector = cluster * sbc;
> -        end = MIN(bm_sectors, sector + sbc);
> -        write_size = bdrv_dirty_bitmap_serialization_size(bitmap,
> -            sector * BDRV_SECTOR_SIZE, (end - sector) * BDRV_SECTOR_SIZE);
> +        offset = QEMU_ALIGN_DOWN(offset, limit);

With the question that I asked in v6, apart from changing the spelling
to be more explicit, I actually hoped that you would explain why
aligning down is the right thing to do.

It looks plausible to me that we can do this in correct code because we
don't support granularities < 512 bytes (a qemu restriction that is
written down as a note in the qcow2 spec).

The part that I'm missing yet is why we need to do it. The bitmap
granularity is also the granularity of bdrv_dirty_iter_next(), so isn't
offset already aligned and we could even assert that instead of aligning
down? (As long we enforce our restriction, which we seem to do in
bitmap_list_load().)

> +        end = MIN(bm_size, offset + limit);
> +        write_size = bdrv_dirty_bitmap_serialization_size(bitmap, offset,
> +                                                          end - offset);
>          assert(write_size <= s->cluster_size);
> 
>          off = qcow2_alloc_clusters(bs, s->cluster_size);

Kevin

Re: [Qemu-devel] [PATCH v8 18/20] qcow2: Switch store_bitmap_data() to byte-based iteration
Posted by Eric Blake 8 years ago
On 09/19/2017 07:44 AM, Kevin Wolf wrote:
> Am 18.09.2017 um 20:58 hat Eric Blake geschrieben:
>> Now that we have adjusted the majority of the calls this function
>> makes to be byte-based, it is easier to read the code if it makes
>> passes over the image using bytes rather than sectors.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>> Reviewed-by: John Snow <jsnow@redhat.com>
>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

>> @@ -1100,18 +1099,17 @@ static uint64_t *store_bitmap_data(BlockDriverState *bs,
>>      dbi = bdrv_dirty_iter_new(bitmap);
>>      buf = g_malloc(s->cluster_size);
>>      limit = bytes_covered_by_bitmap_cluster(s, bitmap);

Limit is huge.  For ease of math, let's consider an image with 512-byte
clusters, and an explicit bitmap granularity of 512 bytes.  One cluster
(512 bytes, or 4k bits) of the bitmap covers 4k multiples of the
granularity, or 2M of image; and the minimum serialization alignment of
64 bits covers 32k bytes of image.  bytes_covered_by_bitmap_cluster()
computes granularity (512 bytes per bit), limit (2M bytes per cluster of
bits), and checks that 2M is indeed aligned to
bdrv_dirty_bitmap_serialization_align() (which is 32k).

Next, an image with default 64k clusters and a default bitmap
granularity of 64k bytes; one cluster (64k bytes, or 512k bits) now
covers 512k multiples of the granularity, or 32G of image size.

However...

>> -    sbc = limit >> BDRV_SECTOR_BITS;
>>      assert(DIV_ROUND_UP(bm_size, limit) == tb_size);
>>
>> -    while ((sector = bdrv_dirty_iter_next(dbi) >> BDRV_SECTOR_BITS) >= 0) {
>> -        uint64_t cluster = sector / sbc;
>> +    while ((offset = bdrv_dirty_iter_next(dbi)) >= 0) {
>> +        uint64_t cluster = offset / limit;

bdrv_dirty_iter_next() returns the next dirty bit (which is not
necessarily the first bit in the cluster).  For the purposes of
serialization, we want to serialize the entire cluster in one go, even
though we will be serializing 0 bits up until the first dirty bit.  So
offset at this point may be unaligned,

>>          uint64_t end, write_size;
>>          int64_t off;
>>
>> -        sector = cluster * sbc;
>> -        end = MIN(bm_sectors, sector + sbc);
>> -        write_size = bdrv_dirty_bitmap_serialization_size(bitmap,
>> -            sector * BDRV_SECTOR_SIZE, (end - sector) * BDRV_SECTOR_SIZE);
>> +        offset = QEMU_ALIGN_DOWN(offset, limit);

and we round it down to the start of the cluster that we will actually
be serializing.

> 
> With the question that I asked in v6, apart from changing the spelling
> to be more explicit, I actually hoped that you would explain why
> aligning down is the right thing to do.
> 
> It looks plausible to me that we can do this in correct code because we
> don't support granularities < 512 bytes (a qemu restriction that is
> written down as a note in the qcow2 spec).

It's not granularities < 512 that we have to worry about, but dirty bits
with an offset < 4M (because we are serializing an entire cluster of
bitmap data, where the first dirty offset is not necessarily aligned to
that large).

> 
> The part that I'm missing yet is why we need to do it. The bitmap
> granularity is also the granularity of bdrv_dirty_iter_next(), so isn't
> offset already aligned and we could even assert that instead of aligning
> down? (As long we enforce our restriction, which we seem to do in
> bitmap_list_load().)

Sadly, a quick:

diff --git i/block/qcow2-bitmap.c w/block/qcow2-bitmap.c
index 302fffd6e1..160f3b99f9 100644
--- i/block/qcow2-bitmap.c
+++ w/block/qcow2-bitmap.c
@@ -1106,6 +1106,7 @@ static uint64_t
*store_bitmap_data(BlockDriverState *bs,
         uint64_t end, write_size;
         int64_t off;

+        assert(QEMU_IS_ALIGNED(offset, limit));
         offset = QEMU_ALIGN_DOWN(offset, limit);
         end = MIN(bm_size, offset + limit);
         write_size = bdrv_dirty_bitmap_serialization_size(bitmap, offset,

does NOT fail iotests 165, which appears to be the only test that
actually hammers on qcow2 bitmaps (changing it to an 'assert(false)'
only shows an effect on 165) - which means our test is NOT exercising
all possible alignments.  And it's python-based, with lame output, which
makes debugging it painful.  But never fear, for v9 I will improve the
test to actually affect the bitmap at a point that would fail with my
temporary assertion in place, and thus proving that we DO need to align
down.  Note that test 165 is testing only a 1G image, but I just showed
that 64k clusters with 64k granularity covers up to 32G of image space
in one cluster of the bitmap, so the test is only covering one cluster
of serialization in the first place, and as written, the test is
dirtying byte 0, which explains why it happens to get an offset aligned
to limit, even though that is not a valid assertion.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH v8 18/20] qcow2: Switch store_bitmap_data() to byte-based iteration
Posted by Kevin Wolf 8 years ago
Am 19.09.2017 um 21:42 hat Eric Blake geschrieben:
> However...
> 
> >> -    sbc = limit >> BDRV_SECTOR_BITS;
> >>      assert(DIV_ROUND_UP(bm_size, limit) == tb_size);
> >>
> >> -    while ((sector = bdrv_dirty_iter_next(dbi) >> BDRV_SECTOR_BITS) >= 0) {
> >> -        uint64_t cluster = sector / sbc;
> >> +    while ((offset = bdrv_dirty_iter_next(dbi)) >= 0) {
> >> +        uint64_t cluster = offset / limit;
> 
> bdrv_dirty_iter_next() returns the next dirty bit (which is not
> necessarily the first bit in the cluster).  For the purposes of
> serialization, we want to serialize the entire cluster in one go, even
> though we will be serializing 0 bits up until the first dirty bit.  So
> offset at this point may be unaligned,

Ok, this is the part that I was missing. It makes a lot more sense now.

Also, I think 'cluster' meaning bitmap clusters and not qcow2 clusters
here confused me a bit.

> > The part that I'm missing yet is why we need to do it. The bitmap
> > granularity is also the granularity of bdrv_dirty_iter_next(), so isn't
> > offset already aligned and we could even assert that instead of aligning
> > down? (As long we enforce our restriction, which we seem to do in
> > bitmap_list_load().)
> 
> Sadly, a quick:
> [...]
> does NOT fail iotests 165, which appears to be the only test that
> actually hammers on qcow2 bitmaps (changing it to an 'assert(false)'
> only shows an effect on 165) - which means our test is NOT exercising
> all possible alignments.  And it's python-based, with lame output, which
> makes debugging it painful.  But never fear, for v9 I will improve the
> test to actually affect the bitmap at a point that would fail with my
> temporary assertion in place, and thus proving that we DO need to align
> down.  Note that test 165 is testing only a 1G image, but I just showed
> that 64k clusters with 64k granularity covers up to 32G of image space
> in one cluster of the bitmap, so the test is only covering one cluster
> of serialization in the first place, and as written, the test is
> dirtying byte 0, which explains why it happens to get an offset aligned
> to limit, even though that is not a valid assertion.

More tests are always welcome and a good argument for getting a series
merged. :-)

Kevin