[PATCH] migration/block-dirty-bitmap: fix larger granularity bitmaps

Stefan Reiter posted 1 patch 3 years, 6 months ago
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20201021144456.1072-1-s.reiter@proxmox.com
Maintainers: "Dr. David Alan Gilbert" <dgilbert@redhat.com>, John Snow <jsnow@redhat.com>, Eric Blake <eblake@redhat.com>, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>, Fam Zheng <fam@euphon.net>, Stefan Hajnoczi <stefanha@redhat.com>, Juan Quintela <quintela@redhat.com>
migration/block-dirty-bitmap.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
[PATCH] migration/block-dirty-bitmap: fix larger granularity bitmaps
Posted by Stefan Reiter 3 years, 6 months ago
sectors_per_chunk is a 64 bit integer, but the calculation is done in 32
bits, leading to an overflow for coarse bitmap granularities.

If that results in the value 0, it leads to a hang where no progress is
made but send_bitmap_bits is constantly called with nr_sectors being 0.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 migration/block-dirty-bitmap.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 5bef793ac0..5398869e2b 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -562,8 +562,9 @@ static int add_bitmaps_to_list(DBMSaveState *s, BlockDriverState *bs,
         dbms->bitmap_alias = g_strdup(bitmap_alias);
         dbms->bitmap = bitmap;
         dbms->total_sectors = bdrv_nb_sectors(bs);
-        dbms->sectors_per_chunk = CHUNK_SIZE * 8 *
+        dbms->sectors_per_chunk = CHUNK_SIZE * 8lu *
             bdrv_dirty_bitmap_granularity(bitmap) >> BDRV_SECTOR_BITS;
+        assert(dbms->sectors_per_chunk != 0);
         if (bdrv_dirty_bitmap_enabled(bitmap)) {
             dbms->flags |= DIRTY_BITMAP_MIG_START_FLAG_ENABLED;
         }
-- 
2.20.1



Re: [PATCH] migration/block-dirty-bitmap: fix larger granularity bitmaps
Posted by Vladimir Sementsov-Ogievskiy 3 years, 6 months ago
21.10.2020 17:44, Stefan Reiter wrote:
> sectors_per_chunk is a 64 bit integer, but the calculation is done in 32
> bits, leading to an overflow for coarse bitmap granularities.
> 
> If that results in the value 0, it leads to a hang where no progress is
> made but send_bitmap_bits is constantly called with nr_sectors being 0.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
>   migration/block-dirty-bitmap.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
> index 5bef793ac0..5398869e2b 100644
> --- a/migration/block-dirty-bitmap.c
> +++ b/migration/block-dirty-bitmap.c
> @@ -562,8 +562,9 @@ static int add_bitmaps_to_list(DBMSaveState *s, BlockDriverState *bs,
>           dbms->bitmap_alias = g_strdup(bitmap_alias);
>           dbms->bitmap = bitmap;
>           dbms->total_sectors = bdrv_nb_sectors(bs);
> -        dbms->sectors_per_chunk = CHUNK_SIZE * 8 *
> +        dbms->sectors_per_chunk = CHUNK_SIZE * 8lu *

I'd prefer 8llu for absolute safety.

>               bdrv_dirty_bitmap_granularity(bitmap) >> BDRV_SECTOR_BITS;
> +        assert(dbms->sectors_per_chunk != 0);

I doubt that we need this assertion. Bug fixed, and it's obviously impossible.
And if we really want to assert that there is no overflow (assuming future changes),
it should look like this:

   assert(bdrv_dirty_bitmap_granularity(bitmap) < (1ull << 63) / CHUNK_SIZE / 8 >> BDRV_SECTOR_BITS);

to cover not only corner case but any overflow.. And of course we should modify original expression
to do ">> BDRV_SECTOR_BITS" earlier than all multiplies, like

   dbms->sectors_per_chunk = CHUNK_SIZE * 8llu * (bdrv_dirty_bitmap_granularity(bitmap) >> BDRV_SECTOR_BITS);


But I think that only s/8/8ull/ change is enough.

>           if (bdrv_dirty_bitmap_enabled(bitmap)) {
>               dbms->flags |= DIRTY_BITMAP_MIG_START_FLAG_ENABLED;
>           }
> 


With 8llu and with or without assertion:
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir

Re: [PATCH] migration/block-dirty-bitmap: fix larger granularity bitmaps
Posted by Stefan Reiter 3 years, 6 months ago
On 10/21/20 5:17 PM, Vladimir Sementsov-Ogievskiy wrote:
> 21.10.2020 17:44, Stefan Reiter wrote:
>> sectors_per_chunk is a 64 bit integer, but the calculation is done in 32
>> bits, leading to an overflow for coarse bitmap granularities.
>>
>> If that results in the value 0, it leads to a hang where no progress is
>> made but send_bitmap_bits is constantly called with nr_sectors being 0.
>>
>> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
>> ---
>>   migration/block-dirty-bitmap.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/migration/block-dirty-bitmap.c 
>> b/migration/block-dirty-bitmap.c
>> index 5bef793ac0..5398869e2b 100644
>> --- a/migration/block-dirty-bitmap.c
>> +++ b/migration/block-dirty-bitmap.c
>> @@ -562,8 +562,9 @@ static int add_bitmaps_to_list(DBMSaveState *s, 
>> BlockDriverState *bs,
>>           dbms->bitmap_alias = g_strdup(bitmap_alias);
>>           dbms->bitmap = bitmap;
>>           dbms->total_sectors = bdrv_nb_sectors(bs);
>> -        dbms->sectors_per_chunk = CHUNK_SIZE * 8 *
>> +        dbms->sectors_per_chunk = CHUNK_SIZE * 8lu *
> 
> I'd prefer 8llu for absolute safety.
> 
>>               bdrv_dirty_bitmap_granularity(bitmap) >> BDRV_SECTOR_BITS;
>> +        assert(dbms->sectors_per_chunk != 0);
> 
> I doubt that we need this assertion. Bug fixed, and it's obviously 
> impossible.
> And if we really want to assert that there is no overflow (assuming 
> future changes),
> it should look like this:
> 
>    assert(bdrv_dirty_bitmap_granularity(bitmap) < (1ull << 63) / 
> CHUNK_SIZE / 8 >> BDRV_SECTOR_BITS);
> 
> to cover not only corner case but any overflow.. And of course we should 
> modify original expression
> to do ">> BDRV_SECTOR_BITS" earlier than all multiplies, like
> 
>    dbms->sectors_per_chunk = CHUNK_SIZE * 8llu * 
> (bdrv_dirty_bitmap_granularity(bitmap) >> BDRV_SECTOR_BITS);
> 
> 
> But I think that only s/8/8ull/ change is enough.
>

I agree, and I wouldn't mind removing the assert, but just to clarify it 
was mostly meant to prevent the case where the migration gets stuck 
entirely. Even if the calculation is wrong, it would at least do 
_something_ instead of endlessly looping.

Maybe an

     assert(nr_sectors != 0);

in send_bitmap_bits instead for that?

>>           if (bdrv_dirty_bitmap_enabled(bitmap)) {
>>               dbms->flags |= DIRTY_BITMAP_MIG_START_FLAG_ENABLED;
>>           }
>>
> 
> 
> With 8llu and with or without assertion:
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> 


Re: [PATCH] migration/block-dirty-bitmap: fix larger granularity bitmaps
Posted by Vladimir Sementsov-Ogievskiy 3 years, 6 months ago
22.10.2020 10:46, Stefan Reiter wrote:
> On 10/21/20 5:17 PM, Vladimir Sementsov-Ogievskiy wrote:
>> 21.10.2020 17:44, Stefan Reiter wrote:
>>> sectors_per_chunk is a 64 bit integer, but the calculation is done in 32
>>> bits, leading to an overflow for coarse bitmap granularities.
>>>
>>> If that results in the value 0, it leads to a hang where no progress is
>>> made but send_bitmap_bits is constantly called with nr_sectors being 0.
>>>
>>> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
>>> ---
>>>   migration/block-dirty-bitmap.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
>>> index 5bef793ac0..5398869e2b 100644
>>> --- a/migration/block-dirty-bitmap.c
>>> +++ b/migration/block-dirty-bitmap.c
>>> @@ -562,8 +562,9 @@ static int add_bitmaps_to_list(DBMSaveState *s, BlockDriverState *bs,
>>>           dbms->bitmap_alias = g_strdup(bitmap_alias);
>>>           dbms->bitmap = bitmap;
>>>           dbms->total_sectors = bdrv_nb_sectors(bs);
>>> -        dbms->sectors_per_chunk = CHUNK_SIZE * 8 *
>>> +        dbms->sectors_per_chunk = CHUNK_SIZE * 8lu *
>>
>> I'd prefer 8llu for absolute safety.
>>
>>>               bdrv_dirty_bitmap_granularity(bitmap) >> BDRV_SECTOR_BITS;
>>> +        assert(dbms->sectors_per_chunk != 0);
>>
>> I doubt that we need this assertion. Bug fixed, and it's obviously impossible.
>> And if we really want to assert that there is no overflow (assuming future changes),
>> it should look like this:
>>
>>    assert(bdrv_dirty_bitmap_granularity(bitmap) < (1ull << 63) / CHUNK_SIZE / 8 >> BDRV_SECTOR_BITS);
>>
>> to cover not only corner case but any overflow.. And of course we should modify original expression
>> to do ">> BDRV_SECTOR_BITS" earlier than all multiplies, like
>>
>>    dbms->sectors_per_chunk = CHUNK_SIZE * 8llu * (bdrv_dirty_bitmap_granularity(bitmap) >> BDRV_SECTOR_BITS);
>>
>>
>> But I think that only s/8/8ull/ change is enough.
>>
> 
> I agree, and I wouldn't mind removing the assert, but just to clarify it was mostly meant to prevent the case where the migration gets stuck entirely. Even if the calculation is wrong, it would at least do _something_ instead of endlessly looping.
> 
> Maybe an
> 
>      assert(nr_sectors != 0);
> 
> in send_bitmap_bits instead for that?

Hmm, just sending 0 sectors should not be a problem by itself. It's a problem when we don't make a progress in the loop in bulk_phase().. So, I'd prefer your original assertion, as sectors_per_chunk=0 is definitely wrong thing.

> 
>>>           if (bdrv_dirty_bitmap_enabled(bitmap)) {
>>>               dbms->flags |= DIRTY_BITMAP_MIG_START_FLAG_ENABLED;
>>>           }
>>>
>>
>>
>> With 8llu and with or without assertion:
>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>
> 


-- 
Best regards,
Vladimir

Re: [PATCH] migration/block-dirty-bitmap: fix larger granularity bitmaps
Posted by Eric Blake 3 years, 6 months ago
On 10/21/20 9:44 AM, Stefan Reiter wrote:
> sectors_per_chunk is a 64 bit integer, but the calculation is done in 32
> bits, leading to an overflow for coarse bitmap granularities.
> 
> If that results in the value 0, it leads to a hang where no progress is
> made but send_bitmap_bits is constantly called with nr_sectors being 0.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
>  migration/block-dirty-bitmap.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
> index 5bef793ac0..5398869e2b 100644
> --- a/migration/block-dirty-bitmap.c
> +++ b/migration/block-dirty-bitmap.c
> @@ -562,8 +562,9 @@ static int add_bitmaps_to_list(DBMSaveState *s, BlockDriverState *bs,
>          dbms->bitmap_alias = g_strdup(bitmap_alias);
>          dbms->bitmap = bitmap;
>          dbms->total_sectors = bdrv_nb_sectors(bs);
> -        dbms->sectors_per_chunk = CHUNK_SIZE * 8 *
> +        dbms->sectors_per_chunk = CHUNK_SIZE * 8lu *

8lu is not necessarily 64-bit; you need llu.  Also, I prefer
capitalizing the type suffix, as in 8LLU.

I can touch that up while queuing through my bitmaps tree, so:

Reviewed-by: Eric Blake <eblake@redhat.com>

>              bdrv_dirty_bitmap_granularity(bitmap) >> BDRV_SECTOR_BITS;
> +        assert(dbms->sectors_per_chunk != 0);
>          if (bdrv_dirty_bitmap_enabled(bitmap)) {
>              dbms->flags |= DIRTY_BITMAP_MIG_START_FLAG_ENABLED;
>          }
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org