[PATCH 3/3] qcow2: handle_dependencies(): relax conflict detection

Vladimir Sementsov-Ogievskiy posted 3 patches 4 years, 6 months ago
Maintainers: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>, Hanna Reitz <hreitz@redhat.com>, Kevin Wolf <kwolf@redhat.com>
There is a newer version of this series
[PATCH 3/3] qcow2: handle_dependencies(): relax conflict detection
Posted by Vladimir Sementsov-Ogievskiy 4 years, 6 months ago
There is no conflict and no dependency if we have parallel writes to
different subclusters of one cluster when cluster itself is already
allocated. So, relax extra dependency.

Measure performance:
First, prepare build/qemu-img-old and build/qemu-img-new images.

cd scripts/simplebench
./img_bench_templater.py

Paste the following to stdin of running script:

qemu_img=../../build/qemu-img-{old|new}
$qemu_img create -f qcow2 -o extended_l2=on /ssd/x.qcow2 1G
$qemu_img bench -c 100000 -d 8 [-s 2K|-s 2K -o 512|-s $((1024*2+512))] \
        -w -t none -n /ssd/x.qcow2

The result:

All results are in seconds

------------------  ---------  ---------
                    old        new
-s 2K               6.7 ± 15%  6.2 ± 12%
                                 -7%
-s 2K -o 512        13 ± 3%    11 ± 5%
                                 -16%
-s $((1024*2+512))  9.5 ± 4%   8.4
                                 -12%
------------------  ---------  ---------

So small writes are more independent now and that helps to keep deeper
io queue which improves performance.

271 iotest output becomes racy for three allocation in one cluster.
Second and third writes may finish in different order. Second and
third requests don't depend on each other any more. Still they both
depend on first request anyway. Keep only one for consistent output.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2-cluster.c      | 11 +++++++++++
 tests/qemu-iotests/271     |  4 +---
 tests/qemu-iotests/271.out |  2 --
 3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 967121c7e6..8f56de5516 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1403,6 +1403,17 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
             continue;
         }
 
+        if (old_alloc->keep_old_clusters &&
+            (end <= l2meta_cow_start(old_alloc) ||
+             start >= l2meta_cow_end(old_alloc)))
+        {
+            /*
+             * Clusters intersect but COW areas don't. And cluster itself is
+             * already allocated. So, there is no actual conflict.
+             */
+            continue;
+        }
+
         /* Conflict */
 
         if (start < old_start) {
diff --git a/tests/qemu-iotests/271 b/tests/qemu-iotests/271
index 599b849cc6..939e88ee88 100755
--- a/tests/qemu-iotests/271
+++ b/tests/qemu-iotests/271
@@ -866,7 +866,7 @@ echo
 
 _concurrent_io()
 {
-# Allocate three subclusters in the same cluster.
+# Allocate two subclusters in the same cluster.
 # This works because handle_dependencies() checks whether the requests
 # allocate the same cluster, even if the COW regions don't overlap (in
 # this case they don't).
@@ -876,7 +876,6 @@ break write_aio A
 aio_write -P 10 30k 2k
 wait_break A
 aio_write -P 11 20k 2k
-aio_write -P 12 40k 2k
 resume A
 aio_flush
 EOF
@@ -888,7 +887,6 @@ cat <<EOF
 open -o driver=$IMGFMT $TEST_IMG
 read -q -P 10 30k 2k
 read -q -P 11 20k 2k
-read -q -P 12 40k 2k
 EOF
 }
 
diff --git a/tests/qemu-iotests/271.out b/tests/qemu-iotests/271.out
index 81043ba4d7..d94c8fe061 100644
--- a/tests/qemu-iotests/271.out
+++ b/tests/qemu-iotests/271.out
@@ -721,6 +721,4 @@ wrote 2048/2048 bytes at offset 30720
 2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 2048/2048 bytes at offset 20480
 2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 2048/2048 bytes at offset 40960
-2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 *** done
-- 
2.29.2


Re: [PATCH 3/3] qcow2: handle_dependencies(): relax conflict detection
Posted by Eric Blake 4 years, 5 months ago
On Sat, Jul 24, 2021 at 04:38:46PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> There is no conflict and no dependency if we have parallel writes to
> different subclusters of one cluster when cluster itself is already

when the cluster itself

> allocated. So, relax extra dependency.
> 
> Measure performance:
> First, prepare build/qemu-img-old and build/qemu-img-new images.
> 
> cd scripts/simplebench
> ./img_bench_templater.py
> 
> Paste the following to stdin of running script:
> 
> qemu_img=../../build/qemu-img-{old|new}
> $qemu_img create -f qcow2 -o extended_l2=on /ssd/x.qcow2 1G
> $qemu_img bench -c 100000 -d 8 [-s 2K|-s 2K -o 512|-s $((1024*2+512))] \
>         -w -t none -n /ssd/x.qcow2
> 
> The result:
> 
> All results are in seconds
> 
> ------------------  ---------  ---------
>                     old        new
> -s 2K               6.7 ± 15%  6.2 ± 12%
>                                  -7%
> -s 2K -o 512        13 ± 3%    11 ± 5%
>                                  -16%
> -s $((1024*2+512))  9.5 ± 4%   8.4
>                                  -12%
> ------------------  ---------  ---------

Cool improvement.

> 
> So small writes are more independent now and that helps to keep deeper
> io queue which improves performance.
> 
> 271 iotest output becomes racy for three allocation in one cluster.
> Second and third writes may finish in different order. Second and
> third requests don't depend on each other any more. Still they both
> depend on first request anyway. Keep only one for consistent output.

Interesting fallout.  Yes, it looks like the test is still robust
enough without the extra request.

> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/qcow2-cluster.c      | 11 +++++++++++
>  tests/qemu-iotests/271     |  4 +---
>  tests/qemu-iotests/271.out |  2 --
>  3 files changed, 12 insertions(+), 5 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


Re: [PATCH 3/3] qcow2: handle_dependencies(): relax conflict detection
Posted by Hanna Reitz 4 years, 5 months ago
On 24.07.21 15:38, Vladimir Sementsov-Ogievskiy wrote:
> There is no conflict and no dependency if we have parallel writes to
> different subclusters of one cluster when cluster itself is already
> allocated. So, relax extra dependency.
>
> Measure performance:
> First, prepare build/qemu-img-old and build/qemu-img-new images.
>
> cd scripts/simplebench
> ./img_bench_templater.py
>
> Paste the following to stdin of running script:
>
> qemu_img=../../build/qemu-img-{old|new}
> $qemu_img create -f qcow2 -o extended_l2=on /ssd/x.qcow2 1G
> $qemu_img bench -c 100000 -d 8 [-s 2K|-s 2K -o 512|-s $((1024*2+512))] \
>          -w -t none -n /ssd/x.qcow2
>
> The result:
>
> All results are in seconds
>
> ------------------  ---------  ---------
>                      old        new
> -s 2K               6.7 ± 15%  6.2 ± 12%
>                                   -7%
> -s 2K -o 512        13 ± 3%    11 ± 5%
>                                   -16%
> -s $((1024*2+512))  9.5 ± 4%   8.4
>                                   -12%
> ------------------  ---------  ---------
>
> So small writes are more independent now and that helps to keep deeper
> io queue which improves performance.
>
> 271 iotest output becomes racy for three allocation in one cluster.
> Second and third writes may finish in different order. Second and
> third requests don't depend on each other any more. Still they both
> depend on first request anyway. Keep only one for consistent output.

I mean, we could also just filter the result 
(`s/\(20480\|40960\)/FILTERED/` or something).  Perhaps there was some 
idea behind doing three writes, I don’t know exactly.

I think I’d prefer a filter, because I guess this is the only test that 
actually will do two subcluster requests in parallel...?

Hanna

> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   block/qcow2-cluster.c      | 11 +++++++++++
>   tests/qemu-iotests/271     |  4 +---
>   tests/qemu-iotests/271.out |  2 --
>   3 files changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 967121c7e6..8f56de5516 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1403,6 +1403,17 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
>               continue;
>           }
>   
> +        if (old_alloc->keep_old_clusters &&
> +            (end <= l2meta_cow_start(old_alloc) ||
> +             start >= l2meta_cow_end(old_alloc)))
> +        {
> +            /*
> +             * Clusters intersect but COW areas don't. And cluster itself is
> +             * already allocated. So, there is no actual conflict.
> +             */
> +            continue;
> +        }
> +
>           /* Conflict */
>   
>           if (start < old_start) {
> diff --git a/tests/qemu-iotests/271 b/tests/qemu-iotests/271
> index 599b849cc6..939e88ee88 100755
> --- a/tests/qemu-iotests/271
> +++ b/tests/qemu-iotests/271
> @@ -866,7 +866,7 @@ echo
>   
>   _concurrent_io()
>   {
> -# Allocate three subclusters in the same cluster.
> +# Allocate two subclusters in the same cluster.
>   # This works because handle_dependencies() checks whether the requests
>   # allocate the same cluster, even if the COW regions don't overlap (in
>   # this case they don't).
> @@ -876,7 +876,6 @@ break write_aio A
>   aio_write -P 10 30k 2k
>   wait_break A
>   aio_write -P 11 20k 2k
> -aio_write -P 12 40k 2k
>   resume A
>   aio_flush
>   EOF
> @@ -888,7 +887,6 @@ cat <<EOF
>   open -o driver=$IMGFMT $TEST_IMG
>   read -q -P 10 30k 2k
>   read -q -P 11 20k 2k
> -read -q -P 12 40k 2k
>   EOF
>   }
>   
> diff --git a/tests/qemu-iotests/271.out b/tests/qemu-iotests/271.out
> index 81043ba4d7..d94c8fe061 100644
> --- a/tests/qemu-iotests/271.out
> +++ b/tests/qemu-iotests/271.out
> @@ -721,6 +721,4 @@ wrote 2048/2048 bytes at offset 30720
>   2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>   wrote 2048/2048 bytes at offset 20480
>   2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -wrote 2048/2048 bytes at offset 40960
> -2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>   *** done


Re: [PATCH 3/3] qcow2: handle_dependencies(): relax conflict detection
Posted by Vladimir Sementsov-Ogievskiy 4 years, 5 months ago
20.08.2021 16:21, Hanna Reitz wrote:
> On 24.07.21 15:38, Vladimir Sementsov-Ogievskiy wrote:
>> There is no conflict and no dependency if we have parallel writes to
>> different subclusters of one cluster when cluster itself is already
>> allocated. So, relax extra dependency.
>>
>> Measure performance:
>> First, prepare build/qemu-img-old and build/qemu-img-new images.
>>
>> cd scripts/simplebench
>> ./img_bench_templater.py
>>
>> Paste the following to stdin of running script:
>>
>> qemu_img=../../build/qemu-img-{old|new}
>> $qemu_img create -f qcow2 -o extended_l2=on /ssd/x.qcow2 1G
>> $qemu_img bench -c 100000 -d 8 [-s 2K|-s 2K -o 512|-s $((1024*2+512))] \
>>          -w -t none -n /ssd/x.qcow2
>>
>> The result:
>>
>> All results are in seconds
>>
>> ------------------  ---------  ---------
>>                      old        new
>> -s 2K               6.7 ± 15%  6.2 ± 12%
>>                                   -7%
>> -s 2K -o 512        13 ± 3%    11 ± 5%
>>                                   -16%
>> -s $((1024*2+512))  9.5 ± 4%   8.4
>>                                   -12%
>> ------------------  ---------  ---------
>>
>> So small writes are more independent now and that helps to keep deeper
>> io queue which improves performance.
>>
>> 271 iotest output becomes racy for three allocation in one cluster.
>> Second and third writes may finish in different order. Second and
>> third requests don't depend on each other any more. Still they both
>> depend on first request anyway. Keep only one for consistent output.
> 
> I mean, we could also just filter the result (`s/\(20480\|40960\)/FILTERED/` or something).  Perhaps there was some idea behind doing three writes, I don’t know exactly.
> 
> I think I’d prefer a filter, because I guess this is the only test that actually will do two subcluster requests in parallel...?
> 

Reasonable, will do

> 
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/qcow2-cluster.c      | 11 +++++++++++
>>   tests/qemu-iotests/271     |  4 +---
>>   tests/qemu-iotests/271.out |  2 --
>>   3 files changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>> index 967121c7e6..8f56de5516 100644
>> --- a/block/qcow2-cluster.c
>> +++ b/block/qcow2-cluster.c
>> @@ -1403,6 +1403,17 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
>>               continue;
>>           }
>> +        if (old_alloc->keep_old_clusters &&
>> +            (end <= l2meta_cow_start(old_alloc) ||
>> +             start >= l2meta_cow_end(old_alloc)))
>> +        {
>> +            /*
>> +             * Clusters intersect but COW areas don't. And cluster itself is
>> +             * already allocated. So, there is no actual conflict.
>> +             */
>> +            continue;
>> +        }
>> +
>>           /* Conflict */
>>           if (start < old_start) {
>> diff --git a/tests/qemu-iotests/271 b/tests/qemu-iotests/271
>> index 599b849cc6..939e88ee88 100755
>> --- a/tests/qemu-iotests/271
>> +++ b/tests/qemu-iotests/271
>> @@ -866,7 +866,7 @@ echo
>>   _concurrent_io()
>>   {
>> -# Allocate three subclusters in the same cluster.
>> +# Allocate two subclusters in the same cluster.
>>   # This works because handle_dependencies() checks whether the requests
>>   # allocate the same cluster, even if the COW regions don't overlap (in
>>   # this case they don't).
>> @@ -876,7 +876,6 @@ break write_aio A
>>   aio_write -P 10 30k 2k
>>   wait_break A
>>   aio_write -P 11 20k 2k
>> -aio_write -P 12 40k 2k
>>   resume A
>>   aio_flush
>>   EOF
>> @@ -888,7 +887,6 @@ cat <<EOF
>>   open -o driver=$IMGFMT $TEST_IMG
>>   read -q -P 10 30k 2k
>>   read -q -P 11 20k 2k
>> -read -q -P 12 40k 2k
>>   EOF
>>   }
>> diff --git a/tests/qemu-iotests/271.out b/tests/qemu-iotests/271.out
>> index 81043ba4d7..d94c8fe061 100644
>> --- a/tests/qemu-iotests/271.out
>> +++ b/tests/qemu-iotests/271.out
>> @@ -721,6 +721,4 @@ wrote 2048/2048 bytes at offset 30720
>>   2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>   wrote 2048/2048 bytes at offset 20480
>>   2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>> -wrote 2048/2048 bytes at offset 40960
>> -2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>   *** done
> 


-- 
Best regards,
Vladimir