[PATCH v8 0/6] block: seriously improve savevm/loadvm performance

Denis V. Lunev posted 6 patches 3 years, 9 months ago
Test checkpatch passed
Test docker-mingw@fedora passed
Test FreeBSD passed
Test docker-quick@centos7 passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20200709132644.28470-1-den@openvz.org
Maintainers: Fam Zheng <fam@euphon.net>, Kevin Wolf <kwolf@redhat.com>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Juan Quintela <quintela@redhat.com>, Max Reitz <mreitz@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>
[PATCH v8 0/6] block: seriously improve savevm/loadvm performance
Posted by Denis V. Lunev 3 years, 9 months ago
This series do standard basic things:
- it creates intermediate buffer for all writes from QEMU migration code
  to QCOW2 image,
- this buffer is sent to disk asynchronously, allowing several writes to
  run in parallel.

In general, migration code is fantastically inefficent (by observation),
buffers are not aligned and sent with arbitrary pieces, a lot of time
less than 100 bytes at a chunk, which results in read-modify-write
operations with non-cached operations. It should also be noted that all
operations are performed into unallocated image blocks, which also suffer
due to partial writes to such new clusters.

This patch series is an implementation of idea discussed in the RFC
posted by Denis Plotnikov
https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg01925.html
Results with this series over NVME are better than original code
                original     rfc    this
cached:          1.79s      2.38s   1.27s
non-cached:      3.29s      1.31s   0.81s

Changes from v7:
- dropped lock from LoadVMState
- fixed assert in last patch
- dropped patch 1 as queued

Changes from v6:
- blk_load_vmstate kludges added (patchew problem fixed)

Changes from v5:
- loadvm optimizations added with Vladimir comments included

Changes from v4:
- added patch 4 with blk_save_vmstate() cleanup
- added R-By
- bdrv_flush_vmstate -> bdrv_finalize_vmstate
- fixed return code of bdrv_co_do_save_vmstate
- fixed typos in comments (Eric, thanks!)
- fixed patchew warnings

Changes from v3:
- rebased to master
- added patch 3 which removes aio_task_pool_wait_one()
- added R-By to patch 1
- patch 4 is rewritten via bdrv_run_co
- error path in blk_save_vmstate() is rewritten to call bdrv_flush_vmstate
  unconditionally
- added some comments
- fixes initialization in bdrv_co_vmstate_save_task_entry as suggested

Changes from v2:
- code moved from QCOW2 level to generic block level
- created bdrv_flush_vmstate helper to fix 022, 029 tests
- added recursive for bs->file in bdrv_co_flush_vmstate (fix 267)
- fixed blk_save_vmstate helper
- fixed coroutine wait as Vladimir suggested with waiting fixes from me

Changes from v1:
- patchew warning fixed
- fixed validation that only 1 waiter is allowed in patch 1

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <fam@euphon.net>
CC: Juan Quintela <quintela@redhat.com>
CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
CC: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
CC: Denis Plotnikov <dplotnikov@virtuozzo.com>



Re: [PATCH v8 0/6] block: seriously improve savevm/loadvm performance
Posted by Denis V. Lunev 3 years, 8 months ago
On 7/9/20 4:26 PM, Denis V. Lunev wrote:
> This series do standard basic things:
> - it creates intermediate buffer for all writes from QEMU migration code
>   to QCOW2 image,
> - this buffer is sent to disk asynchronously, allowing several writes to
>   run in parallel.
>
> In general, migration code is fantastically inefficent (by observation),
> buffers are not aligned and sent with arbitrary pieces, a lot of time
> less than 100 bytes at a chunk, which results in read-modify-write
> operations with non-cached operations. It should also be noted that all
> operations are performed into unallocated image blocks, which also suffer
> due to partial writes to such new clusters.
>
> This patch series is an implementation of idea discussed in the RFC
> posted by Denis Plotnikov
> https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg01925.html
> Results with this series over NVME are better than original code
>                 original     rfc    this
> cached:          1.79s      2.38s   1.27s
> non-cached:      3.29s      1.31s   0.81s
>
> Changes from v7:
> - dropped lock from LoadVMState
> - fixed assert in last patch
> - dropped patch 1 as queued
>
> Changes from v6:
> - blk_load_vmstate kludges added (patchew problem fixed)
>
> Changes from v5:
> - loadvm optimizations added with Vladimir comments included
>
> Changes from v4:
> - added patch 4 with blk_save_vmstate() cleanup
> - added R-By
> - bdrv_flush_vmstate -> bdrv_finalize_vmstate
> - fixed return code of bdrv_co_do_save_vmstate
> - fixed typos in comments (Eric, thanks!)
> - fixed patchew warnings
>
> Changes from v3:
> - rebased to master
> - added patch 3 which removes aio_task_pool_wait_one()
> - added R-By to patch 1
> - patch 4 is rewritten via bdrv_run_co
> - error path in blk_save_vmstate() is rewritten to call bdrv_flush_vmstate
>   unconditionally
> - added some comments
> - fixes initialization in bdrv_co_vmstate_save_task_entry as suggested
>
> Changes from v2:
> - code moved from QCOW2 level to generic block level
> - created bdrv_flush_vmstate helper to fix 022, 029 tests
> - added recursive for bs->file in bdrv_co_flush_vmstate (fix 267)
> - fixed blk_save_vmstate helper
> - fixed coroutine wait as Vladimir suggested with waiting fixes from me
>
> Changes from v1:
> - patchew warning fixed
> - fixed validation that only 1 waiter is allowed in patch 1
>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Fam Zheng <fam@euphon.net>
> CC: Juan Quintela <quintela@redhat.com>
> CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> CC: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> CC: Denis Plotnikov <dplotnikov@virtuozzo.com>
>
>
ping

Re: [PATCH v8 0/6] block: seriously improve savevm/loadvm performance
Posted by Denis V. Lunev 3 years, 8 months ago
On 8/20/20 10:42 AM, Denis V. Lunev wrote:
> On 7/9/20 4:26 PM, Denis V. Lunev wrote:
>> This series do standard basic things:
>> - it creates intermediate buffer for all writes from QEMU migration code
>>   to QCOW2 image,
>> - this buffer is sent to disk asynchronously, allowing several writes to
>>   run in parallel.
>>
>> In general, migration code is fantastically inefficent (by observation),
>> buffers are not aligned and sent with arbitrary pieces, a lot of time
>> less than 100 bytes at a chunk, which results in read-modify-write
>> operations with non-cached operations. It should also be noted that all
>> operations are performed into unallocated image blocks, which also suffer
>> due to partial writes to such new clusters.
>>
>> This patch series is an implementation of idea discussed in the RFC
>> posted by Denis Plotnikov
>> https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg01925.html
>> Results with this series over NVME are better than original code
>>                 original     rfc    this
>> cached:          1.79s      2.38s   1.27s
>> non-cached:      3.29s      1.31s   0.81s
>>
>> Changes from v7:
>> - dropped lock from LoadVMState
>> - fixed assert in last patch
>> - dropped patch 1 as queued
>>
>> Changes from v6:
>> - blk_load_vmstate kludges added (patchew problem fixed)
>>
>> Changes from v5:
>> - loadvm optimizations added with Vladimir comments included
>>
>> Changes from v4:
>> - added patch 4 with blk_save_vmstate() cleanup
>> - added R-By
>> - bdrv_flush_vmstate -> bdrv_finalize_vmstate
>> - fixed return code of bdrv_co_do_save_vmstate
>> - fixed typos in comments (Eric, thanks!)
>> - fixed patchew warnings
>>
>> Changes from v3:
>> - rebased to master
>> - added patch 3 which removes aio_task_pool_wait_one()
>> - added R-By to patch 1
>> - patch 4 is rewritten via bdrv_run_co
>> - error path in blk_save_vmstate() is rewritten to call bdrv_flush_vmstate
>>   unconditionally
>> - added some comments
>> - fixes initialization in bdrv_co_vmstate_save_task_entry as suggested
>>
>> Changes from v2:
>> - code moved from QCOW2 level to generic block level
>> - created bdrv_flush_vmstate helper to fix 022, 029 tests
>> - added recursive for bs->file in bdrv_co_flush_vmstate (fix 267)
>> - fixed blk_save_vmstate helper
>> - fixed coroutine wait as Vladimir suggested with waiting fixes from me
>>
>> Changes from v1:
>> - patchew warning fixed
>> - fixed validation that only 1 waiter is allowed in patch 1
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Max Reitz <mreitz@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> CC: Fam Zheng <fam@euphon.net>
>> CC: Juan Quintela <quintela@redhat.com>
>> CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> CC: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> CC: Denis Plotnikov <dplotnikov@virtuozzo.com>
>>
>>
> ping
ping V2

Re: [PATCH v8 0/6] block: seriously improve savevm/loadvm performance
Posted by Stefan Hajnoczi 3 years, 7 months ago
On Thu, Jul 09, 2020 at 04:26:38PM +0300, Denis V. Lunev wrote:
> This series do standard basic things:
> - it creates intermediate buffer for all writes from QEMU migration code
>   to QCOW2 image,
> - this buffer is sent to disk asynchronously, allowing several writes to
>   run in parallel.
> 
> In general, migration code is fantastically inefficent (by observation),
> buffers are not aligned and sent with arbitrary pieces, a lot of time
> less than 100 bytes at a chunk, which results in read-modify-write
> operations with non-cached operations. It should also be noted that all
> operations are performed into unallocated image blocks, which also suffer
> due to partial writes to such new clusters.
> 
> This patch series is an implementation of idea discussed in the RFC
> posted by Denis Plotnikov
> https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg01925.html
> Results with this series over NVME are better than original code
>                 original     rfc    this
> cached:          1.79s      2.38s   1.27s
> non-cached:      3.29s      1.31s   0.81s

Kevin and Max: Is this going through one of your trees?

Stefan