balloon.c | 58 +++++- hw/virtio/virtio-balloon.c | 241 ++++++++++++++++++++++-- include/hw/virtio/virtio-balloon.h | 27 ++- include/migration/misc.h | 2 + include/qemu/bitmap.h | 13 ++ include/standard-headers/linux/virtio_balloon.h | 7 + include/sysemu/balloon.h | 15 +- migration/ram.c | 73 ++++++- 8 files changed, 406 insertions(+), 30 deletions(-)
This is the deivce part implementation to add a new feature,
VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
receives the guest free page hints from the driver and clears the
corresponding bits in the dirty bitmap, so that those free pages are
not transferred by the migration thread to the destination.
- Test Environment
Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Guest: 8G RAM, 4 vCPU
Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
- Test Results
- Idle Guest Live Migration Time (results are averaged over 10 runs):
- Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
- Guest with Linux Compilation Workload (make bzImage -j4):
- Live Migration Time (average)
Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
- Linux Compilation Time
Optimization v.s. Legacy = 4min56s v.s. 5min3s
--> no obvious difference
- Source Code
- QEMU: https://github.com/wei-w-wang/qemu-free-page-lm.git
- Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
ChangeLog:
v6->v7:
virtio-balloon/virtio_balloo_poll_free_page_hints:
- add virtio_notify() at the end to notify the driver that
the optimization is done, which indicates that the entries
have all been put back to the vq and ready to detach them.
v5->v6:
virtio-balloon: use iothread to get free page hint
v4->v5:
1) migration:
- bitmap_clear_dirty: update the dirty bitmap and dirty page
count under the bitmap mutex as what other functions are doing;
- qemu_guest_free_page_hint:
- add comments for this function;
- check the !block case;
- check "offset > block->used_length" before proceed;
- assign used_len inside the for{} body;
- update the dirty bitmap and dirty page counter under the
bitmap mutex;
- ram_state_reset:
- rs->free_page_support: && with use "migrate_postcopy"
instead of migration_in_postcopy;
- clear the ram_bulk_stage flag if free_page_support is true;
2) balloon:
- add the usage documentation of balloon_free_page_start and
balloon_free_page_stop in code;
- the optimization thread is named "balloon_fpo" to meet the
requirement of "less than 14 characters";
- virtio_balloon_poll_free_page_hints:
- run on condition when runstate_is_running() is true;
- add a qemu spin lock to synchronize accesses to the free
page reporting related fields shared among the migration
thread and the optimization thread;
- virtio_balloon_free_page_start: just return if
runstate_is_running is false;
- virtio_balloon_free_page_stop: access to the free page
reporting related fields under a qemu spin lock;
- virtio_balloon_device_unrealize/reset: call
virtio_balloon_free_page_stop is the free page hint feature is
used;
- virtio_balloon_set_status: call irtio_balloon_free_page_stop
in case the guest is stopped by qmp when the optimization is
running;
v3->v4:
1) bitmap: add a new API to count 1s starting from an offset of a
bitmap
2) migration:
- qemu_guest_free_page_hint: calculate
ram_state->migration_dirty_pages by counting how many bits of
free pages are truely cleared. If some of the bits were
already 0, they shouldn't be deducted by
ram_state->migration_dirty_pages. This wasn't needed for
previous versions since we optimized bulk stage only,
where all bits are guaranteed to be set. It's needed now
because we extened the usage of this optimizaton to all stages
except the last stop© stage. From 2nd stage onward, there
are possibilities that some bits of free pages are already 0.
3) virtio-balloon:
- virtio_balloon_free_page_report_status: introduce a new status,
FREE_PAGE_REPORT_S_EXIT. This status indicates that the
optimization thread has exited. FREE_PAGE_REPORT_S_STOP means
the reporting is stopped, but the optimization thread still needs
to be joined by the migration thread.
v2->v3:
1) virtio-balloon
- virtio_balloon_free_page_start: poll the hints using a new
thread;
- use cmd id between [0x80000000, UINT_MAX];
- virtio_balloon_poll_free_page_hints:
- stop the optimization only when it has started;
- don't skip free pages when !poison_val;
- add poison_val to vmsd to migrate;
- virtio_balloon_get_features: add the F_PAGE_POISON feature when
host has F_FREE_PAGE_HINT;
- remove the timer patch which is not needed now.
2) migration
- new api, qemu_guest_free_page_hint;
- rs->free_page_support set only in the precopy case;
- use the new balloon APIs.
v1->v2:
1) virtio-balloon
- use subsections to save free_page_report_cmd_id;
- poll the free page vq after sending a cmd id to the driver;
- change the free page vq size to VIRTQUEUE_MAX_SIZE;
- virtio_balloon_poll_free_page_hints: handle the corner case
that the free page block reported from the driver may cross
the RAMBlock boundary.
2) migration/ram.c
- use balloon_free_page_poll to start the optimization
Wei Wang (5):
bitmap: bitmap_count_one_with_offset
migration: use bitmap_mutex in migration_bitmap_clear_dirty
migration: API to clear bits of guest free pages from the dirty bitmap
virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
migration: use the free page hint feature from balloon
balloon.c | 58 +++++-
hw/virtio/virtio-balloon.c | 241 ++++++++++++++++++++++--
include/hw/virtio/virtio-balloon.h | 27 ++-
include/migration/misc.h | 2 +
include/qemu/bitmap.h | 13 ++
include/standard-headers/linux/virtio_balloon.h | 7 +
include/sysemu/balloon.h | 15 +-
migration/ram.c | 73 ++++++-
8 files changed, 406 insertions(+), 30 deletions(-)
--
1.8.3.1
On 04/24/2018 02:13 PM, Wei Wang wrote: > This is the deivce part implementation to add a new feature, > VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device > receives the guest free page hints from the driver and clears the > corresponding bits in the dirty bitmap, so that those free pages are > not transferred by the migration thread to the destination. > > - Test Environment > Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz > Guest: 8G RAM, 4 vCPU > Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second > > - Test Results > - Idle Guest Live Migration Time (results are averaged over 10 runs): > - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction > - Guest with Linux Compilation Workload (make bzImage -j4): > - Live Migration Time (average) > Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction > - Linux Compilation Time > Optimization v.s. Legacy = 4min56s v.s. 5min3s > --> no obvious difference > > - Source Code > - QEMU: https://github.com/wei-w-wang/qemu-free-page-lm.git > - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git > > ChangeLog: > v6->v7: > virtio-balloon/virtio_balloo_poll_free_page_hints: > - add virtio_notify() at the end to notify the driver that > the optimization is done, which indicates that the entries > have all been put back to the vq and ready to detach them. Hi Dave, Thanks for reviewing this patch series. Do you have more comments on them? If no, would it be possible to get your reviewed-by? The current kernel part is done already. Hope we could finish the QEMU part soon, and have people start to use this feature. Thanks. Best, Wei
On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote: > This is the deivce part implementation to add a new feature, > VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device > receives the guest free page hints from the driver and clears the > corresponding bits in the dirty bitmap, so that those free pages are > not transferred by the migration thread to the destination. > > - Test Environment > Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz > Guest: 8G RAM, 4 vCPU > Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second > > - Test Results > - Idle Guest Live Migration Time (results are averaged over 10 runs): > - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction > - Guest with Linux Compilation Workload (make bzImage -j4): > - Live Migration Time (average) > Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction > - Linux Compilation Time > Optimization v.s. Legacy = 4min56s v.s. 5min3s > --> no obvious difference > > - Source Code > - QEMU: https://github.com/wei-w-wang/qemu-free-page-lm.git > - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git Hi, Wei, I have a very high-level question to the series. IIUC the core idea for this series is that we can avoid sending some of the pages if we know that we don't need to send them. I think this is based on the fact that on the destination side all the pages are by default zero after they are malloced. While before this series, IIUC any migration will send every single page to destination, no matter whether it's zeroed or not. So I'm uncertain about whether this will affect the received bitmap on the destination side. Say, before this series, the received bitmap will directly cover the whole RAM bitmap after migration is finished, now it's won't. Will there be any side effect? I don't see obvious issue now, but just raise this question up. Meanwhile, this reminds me about a more funny idea: whether we can just avoid sending the zero pages directly from QEMU's perspective. In other words, can we just do nothing if save_zero_page() detected that the page is zero (I guess the is_zero_range() can be fast too, but I don't know exactly how fast it is)? And how that would be differed from this page hinting way in either performance and other aspects. I haven't digged into the kernel patches yet so I have totally no idea on the detailed implementation of the page hinting. Please feel free to correct me if there is obvious misunderstandings. Regards, -- Peter Xu
On Fri, Jun 01, 2018 at 12:58:24PM +0800, Peter Xu wrote: > On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote: > > This is the deivce part implementation to add a new feature, > > VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device > > receives the guest free page hints from the driver and clears the > > corresponding bits in the dirty bitmap, so that those free pages are > > not transferred by the migration thread to the destination. > > > > - Test Environment > > Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz > > Guest: 8G RAM, 4 vCPU > > Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second > > > > - Test Results > > - Idle Guest Live Migration Time (results are averaged over 10 runs): > > - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction > > - Guest with Linux Compilation Workload (make bzImage -j4): > > - Live Migration Time (average) > > Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction > > - Linux Compilation Time > > Optimization v.s. Legacy = 4min56s v.s. 5min3s > > --> no obvious difference > > > > - Source Code > > - QEMU: https://github.com/wei-w-wang/qemu-free-page-lm.git > > - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git > > Hi, Wei, > > I have a very high-level question to the series. > > IIUC the core idea for this series is that we can avoid sending some > of the pages if we know that we don't need to send them. I think this > is based on the fact that on the destination side all the pages are by > default zero after they are malloced. While before this series, IIUC > any migration will send every single page to destination, no matter > whether it's zeroed or not. So I'm uncertain about whether this will > affect the received bitmap on the destination side. Say, before this > series, the received bitmap will directly cover the whole RAM bitmap > after migration is finished, now it's won't. Will there be any side > effect? I don't see obvious issue now, but just raise this question > up. > > Meanwhile, this reminds me about a more funny idea: whether we can > just avoid sending the zero pages directly from QEMU's perspective. > In other words, can we just do nothing if save_zero_page() detected > that the page is zero (I guess the is_zero_range() can be fast too, > but I don't know exactly how fast it is)? And how that would be > differed from this page hinting way in either performance and other > aspects. I noticed a problem (after I wrote the above paragraph 5 minutes ago...): when a page was valid and sent to the destination (with non-zero data), however after a while that page was zeroed. Then if we don't send zero pages at all, we won't send the page after it's zeroed. Then on the destination side we'll have a stale non-zero page. Is my understanding correct? Will that be a problem to this series too where a valid page can be possibly freed and hinted? > > I haven't digged into the kernel patches yet so I have totally no idea > on the detailed implementation of the page hinting. Please feel free > to correct me if there is obvious misunderstandings. > > Regards, > > -- > Peter Xu -- Peter Xu
On 06/01/2018 01:07 PM, Peter Xu wrote: > On Fri, Jun 01, 2018 at 12:58:24PM +0800, Peter Xu wrote: >> On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote: >>> This is the deivce part implementation to add a new feature, >>> VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device >>> receives the guest free page hints from the driver and clears the >>> corresponding bits in the dirty bitmap, so that those free pages are >>> not transferred by the migration thread to the destination. >>> >>> - Test Environment >>> Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz >>> Guest: 8G RAM, 4 vCPU >>> Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second >>> >>> - Test Results >>> - Idle Guest Live Migration Time (results are averaged over 10 runs): >>> - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction >>> - Guest with Linux Compilation Workload (make bzImage -j4): >>> - Live Migration Time (average) >>> Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction >>> - Linux Compilation Time >>> Optimization v.s. Legacy = 4min56s v.s. 5min3s >>> --> no obvious difference >>> >>> - Source Code >>> - QEMU: https://github.com/wei-w-wang/qemu-free-page-lm.git >>> - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git >> Hi, Wei, >> >> I have a very high-level question to the series. >> >> IIUC the core idea for this series is that we can avoid sending some >> of the pages if we know that we don't need to send them. I think this >> is based on the fact that on the destination side all the pages are by >> default zero after they are malloced. While before this series, IIUC >> any migration will send every single page to destination, no matter >> whether it's zeroed or not. So I'm uncertain about whether this will >> affect the received bitmap on the destination side. Say, before this >> series, the received bitmap will directly cover the whole RAM bitmap >> after migration is finished, now it's won't. Will there be any side >> effect? I don't see obvious issue now, but just raise this question >> up. >> >> Meanwhile, this reminds me about a more funny idea: whether we can >> just avoid sending the zero pages directly from QEMU's perspective. >> In other words, can we just do nothing if save_zero_page() detected >> that the page is zero (I guess the is_zero_range() can be fast too, >> but I don't know exactly how fast it is)? And how that would be >> differed from this page hinting way in either performance and other >> aspects. > I noticed a problem (after I wrote the above paragraph 5 minutes > ago...): when a page was valid and sent to the destination (with > non-zero data), however after a while that page was zeroed. Then if > we don't send zero pages at all, we won't send the page after it's > zeroed. Then on the destination side we'll have a stale non-zero > page. Is my understanding correct? Will that be a problem to this > series too where a valid page can be possibly freed and hinted? I think that won't be an issue either for zero page optimization or this free page optimization. For the zero page optimization, QEMU always sends compressed 0s to the destination. The zero page is detected at the time QEMU checks it (before sending the page). if it is a 0 page, QEMU compresses all 0s (actually just a flag) and send it. For the free page optimization, we skip free pages (could be thought of as 0 pages in this context). The zero pages are detected at the time guest reports it QEMU. The page won't be reported if it is non-zero (i.e. used). Best, Wei
On Fri, Jun 01, 2018 at 03:29:45PM +0800, Wei Wang wrote: > On 06/01/2018 01:07 PM, Peter Xu wrote: > > On Fri, Jun 01, 2018 at 12:58:24PM +0800, Peter Xu wrote: > > > On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote: > > > > This is the deivce part implementation to add a new feature, > > > > VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device > > > > receives the guest free page hints from the driver and clears the > > > > corresponding bits in the dirty bitmap, so that those free pages are > > > > not transferred by the migration thread to the destination. > > > > > > > > - Test Environment > > > > Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz > > > > Guest: 8G RAM, 4 vCPU > > > > Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second > > > > > > > > - Test Results > > > > - Idle Guest Live Migration Time (results are averaged over 10 runs): > > > > - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction > > > > - Guest with Linux Compilation Workload (make bzImage -j4): > > > > - Live Migration Time (average) > > > > Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction > > > > - Linux Compilation Time > > > > Optimization v.s. Legacy = 4min56s v.s. 5min3s > > > > --> no obvious difference > > > > > > > > - Source Code > > > > - QEMU: https://github.com/wei-w-wang/qemu-free-page-lm.git > > > > - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git > > > Hi, Wei, > > > > > > I have a very high-level question to the series. > > > > > > IIUC the core idea for this series is that we can avoid sending some > > > of the pages if we know that we don't need to send them. I think this > > > is based on the fact that on the destination side all the pages are by > > > default zero after they are malloced. While before this series, IIUC > > > any migration will send every single page to destination, no matter > > > whether it's zeroed or not. So I'm uncertain about whether this will > > > affect the received bitmap on the destination side. Say, before this > > > series, the received bitmap will directly cover the whole RAM bitmap > > > after migration is finished, now it's won't. Will there be any side > > > effect? I don't see obvious issue now, but just raise this question > > > up. > > > > > > Meanwhile, this reminds me about a more funny idea: whether we can > > > just avoid sending the zero pages directly from QEMU's perspective. > > > In other words, can we just do nothing if save_zero_page() detected > > > that the page is zero (I guess the is_zero_range() can be fast too, > > > but I don't know exactly how fast it is)? And how that would be > > > differed from this page hinting way in either performance and other > > > aspects. > > I noticed a problem (after I wrote the above paragraph 5 minutes > > ago...): when a page was valid and sent to the destination (with > > non-zero data), however after a while that page was zeroed. Then if > > we don't send zero pages at all, we won't send the page after it's > > zeroed. Then on the destination side we'll have a stale non-zero > > page. Is my understanding correct? Will that be a problem to this > > series too where a valid page can be possibly freed and hinted? > > I think that won't be an issue either for zero page optimization or this > free page optimization. > > For the zero page optimization, QEMU always sends compressed 0s to the > destination. The zero page is detected at the time QEMU checks it (before > sending the page). if it is a 0 page, QEMU compresses all 0s (actually just > a flag) and send it. what I meant is, can we just do not even send that ZERO flag at all? :) > > For the free page optimization, we skip free pages (could be thought of as 0 > pages in this context). The zero pages are detected at the time guest > reports it QEMU. The page won't be reported if it is non-zero (i.e. used). Sorry I must have not explained myself well. Let's assume the page hint is used. I meant this: - start precopy, page P is non-zero (let's say, page has content P1, which is non-zero) - we send page P with content P1 on src, then latest destination cache of page P is P1 - page P is freed by the guest, then it becomes zero, dirty bitmap of P is set since it's changed (from P1 to zeroed page) - page P is provided as hint that we can skip it since it's zeroed, then the dirty bit of P is cleared - ... (page P is never used until migration completes) After migration completes, page P should be an zeroed page on the source, while IIUC on the destination side it's still with stale data P1. Did I miss anything important? Thanks, -- Peter Xu
On 06/01/2018 06:02 PM, Peter Xu wrote: > On Fri, Jun 01, 2018 at 03:29:45PM +0800, Wei Wang wrote: >> On 06/01/2018 01:07 PM, Peter Xu wrote: >>> On Fri, Jun 01, 2018 at 12:58:24PM +0800, Peter Xu wrote: >>>> On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote: >>>>> This is the deivce part implementation to add a new feature, >>>>> VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device >>>>> receives the guest free page hints from the driver and clears the >>>>> corresponding bits in the dirty bitmap, so that those free pages are >>>>> not transferred by the migration thread to the destination. >>>>> >>>>> - Test Environment >>>>> Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz >>>>> Guest: 8G RAM, 4 vCPU >>>>> Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second >>>>> >>>>> - Test Results >>>>> - Idle Guest Live Migration Time (results are averaged over 10 runs): >>>>> - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction >>>>> - Guest with Linux Compilation Workload (make bzImage -j4): >>>>> - Live Migration Time (average) >>>>> Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction >>>>> - Linux Compilation Time >>>>> Optimization v.s. Legacy = 4min56s v.s. 5min3s >>>>> --> no obvious difference >>>>> >>>>> - Source Code >>>>> - QEMU: https://github.com/wei-w-wang/qemu-free-page-lm.git >>>>> - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git >>>> Hi, Wei, >>>> >>>> I have a very high-level question to the series. >>>> >>>> IIUC the core idea for this series is that we can avoid sending some >>>> of the pages if we know that we don't need to send them. I think this >>>> is based on the fact that on the destination side all the pages are by >>>> default zero after they are malloced. While before this series, IIUC >>>> any migration will send every single page to destination, no matter >>>> whether it's zeroed or not. So I'm uncertain about whether this will >>>> affect the received bitmap on the destination side. Say, before this >>>> series, the received bitmap will directly cover the whole RAM bitmap >>>> after migration is finished, now it's won't. Will there be any side >>>> effect? I don't see obvious issue now, but just raise this question >>>> up. >>>> >>>> Meanwhile, this reminds me about a more funny idea: whether we can >>>> just avoid sending the zero pages directly from QEMU's perspective. >>>> In other words, can we just do nothing if save_zero_page() detected >>>> that the page is zero (I guess the is_zero_range() can be fast too, >>>> but I don't know exactly how fast it is)? And how that would be >>>> differed from this page hinting way in either performance and other >>>> aspects. >>> I noticed a problem (after I wrote the above paragraph 5 minutes >>> ago...): when a page was valid and sent to the destination (with >>> non-zero data), however after a while that page was zeroed. Then if >>> we don't send zero pages at all, we won't send the page after it's >>> zeroed. Then on the destination side we'll have a stale non-zero >>> page. Is my understanding correct? Will that be a problem to this >>> series too where a valid page can be possibly freed and hinted? >> I think that won't be an issue either for zero page optimization or this >> free page optimization. >> >> For the zero page optimization, QEMU always sends compressed 0s to the >> destination. The zero page is detected at the time QEMU checks it (before >> sending the page). if it is a 0 page, QEMU compresses all 0s (actually just >> a flag) and send it. > what I meant is, can we just do not even send that ZERO flag at all? :) I think you just figured out that zero pages and free pages are not completely the same case. So I guess this question is done :) Please let me know if not. > >> For the free page optimization, we skip free pages (could be thought of as 0 >> pages in this context). The zero pages are detected at the time guest >> reports it QEMU. The page won't be reported if it is non-zero (i.e. used). > Sorry I must have not explained myself well. Let's assume the page > hint is used. I meant this: > > - start precopy, page P is non-zero (let's say, page has content P1, > which is non-zero) > - we send page P with content P1 on src, then latest destination cache > of page P is P1 > - page P is freed by the guest, then it becomes zero, dirty bitmap of > P is set since it's changed (from P1 to zeroed page) The page doesn't become 0 itself when it stays on the free page list. Probably the above referred to this: #1 memset(pageP, 0, PAGE_SIZE); #2 kfree(pageP); #1 causes the page to be tracked in the bitmap, and #2 may cause the page to be cleared from the bitmap. This is no different than the general case, a page is used, written to any value, and then freed. Essentially, this leads to the question asked in another thread: does the data in free pages matter? As far as I know, Linux treats values in free pages as garbage. People don't rely on values from free pages. It is similar as the case when we use an uninitialized variable, the compiler pops out a warning (not a correct behavior). Best, Wei
On 06/01/2018 12:58 PM, Peter Xu wrote: > On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote: >> This is the deivce part implementation to add a new feature, >> VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device >> receives the guest free page hints from the driver and clears the >> corresponding bits in the dirty bitmap, so that those free pages are >> not transferred by the migration thread to the destination. >> >> - Test Environment >> Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz >> Guest: 8G RAM, 4 vCPU >> Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second >> >> - Test Results >> - Idle Guest Live Migration Time (results are averaged over 10 runs): >> - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction >> - Guest with Linux Compilation Workload (make bzImage -j4): >> - Live Migration Time (average) >> Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction >> - Linux Compilation Time >> Optimization v.s. Legacy = 4min56s v.s. 5min3s >> --> no obvious difference >> >> - Source Code >> - QEMU: https://github.com/wei-w-wang/qemu-free-page-lm.git >> - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git > Hi, Wei, > > I have a very high-level question to the series. Hi Peter, Thanks for joining the discussion :) > > IIUC the core idea for this series is that we can avoid sending some > of the pages if we know that we don't need to send them. I think this > is based on the fact that on the destination side all the pages are by > default zero after they are malloced. While before this series, IIUC > any migration will send every single page to destination, no matter > whether it's zeroed or not. So I'm uncertain about whether this will > affect the received bitmap on the destination side. Say, before this > series, the received bitmap will directly cover the whole RAM bitmap > after migration is finished, now it's won't. Will there be any side > effect? I don't see obvious issue now, but just raise this question > up. This feature currently only supports pre-copy (I think the received bitmap is something matters to post copy only). That's why we have rs->free_page_support = ..&& !migrate_postcopy(); > Meanwhile, this reminds me about a more funny idea: whether we can > just avoid sending the zero pages directly from QEMU's perspective. > In other words, can we just do nothing if save_zero_page() detected > that the page is zero (I guess the is_zero_range() can be fast too, > but I don't know exactly how fast it is)? And how that would be > differed from this page hinting way in either performance and other > aspects. I guess you referred to the zero page optimization. I think the major overhead comes to the zero page checking - lots of memory accesses, which also waste memory bandwidth. Please see the results attached in the cover letter. The legacy case already includes the zero page optimization. Best, Wei
On Fri, Jun 01, 2018 at 03:21:54PM +0800, Wei Wang wrote: > On 06/01/2018 12:58 PM, Peter Xu wrote: > > On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote: > > > This is the deivce part implementation to add a new feature, > > > VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device > > > receives the guest free page hints from the driver and clears the > > > corresponding bits in the dirty bitmap, so that those free pages are > > > not transferred by the migration thread to the destination. > > > > > > - Test Environment > > > Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz > > > Guest: 8G RAM, 4 vCPU > > > Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second > > > > > > - Test Results > > > - Idle Guest Live Migration Time (results are averaged over 10 runs): > > > - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction > > > - Guest with Linux Compilation Workload (make bzImage -j4): > > > - Live Migration Time (average) > > > Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction > > > - Linux Compilation Time > > > Optimization v.s. Legacy = 4min56s v.s. 5min3s > > > --> no obvious difference > > > > > > - Source Code > > > - QEMU: https://github.com/wei-w-wang/qemu-free-page-lm.git > > > - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git > > Hi, Wei, > > > > I have a very high-level question to the series. > > Hi Peter, > > Thanks for joining the discussion :) Thanks for letting me know this thread. It's an interesting idea. :) > > > > > IIUC the core idea for this series is that we can avoid sending some > > of the pages if we know that we don't need to send them. I think this > > is based on the fact that on the destination side all the pages are by > > default zero after they are malloced. While before this series, IIUC > > any migration will send every single page to destination, no matter > > whether it's zeroed or not. So I'm uncertain about whether this will > > affect the received bitmap on the destination side. Say, before this > > series, the received bitmap will directly cover the whole RAM bitmap > > after migration is finished, now it's won't. Will there be any side > > effect? I don't see obvious issue now, but just raise this question > > up. > > This feature currently only supports pre-copy (I think the received bitmap > is something matters to post copy only). > That's why we have > rs->free_page_support = ..&& !migrate_postcopy(); Okay. > > > Meanwhile, this reminds me about a more funny idea: whether we can > > just avoid sending the zero pages directly from QEMU's perspective. > > In other words, can we just do nothing if save_zero_page() detected > > that the page is zero (I guess the is_zero_range() can be fast too, > > but I don't know exactly how fast it is)? And how that would be > > differed from this page hinting way in either performance and other > > aspects. > > I guess you referred to the zero page optimization. I think the major > overhead comes to the zero page checking - lots of memory accesses, which > also waste memory bandwidth. Please see the results attached in the cover > letter. The legacy case already includes the zero page optimization. I replied in the other thread. We can discuss there altogether. Actually after a second thought I think maybe what I worried there is exactly the reason why we must send the zero page flag - otherwise there can be stale non-zero page on destination. Here "zero page" and "freed page" is totally different idea since even if a page is zeroed it might still be in use (not freed)! While instead for a "free page" even if it's non-zero we might be able to not send it at all, though I am not sure whether that mismatch of data might cause any side effect too. I think the corresponding question would be: if a page is freed in Linux kernel, would its data matter any more? Thanks, -- Peter Xu
* Peter Xu (peterx@redhat.com) wrote: > On Fri, Jun 01, 2018 at 03:21:54PM +0800, Wei Wang wrote: > > On 06/01/2018 12:58 PM, Peter Xu wrote: > > > On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote: > > > > This is the deivce part implementation to add a new feature, > > > > VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device > > > > receives the guest free page hints from the driver and clears the > > > > corresponding bits in the dirty bitmap, so that those free pages are > > > > not transferred by the migration thread to the destination. > > > > > > > > - Test Environment > > > > Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz > > > > Guest: 8G RAM, 4 vCPU > > > > Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second > > > > > > > > - Test Results > > > > - Idle Guest Live Migration Time (results are averaged over 10 runs): > > > > - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction > > > > - Guest with Linux Compilation Workload (make bzImage -j4): > > > > - Live Migration Time (average) > > > > Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction > > > > - Linux Compilation Time > > > > Optimization v.s. Legacy = 4min56s v.s. 5min3s > > > > --> no obvious difference > > > > > > > > - Source Code > > > > - QEMU: https://github.com/wei-w-wang/qemu-free-page-lm.git > > > > - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git > > > Hi, Wei, > > > > > > I have a very high-level question to the series. > > > > Hi Peter, > > > > Thanks for joining the discussion :) > > Thanks for letting me know this thread. It's an interesting idea. :) > > > > > > > > > IIUC the core idea for this series is that we can avoid sending some > > > of the pages if we know that we don't need to send them. I think this > > > is based on the fact that on the destination side all the pages are by > > > default zero after they are malloced. While before this series, IIUC > > > any migration will send every single page to destination, no matter > > > whether it's zeroed or not. So I'm uncertain about whether this will > > > affect the received bitmap on the destination side. Say, before this > > > series, the received bitmap will directly cover the whole RAM bitmap > > > after migration is finished, now it's won't. Will there be any side > > > effect? I don't see obvious issue now, but just raise this question > > > up. > > > > This feature currently only supports pre-copy (I think the received bitmap > > is something matters to post copy only). > > That's why we have > > rs->free_page_support = ..&& !migrate_postcopy(); > > Okay. > > > > > > Meanwhile, this reminds me about a more funny idea: whether we can > > > just avoid sending the zero pages directly from QEMU's perspective. > > > In other words, can we just do nothing if save_zero_page() detected > > > that the page is zero (I guess the is_zero_range() can be fast too, > > > but I don't know exactly how fast it is)? And how that would be > > > differed from this page hinting way in either performance and other > > > aspects. > > > > I guess you referred to the zero page optimization. I think the major > > overhead comes to the zero page checking - lots of memory accesses, which > > also waste memory bandwidth. Please see the results attached in the cover > > letter. The legacy case already includes the zero page optimization. > > I replied in the other thread. We can discuss there altogether. > > Actually after a second thought I think maybe what I worried there is > exactly the reason why we must send the zero page flag - otherwise > there can be stale non-zero page on destination. Here "zero page" and > "freed page" is totally different idea since even if a page is zeroed > it might still be in use (not freed)! While instead for a "free page" > even if it's non-zero we might be able to not send it at all, though I > am not sure whether that mismatch of data might cause any side effect > too. I think the corresponding question would be: if a page is freed > in Linux kernel, would its data matter any more? I think the answer is no - it doesn't matter; by telling the hypervisor the page is 'free' the kernel gives freedom to the hypervisor to discard the page contents. Now, that is trusting the kernel to get it's 'free' flags right, and we wouldn't want a malicious guest kernel to be able to read random data, so we have to be a little careful that what actually lands in there is something the guest has had at some point - or zero which is a very nice empty value. Dave > Thanks, > > -- > Peter Xu -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Fri, Jun 01, 2018 at 04:33:29PM +0100, Dr. David Alan Gilbert wrote: [...] > > > > Meanwhile, this reminds me about a more funny idea: whether we can > > > > just avoid sending the zero pages directly from QEMU's perspective. > > > > In other words, can we just do nothing if save_zero_page() detected > > > > that the page is zero (I guess the is_zero_range() can be fast too, > > > > but I don't know exactly how fast it is)? And how that would be > > > > differed from this page hinting way in either performance and other > > > > aspects. > > > > > > I guess you referred to the zero page optimization. I think the major > > > overhead comes to the zero page checking - lots of memory accesses, which > > > also waste memory bandwidth. Please see the results attached in the cover > > > letter. The legacy case already includes the zero page optimization. > > > > I replied in the other thread. We can discuss there altogether. > > > > Actually after a second thought I think maybe what I worried there is > > exactly the reason why we must send the zero page flag - otherwise > > there can be stale non-zero page on destination. Here "zero page" and > > "freed page" is totally different idea since even if a page is zeroed > > it might still be in use (not freed)! While instead for a "free page" > > even if it's non-zero we might be able to not send it at all, though I > > am not sure whether that mismatch of data might cause any side effect > > too. I think the corresponding question would be: if a page is freed > > in Linux kernel, would its data matter any more? > > I think the answer is no - it doesn't matter; by telling the hypervisor > the page is 'free' the kernel gives freedom to the hypervisor to > discard the page contents. Yeh it seems so. I just read over the whole work so I think there is a future work for the poisoned bits. If that's the only usage that might make the content of freed page meaningful then it seems fine to me. After all I don't know much about that... However still this seems to be a bit tricky, e.g., we need to be very careful on the guest OS side (when writting up the balloon driver for one guest OS) to make sure of that otherwise it'll be very easy to break a guest when something similar is enabled without our notice just like the poisoned feature. > Now, that is trusting the kernel to get it's 'free' flags right, > and we wouldn't want a malicious guest kernel to be able to read random > data, so we have to be a little careful that what actually lands > in there is something the guest has had at some point - or zero > which is a very nice empty value. Yeah I agree - basically this feature brings more trouble from the security POV, but I don't know whether that can be a problem since after all we can disable this when we care very much about security. Regards, -- Peter Xu
On Tue, Jun 05, 2018 at 02:42:40PM +0800, Peter Xu wrote: > > I think the answer is no - it doesn't matter; by telling the hypervisor > > the page is 'free' the kernel gives freedom to the hypervisor to > > discard the page contents. > > Yeh it seems so. Well not exactly. I replied to parent with a clarification.
On Fri, Jun 01, 2018 at 04:33:29PM +0100, Dr. David Alan Gilbert wrote: > I think the answer is no - it doesn't matter; by telling the hypervisor > the page is 'free' the kernel gives freedom to the hypervisor to > discard the page contents. I'd like to call attention to this since it's easy to get confused. That's not exactly true in the current interface. It's a *hint* not a guarantee. Let me explain. It all starts with a request from hypervisor and each free page report is matched to a request. What the report says is that the page was free *sometime after the request was sent to guest*. If hypervisor was tracking changes to page all the time since before sending the request, it can conclude that page was free and can discard the contents. If it wasn't then it can't be sure and can not discard the page, it can maybe use the hint for other decisions (e.g. unused => should be sent before other pages). -- MST
On 2018/4/24 14:13, Wei Wang wrote:
> This is the deivce part implementation to add a new feature,
> VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
> receives the guest free page hints from the driver and clears the
> corresponding bits in the dirty bitmap, so that those free pages are
> not transferred by the migration thread to the destination.
>
> - Test Environment
> Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> Guest: 8G RAM, 4 vCPU
> Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
>
> - Test Results
> - Idle Guest Live Migration Time (results are averaged over 10 runs):
> - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
> - Guest with Linux Compilation Workload (make bzImage -j4):
> - Live Migration Time (average)
> Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
> - Linux Compilation Time
> Optimization v.s. Legacy = 4min56s v.s. 5min3s
> --> no obvious difference
>
> - Source Code
> - QEMU: https://github.com/wei-w-wang/qemu-free-page-lm.git
> - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
>
> ChangeLog:
> v6->v7:
> virtio-balloon/virtio_balloo_poll_free_page_hints:
> - add virtio_notify() at the end to notify the driver that
> the optimization is done, which indicates that the entries
> have all been put back to the vq and ready to detach them.
> v5->v6:
> virtio-balloon: use iothread to get free page hint
> v4->v5:
> 1) migration:
> - bitmap_clear_dirty: update the dirty bitmap and dirty page
> count under the bitmap mutex as what other functions are doing;
> - qemu_guest_free_page_hint:
> - add comments for this function;
> - check the !block case;
> - check "offset > block->used_length" before proceed;
> - assign used_len inside the for{} body;
> - update the dirty bitmap and dirty page counter under the
> bitmap mutex;
> - ram_state_reset:
> - rs->free_page_support: && with use "migrate_postcopy"
> instead of migration_in_postcopy;
> - clear the ram_bulk_stage flag if free_page_support is true;
> 2) balloon:
> - add the usage documentation of balloon_free_page_start and
> balloon_free_page_stop in code;
> - the optimization thread is named "balloon_fpo" to meet the
> requirement of "less than 14 characters";
> - virtio_balloon_poll_free_page_hints:
> - run on condition when runstate_is_running() is true;
> - add a qemu spin lock to synchronize accesses to the free
> page reporting related fields shared among the migration
> thread and the optimization thread;
> - virtio_balloon_free_page_start: just return if
> runstate_is_running is false;
> - virtio_balloon_free_page_stop: access to the free page
> reporting related fields under a qemu spin lock;
> - virtio_balloon_device_unrealize/reset: call
> virtio_balloon_free_page_stop is the free page hint feature is
> used;
> - virtio_balloon_set_status: call irtio_balloon_free_page_stop
> in case the guest is stopped by qmp when the optimization is
> running;
> v3->v4:
> 1) bitmap: add a new API to count 1s starting from an offset of a
> bitmap
> 2) migration:
> - qemu_guest_free_page_hint: calculate
> ram_state->migration_dirty_pages by counting how many bits of
> free pages are truely cleared. If some of the bits were
> already 0, they shouldn't be deducted by
> ram_state->migration_dirty_pages. This wasn't needed for
> previous versions since we optimized bulk stage only,
> where all bits are guaranteed to be set. It's needed now
> because we extened the usage of this optimizaton to all stages
> except the last stop© stage. From 2nd stage onward, there
> are possibilities that some bits of free pages are already 0.
> 3) virtio-balloon:
> - virtio_balloon_free_page_report_status: introduce a new status,
> FREE_PAGE_REPORT_S_EXIT. This status indicates that the
> optimization thread has exited. FREE_PAGE_REPORT_S_STOP means
> the reporting is stopped, but the optimization thread still needs
> to be joined by the migration thread.
> v2->v3:
> 1) virtio-balloon
> - virtio_balloon_free_page_start: poll the hints using a new
> thread;
> - use cmd id between [0x80000000, UINT_MAX];
> - virtio_balloon_poll_free_page_hints:
> - stop the optimization only when it has started;
> - don't skip free pages when !poison_val;
> - add poison_val to vmsd to migrate;
> - virtio_balloon_get_features: add the F_PAGE_POISON feature when
> host has F_FREE_PAGE_HINT;
> - remove the timer patch which is not needed now.
> 2) migration
> - new api, qemu_guest_free_page_hint;
> - rs->free_page_support set only in the precopy case;
> - use the new balloon APIs.
> v1->v2:
> 1) virtio-balloon
> - use subsections to save free_page_report_cmd_id;
> - poll the free page vq after sending a cmd id to the driver;
> - change the free page vq size to VIRTQUEUE_MAX_SIZE;
> - virtio_balloon_poll_free_page_hints: handle the corner case
> that the free page block reported from the driver may cross
> the RAMBlock boundary.
> 2) migration/ram.c
> - use balloon_free_page_poll to start the optimization
>
>
> Wei Wang (5):
> bitmap: bitmap_count_one_with_offset
> migration: use bitmap_mutex in migration_bitmap_clear_dirty
> migration: API to clear bits of guest free pages from the dirty bitmap
> virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
> migration: use the free page hint feature from balloon
>
> balloon.c | 58 +++++-
> hw/virtio/virtio-balloon.c | 241 ++++++++++++++++++++++--
> include/hw/virtio/virtio-balloon.h | 27 ++-
> include/migration/misc.h | 2 +
> include/qemu/bitmap.h | 13 ++
> include/standard-headers/linux/virtio_balloon.h | 7 +
> include/sysemu/balloon.h | 15 +-
> migration/ram.c | 73 ++++++-
> 8 files changed, 406 insertions(+), 30 deletions(-)
Nice optimization, for the first stage of current migration method, we need to migrate all the pages of
VM to destination, with this capability, we can reduce lots of unnecessary pages migrating.
Just a small piece of advice, it is better to split the fourth patch into small ones, to make it more easy
for reviewing. Besides, should we make this capability an optional one, just like other migration capabilities do ?
On Tue, May 29, 2018 at 11:00:21PM +0800, Hailiang Zhang wrote:
> On 2018/4/24 14:13, Wei Wang wrote:
> > This is the deivce part implementation to add a new feature,
> > VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
> > receives the guest free page hints from the driver and clears the
> > corresponding bits in the dirty bitmap, so that those free pages are
> > not transferred by the migration thread to the destination.
> >
> > - Test Environment
> > Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> > Guest: 8G RAM, 4 vCPU
> > Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
> >
> > - Test Results
> > - Idle Guest Live Migration Time (results are averaged over 10 runs):
> > - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
> > - Guest with Linux Compilation Workload (make bzImage -j4):
> > - Live Migration Time (average)
> > Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
> > - Linux Compilation Time
> > Optimization v.s. Legacy = 4min56s v.s. 5min3s
> > --> no obvious difference
> >
> > - Source Code
> > - QEMU: https://github.com/wei-w-wang/qemu-free-page-lm.git
> > - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
> >
> > ChangeLog:
> > v6->v7:
> > virtio-balloon/virtio_balloo_poll_free_page_hints:
> > - add virtio_notify() at the end to notify the driver that
> > the optimization is done, which indicates that the entries
> > have all been put back to the vq and ready to detach them.
> > v5->v6:
> > virtio-balloon: use iothread to get free page hint
> > v4->v5:
> > 1) migration:
> > - bitmap_clear_dirty: update the dirty bitmap and dirty page
> > count under the bitmap mutex as what other functions are doing;
> > - qemu_guest_free_page_hint:
> > - add comments for this function;
> > - check the !block case;
> > - check "offset > block->used_length" before proceed;
> > - assign used_len inside the for{} body;
> > - update the dirty bitmap and dirty page counter under the
> > bitmap mutex;
> > - ram_state_reset:
> > - rs->free_page_support: && with use "migrate_postcopy"
> > instead of migration_in_postcopy;
> > - clear the ram_bulk_stage flag if free_page_support is true;
> > 2) balloon:
> > - add the usage documentation of balloon_free_page_start and
> > balloon_free_page_stop in code;
> > - the optimization thread is named "balloon_fpo" to meet the
> > requirement of "less than 14 characters";
> > - virtio_balloon_poll_free_page_hints:
> > - run on condition when runstate_is_running() is true;
> > - add a qemu spin lock to synchronize accesses to the free
> > page reporting related fields shared among the migration
> > thread and the optimization thread;
> > - virtio_balloon_free_page_start: just return if
> > runstate_is_running is false;
> > - virtio_balloon_free_page_stop: access to the free page
> > reporting related fields under a qemu spin lock;
> > - virtio_balloon_device_unrealize/reset: call
> > virtio_balloon_free_page_stop is the free page hint feature is
> > used;
> > - virtio_balloon_set_status: call irtio_balloon_free_page_stop
> > in case the guest is stopped by qmp when the optimization is
> > running;
> > v3->v4:
> > 1) bitmap: add a new API to count 1s starting from an offset of a
> > bitmap
> > 2) migration:
> > - qemu_guest_free_page_hint: calculate
> > ram_state->migration_dirty_pages by counting how many bits of
> > free pages are truely cleared. If some of the bits were
> > already 0, they shouldn't be deducted by
> > ram_state->migration_dirty_pages. This wasn't needed for
> > previous versions since we optimized bulk stage only,
> > where all bits are guaranteed to be set. It's needed now
> > because we extened the usage of this optimizaton to all stages
> > except the last stop© stage. From 2nd stage onward, there
> > are possibilities that some bits of free pages are already 0.
> > 3) virtio-balloon:
> > - virtio_balloon_free_page_report_status: introduce a new status,
> > FREE_PAGE_REPORT_S_EXIT. This status indicates that the
> > optimization thread has exited. FREE_PAGE_REPORT_S_STOP means
> > the reporting is stopped, but the optimization thread still needs
> > to be joined by the migration thread.
> > v2->v3:
> > 1) virtio-balloon
> > - virtio_balloon_free_page_start: poll the hints using a new
> > thread;
> > - use cmd id between [0x80000000, UINT_MAX];
> > - virtio_balloon_poll_free_page_hints:
> > - stop the optimization only when it has started;
> > - don't skip free pages when !poison_val;
> > - add poison_val to vmsd to migrate;
> > - virtio_balloon_get_features: add the F_PAGE_POISON feature when
> > host has F_FREE_PAGE_HINT;
> > - remove the timer patch which is not needed now.
> > 2) migration
> > - new api, qemu_guest_free_page_hint;
> > - rs->free_page_support set only in the precopy case;
> > - use the new balloon APIs.
> > v1->v2:
> > 1) virtio-balloon
> > - use subsections to save free_page_report_cmd_id;
> > - poll the free page vq after sending a cmd id to the driver;
> > - change the free page vq size to VIRTQUEUE_MAX_SIZE;
> > - virtio_balloon_poll_free_page_hints: handle the corner case
> > that the free page block reported from the driver may cross
> > the RAMBlock boundary.
> > 2) migration/ram.c
> > - use balloon_free_page_poll to start the optimization
> >
> >
> > Wei Wang (5):
> > bitmap: bitmap_count_one_with_offset
> > migration: use bitmap_mutex in migration_bitmap_clear_dirty
> > migration: API to clear bits of guest free pages from the dirty bitmap
> > virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
> > migration: use the free page hint feature from balloon
> >
> > balloon.c | 58 +++++-
> > hw/virtio/virtio-balloon.c | 241 ++++++++++++++++++++++--
> > include/hw/virtio/virtio-balloon.h | 27 ++-
> > include/migration/misc.h | 2 +
> > include/qemu/bitmap.h | 13 ++
> > include/standard-headers/linux/virtio_balloon.h | 7 +
> > include/sysemu/balloon.h | 15 +-
> > migration/ram.c | 73 ++++++-
> > 8 files changed, 406 insertions(+), 30 deletions(-)
>
> Nice optimization, for the first stage of current migration method, we need to migrate all the pages of
> VM to destination, with this capability, we can reduce lots of unnecessary pages migrating.
>
> Just a small piece of advice, it is better to split the fourth patch into small ones, to make it more easy
> for reviewing. Besides, should we make this capability an optional one, just like other migration capabilities do ?
That's already the case, one has to enable it in the balloon, and set
the iothread.
--
MST
On 04/24/2018 02:13 PM, Wei Wang wrote:
> This is the deivce part implementation to add a new feature,
> VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
> receives the guest free page hints from the driver and clears the
> corresponding bits in the dirty bitmap, so that those free pages are
> not transferred by the migration thread to the destination.
>
> - Test Environment
> Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> Guest: 8G RAM, 4 vCPU
> Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
>
> - Test Results
> - Idle Guest Live Migration Time (results are averaged over 10 runs):
> - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
> - Guest with Linux Compilation Workload (make bzImage -j4):
> - Live Migration Time (average)
> Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
> - Linux Compilation Time
> Optimization v.s. Legacy = 4min56s v.s. 5min3s
> --> no obvious difference
>
> - Source Code
> - QEMU: https://github.com/wei-w-wang/qemu-free-page-lm.git
> - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
>
> ChangeLog:
> v6->v7:
> virtio-balloon/virtio_balloo_poll_free_page_hints:
> - add virtio_notify() at the end to notify the driver that
> the optimization is done, which indicates that the entries
> have all been put back to the vq and ready to detach them.
> v5->v6:
> virtio-balloon: use iothread to get free page hint
> v4->v5:
> 1) migration:
> - bitmap_clear_dirty: update the dirty bitmap and dirty page
> count under the bitmap mutex as what other functions are doing;
> - qemu_guest_free_page_hint:
> - add comments for this function;
> - check the !block case;
> - check "offset > block->used_length" before proceed;
> - assign used_len inside the for{} body;
> - update the dirty bitmap and dirty page counter under the
> bitmap mutex;
> - ram_state_reset:
> - rs->free_page_support: && with use "migrate_postcopy"
> instead of migration_in_postcopy;
> - clear the ram_bulk_stage flag if free_page_support is true;
> 2) balloon:
> - add the usage documentation of balloon_free_page_start and
> balloon_free_page_stop in code;
> - the optimization thread is named "balloon_fpo" to meet the
> requirement of "less than 14 characters";
> - virtio_balloon_poll_free_page_hints:
> - run on condition when runstate_is_running() is true;
> - add a qemu spin lock to synchronize accesses to the free
> page reporting related fields shared among the migration
> thread and the optimization thread;
> - virtio_balloon_free_page_start: just return if
> runstate_is_running is false;
> - virtio_balloon_free_page_stop: access to the free page
> reporting related fields under a qemu spin lock;
> - virtio_balloon_device_unrealize/reset: call
> virtio_balloon_free_page_stop is the free page hint feature is
> used;
> - virtio_balloon_set_status: call irtio_balloon_free_page_stop
> in case the guest is stopped by qmp when the optimization is
> running;
> v3->v4:
> 1) bitmap: add a new API to count 1s starting from an offset of a
> bitmap
> 2) migration:
> - qemu_guest_free_page_hint: calculate
> ram_state->migration_dirty_pages by counting how many bits of
> free pages are truely cleared. If some of the bits were
> already 0, they shouldn't be deducted by
> ram_state->migration_dirty_pages. This wasn't needed for
> previous versions since we optimized bulk stage only,
> where all bits are guaranteed to be set. It's needed now
> because we extened the usage of this optimizaton to all stages
> except the last stop© stage. From 2nd stage onward, there
> are possibilities that some bits of free pages are already 0.
> 3) virtio-balloon:
> - virtio_balloon_free_page_report_status: introduce a new status,
> FREE_PAGE_REPORT_S_EXIT. This status indicates that the
> optimization thread has exited. FREE_PAGE_REPORT_S_STOP means
> the reporting is stopped, but the optimization thread still needs
> to be joined by the migration thread.
> v2->v3:
> 1) virtio-balloon
> - virtio_balloon_free_page_start: poll the hints using a new
> thread;
> - use cmd id between [0x80000000, UINT_MAX];
> - virtio_balloon_poll_free_page_hints:
> - stop the optimization only when it has started;
> - don't skip free pages when !poison_val;
> - add poison_val to vmsd to migrate;
> - virtio_balloon_get_features: add the F_PAGE_POISON feature when
> host has F_FREE_PAGE_HINT;
> - remove the timer patch which is not needed now.
> 2) migration
> - new api, qemu_guest_free_page_hint;
> - rs->free_page_support set only in the precopy case;
> - use the new balloon APIs.
> v1->v2:
> 1) virtio-balloon
> - use subsections to save free_page_report_cmd_id;
> - poll the free page vq after sending a cmd id to the driver;
> - change the free page vq size to VIRTQUEUE_MAX_SIZE;
> - virtio_balloon_poll_free_page_hints: handle the corner case
> that the free page block reported from the driver may cross
> the RAMBlock boundary.
> 2) migration/ram.c
> - use balloon_free_page_poll to start the optimization
>
>
> Wei Wang (5):
> bitmap: bitmap_count_one_with_offset
> migration: use bitmap_mutex in migration_bitmap_clear_dirty
> migration: API to clear bits of guest free pages from the dirty bitmap
> virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
> migration: use the free page hint feature from balloon
>
> balloon.c | 58 +++++-
> hw/virtio/virtio-balloon.c | 241 ++++++++++++++++++++++--
> include/hw/virtio/virtio-balloon.h | 27 ++-
> include/migration/misc.h | 2 +
> include/qemu/bitmap.h | 13 ++
> include/standard-headers/linux/virtio_balloon.h | 7 +
> include/sysemu/balloon.h | 15 +-
> migration/ram.c | 73 ++++++-
> 8 files changed, 406 insertions(+), 30 deletions(-)
>
Ping for comments, thanks.
Best,
Wei
© 2016 - 2026 Red Hat, Inc.