[v7] virtio-balloon: free page hint reporting support

[Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support

Posted by Wei Wang 7 years, 9 months ago

This is the deivce part implementation to add a new feature,
VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
receives the guest free page hints from the driver and clears the
corresponding bits in the dirty bitmap, so that those free pages are
not transferred by the migration thread to the destination.

- Test Environment
    Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
    Guest: 8G RAM, 4 vCPU
    Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second

- Test Results
    - Idle Guest Live Migration Time (results are averaged over 10 runs):
        - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
    - Guest with Linux Compilation Workload (make bzImage -j4):
        - Live Migration Time (average)
          Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
        - Linux Compilation Time
          Optimization v.s. Legacy = 4min56s v.s. 5min3s
          --> no obvious difference

- Source Code
    - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
    - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git

ChangeLog:
v6->v7:
      virtio-balloon/virtio_balloo_poll_free_page_hints:
          - add virtio_notify() at the end to notify the driver that
            the optimization is done, which indicates that the entries
            have all been put back to the vq and ready to detach them.
v5->v6:
      virtio-balloon: use iothread to get free page hint
v4->v5:
    1) migration:
        - bitmap_clear_dirty: update the dirty bitmap and dirty page
          count under the bitmap mutex as what other functions are doing;
        - qemu_guest_free_page_hint:
            - add comments for this function;
            - check the !block case;
            - check "offset > block->used_length" before proceed;
            - assign used_len inside the for{} body;
            - update the dirty bitmap and dirty page counter under the
              bitmap mutex;
        - ram_state_reset:
            - rs->free_page_support: && with use "migrate_postcopy"
              instead of migration_in_postcopy;
            - clear the ram_bulk_stage flag if free_page_support is true;
    2) balloon:
         - add the usage documentation of balloon_free_page_start and
           balloon_free_page_stop in code;
         - the optimization thread is named "balloon_fpo" to meet the
           requirement of "less than 14 characters";
         - virtio_balloon_poll_free_page_hints:
             - run on condition when runstate_is_running() is true;
             - add a qemu spin lock to synchronize accesses to the free
               page reporting related fields shared among the migration
               thread and the optimization thread;
          - virtio_balloon_free_page_start: just return if
            runstate_is_running is false;
          - virtio_balloon_free_page_stop: access to the free page
            reporting related fields under a qemu spin lock;
          - virtio_balloon_device_unrealize/reset: call
            virtio_balloon_free_page_stop is the free page hint feature is
            used;
          - virtio_balloon_set_status: call irtio_balloon_free_page_stop
            in case the guest is stopped by qmp when the optimization is
            running;
v3->v4:
    1) bitmap: add a new API to count 1s starting from an offset of a
       bitmap
    2) migration:
        - qemu_guest_free_page_hint: calculate
          ram_state->migration_dirty_pages by counting how many bits of
          free pages are truely cleared. If some of the bits were
          already 0, they shouldn't be deducted by
          ram_state->migration_dirty_pages. This wasn't needed for
          previous versions since we optimized bulk stage only,
          where all bits are guaranteed to be set. It's needed now
          because we extened the usage of this optimizaton to all stages
          except the last stop&copy stage. From 2nd stage onward, there
          are possibilities that some bits of free pages are already 0.
     3) virtio-balloon:
         - virtio_balloon_free_page_report_status: introduce a new status,
           FREE_PAGE_REPORT_S_EXIT. This status indicates that the
           optimization thread has exited. FREE_PAGE_REPORT_S_STOP means
           the reporting is stopped, but the optimization thread still needs
           to be joined by the migration thread.
v2->v3:
    1) virtio-balloon
        - virtio_balloon_free_page_start: poll the hints using a new
          thread;
        - use cmd id between [0x80000000, UINT_MAX];
        - virtio_balloon_poll_free_page_hints:
            - stop the optimization only when it has started;
            - don't skip free pages when !poison_val;
        - add poison_val to vmsd to migrate;
        - virtio_balloon_get_features: add the F_PAGE_POISON feature when
          host has F_FREE_PAGE_HINT;
        - remove the timer patch which is not needed now.
    2) migration
       - new api, qemu_guest_free_page_hint;
       - rs->free_page_support set only in the precopy case;
       - use the new balloon APIs.
v1->v2: 
    1) virtio-balloon
        - use subsections to save free_page_report_cmd_id;
        - poll the free page vq after sending a cmd id to the driver;
        - change the free page vq size to VIRTQUEUE_MAX_SIZE;
        - virtio_balloon_poll_free_page_hints: handle the corner case
          that the free page block reported from the driver may cross
          the RAMBlock boundary.
    2) migration/ram.c
        - use balloon_free_page_poll to start the optimization


Wei Wang (5):
  bitmap: bitmap_count_one_with_offset
  migration: use bitmap_mutex in migration_bitmap_clear_dirty
  migration: API to clear bits of guest free pages from the dirty bitmap
  virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  migration: use the free page hint feature from balloon

 balloon.c                                       |  58 +++++-
 hw/virtio/virtio-balloon.c                      | 241 ++++++++++++++++++++++--
 include/hw/virtio/virtio-balloon.h              |  27 ++-
 include/migration/misc.h                        |   2 +
 include/qemu/bitmap.h                           |  13 ++
 include/standard-headers/linux/virtio_balloon.h |   7 +
 include/sysemu/balloon.h                        |  15 +-
 migration/ram.c                                 |  73 ++++++-
 8 files changed, 406 insertions(+), 30 deletions(-)

-- 
1.8.3.1

Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support

Posted by Wei Wang 7 years, 9 months ago

On 04/24/2018 02:13 PM, Wei Wang wrote:
> This is the deivce part implementation to add a new feature,
> VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
> receives the guest free page hints from the driver and clears the
> corresponding bits in the dirty bitmap, so that those free pages are
> not transferred by the migration thread to the destination.
>
> - Test Environment
>      Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
>      Guest: 8G RAM, 4 vCPU
>      Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
>
> - Test Results
>      - Idle Guest Live Migration Time (results are averaged over 10 runs):
>          - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
>      - Guest with Linux Compilation Workload (make bzImage -j4):
>          - Live Migration Time (average)
>            Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
>          - Linux Compilation Time
>            Optimization v.s. Legacy = 4min56s v.s. 5min3s
>            --> no obvious difference
>
> - Source Code
>      - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
>      - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
>
> ChangeLog:
> v6->v7:
>        virtio-balloon/virtio_balloo_poll_free_page_hints:
>            - add virtio_notify() at the end to notify the driver that
>              the optimization is done, which indicates that the entries
>              have all been put back to the vq and ready to detach them.

Hi Dave,

Thanks for reviewing this patch series. Do you have more comments on 
them? If no, would it be possible to get your reviewed-by?
The current kernel part is done already. Hope we could finish the QEMU 
part soon, and have people start to use this feature. Thanks.

Best,
Wei

Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support

Posted by Peter Xu 7 years, 8 months ago

On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote:
> This is the deivce part implementation to add a new feature,
> VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
> receives the guest free page hints from the driver and clears the
> corresponding bits in the dirty bitmap, so that those free pages are
> not transferred by the migration thread to the destination.
> 
> - Test Environment
>     Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
>     Guest: 8G RAM, 4 vCPU
>     Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
> 
> - Test Results
>     - Idle Guest Live Migration Time (results are averaged over 10 runs):
>         - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
>     - Guest with Linux Compilation Workload (make bzImage -j4):
>         - Live Migration Time (average)
>           Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
>         - Linux Compilation Time
>           Optimization v.s. Legacy = 4min56s v.s. 5min3s
>           --> no obvious difference
> 
> - Source Code
>     - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
>     - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git

Hi, Wei,

I have a very high-level question to the series.

IIUC the core idea for this series is that we can avoid sending some
of the pages if we know that we don't need to send them.  I think this
is based on the fact that on the destination side all the pages are by
default zero after they are malloced.  While before this series, IIUC
any migration will send every single page to destination, no matter
whether it's zeroed or not.  So I'm uncertain about whether this will
affect the received bitmap on the destination side.  Say, before this
series, the received bitmap will directly cover the whole RAM bitmap
after migration is finished, now it's won't.  Will there be any side
effect?  I don't see obvious issue now, but just raise this question
up.

Meanwhile, this reminds me about a more funny idea: whether we can
just avoid sending the zero pages directly from QEMU's perspective.
In other words, can we just do nothing if save_zero_page() detected
that the page is zero (I guess the is_zero_range() can be fast too,
but I don't know exactly how fast it is)?  And how that would be
differed from this page hinting way in either performance and other
aspects.

I haven't digged into the kernel patches yet so I have totally no idea
on the detailed implementation of the page hinting.  Please feel free
to correct me if there is obvious misunderstandings.

Regards,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support

Posted by Peter Xu 7 years, 8 months ago

On Fri, Jun 01, 2018 at 12:58:24PM +0800, Peter Xu wrote:
> On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote:
> > This is the deivce part implementation to add a new feature,
> > VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
> > receives the guest free page hints from the driver and clears the
> > corresponding bits in the dirty bitmap, so that those free pages are
> > not transferred by the migration thread to the destination.
> > 
> > - Test Environment
> >     Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> >     Guest: 8G RAM, 4 vCPU
> >     Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
> > 
> > - Test Results
> >     - Idle Guest Live Migration Time (results are averaged over 10 runs):
> >         - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
> >     - Guest with Linux Compilation Workload (make bzImage -j4):
> >         - Live Migration Time (average)
> >           Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
> >         - Linux Compilation Time
> >           Optimization v.s. Legacy = 4min56s v.s. 5min3s
> >           --> no obvious difference
> > 
> > - Source Code
> >     - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
> >     - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
> 
> Hi, Wei,
> 
> I have a very high-level question to the series.
> 
> IIUC the core idea for this series is that we can avoid sending some
> of the pages if we know that we don't need to send them.  I think this
> is based on the fact that on the destination side all the pages are by
> default zero after they are malloced.  While before this series, IIUC
> any migration will send every single page to destination, no matter
> whether it's zeroed or not.  So I'm uncertain about whether this will
> affect the received bitmap on the destination side.  Say, before this
> series, the received bitmap will directly cover the whole RAM bitmap
> after migration is finished, now it's won't.  Will there be any side
> effect?  I don't see obvious issue now, but just raise this question
> up.
> 
> Meanwhile, this reminds me about a more funny idea: whether we can
> just avoid sending the zero pages directly from QEMU's perspective.
> In other words, can we just do nothing if save_zero_page() detected
> that the page is zero (I guess the is_zero_range() can be fast too,
> but I don't know exactly how fast it is)?  And how that would be
> differed from this page hinting way in either performance and other
> aspects.

I noticed a problem (after I wrote the above paragraph 5 minutes
ago...): when a page was valid and sent to the destination (with
non-zero data), however after a while that page was zeroed.  Then if
we don't send zero pages at all, we won't send the page after it's
zeroed.  Then on the destination side we'll have a stale non-zero
page.  Is my understanding correct?  Will that be a problem to this
series too where a valid page can be possibly freed and hinted?

> 
> I haven't digged into the kernel patches yet so I have totally no idea
> on the detailed implementation of the page hinting.  Please feel free
> to correct me if there is obvious misunderstandings.
> 
> Regards,
> 
> -- 
> Peter Xu

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support

Posted by Wei Wang 7 years, 8 months ago

On 06/01/2018 01:07 PM, Peter Xu wrote:
> On Fri, Jun 01, 2018 at 12:58:24PM +0800, Peter Xu wrote:
>> On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote:
>>> This is the deivce part implementation to add a new feature,
>>> VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
>>> receives the guest free page hints from the driver and clears the
>>> corresponding bits in the dirty bitmap, so that those free pages are
>>> not transferred by the migration thread to the destination.
>>>
>>> - Test Environment
>>>      Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
>>>      Guest: 8G RAM, 4 vCPU
>>>      Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
>>>
>>> - Test Results
>>>      - Idle Guest Live Migration Time (results are averaged over 10 runs):
>>>          - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
>>>      - Guest with Linux Compilation Workload (make bzImage -j4):
>>>          - Live Migration Time (average)
>>>            Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
>>>          - Linux Compilation Time
>>>            Optimization v.s. Legacy = 4min56s v.s. 5min3s
>>>            --> no obvious difference
>>>
>>> - Source Code
>>>      - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
>>>      - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
>> Hi, Wei,
>>
>> I have a very high-level question to the series.
>>
>> IIUC the core idea for this series is that we can avoid sending some
>> of the pages if we know that we don't need to send them.  I think this
>> is based on the fact that on the destination side all the pages are by
>> default zero after they are malloced.  While before this series, IIUC
>> any migration will send every single page to destination, no matter
>> whether it's zeroed or not.  So I'm uncertain about whether this will
>> affect the received bitmap on the destination side.  Say, before this
>> series, the received bitmap will directly cover the whole RAM bitmap
>> after migration is finished, now it's won't.  Will there be any side
>> effect?  I don't see obvious issue now, but just raise this question
>> up.
>>
>> Meanwhile, this reminds me about a more funny idea: whether we can
>> just avoid sending the zero pages directly from QEMU's perspective.
>> In other words, can we just do nothing if save_zero_page() detected
>> that the page is zero (I guess the is_zero_range() can be fast too,
>> but I don't know exactly how fast it is)?  And how that would be
>> differed from this page hinting way in either performance and other
>> aspects.
> I noticed a problem (after I wrote the above paragraph 5 minutes
> ago...): when a page was valid and sent to the destination (with
> non-zero data), however after a while that page was zeroed.  Then if
> we don't send zero pages at all, we won't send the page after it's
> zeroed.  Then on the destination side we'll have a stale non-zero
> page.  Is my understanding correct?  Will that be a problem to this
> series too where a valid page can be possibly freed and hinted?

I think that won't be an issue either for zero page optimization or this 
free page optimization.

For the zero page optimization, QEMU always sends compressed 0s to the 
destination. The zero page is detected at the time QEMU checks it 
(before sending the page). if it is a 0 page, QEMU compresses all 0s 
(actually just a flag) and send it.

For the free page optimization, we skip free pages (could be thought of 
as 0 pages in this context). The zero pages are detected at the time 
guest reports it QEMU. The page won't be reported if it is non-zero 
(i.e. used).


Best,
Wei

Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support

Posted by Peter Xu 7 years, 8 months ago

On Fri, Jun 01, 2018 at 03:29:45PM +0800, Wei Wang wrote:
> On 06/01/2018 01:07 PM, Peter Xu wrote:
> > On Fri, Jun 01, 2018 at 12:58:24PM +0800, Peter Xu wrote:
> > > On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote:
> > > > This is the deivce part implementation to add a new feature,
> > > > VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
> > > > receives the guest free page hints from the driver and clears the
> > > > corresponding bits in the dirty bitmap, so that those free pages are
> > > > not transferred by the migration thread to the destination.
> > > > 
> > > > - Test Environment
> > > >      Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> > > >      Guest: 8G RAM, 4 vCPU
> > > >      Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
> > > > 
> > > > - Test Results
> > > >      - Idle Guest Live Migration Time (results are averaged over 10 runs):
> > > >          - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
> > > >      - Guest with Linux Compilation Workload (make bzImage -j4):
> > > >          - Live Migration Time (average)
> > > >            Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
> > > >          - Linux Compilation Time
> > > >            Optimization v.s. Legacy = 4min56s v.s. 5min3s
> > > >            --> no obvious difference
> > > > 
> > > > - Source Code
> > > >      - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
> > > >      - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
> > > Hi, Wei,
> > > 
> > > I have a very high-level question to the series.
> > > 
> > > IIUC the core idea for this series is that we can avoid sending some
> > > of the pages if we know that we don't need to send them.  I think this
> > > is based on the fact that on the destination side all the pages are by
> > > default zero after they are malloced.  While before this series, IIUC
> > > any migration will send every single page to destination, no matter
> > > whether it's zeroed or not.  So I'm uncertain about whether this will
> > > affect the received bitmap on the destination side.  Say, before this
> > > series, the received bitmap will directly cover the whole RAM bitmap
> > > after migration is finished, now it's won't.  Will there be any side
> > > effect?  I don't see obvious issue now, but just raise this question
> > > up.
> > > 
> > > Meanwhile, this reminds me about a more funny idea: whether we can
> > > just avoid sending the zero pages directly from QEMU's perspective.
> > > In other words, can we just do nothing if save_zero_page() detected
> > > that the page is zero (I guess the is_zero_range() can be fast too,
> > > but I don't know exactly how fast it is)?  And how that would be
> > > differed from this page hinting way in either performance and other
> > > aspects.
> > I noticed a problem (after I wrote the above paragraph 5 minutes
> > ago...): when a page was valid and sent to the destination (with
> > non-zero data), however after a while that page was zeroed.  Then if
> > we don't send zero pages at all, we won't send the page after it's
> > zeroed.  Then on the destination side we'll have a stale non-zero
> > page.  Is my understanding correct?  Will that be a problem to this
> > series too where a valid page can be possibly freed and hinted?
> 
> I think that won't be an issue either for zero page optimization or this
> free page optimization.
> 
> For the zero page optimization, QEMU always sends compressed 0s to the
> destination. The zero page is detected at the time QEMU checks it (before
> sending the page). if it is a 0 page, QEMU compresses all 0s (actually just
> a flag) and send it.

what I meant is, can we just do not even send that ZERO flag at all? :)

> 
> For the free page optimization, we skip free pages (could be thought of as 0
> pages in this context). The zero pages are detected at the time guest
> reports it QEMU. The page won't be reported if it is non-zero (i.e. used).

Sorry I must have not explained myself well.  Let's assume the page
hint is used.  I meant this:

- start precopy, page P is non-zero (let's say, page has content P1,
  which is non-zero)
- we send page P with content P1 on src, then latest destination cache
  of page P is P1
- page P is freed by the guest, then it becomes zero, dirty bitmap of
  P is set since it's changed (from P1 to zeroed page)
- page P is provided as hint that we can skip it since it's zeroed,
  then the dirty bit of P is cleared
- ... (page P is never used until migration completes)

After migration completes, page P should be an zeroed page on the
source, while IIUC on the destination side it's still with stale data
P1.  Did I miss anything important?

Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support

Posted by Wei Wang 7 years, 8 months ago

On 06/01/2018 06:02 PM, Peter Xu wrote:
> On Fri, Jun 01, 2018 at 03:29:45PM +0800, Wei Wang wrote:
>> On 06/01/2018 01:07 PM, Peter Xu wrote:
>>> On Fri, Jun 01, 2018 at 12:58:24PM +0800, Peter Xu wrote:
>>>> On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote:
>>>>> This is the deivce part implementation to add a new feature,
>>>>> VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
>>>>> receives the guest free page hints from the driver and clears the
>>>>> corresponding bits in the dirty bitmap, so that those free pages are
>>>>> not transferred by the migration thread to the destination.
>>>>>
>>>>> - Test Environment
>>>>>       Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
>>>>>       Guest: 8G RAM, 4 vCPU
>>>>>       Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
>>>>>
>>>>> - Test Results
>>>>>       - Idle Guest Live Migration Time (results are averaged over 10 runs):
>>>>>           - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
>>>>>       - Guest with Linux Compilation Workload (make bzImage -j4):
>>>>>           - Live Migration Time (average)
>>>>>             Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
>>>>>           - Linux Compilation Time
>>>>>             Optimization v.s. Legacy = 4min56s v.s. 5min3s
>>>>>             --> no obvious difference
>>>>>
>>>>> - Source Code
>>>>>       - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
>>>>>       - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
>>>> Hi, Wei,
>>>>
>>>> I have a very high-level question to the series.
>>>>
>>>> IIUC the core idea for this series is that we can avoid sending some
>>>> of the pages if we know that we don't need to send them.  I think this
>>>> is based on the fact that on the destination side all the pages are by
>>>> default zero after they are malloced.  While before this series, IIUC
>>>> any migration will send every single page to destination, no matter
>>>> whether it's zeroed or not.  So I'm uncertain about whether this will
>>>> affect the received bitmap on the destination side.  Say, before this
>>>> series, the received bitmap will directly cover the whole RAM bitmap
>>>> after migration is finished, now it's won't.  Will there be any side
>>>> effect?  I don't see obvious issue now, but just raise this question
>>>> up.
>>>>
>>>> Meanwhile, this reminds me about a more funny idea: whether we can
>>>> just avoid sending the zero pages directly from QEMU's perspective.
>>>> In other words, can we just do nothing if save_zero_page() detected
>>>> that the page is zero (I guess the is_zero_range() can be fast too,
>>>> but I don't know exactly how fast it is)?  And how that would be
>>>> differed from this page hinting way in either performance and other
>>>> aspects.
>>> I noticed a problem (after I wrote the above paragraph 5 minutes
>>> ago...): when a page was valid and sent to the destination (with
>>> non-zero data), however after a while that page was zeroed.  Then if
>>> we don't send zero pages at all, we won't send the page after it's
>>> zeroed.  Then on the destination side we'll have a stale non-zero
>>> page.  Is my understanding correct?  Will that be a problem to this
>>> series too where a valid page can be possibly freed and hinted?
>> I think that won't be an issue either for zero page optimization or this
>> free page optimization.
>>
>> For the zero page optimization, QEMU always sends compressed 0s to the
>> destination. The zero page is detected at the time QEMU checks it (before
>> sending the page). if it is a 0 page, QEMU compresses all 0s (actually just
>> a flag) and send it.
> what I meant is, can we just do not even send that ZERO flag at all? :)

I think you just figured out that zero pages and free pages are not 
completely the same case. So I guess this question is done :)
Please let me know if not.

>
>> For the free page optimization, we skip free pages (could be thought of as 0
>> pages in this context). The zero pages are detected at the time guest
>> reports it QEMU. The page won't be reported if it is non-zero (i.e. used).
> Sorry I must have not explained myself well.  Let's assume the page
> hint is used.  I meant this:
>
> - start precopy, page P is non-zero (let's say, page has content P1,
>    which is non-zero)
> - we send page P with content P1 on src, then latest destination cache
>    of page P is P1
> - page P is freed by the guest, then it becomes zero, dirty bitmap of
>    P is set since it's changed (from P1 to zeroed page)

The page doesn't become 0 itself when it stays on the free page list. 
Probably the above referred to this:
#1 memset(pageP, 0, PAGE_SIZE);
#2 kfree(pageP);

#1 causes the page to be tracked in the bitmap, and #2 may cause the 
page to be cleared from the bitmap. This is no different than the 
general case, a page is used, written to any value, and then freed.
Essentially, this leads to the question asked in another thread: does 
the data in free pages matter?

As far as I know, Linux treats values in free pages as garbage. People 
don't rely on values from free pages. It is similar as the case when we 
use an uninitialized variable, the compiler pops out a warning (not a 
correct behavior).

Best,
Wei

Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support

Posted by Wei Wang 7 years, 8 months ago

On 06/01/2018 12:58 PM, Peter Xu wrote:
> On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote:
>> This is the deivce part implementation to add a new feature,
>> VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
>> receives the guest free page hints from the driver and clears the
>> corresponding bits in the dirty bitmap, so that those free pages are
>> not transferred by the migration thread to the destination.
>>
>> - Test Environment
>>      Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
>>      Guest: 8G RAM, 4 vCPU
>>      Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
>>
>> - Test Results
>>      - Idle Guest Live Migration Time (results are averaged over 10 runs):
>>          - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
>>      - Guest with Linux Compilation Workload (make bzImage -j4):
>>          - Live Migration Time (average)
>>            Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
>>          - Linux Compilation Time
>>            Optimization v.s. Legacy = 4min56s v.s. 5min3s
>>            --> no obvious difference
>>
>> - Source Code
>>      - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
>>      - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
> Hi, Wei,
>
> I have a very high-level question to the series.

Hi Peter,

Thanks for joining the discussion :)

>
> IIUC the core idea for this series is that we can avoid sending some
> of the pages if we know that we don't need to send them.  I think this
> is based on the fact that on the destination side all the pages are by
> default zero after they are malloced.  While before this series, IIUC
> any migration will send every single page to destination, no matter
> whether it's zeroed or not.  So I'm uncertain about whether this will
> affect the received bitmap on the destination side.  Say, before this
> series, the received bitmap will directly cover the whole RAM bitmap
> after migration is finished, now it's won't.  Will there be any side
> effect?  I don't see obvious issue now, but just raise this question
> up.

This feature currently only supports pre-copy (I think the received 
bitmap is something matters to post copy only).
That's why we have
rs->free_page_support = ..&& !migrate_postcopy();

> Meanwhile, this reminds me about a more funny idea: whether we can
> just avoid sending the zero pages directly from QEMU's perspective.
> In other words, can we just do nothing if save_zero_page() detected
> that the page is zero (I guess the is_zero_range() can be fast too,
> but I don't know exactly how fast it is)?  And how that would be
> differed from this page hinting way in either performance and other
> aspects.

I guess you referred to the zero page optimization. I think the major 
overhead comes to the zero page checking - lots of memory accesses, 
which also waste memory bandwidth. Please see the results attached in 
the cover letter. The legacy case already includes the zero page 
optimization.

Best,
Wei

Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support

Posted by Peter Xu 7 years, 8 months ago

On Fri, Jun 01, 2018 at 03:21:54PM +0800, Wei Wang wrote:
> On 06/01/2018 12:58 PM, Peter Xu wrote:
> > On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote:
> > > This is the deivce part implementation to add a new feature,
> > > VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
> > > receives the guest free page hints from the driver and clears the
> > > corresponding bits in the dirty bitmap, so that those free pages are
> > > not transferred by the migration thread to the destination.
> > > 
> > > - Test Environment
> > >      Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> > >      Guest: 8G RAM, 4 vCPU
> > >      Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
> > > 
> > > - Test Results
> > >      - Idle Guest Live Migration Time (results are averaged over 10 runs):
> > >          - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
> > >      - Guest with Linux Compilation Workload (make bzImage -j4):
> > >          - Live Migration Time (average)
> > >            Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
> > >          - Linux Compilation Time
> > >            Optimization v.s. Legacy = 4min56s v.s. 5min3s
> > >            --> no obvious difference
> > > 
> > > - Source Code
> > >      - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
> > >      - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
> > Hi, Wei,
> > 
> > I have a very high-level question to the series.
> 
> Hi Peter,
> 
> Thanks for joining the discussion :)

Thanks for letting me know this thread.  It's an interesting idea. :)

> 
> > 
> > IIUC the core idea for this series is that we can avoid sending some
> > of the pages if we know that we don't need to send them.  I think this
> > is based on the fact that on the destination side all the pages are by
> > default zero after they are malloced.  While before this series, IIUC
> > any migration will send every single page to destination, no matter
> > whether it's zeroed or not.  So I'm uncertain about whether this will
> > affect the received bitmap on the destination side.  Say, before this
> > series, the received bitmap will directly cover the whole RAM bitmap
> > after migration is finished, now it's won't.  Will there be any side
> > effect?  I don't see obvious issue now, but just raise this question
> > up.
> 
> This feature currently only supports pre-copy (I think the received bitmap
> is something matters to post copy only).
> That's why we have
> rs->free_page_support = ..&& !migrate_postcopy();

Okay.

> 
> > Meanwhile, this reminds me about a more funny idea: whether we can
> > just avoid sending the zero pages directly from QEMU's perspective.
> > In other words, can we just do nothing if save_zero_page() detected
> > that the page is zero (I guess the is_zero_range() can be fast too,
> > but I don't know exactly how fast it is)?  And how that would be
> > differed from this page hinting way in either performance and other
> > aspects.
> 
> I guess you referred to the zero page optimization. I think the major
> overhead comes to the zero page checking - lots of memory accesses, which
> also waste memory bandwidth. Please see the results attached in the cover
> letter. The legacy case already includes the zero page optimization.

I replied in the other thread.  We can discuss there altogether.

Actually after a second thought I think maybe what I worried there is
exactly the reason why we must send the zero page flag - otherwise
there can be stale non-zero page on destination.  Here "zero page" and
"freed page" is totally different idea since even if a page is zeroed
it might still be in use (not freed)!  While instead for a "free page"
even if it's non-zero we might be able to not send it at all, though I
am not sure whether that mismatch of data might cause any side effect
too. I think the corresponding question would be: if a page is freed
in Linux kernel, would its data matter any more?

Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support

Posted by Dr. David Alan Gilbert 7 years, 8 months ago

* Peter Xu (peterx@redhat.com) wrote:
> On Fri, Jun 01, 2018 at 03:21:54PM +0800, Wei Wang wrote:
> > On 06/01/2018 12:58 PM, Peter Xu wrote:
> > > On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote:
> > > > This is the deivce part implementation to add a new feature,
> > > > VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
> > > > receives the guest free page hints from the driver and clears the
> > > > corresponding bits in the dirty bitmap, so that those free pages are
> > > > not transferred by the migration thread to the destination.
> > > > 
> > > > - Test Environment
> > > >      Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> > > >      Guest: 8G RAM, 4 vCPU
> > > >      Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
> > > > 
> > > > - Test Results
> > > >      - Idle Guest Live Migration Time (results are averaged over 10 runs):
> > > >          - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
> > > >      - Guest with Linux Compilation Workload (make bzImage -j4):
> > > >          - Live Migration Time (average)
> > > >            Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
> > > >          - Linux Compilation Time
> > > >            Optimization v.s. Legacy = 4min56s v.s. 5min3s
> > > >            --> no obvious difference
> > > > 
> > > > - Source Code
> > > >      - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
> > > >      - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
> > > Hi, Wei,
> > > 
> > > I have a very high-level question to the series.
> > 
> > Hi Peter,
> > 
> > Thanks for joining the discussion :)
> 
> Thanks for letting me know this thread.  It's an interesting idea. :)
> 
> > 
> > > 
> > > IIUC the core idea for this series is that we can avoid sending some
> > > of the pages if we know that we don't need to send them.  I think this
> > > is based on the fact that on the destination side all the pages are by
> > > default zero after they are malloced.  While before this series, IIUC
> > > any migration will send every single page to destination, no matter
> > > whether it's zeroed or not.  So I'm uncertain about whether this will
> > > affect the received bitmap on the destination side.  Say, before this
> > > series, the received bitmap will directly cover the whole RAM bitmap
> > > after migration is finished, now it's won't.  Will there be any side
> > > effect?  I don't see obvious issue now, but just raise this question
> > > up.
> > 
> > This feature currently only supports pre-copy (I think the received bitmap
> > is something matters to post copy only).
> > That's why we have
> > rs->free_page_support = ..&& !migrate_postcopy();
> 
> Okay.
> 
> > 
> > > Meanwhile, this reminds me about a more funny idea: whether we can
> > > just avoid sending the zero pages directly from QEMU's perspective.
> > > In other words, can we just do nothing if save_zero_page() detected
> > > that the page is zero (I guess the is_zero_range() can be fast too,
> > > but I don't know exactly how fast it is)?  And how that would be
> > > differed from this page hinting way in either performance and other
> > > aspects.
> > 
> > I guess you referred to the zero page optimization. I think the major
> > overhead comes to the zero page checking - lots of memory accesses, which
> > also waste memory bandwidth. Please see the results attached in the cover
> > letter. The legacy case already includes the zero page optimization.
> 
> I replied in the other thread.  We can discuss there altogether.
> 
> Actually after a second thought I think maybe what I worried there is
> exactly the reason why we must send the zero page flag - otherwise
> there can be stale non-zero page on destination.  Here "zero page" and
> "freed page" is totally different idea since even if a page is zeroed
> it might still be in use (not freed)!  While instead for a "free page"
> even if it's non-zero we might be able to not send it at all, though I
> am not sure whether that mismatch of data might cause any side effect
> too. I think the corresponding question would be: if a page is freed
> in Linux kernel, would its data matter any more?

I think the answer is no - it doesn't matter; by telling the hypervisor
the page is 'free' the kernel gives freedom to the hypervisor to
discard the page contents.
Now, that is trusting the kernel to get it's 'free' flags right,
and we wouldn't want a malicious guest kernel to be able to read random
data, so we have to be a little careful that what actually lands
in there is something the guest has had at some point - or zero
which is a very nice empty value.

Dave

> Thanks,
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support

Posted by Peter Xu 7 years, 8 months ago

On Fri, Jun 01, 2018 at 04:33:29PM +0100, Dr. David Alan Gilbert wrote:

[...]

> > > > Meanwhile, this reminds me about a more funny idea: whether we can
> > > > just avoid sending the zero pages directly from QEMU's perspective.
> > > > In other words, can we just do nothing if save_zero_page() detected
> > > > that the page is zero (I guess the is_zero_range() can be fast too,
> > > > but I don't know exactly how fast it is)?  And how that would be
> > > > differed from this page hinting way in either performance and other
> > > > aspects.
> > > 
> > > I guess you referred to the zero page optimization. I think the major
> > > overhead comes to the zero page checking - lots of memory accesses, which
> > > also waste memory bandwidth. Please see the results attached in the cover
> > > letter. The legacy case already includes the zero page optimization.
> > 
> > I replied in the other thread.  We can discuss there altogether.
> > 
> > Actually after a second thought I think maybe what I worried there is
> > exactly the reason why we must send the zero page flag - otherwise
> > there can be stale non-zero page on destination.  Here "zero page" and
> > "freed page" is totally different idea since even if a page is zeroed
> > it might still be in use (not freed)!  While instead for a "free page"
> > even if it's non-zero we might be able to not send it at all, though I
> > am not sure whether that mismatch of data might cause any side effect
> > too. I think the corresponding question would be: if a page is freed
> > in Linux kernel, would its data matter any more?
> 
> I think the answer is no - it doesn't matter; by telling the hypervisor
> the page is 'free' the kernel gives freedom to the hypervisor to
> discard the page contents.

Yeh it seems so.  I just read over the whole work so I think there is
a future work for the poisoned bits.  If that's the only usage that
might make the content of freed page meaningful then it seems fine to
me.  After all I don't know much about that...  However still this
seems to be a bit tricky, e.g., we need to be very careful on the
guest OS side (when writting up the balloon driver for one guest OS)
to make sure of that otherwise it'll be very easy to break a guest
when something similar is enabled without our notice just like the
poisoned feature.

> Now, that is trusting the kernel to get it's 'free' flags right,
> and we wouldn't want a malicious guest kernel to be able to read random
> data, so we have to be a little careful that what actually lands
> in there is something the guest has had at some point - or zero
> which is a very nice empty value.

Yeah I agree - basically this feature brings more trouble from the
security POV, but I don't know whether that can be a problem since
after all we can disable this when we care very much about security.

Regards,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support

Posted by Michael S. Tsirkin 7 years, 8 months ago

On Tue, Jun 05, 2018 at 02:42:40PM +0800, Peter Xu wrote:
> > I think the answer is no - it doesn't matter; by telling the hypervisor
> > the page is 'free' the kernel gives freedom to the hypervisor to
> > discard the page contents.
> 
> Yeh it seems so.

Well not exactly.  I replied to parent with a clarification.

Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support

Posted by Michael S. Tsirkin 7 years, 8 months ago

On Fri, Jun 01, 2018 at 04:33:29PM +0100, Dr. David Alan Gilbert wrote:
> I think the answer is no - it doesn't matter; by telling the hypervisor
> the page is 'free' the kernel gives freedom to the hypervisor to
> discard the page contents.

I'd like to call attention to this since it's easy to get confused.

That's not exactly true in the current interface.

It's a *hint* not a guarantee.

Let me explain.

It all starts with a request from hypervisor and each free page report
is matched to a request.  What the report says is that the page was free
*sometime after the request was sent to guest*.  If hypervisor was
tracking changes to page all the time since before sending the request,
it can conclude that page was free and can discard the contents.  If it
wasn't then it can't be sure and can not discard the page, it can maybe
use the hint for other decisions (e.g. unused => should be sent before
other pages).

-- 
MST

Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support

Posted by Hailiang Zhang 7 years, 8 months ago

On 2018/4/24 14:13, Wei Wang wrote:
> This is the deivce part implementation to add a new feature,
> VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
> receives the guest free page hints from the driver and clears the
> corresponding bits in the dirty bitmap, so that those free pages are
> not transferred by the migration thread to the destination.
>
> - Test Environment
>      Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
>      Guest: 8G RAM, 4 vCPU
>      Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
>
> - Test Results
>      - Idle Guest Live Migration Time (results are averaged over 10 runs):
>          - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
>      - Guest with Linux Compilation Workload (make bzImage -j4):
>          - Live Migration Time (average)
>            Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
>          - Linux Compilation Time
>            Optimization v.s. Legacy = 4min56s v.s. 5min3s
>            --> no obvious difference
>
> - Source Code
>      - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
>      - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
>
> ChangeLog:
> v6->v7:
>        virtio-balloon/virtio_balloo_poll_free_page_hints:
>            - add virtio_notify() at the end to notify the driver that
>              the optimization is done, which indicates that the entries
>              have all been put back to the vq and ready to detach them.
> v5->v6:
>        virtio-balloon: use iothread to get free page hint
> v4->v5:
>      1) migration:
>          - bitmap_clear_dirty: update the dirty bitmap and dirty page
>            count under the bitmap mutex as what other functions are doing;
>          - qemu_guest_free_page_hint:
>              - add comments for this function;
>              - check the !block case;
>              - check "offset > block->used_length" before proceed;
>              - assign used_len inside the for{} body;
>              - update the dirty bitmap and dirty page counter under the
>                bitmap mutex;
>          - ram_state_reset:
>              - rs->free_page_support: && with use "migrate_postcopy"
>                instead of migration_in_postcopy;
>              - clear the ram_bulk_stage flag if free_page_support is true;
>      2) balloon:
>           - add the usage documentation of balloon_free_page_start and
>             balloon_free_page_stop in code;
>           - the optimization thread is named "balloon_fpo" to meet the
>             requirement of "less than 14 characters";
>           - virtio_balloon_poll_free_page_hints:
>               - run on condition when runstate_is_running() is true;
>               - add a qemu spin lock to synchronize accesses to the free
>                 page reporting related fields shared among the migration
>                 thread and the optimization thread;
>            - virtio_balloon_free_page_start: just return if
>              runstate_is_running is false;
>            - virtio_balloon_free_page_stop: access to the free page
>              reporting related fields under a qemu spin lock;
>            - virtio_balloon_device_unrealize/reset: call
>              virtio_balloon_free_page_stop is the free page hint feature is
>              used;
>            - virtio_balloon_set_status: call irtio_balloon_free_page_stop
>              in case the guest is stopped by qmp when the optimization is
>              running;
> v3->v4:
>      1) bitmap: add a new API to count 1s starting from an offset of a
>         bitmap
>      2) migration:
>          - qemu_guest_free_page_hint: calculate
>            ram_state->migration_dirty_pages by counting how many bits of
>            free pages are truely cleared. If some of the bits were
>            already 0, they shouldn't be deducted by
>            ram_state->migration_dirty_pages. This wasn't needed for
>            previous versions since we optimized bulk stage only,
>            where all bits are guaranteed to be set. It's needed now
>            because we extened the usage of this optimizaton to all stages
>            except the last stop&copy stage. From 2nd stage onward, there
>            are possibilities that some bits of free pages are already 0.
>       3) virtio-balloon:
>           - virtio_balloon_free_page_report_status: introduce a new status,
>             FREE_PAGE_REPORT_S_EXIT. This status indicates that the
>             optimization thread has exited. FREE_PAGE_REPORT_S_STOP means
>             the reporting is stopped, but the optimization thread still needs
>             to be joined by the migration thread.
> v2->v3:
>      1) virtio-balloon
>          - virtio_balloon_free_page_start: poll the hints using a new
>            thread;
>          - use cmd id between [0x80000000, UINT_MAX];
>          - virtio_balloon_poll_free_page_hints:
>              - stop the optimization only when it has started;
>              - don't skip free pages when !poison_val;
>          - add poison_val to vmsd to migrate;
>          - virtio_balloon_get_features: add the F_PAGE_POISON feature when
>            host has F_FREE_PAGE_HINT;
>          - remove the timer patch which is not needed now.
>      2) migration
>         - new api, qemu_guest_free_page_hint;
>         - rs->free_page_support set only in the precopy case;
>         - use the new balloon APIs.
> v1->v2:
>      1) virtio-balloon
>          - use subsections to save free_page_report_cmd_id;
>          - poll the free page vq after sending a cmd id to the driver;
>          - change the free page vq size to VIRTQUEUE_MAX_SIZE;
>          - virtio_balloon_poll_free_page_hints: handle the corner case
>            that the free page block reported from the driver may cross
>            the RAMBlock boundary.
>      2) migration/ram.c
>          - use balloon_free_page_poll to start the optimization
>
>
> Wei Wang (5):
>    bitmap: bitmap_count_one_with_offset
>    migration: use bitmap_mutex in migration_bitmap_clear_dirty
>    migration: API to clear bits of guest free pages from the dirty bitmap
>    virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
>    migration: use the free page hint feature from balloon
>
>   balloon.c                                       |  58 +++++-
>   hw/virtio/virtio-balloon.c                      | 241 ++++++++++++++++++++++--
>   include/hw/virtio/virtio-balloon.h              |  27 ++-
>   include/migration/misc.h                        |   2 +
>   include/qemu/bitmap.h                           |  13 ++
>   include/standard-headers/linux/virtio_balloon.h |   7 +
>   include/sysemu/balloon.h                        |  15 +-
>   migration/ram.c                                 |  73 ++++++-
>   8 files changed, 406 insertions(+), 30 deletions(-)

Nice optimization, for the first stage of  current migration method, we need to migrate all the pages of
VM to destination,  with this capability, we can reduce lots of unnecessary pages migrating.

Just a small piece of advice, it is better to split the fourth patch into small ones, to make it more easy
for reviewing. Besides, should we make this capability an optional one, just like other migration capabilities do ?

Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support

Posted by Michael S. Tsirkin 7 years, 8 months ago

On Tue, May 29, 2018 at 11:00:21PM +0800, Hailiang Zhang wrote:
> On 2018/4/24 14:13, Wei Wang wrote:
> > This is the deivce part implementation to add a new feature,
> > VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
> > receives the guest free page hints from the driver and clears the
> > corresponding bits in the dirty bitmap, so that those free pages are
> > not transferred by the migration thread to the destination.
> > 
> > - Test Environment
> >      Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> >      Guest: 8G RAM, 4 vCPU
> >      Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
> > 
> > - Test Results
> >      - Idle Guest Live Migration Time (results are averaged over 10 runs):
> >          - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
> >      - Guest with Linux Compilation Workload (make bzImage -j4):
> >          - Live Migration Time (average)
> >            Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
> >          - Linux Compilation Time
> >            Optimization v.s. Legacy = 4min56s v.s. 5min3s
> >            --> no obvious difference
> > 
> > - Source Code
> >      - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
> >      - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
> > 
> > ChangeLog:
> > v6->v7:
> >        virtio-balloon/virtio_balloo_poll_free_page_hints:
> >            - add virtio_notify() at the end to notify the driver that
> >              the optimization is done, which indicates that the entries
> >              have all been put back to the vq and ready to detach them.
> > v5->v6:
> >        virtio-balloon: use iothread to get free page hint
> > v4->v5:
> >      1) migration:
> >          - bitmap_clear_dirty: update the dirty bitmap and dirty page
> >            count under the bitmap mutex as what other functions are doing;
> >          - qemu_guest_free_page_hint:
> >              - add comments for this function;
> >              - check the !block case;
> >              - check "offset > block->used_length" before proceed;
> >              - assign used_len inside the for{} body;
> >              - update the dirty bitmap and dirty page counter under the
> >                bitmap mutex;
> >          - ram_state_reset:
> >              - rs->free_page_support: && with use "migrate_postcopy"
> >                instead of migration_in_postcopy;
> >              - clear the ram_bulk_stage flag if free_page_support is true;
> >      2) balloon:
> >           - add the usage documentation of balloon_free_page_start and
> >             balloon_free_page_stop in code;
> >           - the optimization thread is named "balloon_fpo" to meet the
> >             requirement of "less than 14 characters";
> >           - virtio_balloon_poll_free_page_hints:
> >               - run on condition when runstate_is_running() is true;
> >               - add a qemu spin lock to synchronize accesses to the free
> >                 page reporting related fields shared among the migration
> >                 thread and the optimization thread;
> >            - virtio_balloon_free_page_start: just return if
> >              runstate_is_running is false;
> >            - virtio_balloon_free_page_stop: access to the free page
> >              reporting related fields under a qemu spin lock;
> >            - virtio_balloon_device_unrealize/reset: call
> >              virtio_balloon_free_page_stop is the free page hint feature is
> >              used;
> >            - virtio_balloon_set_status: call irtio_balloon_free_page_stop
> >              in case the guest is stopped by qmp when the optimization is
> >              running;
> > v3->v4:
> >      1) bitmap: add a new API to count 1s starting from an offset of a
> >         bitmap
> >      2) migration:
> >          - qemu_guest_free_page_hint: calculate
> >            ram_state->migration_dirty_pages by counting how many bits of
> >            free pages are truely cleared. If some of the bits were
> >            already 0, they shouldn't be deducted by
> >            ram_state->migration_dirty_pages. This wasn't needed for
> >            previous versions since we optimized bulk stage only,
> >            where all bits are guaranteed to be set. It's needed now
> >            because we extened the usage of this optimizaton to all stages
> >            except the last stop&copy stage. From 2nd stage onward, there
> >            are possibilities that some bits of free pages are already 0.
> >       3) virtio-balloon:
> >           - virtio_balloon_free_page_report_status: introduce a new status,
> >             FREE_PAGE_REPORT_S_EXIT. This status indicates that the
> >             optimization thread has exited. FREE_PAGE_REPORT_S_STOP means
> >             the reporting is stopped, but the optimization thread still needs
> >             to be joined by the migration thread.
> > v2->v3:
> >      1) virtio-balloon
> >          - virtio_balloon_free_page_start: poll the hints using a new
> >            thread;
> >          - use cmd id between [0x80000000, UINT_MAX];
> >          - virtio_balloon_poll_free_page_hints:
> >              - stop the optimization only when it has started;
> >              - don't skip free pages when !poison_val;
> >          - add poison_val to vmsd to migrate;
> >          - virtio_balloon_get_features: add the F_PAGE_POISON feature when
> >            host has F_FREE_PAGE_HINT;
> >          - remove the timer patch which is not needed now.
> >      2) migration
> >         - new api, qemu_guest_free_page_hint;
> >         - rs->free_page_support set only in the precopy case;
> >         - use the new balloon APIs.
> > v1->v2:
> >      1) virtio-balloon
> >          - use subsections to save free_page_report_cmd_id;
> >          - poll the free page vq after sending a cmd id to the driver;
> >          - change the free page vq size to VIRTQUEUE_MAX_SIZE;
> >          - virtio_balloon_poll_free_page_hints: handle the corner case
> >            that the free page block reported from the driver may cross
> >            the RAMBlock boundary.
> >      2) migration/ram.c
> >          - use balloon_free_page_poll to start the optimization
> > 
> > 
> > Wei Wang (5):
> >    bitmap: bitmap_count_one_with_offset
> >    migration: use bitmap_mutex in migration_bitmap_clear_dirty
> >    migration: API to clear bits of guest free pages from the dirty bitmap
> >    virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
> >    migration: use the free page hint feature from balloon
> > 
> >   balloon.c                                       |  58 +++++-
> >   hw/virtio/virtio-balloon.c                      | 241 ++++++++++++++++++++++--
> >   include/hw/virtio/virtio-balloon.h              |  27 ++-
> >   include/migration/misc.h                        |   2 +
> >   include/qemu/bitmap.h                           |  13 ++
> >   include/standard-headers/linux/virtio_balloon.h |   7 +
> >   include/sysemu/balloon.h                        |  15 +-
> >   migration/ram.c                                 |  73 ++++++-
> >   8 files changed, 406 insertions(+), 30 deletions(-)
> 
> Nice optimization, for the first stage of  current migration method, we need to migrate all the pages of
> VM to destination,  with this capability, we can reduce lots of unnecessary pages migrating.
> 
> Just a small piece of advice, it is better to split the fourth patch into small ones, to make it more easy
> for reviewing. Besides, should we make this capability an optional one, just like other migration capabilities do ?

That's already the case, one has to enable it in the balloon, and set
the iothread.

-- 
MST

Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support

Posted by Wei Wang 7 years, 8 months ago

On 04/24/2018 02:13 PM, Wei Wang wrote:
> This is the deivce part implementation to add a new feature,
> VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
> receives the guest free page hints from the driver and clears the
> corresponding bits in the dirty bitmap, so that those free pages are
> not transferred by the migration thread to the destination.
>
> - Test Environment
>      Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
>      Guest: 8G RAM, 4 vCPU
>      Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
>
> - Test Results
>      - Idle Guest Live Migration Time (results are averaged over 10 runs):
>          - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
>      - Guest with Linux Compilation Workload (make bzImage -j4):
>          - Live Migration Time (average)
>            Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
>          - Linux Compilation Time
>            Optimization v.s. Legacy = 4min56s v.s. 5min3s
>            --> no obvious difference
>
> - Source Code
>      - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
>      - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
>
> ChangeLog:
> v6->v7:
>        virtio-balloon/virtio_balloo_poll_free_page_hints:
>            - add virtio_notify() at the end to notify the driver that
>              the optimization is done, which indicates that the entries
>              have all been put back to the vq and ready to detach them.
> v5->v6:
>        virtio-balloon: use iothread to get free page hint
> v4->v5:
>      1) migration:
>          - bitmap_clear_dirty: update the dirty bitmap and dirty page
>            count under the bitmap mutex as what other functions are doing;
>          - qemu_guest_free_page_hint:
>              - add comments for this function;
>              - check the !block case;
>              - check "offset > block->used_length" before proceed;
>              - assign used_len inside the for{} body;
>              - update the dirty bitmap and dirty page counter under the
>                bitmap mutex;
>          - ram_state_reset:
>              - rs->free_page_support: && with use "migrate_postcopy"
>                instead of migration_in_postcopy;
>              - clear the ram_bulk_stage flag if free_page_support is true;
>      2) balloon:
>           - add the usage documentation of balloon_free_page_start and
>             balloon_free_page_stop in code;
>           - the optimization thread is named "balloon_fpo" to meet the
>             requirement of "less than 14 characters";
>           - virtio_balloon_poll_free_page_hints:
>               - run on condition when runstate_is_running() is true;
>               - add a qemu spin lock to synchronize accesses to the free
>                 page reporting related fields shared among the migration
>                 thread and the optimization thread;
>            - virtio_balloon_free_page_start: just return if
>              runstate_is_running is false;
>            - virtio_balloon_free_page_stop: access to the free page
>              reporting related fields under a qemu spin lock;
>            - virtio_balloon_device_unrealize/reset: call
>              virtio_balloon_free_page_stop is the free page hint feature is
>              used;
>            - virtio_balloon_set_status: call irtio_balloon_free_page_stop
>              in case the guest is stopped by qmp when the optimization is
>              running;
> v3->v4:
>      1) bitmap: add a new API to count 1s starting from an offset of a
>         bitmap
>      2) migration:
>          - qemu_guest_free_page_hint: calculate
>            ram_state->migration_dirty_pages by counting how many bits of
>            free pages are truely cleared. If some of the bits were
>            already 0, they shouldn't be deducted by
>            ram_state->migration_dirty_pages. This wasn't needed for
>            previous versions since we optimized bulk stage only,
>            where all bits are guaranteed to be set. It's needed now
>            because we extened the usage of this optimizaton to all stages
>            except the last stop&copy stage. From 2nd stage onward, there
>            are possibilities that some bits of free pages are already 0.
>       3) virtio-balloon:
>           - virtio_balloon_free_page_report_status: introduce a new status,
>             FREE_PAGE_REPORT_S_EXIT. This status indicates that the
>             optimization thread has exited. FREE_PAGE_REPORT_S_STOP means
>             the reporting is stopped, but the optimization thread still needs
>             to be joined by the migration thread.
> v2->v3:
>      1) virtio-balloon
>          - virtio_balloon_free_page_start: poll the hints using a new
>            thread;
>          - use cmd id between [0x80000000, UINT_MAX];
>          - virtio_balloon_poll_free_page_hints:
>              - stop the optimization only when it has started;
>              - don't skip free pages when !poison_val;
>          - add poison_val to vmsd to migrate;
>          - virtio_balloon_get_features: add the F_PAGE_POISON feature when
>            host has F_FREE_PAGE_HINT;
>          - remove the timer patch which is not needed now.
>      2) migration
>         - new api, qemu_guest_free_page_hint;
>         - rs->free_page_support set only in the precopy case;
>         - use the new balloon APIs.
> v1->v2:
>      1) virtio-balloon
>          - use subsections to save free_page_report_cmd_id;
>          - poll the free page vq after sending a cmd id to the driver;
>          - change the free page vq size to VIRTQUEUE_MAX_SIZE;
>          - virtio_balloon_poll_free_page_hints: handle the corner case
>            that the free page block reported from the driver may cross
>            the RAMBlock boundary.
>      2) migration/ram.c
>          - use balloon_free_page_poll to start the optimization
>
>
> Wei Wang (5):
>    bitmap: bitmap_count_one_with_offset
>    migration: use bitmap_mutex in migration_bitmap_clear_dirty
>    migration: API to clear bits of guest free pages from the dirty bitmap
>    virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
>    migration: use the free page hint feature from balloon
>
>   balloon.c                                       |  58 +++++-
>   hw/virtio/virtio-balloon.c                      | 241 ++++++++++++++++++++++--
>   include/hw/virtio/virtio-balloon.h              |  27 ++-
>   include/migration/misc.h                        |   2 +
>   include/qemu/bitmap.h                           |  13 ++
>   include/standard-headers/linux/virtio_balloon.h |   7 +
>   include/sysemu/balloon.h                        |  15 +-
>   migration/ram.c                                 |  73 ++++++-
>   8 files changed, 406 insertions(+), 30 deletions(-)
>

Ping for comments, thanks.

Best,
Wei