Starting from pss->page, ram_save_host_page() will check every page
and send the dirty pages up to the end of the current host page or
the boundary of used_length of the block. If the host page size is
a huge page, the step "check" will take a lot of time.
This will improve performance to use migration_bitmap_find_dirty().
Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
---
migration/ram.c | 39 +++++++++++++++++++--------------------
1 file changed, 19 insertions(+), 20 deletions(-)
diff --git a/migration/ram.c b/migration/ram.c
index 9fc5b2997c..28215aefe4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1991,6 +1991,8 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
int pages = 0;
size_t pagesize_bits =
qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
+ unsigned long hostpage_boundary =
+ QEMU_ALIGN_UP(pss->page + 1, pagesize_bits);
unsigned long start_page = pss->page;
int res;
@@ -2003,30 +2005,27 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
int pages_this_iteration = 0;
/* Check if the page is dirty and send it if it is */
- if (!migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
- pss->page++;
- continue;
- }
-
- pages_this_iteration = ram_save_target_page(rs, pss, last_stage);
- if (pages_this_iteration < 0) {
- return pages_this_iteration;
- }
+ if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
+ pages_this_iteration = ram_save_target_page(rs, pss, last_stage);
+ if (pages_this_iteration < 0) {
+ return pages_this_iteration;
+ }
- pages += pages_this_iteration;
- pss->page++;
- /*
- * Allow rate limiting to happen in the middle of huge pages if
- * something is sent in the current iteration.
- */
- if (pagesize_bits > 1 && pages_this_iteration > 0) {
- migration_rate_limit();
+ pages += pages_this_iteration;
+ /*
+ * Allow rate limiting to happen in the middle of huge pages if
+ * something is sent in the current iteration.
+ */
+ if (pagesize_bits > 1 && pages_this_iteration > 0) {
+ migration_rate_limit();
+ }
}
- } while ((pss->page & (pagesize_bits - 1)) &&
+ pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
+ } while ((pss->page < hostpage_boundary) &&
offset_in_ramblock(pss->block,
((ram_addr_t)pss->page) << TARGET_PAGE_BITS));
- /* The offset we leave with is the last one we looked at */
- pss->page--;
+ /* The offset we leave with is the min boundary of host page and block */
+ pss->page = MIN(pss->page, hostpage_boundary) - 1;
res = ram_save_release_protection(rs, pss, start_page);
return (res < 0 ? res : pages);
--
2.23.0
On Fri, Mar 05, 2021 at 03:50:35PM +0800, Kunkun Jiang wrote:
> Starting from pss->page, ram_save_host_page() will check every page
> and send the dirty pages up to the end of the current host page or
> the boundary of used_length of the block. If the host page size is
> a huge page, the step "check" will take a lot of time.
>
> This will improve performance to use migration_bitmap_find_dirty().
Is there any measurement done?
This looks like an optimization, but to me it seems to have changed a lot
context that it doesn't need to... Do you think it'll also work to just look up
dirty again and update pss->page properly if migration_bitmap_clear_dirty()
returned zero?
Thanks,
>
> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
> ---
> migration/ram.c | 39 +++++++++++++++++++--------------------
> 1 file changed, 19 insertions(+), 20 deletions(-)
>
> diff --git a/migration/ram.c b/migration/ram.c
> index 9fc5b2997c..28215aefe4 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1991,6 +1991,8 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
> int pages = 0;
> size_t pagesize_bits =
> qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
> + unsigned long hostpage_boundary =
> + QEMU_ALIGN_UP(pss->page + 1, pagesize_bits);
> unsigned long start_page = pss->page;
> int res;
>
> @@ -2003,30 +2005,27 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
> int pages_this_iteration = 0;
>
> /* Check if the page is dirty and send it if it is */
> - if (!migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
> - pss->page++;
> - continue;
> - }
> -
> - pages_this_iteration = ram_save_target_page(rs, pss, last_stage);
> - if (pages_this_iteration < 0) {
> - return pages_this_iteration;
> - }
> + if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
> + pages_this_iteration = ram_save_target_page(rs, pss, last_stage);
> + if (pages_this_iteration < 0) {
> + return pages_this_iteration;
> + }
>
> - pages += pages_this_iteration;
> - pss->page++;
> - /*
> - * Allow rate limiting to happen in the middle of huge pages if
> - * something is sent in the current iteration.
> - */
> - if (pagesize_bits > 1 && pages_this_iteration > 0) {
> - migration_rate_limit();
> + pages += pages_this_iteration;
> + /*
> + * Allow rate limiting to happen in the middle of huge pages if
> + * something is sent in the current iteration.
> + */
> + if (pagesize_bits > 1 && pages_this_iteration > 0) {
> + migration_rate_limit();
> + }
> }
> - } while ((pss->page & (pagesize_bits - 1)) &&
> + pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
> + } while ((pss->page < hostpage_boundary) &&
> offset_in_ramblock(pss->block,
> ((ram_addr_t)pss->page) << TARGET_PAGE_BITS));
> - /* The offset we leave with is the last one we looked at */
> - pss->page--;
> + /* The offset we leave with is the min boundary of host page and block */
> + pss->page = MIN(pss->page, hostpage_boundary) - 1;
>
> res = ram_save_release_protection(rs, pss, start_page);
> return (res < 0 ? res : pages);
> --
> 2.23.0
>
--
Peter Xu
Hi,
On 2021/3/5 22:30, Peter Xu wrote:
> On Fri, Mar 05, 2021 at 03:50:35PM +0800, Kunkun Jiang wrote:
>> Starting from pss->page, ram_save_host_page() will check every page
>> and send the dirty pages up to the end of the current host page or
>> the boundary of used_length of the block. If the host page size is
>> a huge page, the step "check" will take a lot of time.
>>
>> This will improve performance to use migration_bitmap_find_dirty().
> Is there any measurement done?
I tested it on Kunpeng 920. VM params: 1U 4G( page size 1G).
The time of ram_save_host_page() in the last round of ram saving:
before optimize: 9250us after optimize: 34us
> This looks like an optimization, but to me it seems to have changed a lot
> context that it doesn't need to... Do you think it'll also work to just look up
> dirty again and update pss->page properly if migration_bitmap_clear_dirty()
> returned zero?
>
> Thanks,
This just inverted the body of the loop, suggested by @David Edmondson.
Here is the v2[1]. Do you mean to change it like this?
[1]:
http://patchwork.ozlabs.org/project/qemu-devel/patch/20210301082132.1107-4-jiangkunkun@huawei.com/
Thanks,
Kunkun Jiang
>> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
>> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
>> ---
>> migration/ram.c | 39 +++++++++++++++++++--------------------
>> 1 file changed, 19 insertions(+), 20 deletions(-)
>>
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 9fc5b2997c..28215aefe4 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1991,6 +1991,8 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
>> int pages = 0;
>> size_t pagesize_bits =
>> qemu_ram_pagesize(pss->block) >> TARGET_PAGE_BITS;
>> + unsigned long hostpage_boundary =
>> + QEMU_ALIGN_UP(pss->page + 1, pagesize_bits);
>> unsigned long start_page = pss->page;
>> int res;
>>
>> @@ -2003,30 +2005,27 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss,
>> int pages_this_iteration = 0;
>>
>> /* Check if the page is dirty and send it if it is */
>> - if (!migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
>> - pss->page++;
>> - continue;
>> - }
>> -
>> - pages_this_iteration = ram_save_target_page(rs, pss, last_stage);
>> - if (pages_this_iteration < 0) {
>> - return pages_this_iteration;
>> - }
>> + if (migration_bitmap_clear_dirty(rs, pss->block, pss->page)) {
>> + pages_this_iteration = ram_save_target_page(rs, pss, last_stage);
>> + if (pages_this_iteration < 0) {
>> + return pages_this_iteration;
>> + }
>>
>> - pages += pages_this_iteration;
>> - pss->page++;
>> - /*
>> - * Allow rate limiting to happen in the middle of huge pages if
>> - * something is sent in the current iteration.
>> - */
>> - if (pagesize_bits > 1 && pages_this_iteration > 0) {
>> - migration_rate_limit();
>> + pages += pages_this_iteration;
>> + /*
>> + * Allow rate limiting to happen in the middle of huge pages if
>> + * something is sent in the current iteration.
>> + */
>> + if (pagesize_bits > 1 && pages_this_iteration > 0) {
>> + migration_rate_limit();
>> + }
>> }
>> - } while ((pss->page & (pagesize_bits - 1)) &&
>> + pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
>> + } while ((pss->page < hostpage_boundary) &&
>> offset_in_ramblock(pss->block,
>> ((ram_addr_t)pss->page) << TARGET_PAGE_BITS));
>> - /* The offset we leave with is the last one we looked at */
>> - pss->page--;
>> + /* The offset we leave with is the min boundary of host page and block */
>> + pss->page = MIN(pss->page, hostpage_boundary) - 1;
>>
>> res = ram_save_release_protection(rs, pss, start_page);
>> return (res < 0 ? res : pages);
>> --
>> 2.23.0
>>
On Mon, Mar 08, 2021 at 09:58:02PM +0800, Kunkun Jiang wrote: > Hi, > > On 2021/3/5 22:30, Peter Xu wrote: > > On Fri, Mar 05, 2021 at 03:50:35PM +0800, Kunkun Jiang wrote: > > > Starting from pss->page, ram_save_host_page() will check every page > > > and send the dirty pages up to the end of the current host page or > > > the boundary of used_length of the block. If the host page size is > > > a huge page, the step "check" will take a lot of time. > > > > > > This will improve performance to use migration_bitmap_find_dirty(). > > Is there any measurement done? > I tested it on Kunpeng 920. VM params: 1U 4G( page size 1G). > The time of ram_save_host_page() in the last round of ram saving: > before optimize: 9250us after optimize: 34us Looks like an idle VM, but still this is a great improvement. Would you mind add this into the commit message too? > > This looks like an optimization, but to me it seems to have changed a lot > > context that it doesn't need to... Do you think it'll also work to just look up > > dirty again and update pss->page properly if migration_bitmap_clear_dirty() > > returned zero? > > > > Thanks, > This just inverted the body of the loop, suggested by @David Edmondson. > Here is the v2[1]. Do you mean to change it like this? > > [1]: http://patchwork.ozlabs.org/project/qemu-devel/patch/20210301082132.1107-4-jiangkunkun@huawei.com/ I see, then it's okay - But indeed I still prefer your previous version. :) Thanks, -- Peter Xu
Hi, On 2021/3/9 5:36, Peter Xu wrote: > On Mon, Mar 08, 2021 at 09:58:02PM +0800, Kunkun Jiang wrote: >> Hi, >> >> On 2021/3/5 22:30, Peter Xu wrote: >>> On Fri, Mar 05, 2021 at 03:50:35PM +0800, Kunkun Jiang wrote: >>>> Starting from pss->page, ram_save_host_page() will check every page >>>> and send the dirty pages up to the end of the current host page or >>>> the boundary of used_length of the block. If the host page size is >>>> a huge page, the step "check" will take a lot of time. >>>> >>>> This will improve performance to use migration_bitmap_find_dirty(). >>> Is there any measurement done? >> I tested it on Kunpeng 920. VM params: 1U 4G( page size 1G). >> The time of ram_save_host_page() in the last round of ram saving: >> before optimize: 9250us after optimize: 34us > Looks like an idle VM, but still this is a great improvement. Would you mind > add this into the commit message too? Ok, I will add it in the next version.😉 >>> This looks like an optimization, but to me it seems to have changed a lot >>> context that it doesn't need to... Do you think it'll also work to just look up >>> dirty again and update pss->page properly if migration_bitmap_clear_dirty() >>> returned zero? >>> >>> Thanks, >> This just inverted the body of the loop, suggested by @David Edmondson. >> Here is the v2[1]. Do you mean to change it like this? >> >> [1]: http://patchwork.ozlabs.org/project/qemu-devel/patch/20210301082132.1107-4-jiangkunkun@huawei.com/ > I see, then it's okay - But indeed I still prefer your previous version. :) > > Thanks, > Both versions are fine to me. This version may make the final code slightly cleaner, I think. Thanks, Kunkun Jiang
© 2016 - 2026 Red Hat, Inc.