[v1] RE: [PATCH v1 0/1] COLO: migrate dirty ram pages before colo checkpoint

RE: [PATCH v1 0/1] COLO: migrate dirty ram pages before colo checkpoint
Posted by Zhanghailiang 5 years, 6 months ago
Hi Lukas Straub & Derek,

Sorry for the late reply, too busy these days ;)

> -----Original Message-----
> From: Lukas Straub [mailto:lukasstraub2@web.de]
> Sent: Friday, July 31, 2020 3:52 PM
> To: Derek Su <dereksu@qnap.com>
> Cc: qemu-devel@nongnu.org; Zhanghailiang
> <zhang.zhanghailiang@huawei.com>; chyang@qnap.com;
> quintela@redhat.com; dgilbert@redhat.com; ctcheng@qnap.com;
> jwsu1986@gmail.com
> Subject: Re: [PATCH v1 0/1] COLO: migrate dirty ram pages before colo
> checkpoint
> 
> On Sun, 21 Jun 2020 10:10:03 +0800
> Derek Su <dereksu@qnap.com> wrote:
> 
> > This series is to reduce the guest's downtime during colo checkpoint
> > by migrating dirty ram pages as many as possible before colo checkpoint.
> >
> > If the iteration count reaches COLO_RAM_MIGRATE_ITERATION_MAX or
> ram
> > pending size is lower than 'x-colo-migrate-ram-threshold', stop the
> > ram migration and do colo checkpoint.
> >
> > Test environment:
> > The both primary VM and secondary VM has 1GiB ram and 10GbE NIC for
> FT
> > traffic.
> > One fio buffer write job runs on the guest.
> > The result shows the total primary VM downtime is decreased by ~40%.
> >
> > Please help to review it and suggestions are welcomed.
> > Thanks.
> 
> Hello Derek,
> Sorry for the late reply.
> I think this is not a good idea, because it unnecessarily introduces a delay
> between checkpoint request and the checkpoint itself and thus impairs
> network bound workloads due to increased network latency. Workloads that
> are independent from network don't cause many checkpoints anyway, so it
> doesn't help there either.
> 

Agreed, though it seems to reduce VM's downtime while do checkpoint, but
It doesn't help to reduce network latency, because the network packages which are
Different between SVM and PVM caused this checkpoint request, it will be blocked
Until finishing checkpoint process.


> Hailang did have a patch to migrate ram between checkpoints, which should
> help all workloads, but it wasn't merged back then. I think you can pick it up
> again, rebase and address David's and Eric's comments:
> https://lore.kernel.org/qemu-devel/20200217012049.22988-3-zhang.zhang
> hailiang@huawei.com/T/#u
>  

The second one is not merged, which can help reduce the downtime.

> Hailang, are you ok with that?
> 

Yes. @Derek, please feel free to pick it up if you would like to ;)


Thanks,
Hailiang

> Regards,
> Lukas Straub