[v4] RE: [PATCH v4 0/4] Implement using Intel QAT to offload ZLIB

RE: [PATCH v4 0/4] Implement using Intel QAT to offload ZLIB

Posted by Liu, Yuan1 4 months, 2 weeks ago

> -----Original Message-----
> From: Peter Xu <peterx@redhat.com>
> Sent: Wednesday, July 10, 2024 11:19 PM
> To: Liu, Yuan1 <yuan1.liu@intel.com>
> Cc: Wang, Yichen <yichen.wang@bytedance.com>; Paolo Bonzini
> <pbonzini@redhat.com>; Daniel P. Berrangé <berrange@redhat.com>; Eduardo
> Habkost <eduardo@habkost.net>; Marc-André Lureau
> <marcandre.lureau@redhat.com>; Thomas Huth <thuth@redhat.com>; Philippe
> Mathieu-Daudé <philmd@linaro.org>; Fabiano Rosas <farosas@suse.de>; Eric
> Blake <eblake@redhat.com>; Markus Armbruster <armbru@redhat.com>; Laurent
> Vivier <lvivier@redhat.com>; qemu-devel@nongnu.org; Hao Xiang
> <hao.xiang@linux.dev>; Zou, Nanhai <nanhai.zou@intel.com>; Ho-Ren (Jack)
> Chuang <horenchuang@bytedance.com>
> Subject: Re: [PATCH v4 0/4] Implement using Intel QAT to offload ZLIB
> 
> On Wed, Jul 10, 2024 at 01:55:23PM +0000, Liu, Yuan1 wrote:
> 
> [...]
> 
> > migrate_set_parameter max-bandwidth 1250M
> > |-----------|--------|---------|----------|----------|------|------|
> > |8 Channels |Total   |down     |throughput|pages per | send | recv |
> > |           |time(ms)|time(ms) |(mbps)    |second    | cpu %| cpu% |
> > |-----------|--------|---------|----------|----------|------|------|
> > |qatzip     |   16630|       28|     10467|   2940235|   160|   360|
> > |-----------|--------|---------|----------|----------|------|------|
> > |zstd       |   20165|       24|      8579|   2391465|   810|   340|
> > |-----------|--------|---------|----------|----------|------|------|
> > |none       |   46063|       40|     10848|    330240|    45|    85|
> > |-----------|--------|---------|----------|----------|------|------|
> >
> > QATzip's dirty page processing throughput is much higher than that no
> compression.
> > In this test, the vCPUs are in idle state, so the migration can be
> successful even
> > without compression.
> 
> Thanks!  Maybe good material to be put into the docs/ too, if Yichen's
> going to pick up your doc patch when repost.

Sure, Yichen will add my doc patch, if he doesn't add this part in 
the next version, I will add it later.

> [...]
> 
> > I don’t have much experience with postcopy, here are some of my thoughts
> > 1. For write-intensive VMs, this solution can improve the migration
> success,
> >    because in a limited bandwidth network scenario, the dirty page
> processing
> >    throughput will be significantly reduced for no compression, the
> previous
> >    data includes this(pages_per_second), it means that in the no
> compression
> >    precopy, the dirty pages generated by the workload are greater than
> the
> >    migration processing, resulting in migration failure.
> 
> Yes.
> 
> >
> > 2. If the VM is read-intensive or has low vCPU utilization (for example,
> my
> >    current test scenario is that the vCPUs are all idle). I think no
> compression +
> >    precopy + postcopy also cannot improve the migration performance, and
> may also
> >    cause timeout failure due to long migration time, same with no
> compression precopy.
> 
> I don't think postcopy will trigger timeout failures - postcopy should use
> constant time to complete a migration, that is guest memsize / bw.

Yes, the migration total time is predictable, failure due to timeout is incorrect, 
migration taking a long time may be more accurate.

> The challenge is normally on the delay of page requests higher than
> precopy, but in this case it might not be a big deal. And I wonder if on
> 100G*2 cards it can also perform pretty well, as the delay might be
> minimal
> even if bandwidth is throttled.

I got your point, I don't have much experience in this area.
So you mean to reserve a small amount of bandwidth on a NIC for postcopy 
migration, and compare the migration performance with and without traffic
on the NIC? Will data plane traffic affect page request delays in postcopy?

> >
> > 3. In my opinion, the postcopy is a good solution in this scenario(low
> network bandwidth,
> >    VM is not critical), because even if compression is turned on, the
> migration may still
> >    fail(page_per_second may still less than the new dirty pages), and it
> is hard to predict
> >    whether VM memory is compression-friendly.
> 
> Yes.
> 
> Thanks,
> 
> --
> Peter Xu

Re: [PATCH v4 0/4] Implement using Intel QAT to offload ZLIB

Posted by Peter Xu 4 months, 2 weeks ago

On Wed, Jul 10, 2024 at 03:39:43PM +0000, Liu, Yuan1 wrote:
> > I don't think postcopy will trigger timeout failures - postcopy should use
> > constant time to complete a migration, that is guest memsize / bw.
> 
> Yes, the migration total time is predictable, failure due to timeout is incorrect, 
> migration taking a long time may be more accurate.

It shouldn't: postcopy is run always together with precopy, so if you start
postcopy after one round of precopy, the total migration time should
alwways be smaller than if you run the precopy two rounds.

With postcopy after that migration completes, but for precopy two rounds of
migration will follow with a dirty sync which may say "there's unforunately
more dirty pages, let's move on with the 3rd round and more".

> 
> > The challenge is normally on the delay of page requests higher than
> > precopy, but in this case it might not be a big deal. And I wonder if on
> > 100G*2 cards it can also perform pretty well, as the delay might be
> > minimal
> > even if bandwidth is throttled.
> 
> I got your point, I don't have much experience in this area.
> So you mean to reserve a small amount of bandwidth on a NIC for postcopy 
> migration, and compare the migration performance with and without traffic
> on the NIC? Will data plane traffic affect page request delays in postcopy?

I'm not sure what's the "data plane" you're describing here, but logically
VMs should be migrated using mgmt networks, and should be somehow separate
from IOs within the VMs.

I'm not really asking for another test, sorry to cause confusions; it's
only about some pure discussions.  I just feel like postcopy wasn't really
seriously considered even for many valid cases, some of them postcopy can
play pretty well even without any modern hardwares requested.  There's no
need to prove which is better for this series.

Thanks,

-- 
Peter Xu