[v5] RE: [PATCH v5 0/7] Live Migration With IAA

RE: [PATCH v5 0/7] Live Migration With IAA
Posted by Liu, Yuan1 1 month ago
> -----Original Message-----
> From: Peter Xu <peterx@redhat.com>
> Sent: Thursday, March 28, 2024 11:22 PM
> To: Liu, Yuan1 <yuan1.liu@intel.com>
> Cc: farosas@suse.de; qemu-devel@nongnu.org; hao.xiang@bytedance.com;
> bryan.zhang@bytedance.com; Zou, Nanhai <nanhai.zou@intel.com>
> Subject: Re: [PATCH v5 0/7] Live Migration With IAA
> 
> On Thu, Mar 28, 2024 at 03:02:30AM +0000, Liu, Yuan1 wrote:
> > Yes, I will support software fallback to ensure CI testing and users can
> > still use qpl compression without IAA hardware.
> >
> > Although the qpl software solution will have better performance than
> zlib,
> > I still don't think it has a greater advantage than zstd. I don't think
> there
> > is a need to add a migration option to configure the qpl software or
> hardware path.
> > So I will still only use QPL as an independent compression in the next
> version, and
> > no other migration options are needed.
> 
> That should be fine.
> 
> >
> > I will also add a guide to qpl-compression.rst about IAA permission
> issues and how to
> > determine whether the hardware path is available.
> 
> OK.
> 
> [...]
> 
> > > > Yes, I use iperf3 to check the bandwidth for one core, the bandwith
> is
> > > 60Gbps.
> > > > [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> > > > [  5]   0.00-1.00   sec  7.00 GBytes  60.1 Gbits/sec    0   2.87
> MBytes
> > > > [  5]   1.00-2.00   sec  7.05 GBytes  60.6 Gbits/sec    0   2.87
> Mbytes
> > > >
> > > > And in the live migration test, a multifd thread's CPU utilization
> is
> > > almost 100%
> > >
> > > This 60Gpbs per-channel is definitely impressive..
> > >
> > > Have you tried migration without multifd on your system? Would that
> also
> > > perform similarly v.s. 2 channels multifd?
> >
> > Simple Test result below:
> > VM Type: 16vCPU, 64G memory
> > Workload in VM: fill 56G memory with Silesia data and vCPUs are idle
> > Migration Configurations:
> > 1. migrate_set_parameter max-bandwidth 100G
> > 2. migrate_set_parameter downtime-limit 300
> > 3. migrate_set_capability multifd on (multiFD test case)
> > 4. migrate_set_parameter multifd-channels 2 (multiFD test case)
> >
> >                   Totaltime (ms) Downtime (ms) Throughput (mbps) Pages-
> per-second
> > without Multifd	23580	            307	         21221	       689588
> > Multifd 2	       7657	            198	         65410	      2221176
> 
> Thanks for the test results.
> 
> So I am guessing the migration overheads besides pushing the socket is
> high
> enough to make it drop drastically, even if in this case zero detection
> shouldn't play a major role considering most of guest mem is pre-filled.

Yes, for no multifd migration, besides the network stack overhead, the zero
page detection overhead (both of source and destination) is indeed very high.
Placing the zero page detection in multi-threads can reduce the performance 
degradation caused by the overhead of zero page detection.

I also think migration doesn't need to detect zero page by memcmp in all cases.
The benefit of zero page detection may be that the VM's memory determines that
there are a large number of 0 pages. 

My experience in this area may be insufficient, I am trying with Hao and Bryan to
see if it is possible to use DSA hardware to accelerate this part (including page 0
detection and writing page 0). 

DSA is an accelerator for detecting memory, writing memory, and comparing memory
https://cdrdv2-public.intel.com/671116/341204-intel-data-streaming-accelerator-spec.pdf