RE: [PATCH v5 00/13] WIP: Use Intel DSA accelerator to offload zero page checking in multifd live migration.

Liu, Yuan1 posted 13 patches 4 months, 1 week ago
Only 0 patches received!
RE: [PATCH v5 00/13] WIP: Use Intel DSA accelerator to offload zero page checking in multifd live migration.
Posted by Liu, Yuan1 4 months, 1 week ago
> -----Original Message-----
> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, July 16, 2024 12:24 AM
> To: Liu, Yuan1 <yuan1.liu@intel.com>
> Cc: Wang, Yichen <yichen.wang@bytedance.com>; Paolo Bonzini
> <pbonzini@redhat.com>; Marc-André Lureau <marcandre.lureau@redhat.com>;
> Daniel P. Berrangé <berrange@redhat.com>; Thomas Huth <thuth@redhat.com>;
> Philippe Mathieu-Daudé <philmd@linaro.org>; Peter Xu <peterx@redhat.com>;
> Fabiano Rosas <farosas@suse.de>; Eric Blake <eblake@redhat.com>; Markus
> Armbruster <armbru@redhat.com>; Cornelia Huck <cohuck@redhat.com>; qemu-
> devel@nongnu.org; Hao Xiang <hao.xiang@linux.dev>; Kumar, Shivam
> <shivam.kumar1@nutanix.com>; Ho-Ren (Jack) Chuang
> <horenchuang@bytedance.com>
> Subject: Re: [PATCH v5 00/13] WIP: Use Intel DSA accelerator to offload
> zero page checking in multifd live migration.
> 
> On Mon, Jul 15, 2024 at 03:57:42PM +0000, Liu, Yuan1 wrote:
> > > > > > > > > that is 23% total CPU usage savings.
> > > > > > > >
> > > > > > > >
> > > > > > > > Here the DSA was mostly idle.
> > > > > > > >
> > > > > > > > Sounds good but a question: what if several qemu instances
> are
> > > > > > > > migrated in parallel?
> > > > > > > >
> > > > > > > > Some accelerators tend to basically stall if several tasks
> > > > > > > > are trying to use them at the same time.
> > > > > > > >
> > > > > > > > Where is the boundary here?
> >
> > If I understand correctly, you are concerned that in some scenarios the
> > accelerator itself is the migration bottleneck, causing the migration
> performance
> > to be degraded.
> >
> > My understanding is to make full use of the accelerator bandwidth, and
> once
> > the accelerator is the bottleneck, it will fall back to zero-page
> detection
> > by the CPU.
> >
> > For example, when the enqcmd command returns an error which means the
> work queue
> > is full, then we can add some retry mechanisms or directly use CPU
> detection.
> 
> 
> How is it handled in your patch? If you just abort migration unless
> enqcmd succeeds then would that not be a bug, where loading the system
> leads to migraton failures?

Sorry for this, I have just started reviewing this patch. The content we
discussed before is only related to the DSA device itself and may not be
related to this patch's implementation. I will review the issue you mentioned
carefully. Thank you for your reminder.

> --
> MST