Daniel P. Berrangé <berrange@redhat.com> wrote:
> On Tue, Sep 19, 2023 at 12:49:46PM -0400, Peter Xu wrote:
>> On Mon, Sep 18, 2023 at 04:41:14PM +0200, Markus Armbruster wrote:
>> > Oh dear, where to start. There's so much wrong, and in pretty obvious
>> > ways. This code should never have passed review. I'm refraining from
>> > saying more; see the commit messages instead.
>> >
>> > Issues remaining after this series include:
>> >
>> > * Terrible error messages
>> >
>> > * Some error message cascades remain
>> >
>> > * There is no written contract for QEMUFileHooks, and the
>> > responsibility for reporting errors is unclear
>>
>> Even being removed.. because no one is really extending that..
>>
>> https://lore.kernel.org/all/20230509120700.78359-1-quintela@redhat.com/#t
>
> One day (in another 5-10 years) I still hope we'll get to
> the point where QEMUFile itself is obsolete :-)
If you see the patches on list, I have move rate limit check outside of
QEMUFile, so right now it is only a buffer to write in the main
migration thread.
> Getting
> rid of QEMUFileHooks is a great step in that direction.
Thanks.
> Me finishing a old PoC to bring buffering to QIOChannel
> would be another big step.
I want to get rid of qemu_file_set_error() and friends first. We should
handle errors correctly when they happen, and go from there.
> The data rate limiting would be the biggest missing piece
> to enable migration/vmstate logic to directly consume
> a QIOChannel.
As said, that is done. I have three atomic counters:
- qemu_file_bytes
- rdma_bytes
- multifd_bytes
We do the rate limiting adding that 3 counters. The only thing that
QEMUFile does know is increase qemu_file_bytes when it needs to do it.
> Eliminating QEMUFile would help to bring Error **errp
> to all the vmstate codepaths.
Yes!
See the last problem on the list where they couldn't use
migrate_set_error() in vmstate.c because that breaks test-vmstate.c.
>> I always see rdma as "odd fixes" stage.. for a long time. But maybe I was
>> wrong.
>
> In the MAINTAINERS file RDMA still get classified as formally
> supported under the migration maintainers. I'm not convinced
> that is an accurate description of its status. I tend to agree
> with you that it is 'odd fixes' at the very best.
It is not exact, we wanted to get rid of it.
> Dave Gilbert had previously speculated about whether we should
> even consider deprecating it on the basis that latest non-RDMA
> migration is too much better than in the past, with multifd
> and zerocopy, that RDMA might not even offer a significant
> enough peformance win to justify.
The main problem with RDMA is that it uses a really weird model for
migration point of view:
- everything there is asynchronous (nothing else is like that)
- it uses its own everything, i.e. abuses QEMUFile and QIOChannel to try
to look like one, but it fails.
- Its error handling is "peculiar", to be friendly. See this series
from Markus to make it look normal.
Thanks, Juan.