[v3] migration/multifd: Remove multifd_send_state->pages

[PATCH v3 12/14] migration/multifd: Allow multifd sync without flush

Posted by Fabiano Rosas 1 year, 6 months ago

Separate the multifd sync from flushing the client data to the
channels. These two operations are closely related but not strictly
necessary to be executed together.

The multifd sync is intrinsic to how multifd works. The multiple
channels operate independently and may finish IO out of order in
relation to each other. This applies also between the source and
destination QEMU.

Flushing the data that is left in the client-owned data structures
(e.g. MultiFDPages_t) prior to sync is usually the right thing to do,
but that is particular to how the ram migration is implemented with
several passes over dirty data.

Make these two routines separate, allowing future code to call the
sync by itself if needed. This also allows the usage of
multifd_ram_send to be isolated to ram code.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd.c | 13 +++++++++----
 migration/multifd.h |  1 +
 migration/ram.c     |  8 ++++----
 3 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 1c16879495..c25ab4924c 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -921,11 +921,8 @@ static int multifd_zero_copy_flush(QIOChannel *c)
     return ret;
 }
 
-int multifd_send_sync_main(void)
+int multifd_ram_flush_and_sync(void)
 {
-    int i;
-    bool flush_zero_copy;
-
     if (!migrate_multifd()) {
         return 0;
     }
@@ -937,6 +934,14 @@ int multifd_send_sync_main(void)
         }
     }
 
+    return multifd_send_sync_main();
+}
+
+int multifd_send_sync_main(void)
+{
+    int i;
+    bool flush_zero_copy;
+
     flush_zero_copy = migrate_zero_copy_send();
 
     for (i = 0; i < migrate_multifd_channels(); i++) {
diff --git a/migration/multifd.h b/migration/multifd.h
index e3d0120837..19a43d46c0 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -278,4 +278,5 @@ static inline uint32_t multifd_ram_page_count(void)
 
 void multifd_ram_save_setup(void);
 void multifd_ram_save_cleanup(void);
+int multifd_ram_flush_and_sync(void);
 #endif
diff --git a/migration/ram.c b/migration/ram.c
index 1815b2557b..67ca3d5d51 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1326,7 +1326,7 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
                 (!migrate_multifd_flush_after_each_section() ||
                  migrate_mapped_ram())) {
                 QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel;
-                int ret = multifd_send_sync_main();
+                int ret = multifd_ram_flush_and_sync();
                 if (ret < 0) {
                     return ret;
                 }
@@ -3066,7 +3066,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp)
     }
 
     bql_unlock();
-    ret = multifd_send_sync_main();
+    ret = multifd_ram_flush_and_sync();
     bql_lock();
     if (ret < 0) {
         error_setg(errp, "%s: multifd synchronization failed", __func__);
@@ -3213,7 +3213,7 @@ out:
         && migration_is_setup_or_active()) {
         if (migrate_multifd() && migrate_multifd_flush_after_each_section() &&
             !migrate_mapped_ram()) {
-            ret = multifd_send_sync_main();
+            ret = multifd_ram_flush_and_sync();
             if (ret < 0) {
                 return ret;
             }
@@ -3285,7 +3285,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
         }
     }
 
-    ret = multifd_send_sync_main();
+    ret = multifd_ram_flush_and_sync();
     if (ret < 0) {
         return ret;
     }
-- 
2.35.3

Re: [PATCH v3 12/14] migration/multifd: Allow multifd sync without flush

Posted by Peter Xu 1 year, 5 months ago

On Thu, Aug 01, 2024 at 09:35:14AM -0300, Fabiano Rosas wrote:
> Separate the multifd sync from flushing the client data to the
> channels. These two operations are closely related but not strictly
> necessary to be executed together.
> 
> The multifd sync is intrinsic to how multifd works. The multiple
> channels operate independently and may finish IO out of order in
> relation to each other. This applies also between the source and
> destination QEMU.
> 
> Flushing the data that is left in the client-owned data structures
> (e.g. MultiFDPages_t) prior to sync is usually the right thing to do,
> but that is particular to how the ram migration is implemented with
> several passes over dirty data.
> 
> Make these two routines separate, allowing future code to call the
> sync by itself if needed. This also allows the usage of
> multifd_ram_send to be isolated to ram code.

What's the usage of sync but not flush here?

-- 
Peter Xu

Re: [PATCH v3 12/14] migration/multifd: Allow multifd sync without flush

Posted by Peter Xu 1 year, 5 months ago

On Thu, Aug 22, 2024 at 12:03:47PM -0400, Peter Xu wrote:
> On Thu, Aug 01, 2024 at 09:35:14AM -0300, Fabiano Rosas wrote:
> > Separate the multifd sync from flushing the client data to the
> > channels. These two operations are closely related but not strictly
> > necessary to be executed together.
> > 
> > The multifd sync is intrinsic to how multifd works. The multiple
> > channels operate independently and may finish IO out of order in
> > relation to each other. This applies also between the source and
> > destination QEMU.
> > 
> > Flushing the data that is left in the client-owned data structures
> > (e.g. MultiFDPages_t) prior to sync is usually the right thing to do,
> > but that is particular to how the ram migration is implemented with
> > several passes over dirty data.
> > 
> > Make these two routines separate, allowing future code to call the
> > sync by itself if needed. This also allows the usage of
> > multifd_ram_send to be isolated to ram code.
> 
> What's the usage of sync but not flush here?

Oh I think I see your point.. I think flush+sync is always needed, it's
just that RAM may not always be the one to flush, but something else.
Makes sense then.

If you want, you may touch up the commit message to clarify that.  E.g. I
still don't see any use case that we want to sync without a flush, that
part might be a bit ambiguous.

If my understanding is correct, take this:

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu

Re: [PATCH v3 12/14] migration/multifd: Allow multifd sync without flush

Posted by Fabiano Rosas 1 year, 5 months ago

Peter Xu <peterx@redhat.com> writes:

> On Thu, Aug 22, 2024 at 12:03:47PM -0400, Peter Xu wrote:
>> On Thu, Aug 01, 2024 at 09:35:14AM -0300, Fabiano Rosas wrote:
>> > Separate the multifd sync from flushing the client data to the
>> > channels. These two operations are closely related but not strictly
>> > necessary to be executed together.
>> > 
>> > The multifd sync is intrinsic to how multifd works. The multiple
>> > channels operate independently and may finish IO out of order in
>> > relation to each other. This applies also between the source and
>> > destination QEMU.
>> > 
>> > Flushing the data that is left in the client-owned data structures
>> > (e.g. MultiFDPages_t) prior to sync is usually the right thing to do,
>> > but that is particular to how the ram migration is implemented with
>> > several passes over dirty data.
>> > 
>> > Make these two routines separate, allowing future code to call the
>> > sync by itself if needed. This also allows the usage of
>> > multifd_ram_send to be isolated to ram code.
>> 
>> What's the usage of sync but not flush here?
>
> Oh I think I see your point.. I think flush+sync is always needed, it's
> just that RAM may not always be the one to flush, but something else.
> Makes sense then.
>

I'm thinking of "flush" here as a last multifd_send() before sync. We
need multiple multifd_send() along the migration to send the data, but
we might not need this extra flush. It could be that there's nothing to
flush and the code guarantees it:

 <populate MultiFDSendData>
 multifd_send()
 sync

Where RAM currently does:

 multifd_queue_page()
 multifd_queue_page()
 multifd_queue_page()
 ...
 multifd_queue_page()
 multifd_send()
 sync

Today there is a multifd_send() inside multifd_queue_page() and the
amount sent depends on the ram.c code. At the time sync gets called,
there could be data queued but not yet sent. Another client (not ram)
could just produce data in a deterministic manner and match that with
calls to multifd_send().

> If you want, you may touch up the commit message to clarify that.  E.g. I
> still don't see any use case that we want to sync without a flush, that
> part might be a bit ambiguous.
>
> If my understanding is correct, take this:
>
> Reviewed-by: Peter Xu <peterx@redhat.com>

Re: [PATCH v3 12/14] migration/multifd: Allow multifd sync without flush

Posted by Peter Xu 1 year, 5 months ago

On Thu, Aug 22, 2024 at 02:05:30PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Thu, Aug 22, 2024 at 12:03:47PM -0400, Peter Xu wrote:
> >> On Thu, Aug 01, 2024 at 09:35:14AM -0300, Fabiano Rosas wrote:
> >> > Separate the multifd sync from flushing the client data to the
> >> > channels. These two operations are closely related but not strictly
> >> > necessary to be executed together.
> >> > 
> >> > The multifd sync is intrinsic to how multifd works. The multiple
> >> > channels operate independently and may finish IO out of order in
> >> > relation to each other. This applies also between the source and
> >> > destination QEMU.
> >> > 
> >> > Flushing the data that is left in the client-owned data structures
> >> > (e.g. MultiFDPages_t) prior to sync is usually the right thing to do,
> >> > but that is particular to how the ram migration is implemented with
> >> > several passes over dirty data.
> >> > 
> >> > Make these two routines separate, allowing future code to call the
> >> > sync by itself if needed. This also allows the usage of
> >> > multifd_ram_send to be isolated to ram code.
> >> 
> >> What's the usage of sync but not flush here?
> >
> > Oh I think I see your point.. I think flush+sync is always needed, it's
> > just that RAM may not always be the one to flush, but something else.
> > Makes sense then.
> >
> 
> I'm thinking of "flush" here as a last multifd_send() before sync. We
> need multiple multifd_send() along the migration to send the data, but
> we might not need this extra flush. It could be that there's nothing to
> flush and the code guarantees it:
> 
>  <populate MultiFDSendData>
>  multifd_send()
>  sync
> 
> Where RAM currently does:
> 
>  multifd_queue_page()
>  multifd_queue_page()
>  multifd_queue_page()
>  ...
>  multifd_queue_page()
>  multifd_send()
>  sync
> 
> Today there is a multifd_send() inside multifd_queue_page() and the
> amount sent depends on the ram.c code. At the time sync gets called,
> there could be data queued but not yet sent. Another client (not ram)
> could just produce data in a deterministic manner and match that with
> calls to multifd_send().

I hope I read it alright.. I suppose you meant we have chance to do:

  ram_send()
  vfio_send()
  flush()

Instead of:

  ram_send()
  flush()
  vfio_send()
  flush()

Am I right?

> 
> > If you want, you may touch up the commit message to clarify that.  E.g. I
> > still don't see any use case that we want to sync without a flush, that
> > part might be a bit ambiguous.
> >
> > If my understanding is correct, take this:
> >
> > Reviewed-by: Peter Xu <peterx@redhat.com>
> 

-- 
Peter Xu

Re: [PATCH v3 12/14] migration/multifd: Allow multifd sync without flush

Posted by Fabiano Rosas 1 year, 5 months ago

Peter Xu <peterx@redhat.com> writes:

> On Thu, Aug 22, 2024 at 02:05:30PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Thu, Aug 22, 2024 at 12:03:47PM -0400, Peter Xu wrote:
>> >> On Thu, Aug 01, 2024 at 09:35:14AM -0300, Fabiano Rosas wrote:
>> >> > Separate the multifd sync from flushing the client data to the
>> >> > channels. These two operations are closely related but not strictly
>> >> > necessary to be executed together.
>> >> > 
>> >> > The multifd sync is intrinsic to how multifd works. The multiple
>> >> > channels operate independently and may finish IO out of order in
>> >> > relation to each other. This applies also between the source and
>> >> > destination QEMU.
>> >> > 
>> >> > Flushing the data that is left in the client-owned data structures
>> >> > (e.g. MultiFDPages_t) prior to sync is usually the right thing to do,
>> >> > but that is particular to how the ram migration is implemented with
>> >> > several passes over dirty data.
>> >> > 
>> >> > Make these two routines separate, allowing future code to call the
>> >> > sync by itself if needed. This also allows the usage of
>> >> > multifd_ram_send to be isolated to ram code.
>> >> 
>> >> What's the usage of sync but not flush here?
>> >
>> > Oh I think I see your point.. I think flush+sync is always needed, it's
>> > just that RAM may not always be the one to flush, but something else.
>> > Makes sense then.
>> >
>> 
>> I'm thinking of "flush" here as a last multifd_send() before sync. We
>> need multiple multifd_send() along the migration to send the data, but
>> we might not need this extra flush. It could be that there's nothing to
>> flush and the code guarantees it:
>> 
>>  <populate MultiFDSendData>
>>  multifd_send()
>>  sync
>> 
>> Where RAM currently does:
>> 
>>  multifd_queue_page()
>>  multifd_queue_page()
>>  multifd_queue_page()
>>  ...
>>  multifd_queue_page()
>>  multifd_send()
>>  sync
>> 
>> Today there is a multifd_send() inside multifd_queue_page() and the
>> amount sent depends on the ram.c code. At the time sync gets called,
>> there could be data queued but not yet sent. Another client (not ram)
>> could just produce data in a deterministic manner and match that with
>> calls to multifd_send().
>
> I hope I read it alright.. I suppose you meant we have chance to do:
>
>   ram_send()
>   vfio_send()
>   flush()
>
> Instead of:
>
>   ram_send()
>   flush()
>   vfio_send()
>   flush()
>
> Am I right?

Not really. I'm saying that RAM doesn't always send the data, that's why
it needs a final flush before sync:

multifd_queue_page()
    if (multifd_queue_empty(pages)) {
        multifd_enqueue(pages, offset);
    }
    
    if (multifd_queue_full(pages)) {
        multifd_send_pages()   <-- this might not happen
    }
    multifd_enqueue()

multifd_send_sync_main()
    if (pages->num) { <-- data left unsent
       multifd_send() <-- flush
    }
    <sync routine>

>
>> 
>> > If you want, you may touch up the commit message to clarify that.  E.g. I
>> > still don't see any use case that we want to sync without a flush, that
>> > part might be a bit ambiguous.
>> >
>> > If my understanding is correct, take this:
>> >
>> > Reviewed-by: Peter Xu <peterx@redhat.com>
>>

Re: [PATCH v3 12/14] migration/multifd: Allow multifd sync without flush

Posted by Peter Xu 1 year, 5 months ago

On Thu, Aug 22, 2024 at 03:07:55PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Thu, Aug 22, 2024 at 02:05:30PM -0300, Fabiano Rosas wrote:
> >> Peter Xu <peterx@redhat.com> writes:
> >> 
> >> > On Thu, Aug 22, 2024 at 12:03:47PM -0400, Peter Xu wrote:
> >> >> On Thu, Aug 01, 2024 at 09:35:14AM -0300, Fabiano Rosas wrote:
> >> >> > Separate the multifd sync from flushing the client data to the
> >> >> > channels. These two operations are closely related but not strictly
> >> >> > necessary to be executed together.
> >> >> > 
> >> >> > The multifd sync is intrinsic to how multifd works. The multiple
> >> >> > channels operate independently and may finish IO out of order in
> >> >> > relation to each other. This applies also between the source and
> >> >> > destination QEMU.
> >> >> > 
> >> >> > Flushing the data that is left in the client-owned data structures
> >> >> > (e.g. MultiFDPages_t) prior to sync is usually the right thing to do,
> >> >> > but that is particular to how the ram migration is implemented with
> >> >> > several passes over dirty data.
> >> >> > 
> >> >> > Make these two routines separate, allowing future code to call the
> >> >> > sync by itself if needed. This also allows the usage of
> >> >> > multifd_ram_send to be isolated to ram code.
> >> >> 
> >> >> What's the usage of sync but not flush here?
> >> >
> >> > Oh I think I see your point.. I think flush+sync is always needed, it's
> >> > just that RAM may not always be the one to flush, but something else.
> >> > Makes sense then.
> >> >
> >> 
> >> I'm thinking of "flush" here as a last multifd_send() before sync. We
> >> need multiple multifd_send() along the migration to send the data, but
> >> we might not need this extra flush. It could be that there's nothing to
> >> flush and the code guarantees it:
> >> 
> >>  <populate MultiFDSendData>
> >>  multifd_send()
> >>  sync
> >> 
> >> Where RAM currently does:
> >> 
> >>  multifd_queue_page()
> >>  multifd_queue_page()
> >>  multifd_queue_page()
> >>  ...
> >>  multifd_queue_page()
> >>  multifd_send()
> >>  sync
> >> 
> >> Today there is a multifd_send() inside multifd_queue_page() and the
> >> amount sent depends on the ram.c code. At the time sync gets called,
> >> there could be data queued but not yet sent. Another client (not ram)
> >> could just produce data in a deterministic manner and match that with
> >> calls to multifd_send().
> >
> > I hope I read it alright.. I suppose you meant we have chance to do:
> >
> >   ram_send()
> >   vfio_send()
> >   flush()
> >
> > Instead of:
> >
> >   ram_send()
> >   flush()
> >   vfio_send()
> >   flush()
> >
> > Am I right?
> 
> Not really. I'm saying that RAM doesn't always send the data, that's why
> it needs a final flush before sync:
> 
> multifd_queue_page()
>     if (multifd_queue_empty(pages)) {
>         multifd_enqueue(pages, offset);
>     }
>     
>     if (multifd_queue_full(pages)) {
>         multifd_send_pages()   <-- this might not happen
>     }
>     multifd_enqueue()
> 
> multifd_send_sync_main()
>     if (pages->num) { <-- data left unsent
>        multifd_send() <-- flush
>     }
>     <sync routine>

I see now.

At least for now I'd expect VFIO doesn't need to use sync at all, not
before the kernel ABI changes. It's just that when that comes we may need
to move that final sync() out of ram code then, but before a migration
completes, to cover both.

It's ok then, my R-b holds.

-- 
Peter Xu