[PATCH v3 16/24] migration/multifd: Send final SYNC only after device state is complete

Maciej S. Szmigiero posted 24 patches 1 year, 2 months ago
There is a newer version of this series
[PATCH v3 16/24] migration/multifd: Send final SYNC only after device state is complete
Posted by Maciej S. Szmigiero 1 year, 2 months ago
From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>

Currently, ram_save_complete() sends a final SYNC multifd packet near this
function end, after sending all of the remaining RAM data.

On the receive side, this SYNC packet will cause multifd channel threads
to block, waiting for the final sem_sync posting in
multifd_recv_terminate_threads().

However, multifd_recv_terminate_threads() won't be called until the
migration is complete, which causes a problem if multifd channels are
still required for transferring device state data after RAM transfer is
complete but before finishing the migration process.

Defer sending the final SYNC packet to the end of sending of
post-switchover iterable data instead if device state transfer is possible.

Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
---
 migration/multifd-nocomp.c | 18 +++++++++++++++++-
 migration/multifd.h        |  1 +
 migration/ram.c            | 10 +++++++++-
 migration/savevm.c         | 11 +++++++++++
 4 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
index 90c0927b9bcb..db87b1262ffa 100644
--- a/migration/multifd-nocomp.c
+++ b/migration/multifd-nocomp.c
@@ -348,7 +348,7 @@ retry:
     return true;
 }
 
-int multifd_ram_flush_and_sync(void)
+int multifd_ram_flush(void)
 {
     if (!migrate_multifd()) {
         return 0;
@@ -361,6 +361,22 @@ int multifd_ram_flush_and_sync(void)
         }
     }
 
+    return 0;
+}
+
+int multifd_ram_flush_and_sync(void)
+{
+    int ret;
+
+    if (!migrate_multifd()) {
+        return 0;
+    }
+
+    ret = multifd_ram_flush();
+    if (ret) {
+        return ret;
+    }
+
     return multifd_send_sync_main();
 }
 
diff --git a/migration/multifd.h b/migration/multifd.h
index 05ddfb4bf119..3abf9578e2ae 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -376,6 +376,7 @@ static inline uint32_t multifd_ram_page_count(void)
 
 void multifd_ram_save_setup(void);
 void multifd_ram_save_cleanup(void);
+int multifd_ram_flush(void);
 int multifd_ram_flush_and_sync(void);
 void multifd_ram_payload_alloc(MultiFDPages_t *pages);
 void multifd_ram_payload_free(MultiFDPages_t *pages);
diff --git a/migration/ram.c b/migration/ram.c
index 05ff9eb32876..cf7bea3f073b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3283,7 +3283,15 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
         }
     }
 
-    ret = multifd_ram_flush_and_sync();
+    if (migration_has_device_state_support()) {
+        /*
+         * Can't do the final SYNC here since device state might still
+         * be transferring via multifd channels.
+         */
+        ret = multifd_ram_flush();
+    } else {
+        ret = multifd_ram_flush_and_sync();
+    }
     if (ret < 0) {
         return ret;
     }
diff --git a/migration/savevm.c b/migration/savevm.c
index 6ea9054c4083..98049cb9b09a 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -37,6 +37,7 @@
 #include "migration/register.h"
 #include "migration/global_state.h"
 #include "migration/channel-block.h"
+#include "multifd.h"
 #include "ram.h"
 #include "qemu-file.h"
 #include "savevm.h"
@@ -1496,6 +1497,7 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
     int64_t start_ts_each, end_ts_each;
     SaveStateEntry *se;
     int ret;
+    bool multifd_device_state = migration_has_device_state_support();
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
         if (!se->ops ||
@@ -1528,6 +1530,15 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
                                     end_ts_each - start_ts_each);
     }
 
+    if (multifd_device_state) {
+        /* Send the final SYNC */
+        ret = multifd_send_sync_main();
+        if (ret) {
+            qemu_file_set_error(f, ret);
+            return -1;
+        }
+    }
+
     trace_vmstate_downtime_checkpoint("src-iterable-saved");
 
     return 0;
Re: [PATCH v3 16/24] migration/multifd: Send final SYNC only after device state is complete
Posted by Fabiano Rosas 1 year, 2 months ago
"Maciej S. Szmigiero" <mail@maciej.szmigiero.name> writes:

> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>
> Currently, ram_save_complete() sends a final SYNC multifd packet near this
> function end, after sending all of the remaining RAM data.
>
> On the receive side, this SYNC packet will cause multifd channel threads
> to block, waiting for the final sem_sync posting in
> multifd_recv_terminate_threads().
>
> However, multifd_recv_terminate_threads() won't be called until the
> migration is complete, which causes a problem if multifd channels are
> still required for transferring device state data after RAM transfer is
> complete but before finishing the migration process.
>
> Defer sending the final SYNC packet to the end of sending of
> post-switchover iterable data instead if device state transfer is possible.
>
> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>

I wonder whether we could just defer the sync for the !device_state case
as well.
Re: [PATCH v3 16/24] migration/multifd: Send final SYNC only after device state is complete
Posted by Maciej S. Szmigiero 1 year, 2 months ago
On 26.11.2024 21:52, Fabiano Rosas wrote:
> "Maciej S. Szmigiero" <mail@maciej.szmigiero.name> writes:
> 
>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>
>> Currently, ram_save_complete() sends a final SYNC multifd packet near this
>> function end, after sending all of the remaining RAM data.
>>
>> On the receive side, this SYNC packet will cause multifd channel threads
>> to block, waiting for the final sem_sync posting in
>> multifd_recv_terminate_threads().
>>
>> However, multifd_recv_terminate_threads() won't be called until the
>> migration is complete, which causes a problem if multifd channels are
>> still required for transferring device state data after RAM transfer is
>> complete but before finishing the migration process.
>>
>> Defer sending the final SYNC packet to the end of sending of
>> post-switchover iterable data instead if device state transfer is possible.
>>
>> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
> 
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> 
> I wonder whether we could just defer the sync for the !device_state case
> as well.
> 

AFAIK this should work, just wanted to be extra cautious with bit
stream timing changes in case there's for example some race in an
older QEMU version.

Thanks,
Maciej
Re: [PATCH v3 16/24] migration/multifd: Send final SYNC only after device state is complete
Posted by Peter Xu 1 year, 2 months ago
On Tue, Nov 26, 2024 at 10:22:42PM +0100, Maciej S. Szmigiero wrote:
> On 26.11.2024 21:52, Fabiano Rosas wrote:
> > "Maciej S. Szmigiero" <mail@maciej.szmigiero.name> writes:
> > 
> > > From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
> > > 
> > > Currently, ram_save_complete() sends a final SYNC multifd packet near this
> > > function end, after sending all of the remaining RAM data.
> > > 
> > > On the receive side, this SYNC packet will cause multifd channel threads
> > > to block, waiting for the final sem_sync posting in
> > > multifd_recv_terminate_threads().
> > > 
> > > However, multifd_recv_terminate_threads() won't be called until the
> > > migration is complete, which causes a problem if multifd channels are
> > > still required for transferring device state data after RAM transfer is
> > > complete but before finishing the migration process.
> > > 
> > > Defer sending the final SYNC packet to the end of sending of
> > > post-switchover iterable data instead if device state transfer is possible.
> > > 
> > > Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
> > 
> > Reviewed-by: Fabiano Rosas <farosas@suse.de>
> > 
> > I wonder whether we could just defer the sync for the !device_state case
> > as well.
> > 
> 
> AFAIK this should work, just wanted to be extra cautious with bit
> stream timing changes in case there's for example some race in an
> older QEMU version.

I see the issue, but maybe we don't even need this patch..

When I was working on commit 637280aeb2 previously, I forgot that the SYNC
messages are together with the FLUSH which got removed.  It means now in
complete() we will sent SYNCs always, but always without FLUSH.

On new binaries, it means SYNCs are not collected properly on dest threads
so it'll hang all threads there.

So yeah, at least from that part it's me to blame..

I think maybe VFIO doesn't need to change the generic path to sync, because
logically speaking VFIO can also use multifd_send_sync_main() in its own
complete() hook to flush everything.  Here the trick is such sync doesn't
need to be attached to any message (either SYNC or FLUSH, that only RAM
uses).  The sync is about "sync against all sender threads", just like what
we do exactly with mapped-ram.  Mapped-ram tricked that path with a
use_packet check in sender thread, however for VFIO we could already expose
a new parameter to multifd_send_sync_main() saying "let's only sync
threads".

I sent two small patches here:

https://lore.kernel.org/r/20241205185303.897010-1-peterx@redhat.com

The 1st patch should fix the SYNC message hang for 637280aeb2 that I did.
The 2nd patch introduced the flag that I said.  I think after that applied
VFIO should be able to sync directly with:

  multifd_send_sync_main(MULTIFD_SYNC_THREADS);

Then maybe we don't need this patch anymore.  Please have a look.

PS: the two patches could be ready to merge already even before VFIO, if
they're properly reviewed and acked.

Thanks,

-- 
Peter Xu
Re: [PATCH v3 16/24] migration/multifd: Send final SYNC only after device state is complete
Posted by Maciej S. Szmigiero 1 year, 2 months ago
On 5.12.2024 20:02, Peter Xu wrote:
> On Tue, Nov 26, 2024 at 10:22:42PM +0100, Maciej S. Szmigiero wrote:
>> On 26.11.2024 21:52, Fabiano Rosas wrote:
>>> "Maciej S. Szmigiero" <mail@maciej.szmigiero.name> writes:
>>>
>>>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>>>
>>>> Currently, ram_save_complete() sends a final SYNC multifd packet near this
>>>> function end, after sending all of the remaining RAM data.
>>>>
>>>> On the receive side, this SYNC packet will cause multifd channel threads
>>>> to block, waiting for the final sem_sync posting in
>>>> multifd_recv_terminate_threads().
>>>>
>>>> However, multifd_recv_terminate_threads() won't be called until the
>>>> migration is complete, which causes a problem if multifd channels are
>>>> still required for transferring device state data after RAM transfer is
>>>> complete but before finishing the migration process.
>>>>
>>>> Defer sending the final SYNC packet to the end of sending of
>>>> post-switchover iterable data instead if device state transfer is possible.
>>>>
>>>> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
>>>
>>> Reviewed-by: Fabiano Rosas <farosas@suse.de>
>>>
>>> I wonder whether we could just defer the sync for the !device_state case
>>> as well.
>>>
>>
>> AFAIK this should work, just wanted to be extra cautious with bit
>> stream timing changes in case there's for example some race in an
>> older QEMU version.
> 
> I see the issue, but maybe we don't even need this patch..
> 
> When I was working on commit 637280aeb2 previously, I forgot that the SYNC
> messages are together with the FLUSH which got removed.  It means now in
> complete() we will sent SYNCs always, but always without FLUSH.
> 
> On new binaries, it means SYNCs are not collected properly on dest threads
> so it'll hang all threads there.
> 
> So yeah, at least from that part it's me to blame..
> 
> I think maybe VFIO doesn't need to change the generic path to sync, because
> logically speaking VFIO can also use multifd_send_sync_main() in its own
> complete() hook to flush everything.  Here the trick is such sync doesn't
> need to be attached to any message (either SYNC or FLUSH, that only RAM
> uses).  The sync is about "sync against all sender threads", just like what
> we do exactly with mapped-ram.  Mapped-ram tricked that path with a
> use_packet check in sender thread, however for VFIO we could already expose
> a new parameter to multifd_send_sync_main() saying "let's only sync
> threads".
> 
> I sent two small patches here:
> 
> https://lore.kernel.org/r/20241205185303.897010-1-peterx@redhat.com
> 
> The 1st patch should fix the SYNC message hang for 637280aeb2 that I did.
> The 2nd patch introduced the flag that I said.  I think after that applied
> VFIO should be able to sync directly with:
> 
>    multifd_send_sync_main(MULTIFD_SYNC_THREADS);
> 
> Then maybe we don't need this patch anymore.  Please have a look.
> 
> PS: the two patches could be ready to merge already even before VFIO, if
> they're properly reviewed and acked.

Thanks Peter for this alternate solution

I think/hope that by the time I will be preparing the next version of
this patch multifd device state set these SYNC patches will be already
merged and I can develop/test against them.
  
> Thanks,
> 

Thanks,
Maciej
Re: [PATCH v3 16/24] migration/multifd: Send final SYNC only after device state is complete
Posted by Peter Xu 1 year, 2 months ago
On Wed, Dec 11, 2024 at 12:05:40AM +0100, Maciej S. Szmigiero wrote:
> > I sent two small patches here:
> > 
> > https://lore.kernel.org/r/20241205185303.897010-1-peterx@redhat.com
> > 
> > The 1st patch should fix the SYNC message hang for 637280aeb2 that I did.
> > The 2nd patch introduced the flag that I said.  I think after that applied
> > VFIO should be able to sync directly with:
> > 
> >    multifd_send_sync_main(MULTIFD_SYNC_THREADS);
> > 
> > Then maybe we don't need this patch anymore.  Please have a look.
> > 
> > PS: the two patches could be ready to merge already even before VFIO, if
> > they're properly reviewed and acked.
> 
> Thanks Peter for this alternate solution
> 
> I think/hope that by the time I will be preparing the next version of
> this patch multifd device state set these SYNC patches will be already
> merged and I can develop/test against them.

Yes that's the plan, even if it didn't yet land you can also collect the
first two patches, especially if you agree with the changes.  I think we
should fix it one way or another, so basing on top of that might be best
for this series (it should hopefully have less code to change with that).

Just to mention: when rebased on top, multifd_send_sync_main() may or may
not need a lock to protect when VFIO uses it.  I think no, as long as it
always comes from the migration thread, but worth double check as I don't
100% know what's the next version looks like (or it can simply share the
same multifd mutex, I think).

Thanks,

-- 
Peter Xu