[v2] migration: Add COLO multifd support and COLO migration unit test

[PATCH v2 4/8] multifd: Add COLO support

Posted by Lukas Straub 2 weeks, 6 days ago

Like in the normal ram_load() path, put the received pages into the
colo cache and mark the pages in the bitmap so that they will be
flushed to the guest later.

Multifd with COLO is useful to reduce the VM pause time during checkpointing
for latency sensitive workloads. In such workloads the worst-case latency
is especially important.

Also, multifd migration is the preferred way to do migration nowadays and this
allows to use multifd compression with COLO.

Benchmark:
Cluster nodes
 - Intel Xenon E5-2630 v3
 - 48Gb RAM
 - 10G Ethernet
Guest
 - Windows Server 2016
 - 6Gb RAM
 - 4 cores
Workload
 - Upload a file to the guest with SMB to simulate moderate
   memory dirtying
 - Measure the memory transfer time portion of each checkpoint
 - 600ms COLO checkpoint interval

Results
Plain
 idle mean: 4.50ms 99per: 10.33ms
 load mean: 24.30ms 99per: 78.05ms
Multifd-4
 idle mean: 6.48ms 99per: 10.41ms
 load mean: 14.12ms 99per: 31.27ms

Evaluation
While multifd has slightly higher latency when the guest idles, it is
10ms faster under load and more importantly it's worst case latency is
less than 1/2 of plain under load as can be seen in the 99. Percentile.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
---
 MAINTAINERS              |  1 +
 migration/meson.build    |  2 +-
 migration/multifd-colo.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++++
 migration/multifd-colo.h | 26 +++++++++++++++++++++++++
 migration/multifd.c      | 12 ++++++++++++
 migration/multifd.h      |  1 +
 6 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 563804345fec68ee72793dbb7c1b7e5be4c32083..dbb217255c2cf35dc0ce971c2021b130fac5469b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3837,6 +3837,7 @@ COLO Framework
 M: Lukas Straub <lukasstraub2@web.de>
 S: Maintained
 F: migration/colo*
+F: migration/multifd-colo.*
 F: include/migration/colo.h
 F: include/migration/failover.h
 F: docs/COLO-FT.txt
diff --git a/migration/meson.build b/migration/meson.build
index 16909d54c5110fc5d8187fd3a68c4a5b08b59ea7..1e59fe4f1f0bbfffed90df38e8f39fa87bceb9b9 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -40,7 +40,7 @@ system_ss.add(files(
 ), gnutls, zlib)
 
 if get_option('replication').allowed()
-  system_ss.add(files('colo-failover.c', 'colo.c'))
+  system_ss.add(files('colo-failover.c', 'colo.c', 'multifd-colo.c'))
 else
   system_ss.add(files('colo-stubs.c'))
 endif
diff --git a/migration/multifd-colo.c b/migration/multifd-colo.c
new file mode 100644
index 0000000000000000000000000000000000000000..d8d98e79b12ed52c41f341052a682d7786e221b5
--- /dev/null
+++ b/migration/multifd-colo.c
@@ -0,0 +1,49 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * multifd colo implementation
+ *
+ * Copyright (c) Lukas Straub <lukasstraub2@web.de>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "exec/target_page.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "ram.h"
+#include "multifd.h"
+#include "options.h"
+#include "io/channel-socket.h"
+#include "migration/colo.h"
+#include "multifd-colo.h"
+#include "system/ramblock.h"
+
+void multifd_colo_prepare_recv(MultiFDRecvParams *p)
+{
+    assert(p->block->colo_cache);
+
+    /*
+     * While we're still in precopy state (not yet in colo state), we copy
+     * received pages to both guest and cache. No need to set dirty bits,
+     * since guest and cache memory are in sync.
+     */
+    if (migration_incoming_in_colo_state()) {
+        colo_record_bitmap(p->block, p->normal, p->normal_num);
+    }
+    p->host = p->block->colo_cache;
+}
+
+void multifd_colo_process_recv(MultiFDRecvParams *p)
+{
+    if (!migration_incoming_in_colo_state()) {
+        for (int i = 0; i < p->normal_num; i++) {
+            void *guest = p->block->host + p->normal[i];
+            void *cache = p->host + p->normal[i];
+            memcpy(guest, cache, multifd_ram_page_size());
+        }
+    }
+    p->host = p->block->host;
+}
diff --git a/migration/multifd-colo.h b/migration/multifd-colo.h
new file mode 100644
index 0000000000000000000000000000000000000000..82eaf3f48c47de2f090f9de52f9d57a337d4754a
--- /dev/null
+++ b/migration/multifd-colo.h
@@ -0,0 +1,26 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * multifd colo header
+ *
+ * Copyright (c) Lukas Straub <lukasstraub2@web.de>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_MIGRATION_MULTIFD_COLO_H
+#define QEMU_MIGRATION_MULTIFD_COLO_H
+
+#ifdef CONFIG_REPLICATION
+
+void multifd_colo_prepare_recv(MultiFDRecvParams *p);
+void multifd_colo_process_recv(MultiFDRecvParams *p);
+
+#else
+
+static inline void multifd_colo_prepare_recv(MultiFDRecvParams *p) {}
+static inline void multifd_colo_process_recv(MultiFDRecvParams *p) {}
+
+#endif
+#endif
diff --git a/migration/multifd.c b/migration/multifd.c
index 8e71171fb7a17726ba7eb0705e293c41e8aa32ec..6c85acec3bac134e85cfcee0d32057134f5af8d1 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -29,6 +29,7 @@
 #include "qemu-file.h"
 #include "trace.h"
 #include "multifd.h"
+#include "multifd-colo.h"
 #include "threadinfo.h"
 #include "options.h"
 #include "qemu/yank.h"
@@ -1269,7 +1270,18 @@ static int multifd_ram_state_recv(MultiFDRecvParams *p, Error **errp)
 {
     int ret;
 
+    if (migrate_colo()) {
+        multifd_colo_prepare_recv(p);
+    }
+
     ret = multifd_recv_state->ops->recv(p, errp);
+    if (ret != 0) {
+        return ret;
+    }
+
+    if (migrate_colo()) {
+        multifd_colo_process_recv(p);
+    }
 
     return ret;
 }
diff --git a/migration/multifd.h b/migration/multifd.h
index 9b6d81e7ede024f05d4cd235de95e73840d0bbc4..7036f438fade1baed2442bfdcf8b5d6397c4a448 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -280,6 +280,7 @@ typedef struct {
     /* ramblock */
     RAMBlock *block;
     /* ramblock host address */
+    /* or points to the corresponding address in the colo cache */
     uint8_t *host;
     /* buffers to recv */
     struct iovec *iov;

-- 
2.39.5

Re: [PATCH v2 4/8] multifd: Add COLO support

Posted by Peter Xu 2 weeks, 3 days ago

On Sat, Jan 17, 2026 at 03:09:11PM +0100, Lukas Straub wrote:
> Like in the normal ram_load() path, put the received pages into the
> colo cache and mark the pages in the bitmap so that they will be
> flushed to the guest later.
> 
> Multifd with COLO is useful to reduce the VM pause time during checkpointing
> for latency sensitive workloads. In such workloads the worst-case latency
> is especially important.
> 
> Also, multifd migration is the preferred way to do migration nowadays and this
> allows to use multifd compression with COLO.
> 
> Benchmark:
> Cluster nodes
>  - Intel Xenon E5-2630 v3
>  - 48Gb RAM
>  - 10G Ethernet
> Guest
>  - Windows Server 2016
>  - 6Gb RAM
>  - 4 cores
> Workload
>  - Upload a file to the guest with SMB to simulate moderate
>    memory dirtying
>  - Measure the memory transfer time portion of each checkpoint
>  - 600ms COLO checkpoint interval
> 
> Results
> Plain
>  idle mean: 4.50ms 99per: 10.33ms
>  load mean: 24.30ms 99per: 78.05ms
> Multifd-4
>  idle mean: 6.48ms 99per: 10.41ms
>  load mean: 14.12ms 99per: 31.27ms

Thanks for the numbers.  They're persuasive at least from 1st look.

Said that, one major question is, multifd should only help with throughput
when cpu is a bottleneck sending, in your case it's 10Gbps NIC.  Normally
any decent cpu should be able to push closer to 10Gbps even without
multifd.

Per my previous experiences, multifd can only show a difference when the
hosts have at least 25GBps+ bandwidth available.

Maybe you turned on compression already?  If so, worth stating the
compressor methods chosen / parameters.

> 
> Evaluation
> While multifd has slightly higher latency when the guest idles, it is
> 10ms faster under load and more importantly it's worst case latency is
> less than 1/2 of plain under load as can be seen in the 99. Percentile.
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> ---
>  MAINTAINERS              |  1 +
>  migration/meson.build    |  2 +-
>  migration/multifd-colo.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++++
>  migration/multifd-colo.h | 26 +++++++++++++++++++++++++
>  migration/multifd.c      | 12 ++++++++++++
>  migration/multifd.h      |  1 +
>  6 files changed, 90 insertions(+), 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 563804345fec68ee72793dbb7c1b7e5be4c32083..dbb217255c2cf35dc0ce971c2021b130fac5469b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3837,6 +3837,7 @@ COLO Framework
>  M: Lukas Straub <lukasstraub2@web.de>
>  S: Maintained
>  F: migration/colo*
> +F: migration/multifd-colo.*
>  F: include/migration/colo.h
>  F: include/migration/failover.h
>  F: docs/COLO-FT.txt
> diff --git a/migration/meson.build b/migration/meson.build
> index 16909d54c5110fc5d8187fd3a68c4a5b08b59ea7..1e59fe4f1f0bbfffed90df38e8f39fa87bceb9b9 100644
> --- a/migration/meson.build
> +++ b/migration/meson.build
> @@ -40,7 +40,7 @@ system_ss.add(files(
>  ), gnutls, zlib)
>  
>  if get_option('replication').allowed()
> -  system_ss.add(files('colo-failover.c', 'colo.c'))
> +  system_ss.add(files('colo-failover.c', 'colo.c', 'multifd-colo.c'))
>  else
>    system_ss.add(files('colo-stubs.c'))
>  endif
> diff --git a/migration/multifd-colo.c b/migration/multifd-colo.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..d8d98e79b12ed52c41f341052a682d7786e221b5
> --- /dev/null
> +++ b/migration/multifd-colo.c
> @@ -0,0 +1,49 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * multifd colo implementation
> + *
> + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "exec/target_page.h"
> +#include "qemu/error-report.h"
> +#include "qapi/error.h"
> +#include "ram.h"
> +#include "multifd.h"
> +#include "options.h"
> +#include "io/channel-socket.h"
> +#include "migration/colo.h"
> +#include "multifd-colo.h"
> +#include "system/ramblock.h"
> +
> +void multifd_colo_prepare_recv(MultiFDRecvParams *p)
> +{
> +    assert(p->block->colo_cache);
> +
> +    /*
> +     * While we're still in precopy state (not yet in colo state), we copy
> +     * received pages to both guest and cache. No need to set dirty bits,
> +     * since guest and cache memory are in sync.
> +     */
> +    if (migration_incoming_in_colo_state()) {
> +        colo_record_bitmap(p->block, p->normal, p->normal_num);
> +    }
> +    p->host = p->block->colo_cache;

I should have mentioned it while reviewing the previous version, anyway..

IMHO it would be better to have one place setting p->host instead of
overwritting it.

So instead of hooking before ->recv(), we should do it in
multifd_ram_unfill_packet(), moving the p->host update to the end of
function and hook it there with COLO (so that you can still record the
bitmaps, only after normal[]).

Another thing, which might be more important: you seem to have ignored
zero[], but I think you need it.  zero[] keeps all pages that are zeros
(which may not used to be zeros).  So IMHO you'll need to record them as
dirty too in COLO's dest bitmap, otherwise you may hit hard to debug RAM
corruptions.

> +}
> +
> +void multifd_colo_process_recv(MultiFDRecvParams *p)
> +{
> +    if (!migration_incoming_in_colo_state()) {
> +        for (int i = 0; i < p->normal_num; i++) {
> +            void *guest = p->block->host + p->normal[i];
> +            void *cache = p->host + p->normal[i];
> +            memcpy(guest, cache, multifd_ram_page_size());
> +        }
> +    }
> +    p->host = p->block->host;

Is resetting the pointer required?  If not, we can skip it.

> +}
> diff --git a/migration/multifd-colo.h b/migration/multifd-colo.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..82eaf3f48c47de2f090f9de52f9d57a337d4754a
> --- /dev/null
> +++ b/migration/multifd-colo.h
> @@ -0,0 +1,26 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * multifd colo header
> + *
> + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef QEMU_MIGRATION_MULTIFD_COLO_H
> +#define QEMU_MIGRATION_MULTIFD_COLO_H
> +
> +#ifdef CONFIG_REPLICATION
> +
> +void multifd_colo_prepare_recv(MultiFDRecvParams *p);
> +void multifd_colo_process_recv(MultiFDRecvParams *p);
> +
> +#else
> +
> +static inline void multifd_colo_prepare_recv(MultiFDRecvParams *p) {}
> +static inline void multifd_colo_process_recv(MultiFDRecvParams *p) {}
> +
> +#endif
> +#endif
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 8e71171fb7a17726ba7eb0705e293c41e8aa32ec..6c85acec3bac134e85cfcee0d32057134f5af8d1 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -29,6 +29,7 @@
>  #include "qemu-file.h"
>  #include "trace.h"
>  #include "multifd.h"
> +#include "multifd-colo.h"
>  #include "threadinfo.h"
>  #include "options.h"
>  #include "qemu/yank.h"
> @@ -1269,7 +1270,18 @@ static int multifd_ram_state_recv(MultiFDRecvParams *p, Error **errp)
>  {
>      int ret;
>  
> +    if (migrate_colo()) {
> +        multifd_colo_prepare_recv(p);
> +    }
> +
>      ret = multifd_recv_state->ops->recv(p, errp);
> +    if (ret != 0) {
> +        return ret;
> +    }
> +
> +    if (migrate_colo()) {
> +        multifd_colo_process_recv(p);
> +    }
>  
>      return ret;
>  }
> diff --git a/migration/multifd.h b/migration/multifd.h
> index 9b6d81e7ede024f05d4cd235de95e73840d0bbc4..7036f438fade1baed2442bfdcf8b5d6397c4a448 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -280,6 +280,7 @@ typedef struct {
>      /* ramblock */
>      RAMBlock *block;
>      /* ramblock host address */
> +    /* or points to the corresponding address in the colo cache */

Nit: we can merge it with /* ... */, and some wording change:

       /*
        * Normally, it points to ramblock's host address.  When COLO
        * enabled, it points to the mirror cache for the ramblock.
        */

>      uint8_t *host;
>      /* buffers to recv */
>      struct iovec *iov;
> 
> -- 
> 2.39.5
> 

-- 
Peter Xu

Re: [PATCH v2 4/8] multifd: Add COLO support

Posted by Lukas Straub 2 weeks, 2 days ago

On Tue, 20 Jan 2026 12:13:58 -0500
Peter Xu <peterx@redhat.com> wrote:

> On Sat, Jan 17, 2026 at 03:09:11PM +0100, Lukas Straub wrote:
> > Like in the normal ram_load() path, put the received pages into the
> > colo cache and mark the pages in the bitmap so that they will be
> > flushed to the guest later.
> > 
> > Multifd with COLO is useful to reduce the VM pause time during checkpointing
> > for latency sensitive workloads. In such workloads the worst-case latency
> > is especially important.
> > 
> > Also, multifd migration is the preferred way to do migration nowadays and this
> > allows to use multifd compression with COLO.
> > 
> > Benchmark:
> > Cluster nodes
> >  - Intel Xenon E5-2630 v3
> >  - 48Gb RAM
> >  - 10G Ethernet
> > Guest
> >  - Windows Server 2016
> >  - 6Gb RAM
> >  - 4 cores
> > Workload
> >  - Upload a file to the guest with SMB to simulate moderate
> >    memory dirtying
> >  - Measure the memory transfer time portion of each checkpoint
> >  - 600ms COLO checkpoint interval
> > 
> > Results
> > Plain
> >  idle mean: 4.50ms 99per: 10.33ms
> >  load mean: 24.30ms 99per: 78.05ms
> > Multifd-4
> >  idle mean: 6.48ms 99per: 10.41ms
> >  load mean: 14.12ms 99per: 31.27ms  
> 
> Thanks for the numbers.  They're persuasive at least from 1st look.
> 
> Said that, one major question is, multifd should only help with throughput
> when cpu is a bottleneck sending, in your case it's 10Gbps NIC.  Normally
> any decent cpu should be able to push closer to 10Gbps even without
> multifd.
> 
> Per my previous experiences, multifd can only show a difference when the
> hosts have at least 25GBps+ bandwidth available.
> 
> Maybe you turned on compression already?  If so, worth stating the
> compressor methods chosen / parameters.

No, it's just the old Haswell CPU I guess. My clients also use it on
embedded platforms with embedded/mobile CPUs, so not necessarily the
fastest CPUs.

Btw, I forgot to mention, it's already worth it for the precopy phase
as it helps with converging. And also for my other patch that sends
dirty ram in the time between the checkpoints.

> 
> > 
> > Evaluation
> > While multifd has slightly higher latency when the guest idles, it is
> > 10ms faster under load and more importantly it's worst case latency is
> > less than 1/2 of plain under load as can be seen in the 99. Percentile.
> > 
> > Signed-off-by: Juan Quintela <quintela@redhat.com>
> > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
> > ---
> >  MAINTAINERS              |  1 +
> >  migration/meson.build    |  2 +-
> >  migration/multifd-colo.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  migration/multifd-colo.h | 26 +++++++++++++++++++++++++
> >  migration/multifd.c      | 12 ++++++++++++
> >  migration/multifd.h      |  1 +
> >  6 files changed, 90 insertions(+), 1 deletion(-)
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 563804345fec68ee72793dbb7c1b7e5be4c32083..dbb217255c2cf35dc0ce971c2021b130fac5469b 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -3837,6 +3837,7 @@ COLO Framework
> >  M: Lukas Straub <lukasstraub2@web.de>
> >  S: Maintained
> >  F: migration/colo*
> > +F: migration/multifd-colo.*
> >  F: include/migration/colo.h
> >  F: include/migration/failover.h
> >  F: docs/COLO-FT.txt
> > diff --git a/migration/meson.build b/migration/meson.build
> > index 16909d54c5110fc5d8187fd3a68c4a5b08b59ea7..1e59fe4f1f0bbfffed90df38e8f39fa87bceb9b9 100644
> > --- a/migration/meson.build
> > +++ b/migration/meson.build
> > @@ -40,7 +40,7 @@ system_ss.add(files(
> >  ), gnutls, zlib)
> >  
> >  if get_option('replication').allowed()
> > -  system_ss.add(files('colo-failover.c', 'colo.c'))
> > +  system_ss.add(files('colo-failover.c', 'colo.c', 'multifd-colo.c'))
> >  else
> >    system_ss.add(files('colo-stubs.c'))
> >  endif
> > diff --git a/migration/multifd-colo.c b/migration/multifd-colo.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..d8d98e79b12ed52c41f341052a682d7786e221b5
> > --- /dev/null
> > +++ b/migration/multifd-colo.c
> > @@ -0,0 +1,49 @@
> > +/*
> > + * SPDX-License-Identifier: GPL-2.0-or-later
> > + *
> > + * multifd colo implementation
> > + *
> > + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "exec/target_page.h"
> > +#include "qemu/error-report.h"
> > +#include "qapi/error.h"
> > +#include "ram.h"
> > +#include "multifd.h"
> > +#include "options.h"
> > +#include "io/channel-socket.h"
> > +#include "migration/colo.h"
> > +#include "multifd-colo.h"
> > +#include "system/ramblock.h"
> > +
> > +void multifd_colo_prepare_recv(MultiFDRecvParams *p)
> > +{
> > +    assert(p->block->colo_cache);
> > +
> > +    /*
> > +     * While we're still in precopy state (not yet in colo state), we copy
> > +     * received pages to both guest and cache. No need to set dirty bits,
> > +     * since guest and cache memory are in sync.
> > +     */
> > +    if (migration_incoming_in_colo_state()) {
> > +        colo_record_bitmap(p->block, p->normal, p->normal_num);
> > +    }
> > +    p->host = p->block->colo_cache;  
> 
> I should have mentioned it while reviewing the previous version, anyway..
> 
> IMHO it would be better to have one place setting p->host instead of
> overwritting it.
> 
> So instead of hooking before ->recv(), we should do it in
> multifd_ram_unfill_packet(), moving the p->host update to the end of
> function and hook it there with COLO (so that you can still record the
> bitmaps, only after normal[]).

Okay, will fix that in the next version.

> 
> Another thing, which might be more important: you seem to have ignored
> zero[], but I think you need it.  zero[] keeps all pages that are zeros
> (which may not used to be zeros).  So IMHO you'll need to record them as
> dirty too in COLO's dest bitmap, otherwise you may hit hard to debug RAM
> corruptions.

Yes, that is a good catch. I missed that during forward porting.

> 
> > +}
> > +
> > +void multifd_colo_process_recv(MultiFDRecvParams *p)
> > +{
> > +    if (!migration_incoming_in_colo_state()) {
> > +        for (int i = 0; i < p->normal_num; i++) {
> > +            void *guest = p->block->host + p->normal[i];
> > +            void *cache = p->host + p->normal[i];
> > +            memcpy(guest, cache, multifd_ram_page_size());
> > +        }
> > +    }
> > +    p->host = p->block->host;  
> 
> Is resetting the pointer required?  If not, we can skip it.

Not that I can see, i will remove this in the next version.

> 
> > +}
> > diff --git a/migration/multifd-colo.h b/migration/multifd-colo.h
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..82eaf3f48c47de2f090f9de52f9d57a337d4754a
> > --- /dev/null
> > +++ b/migration/multifd-colo.h
> > @@ -0,0 +1,26 @@
> > +/*
> > + * SPDX-License-Identifier: GPL-2.0-or-later
> > + *
> > + * multifd colo header
> > + *
> > + * Copyright (c) Lukas Straub <lukasstraub2@web.de>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +
> > +#ifndef QEMU_MIGRATION_MULTIFD_COLO_H
> > +#define QEMU_MIGRATION_MULTIFD_COLO_H
> > +
> > +#ifdef CONFIG_REPLICATION
> > +
> > +void multifd_colo_prepare_recv(MultiFDRecvParams *p);
> > +void multifd_colo_process_recv(MultiFDRecvParams *p);
> > +
> > +#else
> > +
> > +static inline void multifd_colo_prepare_recv(MultiFDRecvParams *p) {}
> > +static inline void multifd_colo_process_recv(MultiFDRecvParams *p) {}
> > +
> > +#endif
> > +#endif
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index 8e71171fb7a17726ba7eb0705e293c41e8aa32ec..6c85acec3bac134e85cfcee0d32057134f5af8d1 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -29,6 +29,7 @@
> >  #include "qemu-file.h"
> >  #include "trace.h"
> >  #include "multifd.h"
> > +#include "multifd-colo.h"
> >  #include "threadinfo.h"
> >  #include "options.h"
> >  #include "qemu/yank.h"
> > @@ -1269,7 +1270,18 @@ static int multifd_ram_state_recv(MultiFDRecvParams *p, Error **errp)
> >  {
> >      int ret;
> >  
> > +    if (migrate_colo()) {
> > +        multifd_colo_prepare_recv(p);
> > +    }
> > +
> >      ret = multifd_recv_state->ops->recv(p, errp);
> > +    if (ret != 0) {
> > +        return ret;
> > +    }
> > +
> > +    if (migrate_colo()) {
> > +        multifd_colo_process_recv(p);
> > +    }
> >  
> >      return ret;
> >  }
> > diff --git a/migration/multifd.h b/migration/multifd.h
> > index 9b6d81e7ede024f05d4cd235de95e73840d0bbc4..7036f438fade1baed2442bfdcf8b5d6397c4a448 100644
> > --- a/migration/multifd.h
> > +++ b/migration/multifd.h
> > @@ -280,6 +280,7 @@ typedef struct {
> >      /* ramblock */
> >      RAMBlock *block;
> >      /* ramblock host address */
> > +    /* or points to the corresponding address in the colo cache */  
> 
> Nit: we can merge it with /* ... */, and some wording change:
> 
>        /*
>         * Normally, it points to ramblock's host address.  When COLO
>         * enabled, it points to the mirror cache for the ramblock.
>         */

Okay, will fix this.

> 
> >      uint8_t *host;
> >      /* buffers to recv */
> >      struct iovec *iov;
> > 
> > -- 
> > 2.39.5
> >   
>

Re: [PATCH v2 4/8] multifd: Add COLO support

Posted by Daniel P. Berrangé 2 weeks, 3 days ago

On Tue, Jan 20, 2026 at 12:13:58PM -0500, Peter Xu wrote:
> On Sat, Jan 17, 2026 at 03:09:11PM +0100, Lukas Straub wrote:
> > Like in the normal ram_load() path, put the received pages into the
> > colo cache and mark the pages in the bitmap so that they will be
> > flushed to the guest later.
> > 
> > Multifd with COLO is useful to reduce the VM pause time during checkpointing
> > for latency sensitive workloads. In such workloads the worst-case latency
> > is especially important.
> > 
> > Also, multifd migration is the preferred way to do migration nowadays and this
> > allows to use multifd compression with COLO.
> > 
> > Benchmark:
> > Cluster nodes
> >  - Intel Xenon E5-2630 v3
> >  - 48Gb RAM
> >  - 10G Ethernet
> > Guest
> >  - Windows Server 2016
> >  - 6Gb RAM
> >  - 4 cores
> > Workload
> >  - Upload a file to the guest with SMB to simulate moderate
> >    memory dirtying
> >  - Measure the memory transfer time portion of each checkpoint
> >  - 600ms COLO checkpoint interval
> > 
> > Results
> > Plain
> >  idle mean: 4.50ms 99per: 10.33ms
> >  load mean: 24.30ms 99per: 78.05ms
> > Multifd-4
> >  idle mean: 6.48ms 99per: 10.41ms
> >  load mean: 14.12ms 99per: 31.27ms
> 
> Thanks for the numbers.  They're persuasive at least from 1st look.
> 
> Said that, one major question is, multifd should only help with throughput
> when cpu is a bottleneck sending, in your case it's 10Gbps NIC.  Normally
> any decent cpu should be able to push closer to 10Gbps even without
> multifd.

That assumes the CPUs used by migration are otherwise idle though. If the
host is busy running guest workloads, only small timeslices may be available
for use by migration threads. Using multifd would better utilize what's
available if multiple host CPUs have partial availability.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH v2 4/8] multifd: Add COLO support

Posted by Peter Xu 2 weeks, 3 days ago

On Tue, Jan 20, 2026 at 06:05:15PM +0000, Daniel P. Berrangé wrote:
> On Tue, Jan 20, 2026 at 12:13:58PM -0500, Peter Xu wrote:
> > On Sat, Jan 17, 2026 at 03:09:11PM +0100, Lukas Straub wrote:
> > > Like in the normal ram_load() path, put the received pages into the
> > > colo cache and mark the pages in the bitmap so that they will be
> > > flushed to the guest later.
> > > 
> > > Multifd with COLO is useful to reduce the VM pause time during checkpointing
> > > for latency sensitive workloads. In such workloads the worst-case latency
> > > is especially important.
> > > 
> > > Also, multifd migration is the preferred way to do migration nowadays and this
> > > allows to use multifd compression with COLO.
> > > 
> > > Benchmark:
> > > Cluster nodes
> > >  - Intel Xenon E5-2630 v3
> > >  - 48Gb RAM
> > >  - 10G Ethernet
> > > Guest
> > >  - Windows Server 2016
> > >  - 6Gb RAM
> > >  - 4 cores
> > > Workload
> > >  - Upload a file to the guest with SMB to simulate moderate
> > >    memory dirtying
> > >  - Measure the memory transfer time portion of each checkpoint
> > >  - 600ms COLO checkpoint interval
> > > 
> > > Results
> > > Plain
> > >  idle mean: 4.50ms 99per: 10.33ms
> > >  load mean: 24.30ms 99per: 78.05ms
> > > Multifd-4
> > >  idle mean: 6.48ms 99per: 10.41ms
> > >  load mean: 14.12ms 99per: 31.27ms
> > 
> > Thanks for the numbers.  They're persuasive at least from 1st look.
> > 
> > Said that, one major question is, multifd should only help with throughput
> > when cpu is a bottleneck sending, in your case it's 10Gbps NIC.  Normally
> > any decent cpu should be able to push closer to 10Gbps even without
> > multifd.
> 
> That assumes the CPUs used by migration are otherwise idle though. If the
> host is busy running guest workloads, only small timeslices may be available
> for use by migration threads. Using multifd would better utilize what's
> available if multiple host CPUs have partial availability.

Hmm, I'm not sure this is the case for when the test was run above.  I
rarely see a host's CPUs been completely occupied.  Say, on 16 cores system
it means ~1600% CPU utilization.

I think it's because normally when a host will be hosting VMs, we should
normally have some of CPU resources reserved for host housekeeping.
Otherwise I'm not sure how to guarantee general availability of the
host.. and IIUC it may also affect the guest.

Here, IMHO as long as there's >100% CPU resource on this host (e.g. out of
1600% on a 16 cores system), enabling multifd or not shouldn't matter much
when the NIC is 10Gbps.

Old but decent processor should be able to push 10~15Gbps, new processor
should be able to push to ~25Gbps or more, with 100% CPU resource.

It's because the scheduler will schedule whatever thread (either the
migration thread alone, or multifd threads) onto whatever core that will
still be free (or some cores that have free cycles).

When all CPUs are occupied, IMHO multifd shouldn't help much
either.. maybe >1 threads make it easier to get scheduled (hence more time
slices from scheduler), but I believe that's not the major use case for
multifd.. it should really be when there're plenty of CPU resources.

Thanks,

-- 
Peter Xu