[PATCH 1/3] migration/colo: Deprecate COLO migration framework

Peter Xu posted 3 patches 3 weeks, 4 days ago
Maintainers: Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>, Eric Blake <eblake@redhat.com>, Markus Armbruster <armbru@redhat.com>
[PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Peter Xu 3 weeks, 4 days ago
COLO was broken for QEMU release 10.0/10.1 without anyone noticed.  One
reason might be that we don't have an unit test for COLO (which we
explicitly require now for any new migration feature).  The other reason
should be that there are just no more active COLO users, at least based on
the latest development of QEMU.

I don't remember seeing anything really active in the past few years in
COLO development.

Meanwhile, COLO migration framework maintainer (Hailiang Zhang)'s last
email to qemu-devel is in Dec 2021 where the patch proposed an email
change (<20211214075424.6920-1-zhanghailiang@xfusion.com>).

We've discussed this for a while, see latest discussions here (our thoughts
of deprecating COLO framework might be earlier than that, but still):

https://lore.kernel.org/r/aQu6bDAA7hnIPg-y@x1.local/
https://lore.kernel.org/r/20251230-colo_unit_test_multifd-v1-0-f9734bc74c71@web.de

Let's make it partly official and put COLO into deprecation list.  If
anyone cares about COLO and is deploying it, please send an email to
qemu-devel to discuss.

Otherwise, let's try to save some energy for either maintainers or
developers who is looking after QEMU. Let's save the work if we don't even
know what the work is for.

Cc: Lukáš Doktor <ldoktor@redhat.com>
Cc: Juan Quintela <quintela@trasno.org>
Cc: Dr. David Alan Gilbert <dave@treblig.org>
Cc: Zhang Chen <zhangckid@gmail.com>
Cc: zhanghailiang@xfusion.com
Cc: Li Zhijian <lizhijian@fujitsu.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 docs/about/deprecated.rst | 6 ++++++
 qapi/migration.json       | 5 ++---
 migration/options.c       | 4 ++++
 3 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 7abb3dab59..b499b2acb0 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -580,3 +580,9 @@ command documentation for details on the ``fdset`` usage.
 
 The ``zero-blocks`` capability was part of the block migration which
 doesn't exist anymore since it was removed in QEMU v9.1.
+
+COLO migration framework (since 11.0)
+'''''''''''''''''''''''''''''''''''''
+
+To be removed with no replacement, as the COLO migration framework doesn't
+seem to have any active user for a while.
diff --git a/qapi/migration.json b/qapi/migration.json
index 201dedd982..3c868efe38 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -531,8 +531,7 @@
 #
 # @unstable: Members @x-colo and @x-ignore-shared are experimental.
 #
-# @deprecated: Member @zero-blocks is deprecated as being part of
-#     block migration which was already removed.
+# @deprecated: Member @zero-blocks and @x-colo are deprecated.
 #
 # Since: 1.2
 ##
@@ -540,7 +539,7 @@
   'data': ['xbzrle', 'rdma-pin-all', 'auto-converge',
            { 'name': 'zero-blocks', 'features': [ 'deprecated' ] },
            'events', 'postcopy-ram',
-           { 'name': 'x-colo', 'features': [ 'unstable' ] },
+           { 'name': 'x-colo', 'features': [ 'unstable', 'deprecated' ] },
            'release-ram',
            'return-path', 'pause-before-switchover', 'multifd',
            'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
diff --git a/migration/options.c b/migration/options.c
index 9a5a39c886..318850ba94 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -580,6 +580,10 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp)
         warn_report("zero-blocks capability is deprecated");
     }
 
+    if (new_caps[MIGRATION_CAPABILITY_X_COLO]) {
+        warn_report("COLO migration framework is deprecated");
+    }
+
 #ifndef CONFIG_REPLICATION
     if (new_caps[MIGRATION_CAPABILITY_X_COLO]) {
         error_setg(errp, "QEMU compiled without replication module"
-- 
2.50.1


Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Markus Armbruster 3 weeks, 4 days ago
Peter Xu <peterx@redhat.com> writes:

> COLO was broken for QEMU release 10.0/10.1 without anyone noticed.

We could arguably drop this right away.  I'm not demanding we do, just
pointing out.

First, COLO is marked 'unstable' in the QAPI schema:

* MigrationCapability member x-colo:

    # @unstable: Members @x-colo and @x-ignore-shared are experimental.

* MigrationParameter and MigrationParameters member x-checkpoint-delay:

    # @unstable: Members @x-checkpoint-delay and
    #     @x-vcpu-dirty-limit-period are experimental.

* Command x-colo-lost-heartbeat:

    # @unstable: This command is experimental.

There's more COLO stuff we neglected to mark, e.g. MigrationStatus
member @colo, event COLO_EXIT, commands xen-colo-do-checkpoint,
query-colo-status.  We should clean that up.  More on that below.

Second, it's been broken for two releases, our deprecation grace period.
In my opinion, "broken" is even stronger notice than "deprecated".

>                                                                     One
> reason might be that we don't have an unit test for COLO (which we
> explicitly require now for any new migration feature).  The other reason
> should be that there are just no more active COLO users, at least based on
> the latest development of QEMU.
>
> I don't remember seeing anything really active in the past few years in
> COLO development.
>
> Meanwhile, COLO migration framework maintainer (Hailiang Zhang)'s last
> email to qemu-devel is in Dec 2021 where the patch proposed an email
> change (<20211214075424.6920-1-zhanghailiang@xfusion.com>).
>
> We've discussed this for a while, see latest discussions here (our thoughts
> of deprecating COLO framework might be earlier than that, but still):
>
> https://lore.kernel.org/r/aQu6bDAA7hnIPg-y@x1.local/
> https://lore.kernel.org/r/20251230-colo_unit_test_multifd-v1-0-f9734bc74c71@web.de
>
> Let's make it partly official and put COLO into deprecation list.  If
> anyone cares about COLO and is deploying it, please send an email to
> qemu-devel to discuss.
>
> Otherwise, let's try to save some energy for either maintainers or
> developers who is looking after QEMU. Let's save the work if we don't even
> know what the work is for.
>
> Cc: Lukáš Doktor <ldoktor@redhat.com>
> Cc: Juan Quintela <quintela@trasno.org>
> Cc: Dr. David Alan Gilbert <dave@treblig.org>
> Cc: Zhang Chen <zhangckid@gmail.com>
> Cc: zhanghailiang@xfusion.com
> Cc: Li Zhijian <lizhijian@fujitsu.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  docs/about/deprecated.rst | 6 ++++++
>  qapi/migration.json       | 5 ++---
>  migration/options.c       | 4 ++++
>  3 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
> index 7abb3dab59..b499b2acb0 100644
> --- a/docs/about/deprecated.rst
> +++ b/docs/about/deprecated.rst
> @@ -580,3 +580,9 @@ command documentation for details on the ``fdset`` usage.
>  
>  The ``zero-blocks`` capability was part of the block migration which
>  doesn't exist anymore since it was removed in QEMU v9.1.
> +
> +COLO migration framework (since 11.0)
> +'''''''''''''''''''''''''''''''''''''
> +
> +To be removed with no replacement, as the COLO migration framework doesn't
> +seem to have any active user for a while.
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 201dedd982..3c868efe38 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -531,8 +531,7 @@
>  #
>  # @unstable: Members @x-colo and @x-ignore-shared are experimental.
>  #
> -# @deprecated: Member @zero-blocks is deprecated as being part of
> -#     block migration which was already removed.
> +# @deprecated: Member @zero-blocks and @x-colo are deprecated.
>  #
>  # Since: 1.2
>  ##
> @@ -540,7 +539,7 @@
>    'data': ['xbzrle', 'rdma-pin-all', 'auto-converge',
>             { 'name': 'zero-blocks', 'features': [ 'deprecated' ] },
>             'events', 'postcopy-ram',
> -           { 'name': 'x-colo', 'features': [ 'unstable' ] },
> +           { 'name': 'x-colo', 'features': [ 'unstable', 'deprecated' ] },
>             'release-ram',
>             'return-path', 'pause-before-switchover', 'multifd',
>             'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',

Issues / doubts:

1. We delete the text why @zero-blocks is deprecated.  Harmless; the
next patch drops @zero-blocks entirely.  Better: swap the patches.

2. The text for @x-colo is lacking.  Suggest something like "Member
@x-colo" is deprecated without replacement."

3. Does it make sense to keep x-colo @unstable?

4. Shouldn't we mark *all* the COLO interfaces the same way?

> diff --git a/migration/options.c b/migration/options.c
> index 9a5a39c886..318850ba94 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -580,6 +580,10 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp)
>          warn_report("zero-blocks capability is deprecated");
>      }
>  
> +    if (new_caps[MIGRATION_CAPABILITY_X_COLO]) {
> +        warn_report("COLO migration framework is deprecated");
> +    }
> +
>  #ifndef CONFIG_REPLICATION
>      if (new_caps[MIGRATION_CAPABILITY_X_COLO]) {
>          error_setg(errp, "QEMU compiled without replication module"
Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Peter Xu 3 weeks, 4 days ago
On Thu, Jan 15, 2026 at 06:56:49AM +0100, Markus Armbruster wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > COLO was broken for QEMU release 10.0/10.1 without anyone noticed.
> 
> We could arguably drop this right away.  I'm not demanding we do, just
> pointing out.
> 
> First, COLO is marked 'unstable' in the QAPI schema:
> 
> * MigrationCapability member x-colo:
> 
>     # @unstable: Members @x-colo and @x-ignore-shared are experimental.
> 
> * MigrationParameter and MigrationParameters member x-checkpoint-delay:
> 
>     # @unstable: Members @x-checkpoint-delay and
>     #     @x-vcpu-dirty-limit-period are experimental.
> 
> * Command x-colo-lost-heartbeat:
> 
>     # @unstable: This command is experimental.
> 
> There's more COLO stuff we neglected to mark, e.g. MigrationStatus
> member @colo, event COLO_EXIT, commands xen-colo-do-checkpoint,
> query-colo-status.  We should clean that up.  More on that below.
> 
> Second, it's been broken for two releases, our deprecation grace period.
> In my opinion, "broken" is even stronger notice than "deprecated".

I agree.

> 
> >                                                                     One
> > reason might be that we don't have an unit test for COLO (which we
> > explicitly require now for any new migration feature).  The other reason
> > should be that there are just no more active COLO users, at least based on
> > the latest development of QEMU.
> >
> > I don't remember seeing anything really active in the past few years in
> > COLO development.
> >
> > Meanwhile, COLO migration framework maintainer (Hailiang Zhang)'s last
> > email to qemu-devel is in Dec 2021 where the patch proposed an email
> > change (<20211214075424.6920-1-zhanghailiang@xfusion.com>).
> >
> > We've discussed this for a while, see latest discussions here (our thoughts
> > of deprecating COLO framework might be earlier than that, but still):
> >
> > https://lore.kernel.org/r/aQu6bDAA7hnIPg-y@x1.local/
> > https://lore.kernel.org/r/20251230-colo_unit_test_multifd-v1-0-f9734bc74c71@web.de
> >
> > Let's make it partly official and put COLO into deprecation list.  If
> > anyone cares about COLO and is deploying it, please send an email to
> > qemu-devel to discuss.
> >
> > Otherwise, let's try to save some energy for either maintainers or
> > developers who is looking after QEMU. Let's save the work if we don't even
> > know what the work is for.
> >
> > Cc: Lukáš Doktor <ldoktor@redhat.com>
> > Cc: Juan Quintela <quintela@trasno.org>
> > Cc: Dr. David Alan Gilbert <dave@treblig.org>
> > Cc: Zhang Chen <zhangckid@gmail.com>
> > Cc: zhanghailiang@xfusion.com
> > Cc: Li Zhijian <lizhijian@fujitsu.com>
> > Cc: Jason Wang <jasowang@redhat.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  docs/about/deprecated.rst | 6 ++++++
> >  qapi/migration.json       | 5 ++---
> >  migration/options.c       | 4 ++++
> >  3 files changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
> > index 7abb3dab59..b499b2acb0 100644
> > --- a/docs/about/deprecated.rst
> > +++ b/docs/about/deprecated.rst
> > @@ -580,3 +580,9 @@ command documentation for details on the ``fdset`` usage.
> >  
> >  The ``zero-blocks`` capability was part of the block migration which
> >  doesn't exist anymore since it was removed in QEMU v9.1.
> > +
> > +COLO migration framework (since 11.0)
> > +'''''''''''''''''''''''''''''''''''''
> > +
> > +To be removed with no replacement, as the COLO migration framework doesn't
> > +seem to have any active user for a while.
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 201dedd982..3c868efe38 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -531,8 +531,7 @@
> >  #
> >  # @unstable: Members @x-colo and @x-ignore-shared are experimental.
> >  #
> > -# @deprecated: Member @zero-blocks is deprecated as being part of
> > -#     block migration which was already removed.
> > +# @deprecated: Member @zero-blocks and @x-colo are deprecated.
> >  #
> >  # Since: 1.2
> >  ##
> > @@ -540,7 +539,7 @@
> >    'data': ['xbzrle', 'rdma-pin-all', 'auto-converge',
> >             { 'name': 'zero-blocks', 'features': [ 'deprecated' ] },
> >             'events', 'postcopy-ram',
> > -           { 'name': 'x-colo', 'features': [ 'unstable' ] },
> > +           { 'name': 'x-colo', 'features': [ 'unstable', 'deprecated' ] },
> >             'release-ram',
> >             'return-path', 'pause-before-switchover', 'multifd',
> >             'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
> 
> Issues / doubts:
> 
> 1. We delete the text why @zero-blocks is deprecated.  Harmless; the
> next patch drops @zero-blocks entirely.  Better: swap the patches.

Will do.

> 
> 2. The text for @x-colo is lacking.  Suggest something like "Member
> @x-colo" is deprecated without replacement."
> 
> 3. Does it make sense to keep x-colo @unstable?
> 
> 4. Shouldn't we mark *all* the COLO interfaces the same way?

All questions are fair asks.  For issue 4, it means we will need to add new
tag to COLO if we have the deprecation window..

Let me try to propose removal of COLO in 11.0 directly and see if there'll
be objections.

> 
> > diff --git a/migration/options.c b/migration/options.c
> > index 9a5a39c886..318850ba94 100644
> > --- a/migration/options.c
> > +++ b/migration/options.c
> > @@ -580,6 +580,10 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp)
> >          warn_report("zero-blocks capability is deprecated");
> >      }
> >  
> > +    if (new_caps[MIGRATION_CAPABILITY_X_COLO]) {
> > +        warn_report("COLO migration framework is deprecated");
> > +    }
> > +
> >  #ifndef CONFIG_REPLICATION
> >      if (new_caps[MIGRATION_CAPABILITY_X_COLO]) {
> >          error_setg(errp, "QEMU compiled without replication module"
> 

-- 
Peter Xu


Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Dr. David Alan Gilbert 3 weeks, 4 days ago
* Peter Xu (peterx@redhat.com) wrote:
> COLO was broken for QEMU release 10.0/10.1 without anyone noticed.  One
> reason might be that we don't have an unit test for COLO (which we
> explicitly require now for any new migration feature).  The other reason
> should be that there are just no more active COLO users, at least based on
> the latest development of QEMU.
> 
> I don't remember seeing anything really active in the past few years in
> COLO development.
> 
> Meanwhile, COLO migration framework maintainer (Hailiang Zhang)'s last
> email to qemu-devel is in Dec 2021 where the patch proposed an email
> change (<20211214075424.6920-1-zhanghailiang@xfusion.com>).
> 
> We've discussed this for a while, see latest discussions here (our thoughts
> of deprecating COLO framework might be earlier than that, but still):
> 
> https://lore.kernel.org/r/aQu6bDAA7hnIPg-y@x1.local/
> https://lore.kernel.org/r/20251230-colo_unit_test_multifd-v1-0-f9734bc74c71@web.de
> 
> Let's make it partly official and put COLO into deprecation list.  If
> anyone cares about COLO and is deploying it, please send an email to
> qemu-devel to discuss.

A shame, but it probably makes sense; it was always quite tricky to get
going.

Dave

> Otherwise, let's try to save some energy for either maintainers or
> developers who is looking after QEMU. Let's save the work if we don't even
> know what the work is for.
> 
> Cc: Lukáš Doktor <ldoktor@redhat.com>
> Cc: Juan Quintela <quintela@trasno.org>
> Cc: Dr. David Alan Gilbert <dave@treblig.org>
> Cc: Zhang Chen <zhangckid@gmail.com>
> Cc: zhanghailiang@xfusion.com
> Cc: Li Zhijian <lizhijian@fujitsu.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  docs/about/deprecated.rst | 6 ++++++
>  qapi/migration.json       | 5 ++---
>  migration/options.c       | 4 ++++
>  3 files changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
> index 7abb3dab59..b499b2acb0 100644
> --- a/docs/about/deprecated.rst
> +++ b/docs/about/deprecated.rst
> @@ -580,3 +580,9 @@ command documentation for details on the ``fdset`` usage.
>  
>  The ``zero-blocks`` capability was part of the block migration which
>  doesn't exist anymore since it was removed in QEMU v9.1.
> +
> +COLO migration framework (since 11.0)
> +'''''''''''''''''''''''''''''''''''''
> +
> +To be removed with no replacement, as the COLO migration framework doesn't
> +seem to have any active user for a while.
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 201dedd982..3c868efe38 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -531,8 +531,7 @@
>  #
>  # @unstable: Members @x-colo and @x-ignore-shared are experimental.
>  #
> -# @deprecated: Member @zero-blocks is deprecated as being part of
> -#     block migration which was already removed.
> +# @deprecated: Member @zero-blocks and @x-colo are deprecated.
>  #
>  # Since: 1.2
>  ##
> @@ -540,7 +539,7 @@
>    'data': ['xbzrle', 'rdma-pin-all', 'auto-converge',
>             { 'name': 'zero-blocks', 'features': [ 'deprecated' ] },
>             'events', 'postcopy-ram',
> -           { 'name': 'x-colo', 'features': [ 'unstable' ] },
> +           { 'name': 'x-colo', 'features': [ 'unstable', 'deprecated' ] },
>             'release-ram',
>             'return-path', 'pause-before-switchover', 'multifd',
>             'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
> diff --git a/migration/options.c b/migration/options.c
> index 9a5a39c886..318850ba94 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -580,6 +580,10 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp)
>          warn_report("zero-blocks capability is deprecated");
>      }
>  
> +    if (new_caps[MIGRATION_CAPABILITY_X_COLO]) {
> +        warn_report("COLO migration framework is deprecated");
> +    }
> +
>  #ifndef CONFIG_REPLICATION
>      if (new_caps[MIGRATION_CAPABILITY_X_COLO]) {
>          error_setg(errp, "QEMU compiled without replication module"
> -- 
> 2.50.1
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Peter Xu 3 weeks, 4 days ago
On Wed, Jan 14, 2026 at 02:56:57PM -0500, Peter Xu wrote:
> COLO was broken for QEMU release 10.0/10.1 without anyone noticed.  One
> reason might be that we don't have an unit test for COLO (which we
> explicitly require now for any new migration feature).  The other reason
> should be that there are just no more active COLO users, at least based on
> the latest development of QEMU.
> 
> I don't remember seeing anything really active in the past few years in
> COLO development.
> 
> Meanwhile, COLO migration framework maintainer (Hailiang Zhang)'s last
> email to qemu-devel is in Dec 2021 where the patch proposed an email
> change (<20211214075424.6920-1-zhanghailiang@xfusion.com>).
> 
> We've discussed this for a while, see latest discussions here (our thoughts
> of deprecating COLO framework might be earlier than that, but still):
> 
> https://lore.kernel.org/r/aQu6bDAA7hnIPg-y@x1.local/
> https://lore.kernel.org/r/20251230-colo_unit_test_multifd-v1-0-f9734bc74c71@web.de
> 
> Let's make it partly official and put COLO into deprecation list.  If
> anyone cares about COLO and is deploying it, please send an email to
> qemu-devel to discuss.
> 
> Otherwise, let's try to save some energy for either maintainers or
> developers who is looking after QEMU. Let's save the work if we don't even
> know what the work is for.
> 
> Cc: Lukáš Doktor <ldoktor@redhat.com>

My apologize, I copied the wrong email.

Cc: Lukas Straub <lukasstraub2@web.de>

> Cc: Juan Quintela <quintela@trasno.org>
> Cc: Dr. David Alan Gilbert <dave@treblig.org>
> Cc: Zhang Chen <zhangckid@gmail.com>
> Cc: zhanghailiang@xfusion.com
> Cc: Li Zhijian <lizhijian@fujitsu.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  docs/about/deprecated.rst | 6 ++++++
>  qapi/migration.json       | 5 ++---
>  migration/options.c       | 4 ++++
>  3 files changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
> index 7abb3dab59..b499b2acb0 100644
> --- a/docs/about/deprecated.rst
> +++ b/docs/about/deprecated.rst
> @@ -580,3 +580,9 @@ command documentation for details on the ``fdset`` usage.
>  
>  The ``zero-blocks`` capability was part of the block migration which
>  doesn't exist anymore since it was removed in QEMU v9.1.
> +
> +COLO migration framework (since 11.0)
> +'''''''''''''''''''''''''''''''''''''
> +
> +To be removed with no replacement, as the COLO migration framework doesn't
> +seem to have any active user for a while.
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 201dedd982..3c868efe38 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -531,8 +531,7 @@
>  #
>  # @unstable: Members @x-colo and @x-ignore-shared are experimental.
>  #
> -# @deprecated: Member @zero-blocks is deprecated as being part of
> -#     block migration which was already removed.
> +# @deprecated: Member @zero-blocks and @x-colo are deprecated.
>  #
>  # Since: 1.2
>  ##
> @@ -540,7 +539,7 @@
>    'data': ['xbzrle', 'rdma-pin-all', 'auto-converge',
>             { 'name': 'zero-blocks', 'features': [ 'deprecated' ] },
>             'events', 'postcopy-ram',
> -           { 'name': 'x-colo', 'features': [ 'unstable' ] },
> +           { 'name': 'x-colo', 'features': [ 'unstable', 'deprecated' ] },
>             'release-ram',
>             'return-path', 'pause-before-switchover', 'multifd',
>             'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
> diff --git a/migration/options.c b/migration/options.c
> index 9a5a39c886..318850ba94 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -580,6 +580,10 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp)
>          warn_report("zero-blocks capability is deprecated");
>      }
>  
> +    if (new_caps[MIGRATION_CAPABILITY_X_COLO]) {
> +        warn_report("COLO migration framework is deprecated");
> +    }
> +
>  #ifndef CONFIG_REPLICATION
>      if (new_caps[MIGRATION_CAPABILITY_X_COLO]) {
>          error_setg(errp, "QEMU compiled without replication module"
> -- 
> 2.50.1
> 

-- 
Peter Xu


Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Lukas Straub 3 weeks, 3 days ago
On Wed, 14 Jan 2026 15:11:55 -0500
Peter Xu <peterx@redhat.com> wrote:

> On Wed, Jan 14, 2026 at 02:56:57PM -0500, Peter Xu wrote:
> > COLO was broken for QEMU release 10.0/10.1 without anyone noticed.  One
> > reason might be that we don't have an unit test for COLO (which we
> > explicitly require now for any new migration feature).  The other reason
> > should be that there are just no more active COLO users, at least based on
> > the latest development of QEMU.
> > 
> > I don't remember seeing anything really active in the past few years in
> > COLO development.
> > 
> > Meanwhile, COLO migration framework maintainer (Hailiang Zhang)'s last
> > email to qemu-devel is in Dec 2021 where the patch proposed an email
> > change (<20211214075424.6920-1-zhanghailiang@xfusion.com>).
> > 
> > We've discussed this for a while, see latest discussions here (our thoughts
> > of deprecating COLO framework might be earlier than that, but still):
> > 
> > https://lore.kernel.org/r/aQu6bDAA7hnIPg-y@x1.local/
> > https://lore.kernel.org/r/20251230-colo_unit_test_multifd-v1-0-f9734bc74c71@web.de
> > 
> > Let's make it partly official and put COLO into deprecation list.  If
> > anyone cares about COLO and is deploying it, please send an email to
> > qemu-devel to discuss.
> > 
> > Otherwise, let's try to save some energy for either maintainers or
> > developers who is looking after QEMU. Let's save the work if we don't even
> > know what the work is for.
> > 
> > Cc: Lukáš Doktor <ldoktor@redhat.com>  
> 
> My apologize, I copied the wrong email.
> 
> Cc: Lukas Straub <lukasstraub2@web.de>

Nack.

This code has users, as explained in my other email:
https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464

Regards,
Lukas Straub

> 
> > Cc: Juan Quintela <quintela@trasno.org>
> > Cc: Dr. David Alan Gilbert <dave@treblig.org>
> > Cc: Zhang Chen <zhangckid@gmail.com>
> > Cc: zhanghailiang@xfusion.com
> > Cc: Li Zhijian <lizhijian@fujitsu.com>
> > Cc: Jason Wang <jasowang@redhat.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  docs/about/deprecated.rst | 6 ++++++
> >  qapi/migration.json       | 5 ++---
> >  migration/options.c       | 4 ++++
> >  3 files changed, 12 insertions(+), 3 deletions(-)
> > 
> > diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
> > index 7abb3dab59..b499b2acb0 100644
> > --- a/docs/about/deprecated.rst
> > +++ b/docs/about/deprecated.rst
> > @@ -580,3 +580,9 @@ command documentation for details on the ``fdset`` usage.
> >  
> >  The ``zero-blocks`` capability was part of the block migration which
> >  doesn't exist anymore since it was removed in QEMU v9.1.
> > +
> > +COLO migration framework (since 11.0)
> > +'''''''''''''''''''''''''''''''''''''
> > +
> > +To be removed with no replacement, as the COLO migration framework doesn't
> > +seem to have any active user for a while.
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 201dedd982..3c868efe38 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -531,8 +531,7 @@
> >  #
> >  # @unstable: Members @x-colo and @x-ignore-shared are experimental.
> >  #
> > -# @deprecated: Member @zero-blocks is deprecated as being part of
> > -#     block migration which was already removed.
> > +# @deprecated: Member @zero-blocks and @x-colo are deprecated.
> >  #
> >  # Since: 1.2
> >  ##
> > @@ -540,7 +539,7 @@
> >    'data': ['xbzrle', 'rdma-pin-all', 'auto-converge',
> >             { 'name': 'zero-blocks', 'features': [ 'deprecated' ] },
> >             'events', 'postcopy-ram',
> > -           { 'name': 'x-colo', 'features': [ 'unstable' ] },
> > +           { 'name': 'x-colo', 'features': [ 'unstable', 'deprecated' ] },
> >             'release-ram',
> >             'return-path', 'pause-before-switchover', 'multifd',
> >             'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
> > diff --git a/migration/options.c b/migration/options.c
> > index 9a5a39c886..318850ba94 100644
> > --- a/migration/options.c
> > +++ b/migration/options.c
> > @@ -580,6 +580,10 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp)
> >          warn_report("zero-blocks capability is deprecated");
> >      }
> >  
> > +    if (new_caps[MIGRATION_CAPABILITY_X_COLO]) {
> > +        warn_report("COLO migration framework is deprecated");
> > +    }
> > +
> >  #ifndef CONFIG_REPLICATION
> >      if (new_caps[MIGRATION_CAPABILITY_X_COLO]) {
> >          error_setg(errp, "QEMU compiled without replication module"
> > -- 
> > 2.50.1
> >   
> 

Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Markus Armbruster 3 weeks, 3 days ago
Lukas Straub <lukasstraub2@web.de> writes:

> On Wed, 14 Jan 2026 15:11:55 -0500
> Peter Xu <peterx@redhat.com> wrote:
>
>> On Wed, Jan 14, 2026 at 02:56:57PM -0500, Peter Xu wrote:
>> > COLO was broken for QEMU release 10.0/10.1 without anyone noticed.  One
>> > reason might be that we don't have an unit test for COLO (which we
>> > explicitly require now for any new migration feature).  The other reason
>> > should be that there are just no more active COLO users, at least based on
>> > the latest development of QEMU.
>> > 
>> > I don't remember seeing anything really active in the past few years in
>> > COLO development.
>> > 
>> > Meanwhile, COLO migration framework maintainer (Hailiang Zhang)'s last
>> > email to qemu-devel is in Dec 2021 where the patch proposed an email
>> > change (<20211214075424.6920-1-zhanghailiang@xfusion.com>).
>> > 
>> > We've discussed this for a while, see latest discussions here (our thoughts
>> > of deprecating COLO framework might be earlier than that, but still):
>> > 
>> > https://lore.kernel.org/r/aQu6bDAA7hnIPg-y@x1.local/
>> > https://lore.kernel.org/r/20251230-colo_unit_test_multifd-v1-0-f9734bc74c71@web.de
>> > 
>> > Let's make it partly official and put COLO into deprecation list.  If
>> > anyone cares about COLO and is deploying it, please send an email to
>> > qemu-devel to discuss.
>> > 
>> > Otherwise, let's try to save some energy for either maintainers or
>> > developers who is looking after QEMU. Let's save the work if we don't even
>> > know what the work is for.
>> > 
>> > Cc: Lukáš Doktor <ldoktor@redhat.com>  
>> 
>> My apologize, I copied the wrong email.
>> 
>> Cc: Lukas Straub <lukasstraub2@web.de>
>
> Nack.
>
> This code has users, as explained in my other email:
> https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464

Code being useful is not enough.  We must have people to maintain it
adequately.  This has not been the case for COLO in years.

Deprecating a feature with intent to remove it is not a death sentence.
It's a *suspended* death sentence: if somebody steps up to maintain it,
we can revert the deprecation, or extend the grace period to give them a
chance.

I think we should deprecate COLO now to send a clear distress signal.
The deprecation notice should explain it doesn't work, and will be
removed unless people step up to fix it and to maintain it.  This will
ensure progress one way or the other.  Doing nothing now virtually
ensures we'll have the same discussion again later.

"Broken for two releases without anyone noticing" and "maintainer absent
for more than four years" doesn't exacltly inspire hope, though.  We
should seriously consider removing it right away.

Lukas, can you give us hope?

[...]
Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Zhang Chen 3 weeks, 3 days ago
On Fri, Jan 16, 2026 at 2:26 PM Markus Armbruster <armbru@redhat.com> wrote:
>
> Lukas Straub <lukasstraub2@web.de> writes:
>
> > On Wed, 14 Jan 2026 15:11:55 -0500
> > Peter Xu <peterx@redhat.com> wrote:
> >
> >> On Wed, Jan 14, 2026 at 02:56:57PM -0500, Peter Xu wrote:
> >> > COLO was broken for QEMU release 10.0/10.1 without anyone noticed.  One
> >> > reason might be that we don't have an unit test for COLO (which we
> >> > explicitly require now for any new migration feature).  The other reason
> >> > should be that there are just no more active COLO users, at least based on
> >> > the latest development of QEMU.
> >> >
> >> > I don't remember seeing anything really active in the past few years in
> >> > COLO development.
> >> >
> >> > Meanwhile, COLO migration framework maintainer (Hailiang Zhang)'s last
> >> > email to qemu-devel is in Dec 2021 where the patch proposed an email
> >> > change (<20211214075424.6920-1-zhanghailiang@xfusion.com>).
> >> >
> >> > We've discussed this for a while, see latest discussions here (our thoughts
> >> > of deprecating COLO framework might be earlier than that, but still):
> >> >
> >> > https://lore.kernel.org/r/aQu6bDAA7hnIPg-y@x1.local/
> >> > https://lore.kernel.org/r/20251230-colo_unit_test_multifd-v1-0-f9734bc74c71@web.de
> >> >
> >> > Let's make it partly official and put COLO into deprecation list.  If
> >> > anyone cares about COLO and is deploying it, please send an email to
> >> > qemu-devel to discuss.
> >> >
> >> > Otherwise, let's try to save some energy for either maintainers or
> >> > developers who is looking after QEMU. Let's save the work if we don't even
> >> > know what the work is for.
> >> >
> >> > Cc: Lukáš Doktor <ldoktor@redhat.com>
> >>
> >> My apologize, I copied the wrong email.
> >>
> >> Cc: Lukas Straub <lukasstraub2@web.de>
> >
> > Nack.
> >
> > This code has users, as explained in my other email:
> > https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464
>
> Code being useful is not enough.  We must have people to maintain it
> adequately.  This has not been the case for COLO in years.
>
> Deprecating a feature with intent to remove it is not a death sentence.
> It's a *suspended* death sentence: if somebody steps up to maintain it,
> we can revert the deprecation, or extend the grace period to give them a
> chance.
>
> I think we should deprecate COLO now to send a clear distress signal.
> The deprecation notice should explain it doesn't work, and will be
> removed unless people step up to fix it and to maintain it.  This will
> ensure progress one way or the other.  Doing nothing now virtually
> ensures we'll have the same discussion again later.
>
> "Broken for two releases without anyone noticing" and "maintainer absent
> for more than four years" doesn't exacltly inspire hope, though.  We
> should seriously consider removing it right away.
>
> Lukas, can you give us hope?
>

Hi Markus,
Maybe you missed something?
I think Lukas is ready to maintain this code in his previous emails.
https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464

Thanks
Chen

> [...]
>
Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Markus Armbruster 3 weeks, 3 days ago
Zhang Chen <zhangckid@gmail.com> writes:

> On Fri, Jan 16, 2026 at 2:26 PM Markus Armbruster <armbru@redhat.com> wrote:
>>
>> Lukas Straub <lukasstraub2@web.de> writes:
>>
>> > On Wed, 14 Jan 2026 15:11:55 -0500
>> > Peter Xu <peterx@redhat.com> wrote:
>> >
>> >> On Wed, Jan 14, 2026 at 02:56:57PM -0500, Peter Xu wrote:
>> >> > COLO was broken for QEMU release 10.0/10.1 without anyone noticed.  One
>> >> > reason might be that we don't have an unit test for COLO (which we
>> >> > explicitly require now for any new migration feature).  The other reason
>> >> > should be that there are just no more active COLO users, at least based on
>> >> > the latest development of QEMU.
>> >> >
>> >> > I don't remember seeing anything really active in the past few years in
>> >> > COLO development.
>> >> >
>> >> > Meanwhile, COLO migration framework maintainer (Hailiang Zhang)'s last
>> >> > email to qemu-devel is in Dec 2021 where the patch proposed an email
>> >> > change (<20211214075424.6920-1-zhanghailiang@xfusion.com>).
>> >> >
>> >> > We've discussed this for a while, see latest discussions here (our thoughts
>> >> > of deprecating COLO framework might be earlier than that, but still):
>> >> >
>> >> > https://lore.kernel.org/r/aQu6bDAA7hnIPg-y@x1.local/
>> >> > https://lore.kernel.org/r/20251230-colo_unit_test_multifd-v1-0-f9734bc74c71@web.de
>> >> >
>> >> > Let's make it partly official and put COLO into deprecation list.  If
>> >> > anyone cares about COLO and is deploying it, please send an email to
>> >> > qemu-devel to discuss.
>> >> >
>> >> > Otherwise, let's try to save some energy for either maintainers or
>> >> > developers who is looking after QEMU. Let's save the work if we don't even
>> >> > know what the work is for.
>> >> >
>> >> > Cc: Lukáš Doktor <ldoktor@redhat.com>
>> >>
>> >> My apologize, I copied the wrong email.
>> >>
>> >> Cc: Lukas Straub <lukasstraub2@web.de>
>> >
>> > Nack.
>> >
>> > This code has users, as explained in my other email:
>> > https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464
>>
>> Code being useful is not enough.  We must have people to maintain it
>> adequately.  This has not been the case for COLO in years.
>>
>> Deprecating a feature with intent to remove it is not a death sentence.
>> It's a *suspended* death sentence: if somebody steps up to maintain it,
>> we can revert the deprecation, or extend the grace period to give them a
>> chance.
>>
>> I think we should deprecate COLO now to send a clear distress signal.
>> The deprecation notice should explain it doesn't work, and will be
>> removed unless people step up to fix it and to maintain it.  This will
>> ensure progress one way or the other.  Doing nothing now virtually
>> ensures we'll have the same discussion again later.
>>
>> "Broken for two releases without anyone noticing" and "maintainer absent
>> for more than four years" doesn't exacltly inspire hope, though.  We
>> should seriously consider removing it right away.
>>
>> Lukas, can you give us hope?
>>
>
> Hi Markus,
> Maybe you missed something?
> I think Lukas is ready to maintain this code in his previous emails.
> https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464

Patch to MAINTAINERS or it didn't happen ;)
Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Peter Xu 3 weeks, 3 days ago
On Fri, Jan 16, 2026 at 10:41:28AM +0100, Markus Armbruster wrote:
> Zhang Chen <zhangckid@gmail.com> writes:
> 
> > On Fri, Jan 16, 2026 at 2:26 PM Markus Armbruster <armbru@redhat.com> wrote:
> >>
> >> Lukas Straub <lukasstraub2@web.de> writes:
> >>
> >> > On Wed, 14 Jan 2026 15:11:55 -0500
> >> > Peter Xu <peterx@redhat.com> wrote:
> >> >
> >> >> On Wed, Jan 14, 2026 at 02:56:57PM -0500, Peter Xu wrote:
> >> >> > COLO was broken for QEMU release 10.0/10.1 without anyone noticed.  One
> >> >> > reason might be that we don't have an unit test for COLO (which we
> >> >> > explicitly require now for any new migration feature).  The other reason
> >> >> > should be that there are just no more active COLO users, at least based on
> >> >> > the latest development of QEMU.
> >> >> >
> >> >> > I don't remember seeing anything really active in the past few years in
> >> >> > COLO development.
> >> >> >
> >> >> > Meanwhile, COLO migration framework maintainer (Hailiang Zhang)'s last
> >> >> > email to qemu-devel is in Dec 2021 where the patch proposed an email
> >> >> > change (<20211214075424.6920-1-zhanghailiang@xfusion.com>).
> >> >> >
> >> >> > We've discussed this for a while, see latest discussions here (our thoughts
> >> >> > of deprecating COLO framework might be earlier than that, but still):
> >> >> >
> >> >> > https://lore.kernel.org/r/aQu6bDAA7hnIPg-y@x1.local/
> >> >> > https://lore.kernel.org/r/20251230-colo_unit_test_multifd-v1-0-f9734bc74c71@web.de
> >> >> >
> >> >> > Let's make it partly official and put COLO into deprecation list.  If
> >> >> > anyone cares about COLO and is deploying it, please send an email to
> >> >> > qemu-devel to discuss.
> >> >> >
> >> >> > Otherwise, let's try to save some energy for either maintainers or
> >> >> > developers who is looking after QEMU. Let's save the work if we don't even
> >> >> > know what the work is for.
> >> >> >
> >> >> > Cc: Lukáš Doktor <ldoktor@redhat.com>
> >> >>
> >> >> My apologize, I copied the wrong email.
> >> >>
> >> >> Cc: Lukas Straub <lukasstraub2@web.de>
> >> >
> >> > Nack.
> >> >
> >> > This code has users, as explained in my other email:
> >> > https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464
> >>
> >> Code being useful is not enough.  We must have people to maintain it
> >> adequately.  This has not been the case for COLO in years.
> >>
> >> Deprecating a feature with intent to remove it is not a death sentence.
> >> It's a *suspended* death sentence: if somebody steps up to maintain it,
> >> we can revert the deprecation, or extend the grace period to give them a
> >> chance.
> >>
> >> I think we should deprecate COLO now to send a clear distress signal.
> >> The deprecation notice should explain it doesn't work, and will be
> >> removed unless people step up to fix it and to maintain it.  This will
> >> ensure progress one way or the other.  Doing nothing now virtually
> >> ensures we'll have the same discussion again later.
> >>
> >> "Broken for two releases without anyone noticing" and "maintainer absent
> >> for more than four years" doesn't exacltly inspire hope, though.  We
> >> should seriously consider removing it right away.
> >>
> >> Lukas, can you give us hope?
> >>
> >
> > Hi Markus,
> > Maybe you missed something?
> > I think Lukas is ready to maintain this code in his previous emails.
> > https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464
> 
> Patch to MAINTAINERS or it didn't happen ;)

I'd even say MAINTAINERS file is, in many cases, cosmetic..

It definitely is helpful for people to do lookups or scripts to fetch
information, but IMHO we need more than one single entry, and in some sense
that entry is less important than the activities.

We need someone to be first familiar with a piece of code, spend time on
it, actively reply to the relevant queries upstream, proper testing /
gating to make sure the feature is usable as stated - either fully
maintained or odd fixes or others, and more.

I used to request Lukas help reviewing the loadvm threadify series [1,2]
which definitely touches COLO, I didn't really get a respond.  That's also
a sign I don't feel like Lucas cares enough about COLO, after I explicitly
pointing out something might be changing and might be risky.

It's like Hailiang is also in the MAINTAINERS file but Hailiang is
unfortunately not active anymore recently over the years.

Frankly, it was Zhijian and Chen that were always helping from that regard.
I would rather think anyone of both would be more suitable at least from
all the discussions I had with COLO, but this is a promise I can't do.  I
also still want to remove it as I proposed, in case it releases everyone.

So an update in the file isn't even enough if we accept it.  We need
activity corresponding to the file change.  That's also why I still think
we should remove COLO regardless if 11.0 doesn't improve in this condition,
as I stated in the other email.

[1] https://lore.kernel.org/qemu-devel/aSSx28slqi1ywg2v@x1.local
[2] https://lore.kernel.org/all/20251022192612.2737648-1-peterx@redhat.com

Thanks,

-- 
Peter Xu


Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Markus Armbruster 3 weeks, 3 days ago
Peter Xu <peterx@redhat.com> writes:

> On Fri, Jan 16, 2026 at 10:41:28AM +0100, Markus Armbruster wrote:
>> Zhang Chen <zhangckid@gmail.com> writes:
>> 
>> > On Fri, Jan 16, 2026 at 2:26 PM Markus Armbruster <armbru@redhat.com> wrote:

[...]

>> >> I think we should deprecate COLO now to send a clear distress signal.
>> >> The deprecation notice should explain it doesn't work, and will be
>> >> removed unless people step up to fix it and to maintain it.  This will
>> >> ensure progress one way or the other.  Doing nothing now virtually
>> >> ensures we'll have the same discussion again later.
>> >>
>> >> "Broken for two releases without anyone noticing" and "maintainer absent
>> >> for more than four years" doesn't exacltly inspire hope, though.  We
>> >> should seriously consider removing it right away.
>> >>
>> >> Lukas, can you give us hope?
>> >>
>> >
>> > Hi Markus,
>> > Maybe you missed something?
>> > I think Lukas is ready to maintain this code in his previous emails.
>> > https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464
>> 
>> Patch to MAINTAINERS or it didn't happen ;)
>
> I'd even say MAINTAINERS file is, in many cases, cosmetic..
>
> It definitely is helpful for people to do lookups or scripts to fetch
> information, but IMHO we need more than one single entry, and in some sense
> that entry is less important than the activities.
>
> We need someone to be first familiar with a piece of code, spend time on
> it, actively reply to the relevant queries upstream, proper testing /
> gating to make sure the feature is usable as stated - either fully
> maintained or odd fixes or others, and more.

Yes, we need a maintainer not just in name, but for real.

(My one-liner was an attempt at a joke)

> I used to request Lukas help reviewing the loadvm threadify series [1,2]
> which definitely touches COLO, I didn't really get a respond.  That's also
> a sign I don't feel like Lucas cares enough about COLO, after I explicitly
> pointing out something might be changing and might be risky.
>
> It's like Hailiang is also in the MAINTAINERS file but Hailiang is
> unfortunately not active anymore recently over the years.

We're bad at updating the MAINTAINERS file when maintainers have
wandered off.

> Frankly, it was Zhijian and Chen that were always helping from that regard.
> I would rather think anyone of both would be more suitable at least from
> all the discussions I had with COLO, but this is a promise I can't do.  I
> also still want to remove it as I proposed, in case it releases everyone.
>
> So an update in the file isn't even enough if we accept it.  We need
> activity corresponding to the file change.  That's also why I still think
> we should remove COLO regardless if 11.0 doesn't improve in this condition,
> as I stated in the other email.

Concur.

> [1] https://lore.kernel.org/qemu-devel/aSSx28slqi1ywg2v@x1.local
> [2] https://lore.kernel.org/all/20251022192612.2737648-1-peterx@redhat.com
>
> Thanks,
Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Peter Xu 3 weeks, 3 days ago
On Thu, Jan 15, 2026 at 10:49:29PM +0100, Lukas Straub wrote:
> Nack.
> 
> This code has users, as explained in my other email:
> https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464

Please then rework that series and consider include the following (I
believe I pointed out a long time ago somewhere..):

- (Working) unit test for COLO migration

- Some form of justification of why multifd needs to be enabled for COLO.
  For example, in your cluster deployment, using multifd can improve XXX
  by YYY.  Please describe the use case and improvements.

- Maintainer file to replace Hailiang in his role (unless Hailiang could
  reply before you send it)

- Please consider converting COLO-FT.txt to .rst, and maybe we should also
  move it over to devel/migration/, then add an index in the index.rst.

IMHO we should either merge a series at least covers above in 11.0, or we
drop it.

Thanks,

-- 
Peter Xu
Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Dr. David Alan Gilbert 3 weeks, 3 days ago
* Peter Xu (peterx@redhat.com) wrote:
> On Thu, Jan 15, 2026 at 10:49:29PM +0100, Lukas Straub wrote:
> > Nack.
> > 
> > This code has users, as explained in my other email:
> > https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464
> 
> Please then rework that series and consider include the following (I
> believe I pointed out a long time ago somewhere..):
> 

> - Some form of justification of why multifd needs to be enabled for COLO.
>   For example, in your cluster deployment, using multifd can improve XXX
>   by YYY.  Please describe the use case and improvements.

That one is pretty easy; since COLO is regularly taking snapshots, the faster
the snapshoting the less overhead there is.

Lukas: Given COLO has a bunch of different features (i.e. the block
replication, the clever network comparison etc) do you know which ones
are used in the setups you are aware of?

I'd guess the tricky part of a test would be the network side; I'm
not too sure how you'd set that in a test.

Dave

-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/
Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Daniel P. Berrangé 3 weeks, 3 days ago
On Thu, Jan 15, 2026 at 10:59:47PM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Thu, Jan 15, 2026 at 10:49:29PM +0100, Lukas Straub wrote:
> > > Nack.
> > > 
> > > This code has users, as explained in my other email:
> > > https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464
> > 
> > Please then rework that series and consider include the following (I
> > believe I pointed out a long time ago somewhere..):
> > 
> 
> > - Some form of justification of why multifd needs to be enabled for COLO.
> >   For example, in your cluster deployment, using multifd can improve XXX
> >   by YYY.  Please describe the use case and improvements.
> 
> That one is pretty easy; since COLO is regularly taking snapshots, the faster
> the snapshoting the less overhead there is.

Also if we ever want to be able to deprecate the non-multifd migration,
then we need to ensure multifd migration has the super-set of functionality.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Peter Xu 3 weeks, 3 days ago
On Fri, Jan 16, 2026 at 09:46:43AM +0000, Daniel P. Berrangé wrote:
> On Thu, Jan 15, 2026 at 10:59:47PM +0000, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Thu, Jan 15, 2026 at 10:49:29PM +0100, Lukas Straub wrote:
> > > > Nack.
> > > > 
> > > > This code has users, as explained in my other email:
> > > > https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464
> > > 
> > > Please then rework that series and consider include the following (I
> > > believe I pointed out a long time ago somewhere..):
> > > 
> > 
> > > - Some form of justification of why multifd needs to be enabled for COLO.
> > >   For example, in your cluster deployment, using multifd can improve XXX
> > >   by YYY.  Please describe the use case and improvements.
> > 
> > That one is pretty easy; since COLO is regularly taking snapshots, the faster
> > the snapshoting the less overhead there is.
> 
> Also if we ever want to be able to deprecate the non-multifd migration,
> then we need to ensure multifd migration has the super-set of functionality.

IIUC there's still long way to go for that, and I'm not yet sure if it will
happen..

To achieve it, we'll need to first remove/deprecate multifd capability,
because as long as it's there people can still set it to OFF..

But before that, we'll need to figure out how to do with features
non-trivial to be supported, at least RDMA (it turns out we decided to keep
RDMA, prior to this COLO discussion), and "fd:" URIs.

I still don't know if we can justify nobody will be using some handy
streaming tooling with QEMU migration, in that case it'll never work with
multifd because multifd (even if channels=1) requires two sessions; there's
always the main channel.

So I'd put that aside when considering what we'd do with COLO.  In that
case IIUC COLO is the easy part if we really want to always use multifd.

Thanks,

-- 
Peter Xu


Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Zhang Chen 3 weeks, 3 days ago
On Fri, Jan 16, 2026 at 6:59 AM Dr. David Alan Gilbert <dave@treblig.org> wrote:
>
> * Peter Xu (peterx@redhat.com) wrote:
> > On Thu, Jan 15, 2026 at 10:49:29PM +0100, Lukas Straub wrote:
> > > Nack.
> > >
> > > This code has users, as explained in my other email:
> > > https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464
> >
> > Please then rework that series and consider include the following (I
> > believe I pointed out a long time ago somewhere..):
> >
>
> > - Some form of justification of why multifd needs to be enabled for COLO.
> >   For example, in your cluster deployment, using multifd can improve XXX
> >   by YYY.  Please describe the use case and improvements.
>
> That one is pretty easy; since COLO is regularly taking snapshots, the faster
> the snapshoting the less overhead there is.
>
> Lukas: Given COLO has a bunch of different features (i.e. the block
> replication, the clever network comparison etc) do you know which ones
> are used in the setups you are aware of?
>
> I'd guess the tricky part of a test would be the network side; I'm
> not too sure how you'd set that in a test.

Hi Dave,

For the COLO network test part we already have some qtest for that.
The original COLO-proxy function decoupled into several QEMU netfilter modules:
The filter-mirror/filter-redirector/filter-rewriter/colo-compare.
Only the colo-compare is COLO specific one.
COLO connect all the general modules with chardev socket to finish functions.
Current status is we already have the qtest for filter-mirror/filter-redirector:
like the qemu/tests/qtest/test-filter-mirror.c

If this discussion ultimately dicides to retain COLO, I can cover COLO
network test case.

Thanks
Chen

>
> Dave
>
> --
>  -----Open up your eyes, open up your mind, open up your code -------
> / Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \
> \        dave @ treblig.org |                               | In Hex /
>  \ _________________________|_____ http://www.treblig.org   |_______/
Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Peter Xu 3 weeks, 3 days ago
On Thu, Jan 15, 2026 at 10:59:47PM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Thu, Jan 15, 2026 at 10:49:29PM +0100, Lukas Straub wrote:
> > > Nack.
> > > 
> > > This code has users, as explained in my other email:
> > > https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464
> > 
> > Please then rework that series and consider include the following (I
> > believe I pointed out a long time ago somewhere..):
> > 
> 
> > - Some form of justification of why multifd needs to be enabled for COLO.
> >   For example, in your cluster deployment, using multifd can improve XXX
> >   by YYY.  Please describe the use case and improvements.
> 
> That one is pretty easy; since COLO is regularly taking snapshots, the faster
> the snapshoting the less overhead there is.

Thanks for chiming in, Dave.  I can explain why I want to request for some
numbers.

Firstly, numbers normally proves it's used in a real system.  It's at least
being used and seriously tested.

Secondly, per my very limited understanding to COLO... the two VMs in most
cases should be in-sync state already when both sides generate the same
network packets.

Another sync (where multifd can start to take effect) is only needed when
there're packets misalignments, but IIUC it should be rare.  I don't know
how rare it is, it would be good if Lukas could introduce some of those
numbers in his deployment to help us understand COLO better if we'll need
to keep it.

IIUC, the critical path of COLO shouldn't be migration on its own?  It
should be when heartbeat gets lost; that normally should happen when two
VMs are in sync.  In this path, I don't see how multifd helps..  because
there's no migration happening, only the src recording what has changed.
Hence I think some number with description of the measurements may help us
understand how important multifd is to COLO.

Supporting multifd will cause new COLO functions to inject into core
migration code paths (even if not much..). I want to make sure such (new)
complexity is justified. I also want to avoid introducing a feature only
because "we have XXX, then let's support XXX in COLO too, maybe some day
it'll be useful".

After these days, I found removing code is sometimes harder than writting
new..

Thanks,

> 
> Lukas: Given COLO has a bunch of different features (i.e. the block
> replication, the clever network comparison etc) do you know which ones
> are used in the setups you are aware of?
> 
> I'd guess the tricky part of a test would be the network side; I'm
> not too sure how you'd set that in a test.

-- 
Peter Xu
Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Lukas Straub 3 weeks, 1 day ago
On Thu, 15 Jan 2026 18:38:51 -0500
Peter Xu <peterx@redhat.com> wrote:

> On Thu, Jan 15, 2026 at 10:59:47PM +0000, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:  
> > > On Thu, Jan 15, 2026 at 10:49:29PM +0100, Lukas Straub wrote:  
> > > > Nack.
> > > > 
> > > > This code has users, as explained in my other email:
> > > > https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464  
> > > 
> > > Please then rework that series and consider include the following (I
> > > believe I pointed out a long time ago somewhere..):
> > >   
> >   
> > > - Some form of justification of why multifd needs to be enabled for COLO.
> > >   For example, in your cluster deployment, using multifd can improve XXX
> > >   by YYY.  Please describe the use case and improvements.  
> > 
> > That one is pretty easy; since COLO is regularly taking snapshots, the faster
> > the snapshoting the less overhead there is.  
> 
> Thanks for chiming in, Dave.  I can explain why I want to request for some
> numbers.
> 
> Firstly, numbers normally proves it's used in a real system.  It's at least
> being used and seriously tested.
> 
> Secondly, per my very limited understanding to COLO... the two VMs in most
> cases should be in-sync state already when both sides generate the same
> network packets.
> 
> Another sync (where multifd can start to take effect) is only needed when
> there're packets misalignments, but IIUC it should be rare.  I don't know
> how rare it is, it would be good if Lukas could introduce some of those
> numbers in his deployment to help us understand COLO better if we'll need
> to keep it.

It really depends on the workload and if you want to tune for
throughput or latency.

You need to do a checkpoint eventually and the more time passes between
checkpoints the more dirty memory you have to transfer during the
checkpoint.

Also keep in mind that the guest is stopped during checkpoints. Because
even if we continue running the guest, we can not release the mismatched
packets since that would expose a state of the guest to the outside
world that is not yet replicated to the secondary.

So the migration performance is actually the most important part in
COLO to keep the checkpoints as short as possible.

I have quite a few more performance and cleanup patches on my hands,
for example to transfer dirty memory between checkpoints.

> 
> IIUC, the critical path of COLO shouldn't be migration on its own?  It
> should be when heartbeat gets lost; that normally should happen when two
> VMs are in sync.  In this path, I don't see how multifd helps..  because
> there's no migration happening, only the src recording what has changed.
> Hence I think some number with description of the measurements may help us
> understand how important multifd is to COLO.
> 
> Supporting multifd will cause new COLO functions to inject into core
> migration code paths (even if not much..). I want to make sure such (new)
> complexity is justified. I also want to avoid introducing a feature only
> because "we have XXX, then let's support XXX in COLO too, maybe some day
> it'll be useful".

What COLO needs from migration at the low level:

Primary/Outgoing side:

Not much actually, we just need a way to incrementally send the
dirtied memory and the full device state.
Also, we ensure that migration never actually finishes since we will
never do a switchover. For example we never set
RAMState::last_stage with COLO.

Secondary/Incoming side:

colo cache:
Since the secondary always needs to be ready to take over (even during
checkpointing), we can not write the received ram pages directly to
the guest ram to prevent having half of the old and half of the new
contents.
So we redirect the received ram pages to the colo cache. This is
basically a mirror of the primary side ram.
It also simplifies the primary side since from it's point of view it's
just a normal migration target. So primary side doesn't have to care
about dirtied pages on the secondary for example.

Dirty Bitmap:
With COLO we also need a dirty bitmap on the incoming side to track
1. pages dirtied by the secondary guest
2. pages dirtied by the primary guest (incoming ram pages)
In the last step during the checkpointing, this bitmap is then used
to overwrite the guest ram with the colo cache so the secondary guest
is in sync with the primary guest.

All this individually is very little code as you can see from my
multifd patch. Just something to keep in mind I guess.


At the high level we have the COLO framework outgoing and incoming
threads which just tell the migration code to:
Send all ram pages (qemu_savevm_live_state()) on the outgoing side
paired with a qemu_loadvm_state_main on the incoming side.
Send the device state (qemu_save_device_state()) paired with writing
that stream to a buffer on the incoming side.
And finally flusing the colo cache and loading the device state on the
incoming side.

And of course we coordinate with the colo block replication and
colo-compare.

Best regards,
Lukas Straub

> 
> After these days, I found removing code is sometimes harder than writting
> new..
> 
> Thanks,
> 
> > 
> > Lukas: Given COLO has a bunch of different features (i.e. the block
> > replication, the clever network comparison etc) do you know which ones
> > are used in the setups you are aware of?
> > 
> > I'd guess the tricky part of a test would be the network side; I'm
> > not too sure how you'd set that in a test.  
> 

Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Peter Xu 2 weeks, 6 days ago
On Sat, Jan 17, 2026 at 08:49:13PM +0100, Lukas Straub wrote:
> On Thu, 15 Jan 2026 18:38:51 -0500
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Thu, Jan 15, 2026 at 10:59:47PM +0000, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:  
> > > > On Thu, Jan 15, 2026 at 10:49:29PM +0100, Lukas Straub wrote:  
> > > > > Nack.
> > > > > 
> > > > > This code has users, as explained in my other email:
> > > > > https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464  
> > > > 
> > > > Please then rework that series and consider include the following (I
> > > > believe I pointed out a long time ago somewhere..):
> > > >   
> > >   
> > > > - Some form of justification of why multifd needs to be enabled for COLO.
> > > >   For example, in your cluster deployment, using multifd can improve XXX
> > > >   by YYY.  Please describe the use case and improvements.  
> > > 
> > > That one is pretty easy; since COLO is regularly taking snapshots, the faster
> > > the snapshoting the less overhead there is.  
> > 
> > Thanks for chiming in, Dave.  I can explain why I want to request for some
> > numbers.
> > 
> > Firstly, numbers normally proves it's used in a real system.  It's at least
> > being used and seriously tested.
> > 
> > Secondly, per my very limited understanding to COLO... the two VMs in most
> > cases should be in-sync state already when both sides generate the same
> > network packets.
> > 
> > Another sync (where multifd can start to take effect) is only needed when
> > there're packets misalignments, but IIUC it should be rare.  I don't know
> > how rare it is, it would be good if Lukas could introduce some of those
> > numbers in his deployment to help us understand COLO better if we'll need
> > to keep it.
> 
> It really depends on the workload and if you want to tune for
> throughput or latency.

Thanks for all the answers from all of you.

If we decide to keep COLO, looks like I'll have no choice but understand it
better, as long as it still has anything to do with migration..  I'll leave
some more questions / comments at the end.

> 
> You need to do a checkpoint eventually and the more time passes between
> checkpoints the more dirty memory you have to transfer during the
> checkpoint.
> 
> Also keep in mind that the guest is stopped during checkpoints. Because
> even if we continue running the guest, we can not release the mismatched
> packets since that would expose a state of the guest to the outside
> world that is not yet replicated to the secondary.

Yes this makes sense.  However it is also the very confusing part of COLO.

When I said "I was expecting migration to not be the hot path", one reason
is I believe COLO checkpoint (or say, when migration happens) will
introduce a larger downtime than normal migration, because this process
transfers RAM with both VMs stopped.

You helped explain why that large downtime is needed, thanks.  However then
it means either (1) packet misalignment, or (2) periodical timer kickoff,
either of them will kickoff checkpoint..

I don't know if COLO services care about such relatively large downtime,
especially it is not happening once, but periodically for every tens of
seconds at least (assuming when periodically then packet misalignment is
rare).

> 
> So the migration performance is actually the most important part in
> COLO to keep the checkpoints as short as possible.

IIUC when a heartbeat will be lost on PVM _during_ sync checkpoints, then
SVM can only rollback to the last time checkpoint.  Would this be good
enough in reality?  It means if there's a TCP transaction then it may broke
anyway.  x-checkpoint-delay / periodic checkpoints definitely make this
more possible to happen.

> 
> I have quite a few more performance and cleanup patches on my hands,
> for example to transfer dirty memory between checkpoints.
> 
> > 
> > IIUC, the critical path of COLO shouldn't be migration on its own?  It
> > should be when heartbeat gets lost; that normally should happen when two
> > VMs are in sync.  In this path, I don't see how multifd helps..  because
> > there's no migration happening, only the src recording what has changed.
> > Hence I think some number with description of the measurements may help us
> > understand how important multifd is to COLO.
> > 
> > Supporting multifd will cause new COLO functions to inject into core
> > migration code paths (even if not much..). I want to make sure such (new)
> > complexity is justified. I also want to avoid introducing a feature only
> > because "we have XXX, then let's support XXX in COLO too, maybe some day
> > it'll be useful".
> 
> What COLO needs from migration at the low level:
> 
> Primary/Outgoing side:
> 
> Not much actually, we just need a way to incrementally send the
> dirtied memory and the full device state.
> Also, we ensure that migration never actually finishes since we will
> never do a switchover. For example we never set
> RAMState::last_stage with COLO.
> 
> Secondary/Incoming side:
> 
> colo cache:
> Since the secondary always needs to be ready to take over (even during
> checkpointing), we can not write the received ram pages directly to
> the guest ram to prevent having half of the old and half of the new
> contents.
> So we redirect the received ram pages to the colo cache. This is
> basically a mirror of the primary side ram.
> It also simplifies the primary side since from it's point of view it's
> just a normal migration target. So primary side doesn't have to care
> about dirtied pages on the secondary for example.
> 
> Dirty Bitmap:
> With COLO we also need a dirty bitmap on the incoming side to track
> 1. pages dirtied by the secondary guest
> 2. pages dirtied by the primary guest (incoming ram pages)
> In the last step during the checkpointing, this bitmap is then used
> to overwrite the guest ram with the colo cache so the secondary guest
> is in sync with the primary guest.
> 
> All this individually is very little code as you can see from my
> multifd patch. Just something to keep in mind I guess.
> 
> 
> At the high level we have the COLO framework outgoing and incoming
> threads which just tell the migration code to:
> Send all ram pages (qemu_savevm_live_state()) on the outgoing side
> paired with a qemu_loadvm_state_main on the incoming side.
> Send the device state (qemu_save_device_state()) paired with writing
> that stream to a buffer on the incoming side.
> And finally flusing the colo cache and loading the device state on the
> incoming side.
> 
> And of course we coordinate with the colo block replication and
> colo-compare.

Thank you.  Maybe you should generalize some of the explanations and put it
into docs/devel/migration/ somewhere.  I think many of them are not
mentioned in the doc on how COLO works internally.

Let me ask some more questions while I'm reading COLO today:

- For each of the checkpoint (colo_do_checkpoint_transaction()), COLO will
  do the following:

    bql_lock()
    vm_stop_force_state(RUN_STATE_COLO)     # stop vm
    bql_unlock()

    ...
  
    bql_lock()
    qemu_save_device_state()                # into a temp buffer fb
    bql_unlock()

    ...

    qemu_savevm_state_complete_precopy()    # send RAM, directly to the wire
    qemu_put_buffer(fb)                     # push temp buffer fb to wire

    ...

    bql_lock()
    vm_start()                              # start vm
    bql_unlock()

  A few questions that I didn't ask previously:

  - If VM is stopped anyway, why putting the device states into a temp
    buffer, instead of using what we already have for precopy phase, or
    just push everything directly to the wire?

  - Above operation frequently releases BQL, why is it needed?  What
    happens if (within the window where BQL released) someone invoked QMP
    command "cont" and causing VM to start? Would COLO be broken with it?
    Should we take BQL for the whole process to avoid it?

- Does colo_cache has an limitation, or should we expect SVM to double
  consume the guest RAM size?  As I didn't see where colo_cache will be
  released during each sync (e.g. after colo_flush_ram_cache).  I am
  expecting over time SVM will have most of the pages touched, then the
  colo_cache can consume the same as guest mem on SVM.

Thanks,

-- 
Peter Xu
Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Lukas Straub 2 weeks, 6 days ago
On Mon, 19 Jan 2026 17:33:25 -0500
Peter Xu <peterx@redhat.com> wrote:

> On Sat, Jan 17, 2026 at 08:49:13PM +0100, Lukas Straub wrote:
> > On Thu, 15 Jan 2026 18:38:51 -0500
> > Peter Xu <peterx@redhat.com> wrote:
> >   
> > > On Thu, Jan 15, 2026 at 10:59:47PM +0000, Dr. David Alan Gilbert wrote:  
> > > > * Peter Xu (peterx@redhat.com) wrote:    
> > > > > On Thu, Jan 15, 2026 at 10:49:29PM +0100, Lukas Straub wrote:    
> > > > > > Nack.
> > > > > > 
> > > > > > This code has users, as explained in my other email:
> > > > > > https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464    
> > > > > 
> > > > > Please then rework that series and consider include the following (I
> > > > > believe I pointed out a long time ago somewhere..):
> > > > >     
> > > >     
> > > > > - Some form of justification of why multifd needs to be enabled for COLO.
> > > > >   For example, in your cluster deployment, using multifd can improve XXX
> > > > >   by YYY.  Please describe the use case and improvements.    
> > > > 
> > > > That one is pretty easy; since COLO is regularly taking snapshots, the faster
> > > > the snapshoting the less overhead there is.    
> > > 
> > > Thanks for chiming in, Dave.  I can explain why I want to request for some
> > > numbers.
> > > 
> > > Firstly, numbers normally proves it's used in a real system.  It's at least
> > > being used and seriously tested.
> > > 
> > > Secondly, per my very limited understanding to COLO... the two VMs in most
> > > cases should be in-sync state already when both sides generate the same
> > > network packets.
> > > 
> > > Another sync (where multifd can start to take effect) is only needed when
> > > there're packets misalignments, but IIUC it should be rare.  I don't know
> > > how rare it is, it would be good if Lukas could introduce some of those
> > > numbers in his deployment to help us understand COLO better if we'll need
> > > to keep it.  
> > 
> > It really depends on the workload and if you want to tune for
> > throughput or latency.  
> 
> Thanks for all the answers from all of you.
> 
> If we decide to keep COLO, looks like I'll have no choice but understand it
> better, as long as it still has anything to do with migration..  I'll leave
> some more questions / comments at the end.
> 
> > 
> > You need to do a checkpoint eventually and the more time passes between
> > checkpoints the more dirty memory you have to transfer during the
> > checkpoint.
> > 
> > Also keep in mind that the guest is stopped during checkpoints. Because
> > even if we continue running the guest, we can not release the mismatched
> > packets since that would expose a state of the guest to the outside
> > world that is not yet replicated to the secondary.  
> 
> Yes this makes sense.  However it is also the very confusing part of COLO.
> 
> When I said "I was expecting migration to not be the hot path", one reason
> is I believe COLO checkpoint (or say, when migration happens) will
> introduce a larger downtime than normal migration, because this process
> transfers RAM with both VMs stopped.
> 
> You helped explain why that large downtime is needed, thanks.  However then
> it means either (1) packet misalignment, or (2) periodical timer kickoff,
> either of them will kickoff checkpoint..

Yes, it could be optimized so we don't stop the guest for the periodical
checkpoints.

> 
> I don't know if COLO services care about such relatively large downtime,
> especially it is not happening once, but periodically for every tens of
> seconds at least (assuming when periodically then packet misalignment is
> rare).
> 

If you want to tune for latency you go for like 500ms checkpoint
interval.


The alternative way to do fault tolerance is micro checkpointing where
only the primary guest runs while you buffer all sent packets. Then
every checkpoint you transfer all ram and device state to the secondary
and only then release all network packets.
So in this approach every packet is delayed by checkpoint interval +
checkpoint downtime and you use a checkpoint interval of like 30-100ms.

Obviously, COLO is a much better approach because only few packets
observe a delay.

> > 
> > So the migration performance is actually the most important part in
> > COLO to keep the checkpoints as short as possible.  
> 
> IIUC when a heartbeat will be lost on PVM _during_ sync checkpoints, then
> SVM can only rollback to the last time checkpoint.  Would this be good
> enough in reality?  It means if there's a TCP transaction then it may broke
> anyway.  x-checkpoint-delay / periodic checkpoints definitely make this
> more possible to happen.

We only release the mismatched packets after the ram and device state
is fully sent to the secondary. Because then the secondary is in the
state that generated these mismatched packets and can take over.

> 
> > 
> > I have quite a few more performance and cleanup patches on my hands,
> > for example to transfer dirty memory between checkpoints.
> >   
> > > 
> > > IIUC, the critical path of COLO shouldn't be migration on its own?  It
> > > should be when heartbeat gets lost; that normally should happen when two
> > > VMs are in sync.  In this path, I don't see how multifd helps..  because
> > > there's no migration happening, only the src recording what has changed.
> > > Hence I think some number with description of the measurements may help us
> > > understand how important multifd is to COLO.
> > > 
> > > Supporting multifd will cause new COLO functions to inject into core
> > > migration code paths (even if not much..). I want to make sure such (new)
> > > complexity is justified. I also want to avoid introducing a feature only
> > > because "we have XXX, then let's support XXX in COLO too, maybe some day
> > > it'll be useful".  
> > 
> > What COLO needs from migration at the low level:
> > 
> > Primary/Outgoing side:
> > 
> > Not much actually, we just need a way to incrementally send the
> > dirtied memory and the full device state.
> > Also, we ensure that migration never actually finishes since we will
> > never do a switchover. For example we never set
> > RAMState::last_stage with COLO.
> > 
> > Secondary/Incoming side:
> > 
> > colo cache:
> > Since the secondary always needs to be ready to take over (even during
> > checkpointing), we can not write the received ram pages directly to
> > the guest ram to prevent having half of the old and half of the new
> > contents.
> > So we redirect the received ram pages to the colo cache. This is
> > basically a mirror of the primary side ram.
> > It also simplifies the primary side since from it's point of view it's
> > just a normal migration target. So primary side doesn't have to care
> > about dirtied pages on the secondary for example.
> > 
> > Dirty Bitmap:
> > With COLO we also need a dirty bitmap on the incoming side to track
> > 1. pages dirtied by the secondary guest
> > 2. pages dirtied by the primary guest (incoming ram pages)
> > In the last step during the checkpointing, this bitmap is then used
> > to overwrite the guest ram with the colo cache so the secondary guest
> > is in sync with the primary guest.
> > 
> > All this individually is very little code as you can see from my
> > multifd patch. Just something to keep in mind I guess.
> > 
> > 
> > At the high level we have the COLO framework outgoing and incoming
> > threads which just tell the migration code to:
> > Send all ram pages (qemu_savevm_live_state()) on the outgoing side
> > paired with a qemu_loadvm_state_main on the incoming side.
> > Send the device state (qemu_save_device_state()) paired with writing
> > that stream to a buffer on the incoming side.
> > And finally flusing the colo cache and loading the device state on the
> > incoming side.
> > 
> > And of course we coordinate with the colo block replication and
> > colo-compare.  
> 
> Thank you.  Maybe you should generalize some of the explanations and put it
> into docs/devel/migration/ somewhere.  I think many of them are not
> mentioned in the doc on how COLO works internally.
> 
> Let me ask some more questions while I'm reading COLO today:
> 
> - For each of the checkpoint (colo_do_checkpoint_transaction()), COLO will
>   do the following:
> 
>     bql_lock()
>     vm_stop_force_state(RUN_STATE_COLO)     # stop vm
>     bql_unlock()
> 
>     ...
>   
>     bql_lock()
>     qemu_save_device_state()                # into a temp buffer fb
>     bql_unlock()
> 
>     ...
> 
>     qemu_savevm_state_complete_precopy()    # send RAM, directly to the wire
>     qemu_put_buffer(fb)                     # push temp buffer fb to wire
> 
>     ...
> 
>     bql_lock()
>     vm_start()                              # start vm
>     bql_unlock()
> 
>   A few questions that I didn't ask previously:
> 
>   - If VM is stopped anyway, why putting the device states into a temp
>     buffer, instead of using what we already have for precopy phase, or
>     just push everything directly to the wire?

Actually we only do that to get the size of the device state and send
the size out-of-band, since we can not use qemu_load_device_state()
directly on the secondary side and look for the in-band EOF.

> 
>   - Above operation frequently releases BQL, why is it needed?  What
>     happens if (within the window where BQL released) someone invoked QMP
>     command "cont" and causing VM to start? Would COLO be broken with it?
>     Should we take BQL for the whole process to avoid it?

We need to release the BQL because block replication on the secondary side and
colo-compare and netfilters on the primary side need the main loop to
make progress.

Issuing a cont during a checkpoint will probably break it yes.

> 
> - Does colo_cache has an limitation, or should we expect SVM to double
>   consume the guest RAM size?  As I didn't see where colo_cache will be
>   released during each sync (e.g. after colo_flush_ram_cache).  I am
>   expecting over time SVM will have most of the pages touched, then the
>   colo_cache can consume the same as guest mem on SVM.

Yes, the secondary side consumes twice the guest ram size. That is one
disadvantage of this approach.

I guess we could do some kind of copy on write mapping for the
secondary guest ram. But even then it's hard to make the ram overhead
bounded in size.

Best regards,
Lukas Straub

> 
> Thanks,
> 

Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Peter Xu 2 weeks, 6 days ago
On Tue, Jan 20, 2026 at 12:48:47PM +0100, Lukas Straub wrote:
> On Mon, 19 Jan 2026 17:33:25 -0500
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Sat, Jan 17, 2026 at 08:49:13PM +0100, Lukas Straub wrote:
> > > On Thu, 15 Jan 2026 18:38:51 -0500
> > > Peter Xu <peterx@redhat.com> wrote:
> > >   
> > > > On Thu, Jan 15, 2026 at 10:59:47PM +0000, Dr. David Alan Gilbert wrote:  
> > > > > * Peter Xu (peterx@redhat.com) wrote:    
> > > > > > On Thu, Jan 15, 2026 at 10:49:29PM +0100, Lukas Straub wrote:    
> > > > > > > Nack.
> > > > > > > 
> > > > > > > This code has users, as explained in my other email:
> > > > > > > https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464    
> > > > > > 
> > > > > > Please then rework that series and consider include the following (I
> > > > > > believe I pointed out a long time ago somewhere..):
> > > > > >     
> > > > >     
> > > > > > - Some form of justification of why multifd needs to be enabled for COLO.
> > > > > >   For example, in your cluster deployment, using multifd can improve XXX
> > > > > >   by YYY.  Please describe the use case and improvements.    
> > > > > 
> > > > > That one is pretty easy; since COLO is regularly taking snapshots, the faster
> > > > > the snapshoting the less overhead there is.    
> > > > 
> > > > Thanks for chiming in, Dave.  I can explain why I want to request for some
> > > > numbers.
> > > > 
> > > > Firstly, numbers normally proves it's used in a real system.  It's at least
> > > > being used and seriously tested.
> > > > 
> > > > Secondly, per my very limited understanding to COLO... the two VMs in most
> > > > cases should be in-sync state already when both sides generate the same
> > > > network packets.
> > > > 
> > > > Another sync (where multifd can start to take effect) is only needed when
> > > > there're packets misalignments, but IIUC it should be rare.  I don't know
> > > > how rare it is, it would be good if Lukas could introduce some of those
> > > > numbers in his deployment to help us understand COLO better if we'll need
> > > > to keep it.  
> > > 
> > > It really depends on the workload and if you want to tune for
> > > throughput or latency.  
> > 
> > Thanks for all the answers from all of you.
> > 
> > If we decide to keep COLO, looks like I'll have no choice but understand it
> > better, as long as it still has anything to do with migration..  I'll leave
> > some more questions / comments at the end.
> > 
> > > 
> > > You need to do a checkpoint eventually and the more time passes between
> > > checkpoints the more dirty memory you have to transfer during the
> > > checkpoint.
> > > 
> > > Also keep in mind that the guest is stopped during checkpoints. Because
> > > even if we continue running the guest, we can not release the mismatched
> > > packets since that would expose a state of the guest to the outside
> > > world that is not yet replicated to the secondary.  
> > 
> > Yes this makes sense.  However it is also the very confusing part of COLO.
> > 
> > When I said "I was expecting migration to not be the hot path", one reason
> > is I believe COLO checkpoint (or say, when migration happens) will
> > introduce a larger downtime than normal migration, because this process
> > transfers RAM with both VMs stopped.
> > 
> > You helped explain why that large downtime is needed, thanks.  However then
> > it means either (1) packet misalignment, or (2) periodical timer kickoff,
> > either of them will kickoff checkpoint..
> 
> Yes, it could be optimized so we don't stop the guest for the periodical
> checkpoints.

Likely we must stop it at least to savevm on non-rams.  But I get your
point.  Yes, I think it might be good idea to try to keep in sync even
without an explicit checkpoint request, almost like what live precopy does
with RAM to shrink the downtime.

> 
> > 
> > I don't know if COLO services care about such relatively large downtime,
> > especially it is not happening once, but periodically for every tens of
> > seconds at least (assuming when periodically then packet misalignment is
> > rare).
> > 
> 
> If you want to tune for latency you go for like 500ms checkpoint
> interval.
> 
> 
> The alternative way to do fault tolerance is micro checkpointing where
> only the primary guest runs while you buffer all sent packets. Then
> every checkpoint you transfer all ram and device state to the secondary
> and only then release all network packets.
> So in this approach every packet is delayed by checkpoint interval +
> checkpoint downtime and you use a checkpoint interval of like 30-100ms.
> 
> Obviously, COLO is a much better approach because only few packets
> observe a delay.
> 
> > > 
> > > So the migration performance is actually the most important part in
> > > COLO to keep the checkpoints as short as possible.  
> > 
> > IIUC when a heartbeat will be lost on PVM _during_ sync checkpoints, then
> > SVM can only rollback to the last time checkpoint.  Would this be good
> > enough in reality?  It means if there's a TCP transaction then it may broke
> > anyway.  x-checkpoint-delay / periodic checkpoints definitely make this
> > more possible to happen.
> 
> We only release the mismatched packets after the ram and device state
> is fully sent to the secondary. Because then the secondary is in the
> state that generated these mismatched packets and can take over.

My question was more about how COLO failover works (or work at all?) if a
failure happens exactly during checkpointing (aka, migration happening).

First of all, if the failure happens on SVM, IIUC it's not a problem,
because PVM has all the latest data.

The problem lies more in the case where the failure happened in PVM. In
this case, SVM only contains the previous checkpoint results, maybe plus
something on top of that snapshot, as SVM kept running after the previous
checkpoint.

So the failure can happen at different spots:

  (1) Failure happens _before_ applying the new checkpoint, that is, while
      receiving the checkpoint from src and for example the PVM host is
      down, channel shutdown.

      This one looks "okay", IIUC what will happen is SVM will keep running
      but then as I described above it only contains the previous version
      of the PVM snapshot, plus something SVM updated which may not match
      with PVM's data:

           (1.a) if checkpoint triggered because of x-checkpoint-delay,
           lower risk, possibly still in sync with src

           (1.b) if checkpoint triggered by colo-compare notification of
           packet misalignment, I believe this may cause service
           interruptions and it means SVM will not be able to competely
           replace SVM in some cases.

  (2) Failure happens _after_ applying the new checkpoint, but _before_ the
      whole checkpoint is applied.

      To be explicit, consider qemu_load_device_state() when the process of
      colo_incoming_process_checkpoint() failed.  It means SVM applied
      partial of PVM's checkpoint, I think it should mean PVM is completely
      corrupted.

Here either (1.b) or (2) seems fatal to me on the whole high level design.
Periodical syncs with x-checkpoint-delay can make this easier to happen, so
larger windows of critical failures.  That's also why I think it's
confusing COLO prefers more checkpoints - while it helps sync things up, it
enlarges high risk window and overall overhead.

> 
> > 
> > > 
> > > I have quite a few more performance and cleanup patches on my hands,
> > > for example to transfer dirty memory between checkpoints.
> > >   
> > > > 
> > > > IIUC, the critical path of COLO shouldn't be migration on its own?  It
> > > > should be when heartbeat gets lost; that normally should happen when two
> > > > VMs are in sync.  In this path, I don't see how multifd helps..  because
> > > > there's no migration happening, only the src recording what has changed.
> > > > Hence I think some number with description of the measurements may help us
> > > > understand how important multifd is to COLO.
> > > > 
> > > > Supporting multifd will cause new COLO functions to inject into core
> > > > migration code paths (even if not much..). I want to make sure such (new)
> > > > complexity is justified. I also want to avoid introducing a feature only
> > > > because "we have XXX, then let's support XXX in COLO too, maybe some day
> > > > it'll be useful".  
> > > 
> > > What COLO needs from migration at the low level:
> > > 
> > > Primary/Outgoing side:
> > > 
> > > Not much actually, we just need a way to incrementally send the
> > > dirtied memory and the full device state.
> > > Also, we ensure that migration never actually finishes since we will
> > > never do a switchover. For example we never set
> > > RAMState::last_stage with COLO.
> > > 
> > > Secondary/Incoming side:
> > > 
> > > colo cache:
> > > Since the secondary always needs to be ready to take over (even during
> > > checkpointing), we can not write the received ram pages directly to
> > > the guest ram to prevent having half of the old and half of the new
> > > contents.
> > > So we redirect the received ram pages to the colo cache. This is
> > > basically a mirror of the primary side ram.
> > > It also simplifies the primary side since from it's point of view it's
> > > just a normal migration target. So primary side doesn't have to care
> > > about dirtied pages on the secondary for example.
> > > 
> > > Dirty Bitmap:
> > > With COLO we also need a dirty bitmap on the incoming side to track
> > > 1. pages dirtied by the secondary guest
> > > 2. pages dirtied by the primary guest (incoming ram pages)
> > > In the last step during the checkpointing, this bitmap is then used
> > > to overwrite the guest ram with the colo cache so the secondary guest
> > > is in sync with the primary guest.
> > > 
> > > All this individually is very little code as you can see from my
> > > multifd patch. Just something to keep in mind I guess.
> > > 
> > > 
> > > At the high level we have the COLO framework outgoing and incoming
> > > threads which just tell the migration code to:
> > > Send all ram pages (qemu_savevm_live_state()) on the outgoing side
> > > paired with a qemu_loadvm_state_main on the incoming side.
> > > Send the device state (qemu_save_device_state()) paired with writing
> > > that stream to a buffer on the incoming side.
> > > And finally flusing the colo cache and loading the device state on the
> > > incoming side.
> > > 
> > > And of course we coordinate with the colo block replication and
> > > colo-compare.  
> > 
> > Thank you.  Maybe you should generalize some of the explanations and put it
> > into docs/devel/migration/ somewhere.  I think many of them are not
> > mentioned in the doc on how COLO works internally.
> > 
> > Let me ask some more questions while I'm reading COLO today:
> > 
> > - For each of the checkpoint (colo_do_checkpoint_transaction()), COLO will
> >   do the following:
> > 
> >     bql_lock()
> >     vm_stop_force_state(RUN_STATE_COLO)     # stop vm
> >     bql_unlock()
> > 
> >     ...
> >   
> >     bql_lock()
> >     qemu_save_device_state()                # into a temp buffer fb
> >     bql_unlock()
> > 
> >     ...
> > 
> >     qemu_savevm_state_complete_precopy()    # send RAM, directly to the wire
> >     qemu_put_buffer(fb)                     # push temp buffer fb to wire
> > 
> >     ...
> > 
> >     bql_lock()
> >     vm_start()                              # start vm
> >     bql_unlock()
> > 
> >   A few questions that I didn't ask previously:
> > 
> >   - If VM is stopped anyway, why putting the device states into a temp
> >     buffer, instead of using what we already have for precopy phase, or
> >     just push everything directly to the wire?
> 
> Actually we only do that to get the size of the device state and send
> the size out-of-band, since we can not use qemu_load_device_state()
> directly on the secondary side and look for the in-band EOF.

I also don't understand why the size is needed..

Currently the streaming protocol for COLO is:

  - ...
  - COLO_MESSAGE_VMSTATE_SEND
  - RAM data
  - EOF
  - COLO_MESSAGE_VMSTATE_SIZE
  - non-RAM data
  - EOF

My question is about, why can't we do this instead?

  - ...
  - COLO_MESSAGE_VMSTATE_SEND
  - RAM data
  - non-RAM data
  - EOF

If the VM is stoppped during the whole process anyway..

Here RAM/non-RAM data all are vmstates, and logically can also be loaded in
one shot of a vmstate load loop.

> 
> > 
> >   - Above operation frequently releases BQL, why is it needed?  What
> >     happens if (within the window where BQL released) someone invoked QMP
> >     command "cont" and causing VM to start? Would COLO be broken with it?
> >     Should we take BQL for the whole process to avoid it?
> 
> We need to release the BQL because block replication on the secondary side and
> colo-compare and netfilters on the primary side need the main loop to
> make progress.

Do we need it to make progress before vm_start(), though?  If we take BQL
once and release it once only after vm_start(), would it work?

I didn't see anything being checked in colo_do_checkpoint_transaction(),
after vm_stop() + replication_do_checkpoint_all(), and before vm_start()..

> 
> Issuing a cont during a checkpoint will probably break it yes.

Feel free to send a patch if you think it's a concern.  Ok to me even if
without, if mgmt has full control of it, so I'll leave it to you to decide
as I'm not a colo user after all.

> 
> > 
> > - Does colo_cache has an limitation, or should we expect SVM to double
> >   consume the guest RAM size?  As I didn't see where colo_cache will be
> >   released during each sync (e.g. after colo_flush_ram_cache).  I am
> >   expecting over time SVM will have most of the pages touched, then the
> >   colo_cache can consume the same as guest mem on SVM.
> 
> Yes, the secondary side consumes twice the guest ram size. That is one
> disadvantage of this approach.
> 
> I guess we could do some kind of copy on write mapping for the
> secondary guest ram. But even then it's hard to make the ram overhead
> bounded in size.

It's ok, though this sounds also like something good to be documented, it's
a very high level knowledge an user should know when considering COLO as an
HA solution.

Thanks,

-- 
Peter Xu
Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Dr. David Alan Gilbert 2 weeks, 6 days ago
* Peter Xu (peterx@redhat.com) wrote:
> On Tue, Jan 20, 2026 at 12:48:47PM +0100, Lukas Straub wrote:
> > On Mon, 19 Jan 2026 17:33:25 -0500
> > Peter Xu <peterx@redhat.com> wrote:
> > 
> > > On Sat, Jan 17, 2026 at 08:49:13PM +0100, Lukas Straub wrote:
> > > > On Thu, 15 Jan 2026 18:38:51 -0500
> > > > Peter Xu <peterx@redhat.com> wrote:
> > > >   
> > > > > On Thu, Jan 15, 2026 at 10:59:47PM +0000, Dr. David Alan Gilbert wrote:  
> > > > > > * Peter Xu (peterx@redhat.com) wrote:    
> > > > > > > On Thu, Jan 15, 2026 at 10:49:29PM +0100, Lukas Straub wrote:    
> > > > > > > > Nack.
> > > > > > > > 
> > > > > > > > This code has users, as explained in my other email:
> > > > > > > > https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464    
> > > > > > > 
> > > > > > > Please then rework that series and consider include the following (I
> > > > > > > believe I pointed out a long time ago somewhere..):
> > > > > > >     
> > > > > >     
> > > > > > > - Some form of justification of why multifd needs to be enabled for COLO.
> > > > > > >   For example, in your cluster deployment, using multifd can improve XXX
> > > > > > >   by YYY.  Please describe the use case and improvements.    
> > > > > > 
> > > > > > That one is pretty easy; since COLO is regularly taking snapshots, the faster
> > > > > > the snapshoting the less overhead there is.    
> > > > > 
> > > > > Thanks for chiming in, Dave.  I can explain why I want to request for some
> > > > > numbers.
> > > > > 
> > > > > Firstly, numbers normally proves it's used in a real system.  It's at least
> > > > > being used and seriously tested.
> > > > > 
> > > > > Secondly, per my very limited understanding to COLO... the two VMs in most
> > > > > cases should be in-sync state already when both sides generate the same
> > > > > network packets.
> > > > > 
> > > > > Another sync (where multifd can start to take effect) is only needed when
> > > > > there're packets misalignments, but IIUC it should be rare.  I don't know
> > > > > how rare it is, it would be good if Lukas could introduce some of those
> > > > > numbers in his deployment to help us understand COLO better if we'll need
> > > > > to keep it.  
> > > > 
> > > > It really depends on the workload and if you want to tune for
> > > > throughput or latency.  
> > > 
> > > Thanks for all the answers from all of you.
> > > 
> > > If we decide to keep COLO, looks like I'll have no choice but understand it
> > > better, as long as it still has anything to do with migration..  I'll leave
> > > some more questions / comments at the end.
> > > 
> > > > 
> > > > You need to do a checkpoint eventually and the more time passes between
> > > > checkpoints the more dirty memory you have to transfer during the
> > > > checkpoint.
> > > > 
> > > > Also keep in mind that the guest is stopped during checkpoints. Because
> > > > even if we continue running the guest, we can not release the mismatched
> > > > packets since that would expose a state of the guest to the outside
> > > > world that is not yet replicated to the secondary.  
> > > 
> > > Yes this makes sense.  However it is also the very confusing part of COLO.
> > > 
> > > When I said "I was expecting migration to not be the hot path", one reason
> > > is I believe COLO checkpoint (or say, when migration happens) will
> > > introduce a larger downtime than normal migration, because this process
> > > transfers RAM with both VMs stopped.
> > > 
> > > You helped explain why that large downtime is needed, thanks.  However then
> > > it means either (1) packet misalignment, or (2) periodical timer kickoff,
> > > either of them will kickoff checkpoint..
> > 
> > Yes, it could be optimized so we don't stop the guest for the periodical
> > checkpoints.
> 
> Likely we must stop it at least to savevm on non-rams.  But I get your
> point.  Yes, I think it might be good idea to try to keep in sync even
> without an explicit checkpoint request, almost like what live precopy does
> with RAM to shrink the downtime.
> 
> > 
> > > 
> > > I don't know if COLO services care about such relatively large downtime,
> > > especially it is not happening once, but periodically for every tens of
> > > seconds at least (assuming when periodically then packet misalignment is
> > > rare).
> > > 
> > 
> > If you want to tune for latency you go for like 500ms checkpoint
> > interval.
> > 
> > 
> > The alternative way to do fault tolerance is micro checkpointing where
> > only the primary guest runs while you buffer all sent packets. Then
> > every checkpoint you transfer all ram and device state to the secondary
> > and only then release all network packets.
> > So in this approach every packet is delayed by checkpoint interval +
> > checkpoint downtime and you use a checkpoint interval of like 30-100ms.
> > 
> > Obviously, COLO is a much better approach because only few packets
> > observe a delay.
> > 
> > > > 
> > > > So the migration performance is actually the most important part in
> > > > COLO to keep the checkpoints as short as possible.  
> > > 
> > > IIUC when a heartbeat will be lost on PVM _during_ sync checkpoints, then
> > > SVM can only rollback to the last time checkpoint.  Would this be good
> > > enough in reality?  It means if there's a TCP transaction then it may broke
> > > anyway.  x-checkpoint-delay / periodic checkpoints definitely make this
> > > more possible to happen.
> > 
> > We only release the mismatched packets after the ram and device state
> > is fully sent to the secondary. Because then the secondary is in the
> > state that generated these mismatched packets and can take over.
> 
> My question was more about how COLO failover works (or work at all?) if a
> failure happens exactly during checkpointing (aka, migration happening).
> 
> First of all, if the failure happens on SVM, IIUC it's not a problem,
> because PVM has all the latest data.
> 
> The problem lies more in the case where the failure happened in PVM. In
> this case, SVM only contains the previous checkpoint results, maybe plus
> something on top of that snapshot, as SVM kept running after the previous
> checkpoint.
> 
> So the failure can happen at different spots:
> 
>   (1) Failure happens _before_ applying the new checkpoint, that is, while
>       receiving the checkpoint from src and for example the PVM host is
>       down, channel shutdown.
> 
>       This one looks "okay", IIUC what will happen is SVM will keep running
>       but then as I described above it only contains the previous version
>       of the PVM snapshot, plus something SVM updated which may not match
>       with PVM's data:
> 
>            (1.a) if checkpoint triggered because of x-checkpoint-delay,
>            lower risk, possibly still in sync with src
> 
>            (1.b) if checkpoint triggered by colo-compare notification of
>            packet misalignment, I believe this may cause service
>            interruptions and it means SVM will not be able to competely

No, that's ok - the colo-compare mismatch triggers the need for a checkpoint;
but if the PVM dies during the creation of that checkpoint, it's the same as
if the PVM had never started making the checkpoint; the SVM just takes over.
But the important thing is that the packet that caused the micompare can't
be released until after the hosts are in sync again.

> 
>   (2) Failure happens _after_ applying the new checkpoint, but _before_ the
>       whole checkpoint is applied.
> 
>       To be explicit, consider qemu_load_device_state() when the process of
>       colo_incoming_process_checkpoint() failed.  It means SVM applied
>       partial of PVM's checkpoint, I think it should mean PVM is completely
>       corrupted.

As long as the SVM has got the entire checkpoint, then it *can* apply it all
and carry on from that point.

> Here either (1.b) or (2) seems fatal to me on the whole high level design.
> Periodical syncs with x-checkpoint-delay can make this easier to happen, so
> larger windows of critical failures.  That's also why I think it's
> confusing COLO prefers more checkpoints - while it helps sync things up, it
> enlarges high risk window and overall overhead.

No, there should be no point at which a failure leaves the SVM without a checkpoint
that it can apply to take over.

> > > > I have quite a few more performance and cleanup patches on my hands,
> > > > for example to transfer dirty memory between checkpoints.
> > > >   
> > > > > 
> > > > > IIUC, the critical path of COLO shouldn't be migration on its own?  It
> > > > > should be when heartbeat gets lost; that normally should happen when two
> > > > > VMs are in sync.  In this path, I don't see how multifd helps..  because
> > > > > there's no migration happening, only the src recording what has changed.
> > > > > Hence I think some number with description of the measurements may help us
> > > > > understand how important multifd is to COLO.
> > > > > 
> > > > > Supporting multifd will cause new COLO functions to inject into core
> > > > > migration code paths (even if not much..). I want to make sure such (new)
> > > > > complexity is justified. I also want to avoid introducing a feature only
> > > > > because "we have XXX, then let's support XXX in COLO too, maybe some day
> > > > > it'll be useful".  
> > > > 
> > > > What COLO needs from migration at the low level:
> > > > 
> > > > Primary/Outgoing side:
> > > > 
> > > > Not much actually, we just need a way to incrementally send the
> > > > dirtied memory and the full device state.
> > > > Also, we ensure that migration never actually finishes since we will
> > > > never do a switchover. For example we never set
> > > > RAMState::last_stage with COLO.
> > > > 
> > > > Secondary/Incoming side:
> > > > 
> > > > colo cache:
> > > > Since the secondary always needs to be ready to take over (even during
> > > > checkpointing), we can not write the received ram pages directly to
> > > > the guest ram to prevent having half of the old and half of the new
> > > > contents.
> > > > So we redirect the received ram pages to the colo cache. This is
> > > > basically a mirror of the primary side ram.
> > > > It also simplifies the primary side since from it's point of view it's
> > > > just a normal migration target. So primary side doesn't have to care
> > > > about dirtied pages on the secondary for example.
> > > > 
> > > > Dirty Bitmap:
> > > > With COLO we also need a dirty bitmap on the incoming side to track
> > > > 1. pages dirtied by the secondary guest
> > > > 2. pages dirtied by the primary guest (incoming ram pages)
> > > > In the last step during the checkpointing, this bitmap is then used
> > > > to overwrite the guest ram with the colo cache so the secondary guest
> > > > is in sync with the primary guest.
> > > > 
> > > > All this individually is very little code as you can see from my
> > > > multifd patch. Just something to keep in mind I guess.
> > > > 
> > > > 
> > > > At the high level we have the COLO framework outgoing and incoming
> > > > threads which just tell the migration code to:
> > > > Send all ram pages (qemu_savevm_live_state()) on the outgoing side
> > > > paired with a qemu_loadvm_state_main on the incoming side.
> > > > Send the device state (qemu_save_device_state()) paired with writing
> > > > that stream to a buffer on the incoming side.
> > > > And finally flusing the colo cache and loading the device state on the
> > > > incoming side.
> > > > 
> > > > And of course we coordinate with the colo block replication and
> > > > colo-compare.  
> > > 
> > > Thank you.  Maybe you should generalize some of the explanations and put it
> > > into docs/devel/migration/ somewhere.  I think many of them are not
> > > mentioned in the doc on how COLO works internally.
> > > 
> > > Let me ask some more questions while I'm reading COLO today:
> > > 
> > > - For each of the checkpoint (colo_do_checkpoint_transaction()), COLO will
> > >   do the following:
> > > 
> > >     bql_lock()
> > >     vm_stop_force_state(RUN_STATE_COLO)     # stop vm
> > >     bql_unlock()
> > > 
> > >     ...
> > >   
> > >     bql_lock()
> > >     qemu_save_device_state()                # into a temp buffer fb
> > >     bql_unlock()
> > > 
> > >     ...
> > > 
> > >     qemu_savevm_state_complete_precopy()    # send RAM, directly to the wire
> > >     qemu_put_buffer(fb)                     # push temp buffer fb to wire
> > > 
> > >     ...
> > > 
> > >     bql_lock()
> > >     vm_start()                              # start vm
> > >     bql_unlock()
> > > 
> > >   A few questions that I didn't ask previously:
> > > 
> > >   - If VM is stopped anyway, why putting the device states into a temp
> > >     buffer, instead of using what we already have for precopy phase, or
> > >     just push everything directly to the wire?
> > 
> > Actually we only do that to get the size of the device state and send
> > the size out-of-band, since we can not use qemu_load_device_state()
> > directly on the secondary side and look for the in-band EOF.
> 
> I also don't understand why the size is needed..
> 
> Currently the streaming protocol for COLO is:
> 
>   - ...
>   - COLO_MESSAGE_VMSTATE_SEND
>   - RAM data
>   - EOF
>   - COLO_MESSAGE_VMSTATE_SIZE
>   - non-RAM data
>   - EOF
> 
> My question is about, why can't we do this instead?
> 
>   - ...
>   - COLO_MESSAGE_VMSTATE_SEND
>   - RAM data
>   - non-RAM data
>   - EOF
> 
> If the VM is stoppped during the whole process anyway..
> 
> Here RAM/non-RAM data all are vmstates, and logically can also be loaded in
> one shot of a vmstate load loop.

You might be able to; in that case you would have to stream the 
entire thing into a buffer on the secondary rather than applying the
RAM updates to the colo cache.

> 
> > 
> > > 
> > >   - Above operation frequently releases BQL, why is it needed?  What
> > >     happens if (within the window where BQL released) someone invoked QMP
> > >     command "cont" and causing VM to start? Would COLO be broken with it?
> > >     Should we take BQL for the whole process to avoid it?
> > 
> > We need to release the BQL because block replication on the secondary side and
> > colo-compare and netfilters on the primary side need the main loop to
> > make progress.
> 
> Do we need it to make progress before vm_start(), though?  If we take BQL
> once and release it once only after vm_start(), would it work?
> 
> I didn't see anything being checked in colo_do_checkpoint_transaction(),
> after vm_stop() + replication_do_checkpoint_all(), and before vm_start()..
> 
> > 
> > Issuing a cont during a checkpoint will probably break it yes.
> 
> Feel free to send a patch if you think it's a concern.  Ok to me even if
> without, if mgmt has full control of it, so I'll leave it to you to decide
> as I'm not a colo user after all.
> 
> > 
> > > 
> > > - Does colo_cache has an limitation, or should we expect SVM to double
> > >   consume the guest RAM size?  As I didn't see where colo_cache will be
> > >   released during each sync (e.g. after colo_flush_ram_cache).  I am
> > >   expecting over time SVM will have most of the pages touched, then the
> > >   colo_cache can consume the same as guest mem on SVM.
> > 
> > Yes, the secondary side consumes twice the guest ram size. That is one
> > disadvantage of this approach.
> > 
> > I guess we could do some kind of copy on write mapping for the
> > secondary guest ram. But even then it's hard to make the ram overhead
> > bounded in size.
> 
> It's ok, though this sounds also like something good to be documented, it's
> a very high level knowledge an user should know when considering COLO as an
> HA solution.

The thought of using userfaultfd-write had floated around at some time
as ways to optimise this.

Dave

> Thanks,
> 
> -- 
> Peter Xu
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/
Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Peter Xu 2 weeks, 5 days ago
On Tue, Jan 20, 2026 at 07:04:09PM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Tue, Jan 20, 2026 at 12:48:47PM +0100, Lukas Straub wrote:
> > > On Mon, 19 Jan 2026 17:33:25 -0500
> > > Peter Xu <peterx@redhat.com> wrote:
> > > 
> > > > On Sat, Jan 17, 2026 at 08:49:13PM +0100, Lukas Straub wrote:
> > > > > On Thu, 15 Jan 2026 18:38:51 -0500
> > > > > Peter Xu <peterx@redhat.com> wrote:
> > > > >   
> > > > > > On Thu, Jan 15, 2026 at 10:59:47PM +0000, Dr. David Alan Gilbert wrote:  
> > > > > > > * Peter Xu (peterx@redhat.com) wrote:    
> > > > > > > > On Thu, Jan 15, 2026 at 10:49:29PM +0100, Lukas Straub wrote:    
> > > > > > > > > Nack.
> > > > > > > > > 
> > > > > > > > > This code has users, as explained in my other email:
> > > > > > > > > https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464    
> > > > > > > > 
> > > > > > > > Please then rework that series and consider include the following (I
> > > > > > > > believe I pointed out a long time ago somewhere..):
> > > > > > > >     
> > > > > > >     
> > > > > > > > - Some form of justification of why multifd needs to be enabled for COLO.
> > > > > > > >   For example, in your cluster deployment, using multifd can improve XXX
> > > > > > > >   by YYY.  Please describe the use case and improvements.    
> > > > > > > 
> > > > > > > That one is pretty easy; since COLO is regularly taking snapshots, the faster
> > > > > > > the snapshoting the less overhead there is.    
> > > > > > 
> > > > > > Thanks for chiming in, Dave.  I can explain why I want to request for some
> > > > > > numbers.
> > > > > > 
> > > > > > Firstly, numbers normally proves it's used in a real system.  It's at least
> > > > > > being used and seriously tested.
> > > > > > 
> > > > > > Secondly, per my very limited understanding to COLO... the two VMs in most
> > > > > > cases should be in-sync state already when both sides generate the same
> > > > > > network packets.
> > > > > > 
> > > > > > Another sync (where multifd can start to take effect) is only needed when
> > > > > > there're packets misalignments, but IIUC it should be rare.  I don't know
> > > > > > how rare it is, it would be good if Lukas could introduce some of those
> > > > > > numbers in his deployment to help us understand COLO better if we'll need
> > > > > > to keep it.  
> > > > > 
> > > > > It really depends on the workload and if you want to tune for
> > > > > throughput or latency.  
> > > > 
> > > > Thanks for all the answers from all of you.
> > > > 
> > > > If we decide to keep COLO, looks like I'll have no choice but understand it
> > > > better, as long as it still has anything to do with migration..  I'll leave
> > > > some more questions / comments at the end.
> > > > 
> > > > > 
> > > > > You need to do a checkpoint eventually and the more time passes between
> > > > > checkpoints the more dirty memory you have to transfer during the
> > > > > checkpoint.
> > > > > 
> > > > > Also keep in mind that the guest is stopped during checkpoints. Because
> > > > > even if we continue running the guest, we can not release the mismatched
> > > > > packets since that would expose a state of the guest to the outside
> > > > > world that is not yet replicated to the secondary.  
> > > > 
> > > > Yes this makes sense.  However it is also the very confusing part of COLO.
> > > > 
> > > > When I said "I was expecting migration to not be the hot path", one reason
> > > > is I believe COLO checkpoint (or say, when migration happens) will
> > > > introduce a larger downtime than normal migration, because this process
> > > > transfers RAM with both VMs stopped.
> > > > 
> > > > You helped explain why that large downtime is needed, thanks.  However then
> > > > it means either (1) packet misalignment, or (2) periodical timer kickoff,
> > > > either of them will kickoff checkpoint..
> > > 
> > > Yes, it could be optimized so we don't stop the guest for the periodical
> > > checkpoints.
> > 
> > Likely we must stop it at least to savevm on non-rams.  But I get your
> > point.  Yes, I think it might be good idea to try to keep in sync even
> > without an explicit checkpoint request, almost like what live precopy does
> > with RAM to shrink the downtime.
> > 
> > > 
> > > > 
> > > > I don't know if COLO services care about such relatively large downtime,
> > > > especially it is not happening once, but periodically for every tens of
> > > > seconds at least (assuming when periodically then packet misalignment is
> > > > rare).
> > > > 
> > > 
> > > If you want to tune for latency you go for like 500ms checkpoint
> > > interval.
> > > 
> > > 
> > > The alternative way to do fault tolerance is micro checkpointing where
> > > only the primary guest runs while you buffer all sent packets. Then
> > > every checkpoint you transfer all ram and device state to the secondary
> > > and only then release all network packets.
> > > So in this approach every packet is delayed by checkpoint interval +
> > > checkpoint downtime and you use a checkpoint interval of like 30-100ms.
> > > 
> > > Obviously, COLO is a much better approach because only few packets
> > > observe a delay.
> > > 
> > > > > 
> > > > > So the migration performance is actually the most important part in
> > > > > COLO to keep the checkpoints as short as possible.  
> > > > 
> > > > IIUC when a heartbeat will be lost on PVM _during_ sync checkpoints, then
> > > > SVM can only rollback to the last time checkpoint.  Would this be good
> > > > enough in reality?  It means if there's a TCP transaction then it may broke
> > > > anyway.  x-checkpoint-delay / periodic checkpoints definitely make this
> > > > more possible to happen.
> > > 
> > > We only release the mismatched packets after the ram and device state
> > > is fully sent to the secondary. Because then the secondary is in the
> > > state that generated these mismatched packets and can take over.
> > 
> > My question was more about how COLO failover works (or work at all?) if a
> > failure happens exactly during checkpointing (aka, migration happening).
> > 
> > First of all, if the failure happens on SVM, IIUC it's not a problem,
> > because PVM has all the latest data.
> > 
> > The problem lies more in the case where the failure happened in PVM. In
> > this case, SVM only contains the previous checkpoint results, maybe plus
> > something on top of that snapshot, as SVM kept running after the previous
> > checkpoint.
> > 
> > So the failure can happen at different spots:
> > 
> >   (1) Failure happens _before_ applying the new checkpoint, that is, while
> >       receiving the checkpoint from src and for example the PVM host is
> >       down, channel shutdown.
> > 
> >       This one looks "okay", IIUC what will happen is SVM will keep running
> >       but then as I described above it only contains the previous version
> >       of the PVM snapshot, plus something SVM updated which may not match
> >       with PVM's data:
> > 
> >            (1.a) if checkpoint triggered because of x-checkpoint-delay,
> >            lower risk, possibly still in sync with src
> > 
> >            (1.b) if checkpoint triggered by colo-compare notification of
> >            packet misalignment, I believe this may cause service
> >            interruptions and it means SVM will not be able to competely
> 
> No, that's ok - the colo-compare mismatch triggers the need for a checkpoint;
> but if the PVM dies during the creation of that checkpoint, it's the same as
> if the PVM had never started making the checkpoint; the SVM just takes over.
> But the important thing is that the packet that caused the micompare can't
> be released until after the hosts are in sync again.
> 
> > 
> >   (2) Failure happens _after_ applying the new checkpoint, but _before_ the
> >       whole checkpoint is applied.
> > 
> >       To be explicit, consider qemu_load_device_state() when the process of
> >       colo_incoming_process_checkpoint() failed.  It means SVM applied
> >       partial of PVM's checkpoint, I think it should mean PVM is completely
> >       corrupted.
> 
> As long as the SVM has got the entire checkpoint, then it *can* apply it all
> and carry on from that point.

Does it mean we assert() that qemu_load_device_state() will always success
for COLO syncs?

Logically post_load() can invoke anything and I'm not sure if something can
start to fail, but I confess I don't know an existing device that can
trigger it.

Lukas told me something was broken though with pc machine type, on
post_load() not re-entrant.  I think it might be possible though when
post_load() is relevant to some device states (that guest driver can change
between two checkpoint loads), but that's still only theoretical.  So maybe
we can indeed assert it here.

> 
> > Here either (1.b) or (2) seems fatal to me on the whole high level design.
> > Periodical syncs with x-checkpoint-delay can make this easier to happen, so
> > larger windows of critical failures.  That's also why I think it's
> > confusing COLO prefers more checkpoints - while it helps sync things up, it
> > enlarges high risk window and overall overhead.
> 
> No, there should be no point at which a failure leaves the SVM without a checkpoint
> that it can apply to take over.
> 
> > > > > I have quite a few more performance and cleanup patches on my hands,
> > > > > for example to transfer dirty memory between checkpoints.
> > > > >   
> > > > > > 
> > > > > > IIUC, the critical path of COLO shouldn't be migration on its own?  It
> > > > > > should be when heartbeat gets lost; that normally should happen when two
> > > > > > VMs are in sync.  In this path, I don't see how multifd helps..  because
> > > > > > there's no migration happening, only the src recording what has changed.
> > > > > > Hence I think some number with description of the measurements may help us
> > > > > > understand how important multifd is to COLO.
> > > > > > 
> > > > > > Supporting multifd will cause new COLO functions to inject into core
> > > > > > migration code paths (even if not much..). I want to make sure such (new)
> > > > > > complexity is justified. I also want to avoid introducing a feature only
> > > > > > because "we have XXX, then let's support XXX in COLO too, maybe some day
> > > > > > it'll be useful".  
> > > > > 
> > > > > What COLO needs from migration at the low level:
> > > > > 
> > > > > Primary/Outgoing side:
> > > > > 
> > > > > Not much actually, we just need a way to incrementally send the
> > > > > dirtied memory and the full device state.
> > > > > Also, we ensure that migration never actually finishes since we will
> > > > > never do a switchover. For example we never set
> > > > > RAMState::last_stage with COLO.
> > > > > 
> > > > > Secondary/Incoming side:
> > > > > 
> > > > > colo cache:
> > > > > Since the secondary always needs to be ready to take over (even during
> > > > > checkpointing), we can not write the received ram pages directly to
> > > > > the guest ram to prevent having half of the old and half of the new
> > > > > contents.
> > > > > So we redirect the received ram pages to the colo cache. This is
> > > > > basically a mirror of the primary side ram.
> > > > > It also simplifies the primary side since from it's point of view it's
> > > > > just a normal migration target. So primary side doesn't have to care
> > > > > about dirtied pages on the secondary for example.
> > > > > 
> > > > > Dirty Bitmap:
> > > > > With COLO we also need a dirty bitmap on the incoming side to track
> > > > > 1. pages dirtied by the secondary guest
> > > > > 2. pages dirtied by the primary guest (incoming ram pages)
> > > > > In the last step during the checkpointing, this bitmap is then used
> > > > > to overwrite the guest ram with the colo cache so the secondary guest
> > > > > is in sync with the primary guest.
> > > > > 
> > > > > All this individually is very little code as you can see from my
> > > > > multifd patch. Just something to keep in mind I guess.
> > > > > 
> > > > > 
> > > > > At the high level we have the COLO framework outgoing and incoming
> > > > > threads which just tell the migration code to:
> > > > > Send all ram pages (qemu_savevm_live_state()) on the outgoing side
> > > > > paired with a qemu_loadvm_state_main on the incoming side.
> > > > > Send the device state (qemu_save_device_state()) paired with writing
> > > > > that stream to a buffer on the incoming side.
> > > > > And finally flusing the colo cache and loading the device state on the
> > > > > incoming side.
> > > > > 
> > > > > And of course we coordinate with the colo block replication and
> > > > > colo-compare.  
> > > > 
> > > > Thank you.  Maybe you should generalize some of the explanations and put it
> > > > into docs/devel/migration/ somewhere.  I think many of them are not
> > > > mentioned in the doc on how COLO works internally.
> > > > 
> > > > Let me ask some more questions while I'm reading COLO today:
> > > > 
> > > > - For each of the checkpoint (colo_do_checkpoint_transaction()), COLO will
> > > >   do the following:
> > > > 
> > > >     bql_lock()
> > > >     vm_stop_force_state(RUN_STATE_COLO)     # stop vm
> > > >     bql_unlock()
> > > > 
> > > >     ...
> > > >   
> > > >     bql_lock()
> > > >     qemu_save_device_state()                # into a temp buffer fb
> > > >     bql_unlock()
> > > > 
> > > >     ...
> > > > 
> > > >     qemu_savevm_state_complete_precopy()    # send RAM, directly to the wire
> > > >     qemu_put_buffer(fb)                     # push temp buffer fb to wire
> > > > 
> > > >     ...
> > > > 
> > > >     bql_lock()
> > > >     vm_start()                              # start vm
> > > >     bql_unlock()
> > > > 
> > > >   A few questions that I didn't ask previously:
> > > > 
> > > >   - If VM is stopped anyway, why putting the device states into a temp
> > > >     buffer, instead of using what we already have for precopy phase, or
> > > >     just push everything directly to the wire?
> > > 
> > > Actually we only do that to get the size of the device state and send
> > > the size out-of-band, since we can not use qemu_load_device_state()
> > > directly on the secondary side and look for the in-band EOF.
> > 
> > I also don't understand why the size is needed..
> > 
> > Currently the streaming protocol for COLO is:
> > 
> >   - ...
> >   - COLO_MESSAGE_VMSTATE_SEND
> >   - RAM data
> >   - EOF
> >   - COLO_MESSAGE_VMSTATE_SIZE
> >   - non-RAM data
> >   - EOF
> > 
> > My question is about, why can't we do this instead?
> > 
> >   - ...
> >   - COLO_MESSAGE_VMSTATE_SEND
> >   - RAM data

[1]

> >   - non-RAM data
> >   - EOF
> > 
> > If the VM is stoppped during the whole process anyway..
> > 
> > Here RAM/non-RAM data all are vmstates, and logically can also be loaded in
> > one shot of a vmstate load loop.
> 
> You might be able to; in that case you would have to stream the 
> entire thing into a buffer on the secondary rather than applying the
> RAM updates to the colo cache.

I thought the colo cache is already such a buffering when receiving at [1]
above?  Then we need to flush the colo cache (including scan the SVM bitmap
and only flush those pages in colo cache) like before.

If something went wrong (e.g. channel broken during receiving non-ram
device states), SVM can directly drop all colo cache as the latest
checkpoint isn't complete.

> 
> > 
> > > 
> > > > 
> > > >   - Above operation frequently releases BQL, why is it needed?  What
> > > >     happens if (within the window where BQL released) someone invoked QMP
> > > >     command "cont" and causing VM to start? Would COLO be broken with it?
> > > >     Should we take BQL for the whole process to avoid it?
> > > 
> > > We need to release the BQL because block replication on the secondary side and
> > > colo-compare and netfilters on the primary side need the main loop to
> > > make progress.
> > 
> > Do we need it to make progress before vm_start(), though?  If we take BQL
> > once and release it once only after vm_start(), would it work?
> > 
> > I didn't see anything being checked in colo_do_checkpoint_transaction(),
> > after vm_stop() + replication_do_checkpoint_all(), and before vm_start()..
> > 
> > > 
> > > Issuing a cont during a checkpoint will probably break it yes.
> > 
> > Feel free to send a patch if you think it's a concern.  Ok to me even if
> > without, if mgmt has full control of it, so I'll leave it to you to decide
> > as I'm not a colo user after all.
> > 
> > > 
> > > > 
> > > > - Does colo_cache has an limitation, or should we expect SVM to double
> > > >   consume the guest RAM size?  As I didn't see where colo_cache will be
> > > >   released during each sync (e.g. after colo_flush_ram_cache).  I am
> > > >   expecting over time SVM will have most of the pages touched, then the
> > > >   colo_cache can consume the same as guest mem on SVM.
> > > 
> > > Yes, the secondary side consumes twice the guest ram size. That is one
> > > disadvantage of this approach.
> > > 
> > > I guess we could do some kind of copy on write mapping for the
> > > secondary guest ram. But even then it's hard to make the ram overhead
> > > bounded in size.
> > 
> > It's ok, though this sounds also like something good to be documented, it's
> > a very high level knowledge an user should know when considering COLO as an
> > HA solution.
> 
> The thought of using userfaultfd-write had floated around at some time
> as ways to optimise this.

It's an interesting idea. Yes it looks working, but as Lukas said, it looks
still unbounded.

One idea to provide a strict bound:

  - admin sets a proper buffer to limit the extra pages to remember on SVM,
    should be much smaller than total guest mem, but admin should make sure
    in 99.99% cases it won't hit the limit with a proper x-checkpoint-delay,

  - if limit triggered, both VMs needs to pause (initiated by SVM), SVM
    needs to explicitly request a checkpoint to src,

  - VMs can only start again after two VMs sync again

Thanks,

-- 
Peter Xu
Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Dr. David Alan Gilbert 2 weeks, 5 days ago
* Peter Xu (peterx@redhat.com) wrote:
> On Tue, Jan 20, 2026 at 07:04:09PM +0000, Dr. David Alan Gilbert wrote:

<snip>

> > >   (2) Failure happens _after_ applying the new checkpoint, but _before_ the
> > >       whole checkpoint is applied.
> > > 
> > >       To be explicit, consider qemu_load_device_state() when the process of
> > >       colo_incoming_process_checkpoint() failed.  It means SVM applied
> > >       partial of PVM's checkpoint, I think it should mean PVM is completely
> > >       corrupted.
> > 
> > As long as the SVM has got the entire checkpoint, then it *can* apply it all
> > and carry on from that point.
> 
> Does it mean we assert() that qemu_load_device_state() will always success
> for COLO syncs?

Not sure; I'd expect if that load fails then the SVM fails; if that happens
on a periodic checkpoint then the PVM should carry on.

> Logically post_load() can invoke anything and I'm not sure if something can
> start to fail, but I confess I don't know an existing device that can
> trigger it.

Like a postcopy, it shouldn't fail unless there's an underlying failure
(e.g. storage died)

> Lukas told me something was broken though with pc machine type, on
> post_load() not re-entrant.  I think it might be possible though when
> post_load() is relevant to some device states (that guest driver can change
> between two checkpoint loads), but that's still only theoretical.  So maybe
> we can indeed assert it here.

I don't understand that non re-entrant bit?

> > 
> > > Here either (1.b) or (2) seems fatal to me on the whole high level design.
> > > Periodical syncs with x-checkpoint-delay can make this easier to happen, so
> > > larger windows of critical failures.  That's also why I think it's
> > > confusing COLO prefers more checkpoints - while it helps sync things up, it
> > > enlarges high risk window and overall overhead.
> > 
> > No, there should be no point at which a failure leaves the SVM without a checkpoint
> > that it can apply to take over.
> > 
> > > > > > I have quite a few more performance and cleanup patches on my hands,
> > > > > > for example to transfer dirty memory between checkpoints.
> > > > > >   
> > > > > > > 
> > > > > > > IIUC, the critical path of COLO shouldn't be migration on its own?  It
> > > > > > > should be when heartbeat gets lost; that normally should happen when two
> > > > > > > VMs are in sync.  In this path, I don't see how multifd helps..  because
> > > > > > > there's no migration happening, only the src recording what has changed.
> > > > > > > Hence I think some number with description of the measurements may help us
> > > > > > > understand how important multifd is to COLO.
> > > > > > > 
> > > > > > > Supporting multifd will cause new COLO functions to inject into core
> > > > > > > migration code paths (even if not much..). I want to make sure such (new)
> > > > > > > complexity is justified. I also want to avoid introducing a feature only
> > > > > > > because "we have XXX, then let's support XXX in COLO too, maybe some day
> > > > > > > it'll be useful".  
> > > > > > 
> > > > > > What COLO needs from migration at the low level:
> > > > > > 
> > > > > > Primary/Outgoing side:
> > > > > > 
> > > > > > Not much actually, we just need a way to incrementally send the
> > > > > > dirtied memory and the full device state.
> > > > > > Also, we ensure that migration never actually finishes since we will
> > > > > > never do a switchover. For example we never set
> > > > > > RAMState::last_stage with COLO.
> > > > > > 
> > > > > > Secondary/Incoming side:
> > > > > > 
> > > > > > colo cache:
> > > > > > Since the secondary always needs to be ready to take over (even during
> > > > > > checkpointing), we can not write the received ram pages directly to
> > > > > > the guest ram to prevent having half of the old and half of the new
> > > > > > contents.
> > > > > > So we redirect the received ram pages to the colo cache. This is
> > > > > > basically a mirror of the primary side ram.
> > > > > > It also simplifies the primary side since from it's point of view it's
> > > > > > just a normal migration target. So primary side doesn't have to care
> > > > > > about dirtied pages on the secondary for example.
> > > > > > 
> > > > > > Dirty Bitmap:
> > > > > > With COLO we also need a dirty bitmap on the incoming side to track
> > > > > > 1. pages dirtied by the secondary guest
> > > > > > 2. pages dirtied by the primary guest (incoming ram pages)
> > > > > > In the last step during the checkpointing, this bitmap is then used
> > > > > > to overwrite the guest ram with the colo cache so the secondary guest
> > > > > > is in sync with the primary guest.
> > > > > > 
> > > > > > All this individually is very little code as you can see from my
> > > > > > multifd patch. Just something to keep in mind I guess.
> > > > > > 
> > > > > > 
> > > > > > At the high level we have the COLO framework outgoing and incoming
> > > > > > threads which just tell the migration code to:
> > > > > > Send all ram pages (qemu_savevm_live_state()) on the outgoing side
> > > > > > paired with a qemu_loadvm_state_main on the incoming side.
> > > > > > Send the device state (qemu_save_device_state()) paired with writing
> > > > > > that stream to a buffer on the incoming side.
> > > > > > And finally flusing the colo cache and loading the device state on the
> > > > > > incoming side.
> > > > > > 
> > > > > > And of course we coordinate with the colo block replication and
> > > > > > colo-compare.  
> > > > > 
> > > > > Thank you.  Maybe you should generalize some of the explanations and put it
> > > > > into docs/devel/migration/ somewhere.  I think many of them are not
> > > > > mentioned in the doc on how COLO works internally.
> > > > > 
> > > > > Let me ask some more questions while I'm reading COLO today:
> > > > > 
> > > > > - For each of the checkpoint (colo_do_checkpoint_transaction()), COLO will
> > > > >   do the following:
> > > > > 
> > > > >     bql_lock()
> > > > >     vm_stop_force_state(RUN_STATE_COLO)     # stop vm
> > > > >     bql_unlock()
> > > > > 
> > > > >     ...
> > > > >   
> > > > >     bql_lock()
> > > > >     qemu_save_device_state()                # into a temp buffer fb
> > > > >     bql_unlock()
> > > > > 
> > > > >     ...
> > > > > 
> > > > >     qemu_savevm_state_complete_precopy()    # send RAM, directly to the wire
> > > > >     qemu_put_buffer(fb)                     # push temp buffer fb to wire
> > > > > 
> > > > >     ...
> > > > > 
> > > > >     bql_lock()
> > > > >     vm_start()                              # start vm
> > > > >     bql_unlock()
> > > > > 
> > > > >   A few questions that I didn't ask previously:
> > > > > 
> > > > >   - If VM is stopped anyway, why putting the device states into a temp
> > > > >     buffer, instead of using what we already have for precopy phase, or
> > > > >     just push everything directly to the wire?
> > > > 
> > > > Actually we only do that to get the size of the device state and send
> > > > the size out-of-band, since we can not use qemu_load_device_state()
> > > > directly on the secondary side and look for the in-band EOF.
> > > 
> > > I also don't understand why the size is needed..
> > > 
> > > Currently the streaming protocol for COLO is:
> > > 
> > >   - ...
> > >   - COLO_MESSAGE_VMSTATE_SEND
> > >   - RAM data
> > >   - EOF
> > >   - COLO_MESSAGE_VMSTATE_SIZE
> > >   - non-RAM data
> > >   - EOF
> > > 
> > > My question is about, why can't we do this instead?
> > > 
> > >   - ...
> > >   - COLO_MESSAGE_VMSTATE_SEND
> > >   - RAM data
> 
> [1]
> 
> > >   - non-RAM data
> > >   - EOF
> > > 
> > > If the VM is stoppped during the whole process anyway..
> > > 
> > > Here RAM/non-RAM data all are vmstates, and logically can also be loaded in
> > > one shot of a vmstate load loop.
> > 
> > You might be able to; in that case you would have to stream the 
> > entire thing into a buffer on the secondary rather than applying the
> > RAM updates to the colo cache.
> 
> I thought the colo cache is already such a buffering when receiving at [1]
> above?  Then we need to flush the colo cache (including scan the SVM bitmap
> and only flush those pages in colo cache) like before.
> 
> If something went wrong (e.g. channel broken during receiving non-ram
> device states), SVM can directly drop all colo cache as the latest
> checkpoint isn't complete.

Oh, I think I've remembered why it's necessary to split it into RAM and non-RAM;
you can't parse a non-RAM stream and know when you've got an EOF flag in the stream;
especially for stuff that's open coded (like some of virtio);   so there's
no way to write a 'load until EOF' into a simple RAM buffer; you need to be
given an explicit size to know how much to expect.

You could do it for the RAM, but you'd need to write a protocol parser
to follow the stream to watch for the EOF.  It's actuallly harder with multifd;
how would you make a temporary buffer with multiple streams like that?

> > The thought of using userfaultfd-write had floated around at some time
> > as ways to optimise this.
> 
> It's an interesting idea. Yes it looks working, but as Lukas said, it looks
> still unbounded.
> 
> One idea to provide a strict bound:
> 
>   - admin sets a proper buffer to limit the extra pages to remember on SVM,
>     should be much smaller than total guest mem, but admin should make sure
>     in 99.99% cases it won't hit the limit with a proper x-checkpoint-delay,
> 
>   - if limit triggered, both VMs needs to pause (initiated by SVM), SVM
>     needs to explicitly request a checkpoint to src,
> 
>   - VMs can only start again after two VMs sync again

Right, that should be doable with a userfault-write.

Dave

> Thanks,
> 
> -- 
> Peter Xu
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/
Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Peter Xu 2 weeks, 5 days ago
On Wed, Jan 21, 2026 at 01:25:32AM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Tue, Jan 20, 2026 at 07:04:09PM +0000, Dr. David Alan Gilbert wrote:
> 
> <snip>
> 
> > > >   (2) Failure happens _after_ applying the new checkpoint, but _before_ the
> > > >       whole checkpoint is applied.
> > > > 
> > > >       To be explicit, consider qemu_load_device_state() when the process of
> > > >       colo_incoming_process_checkpoint() failed.  It means SVM applied
> > > >       partial of PVM's checkpoint, I think it should mean PVM is completely
> > > >       corrupted.
> > > 
> > > As long as the SVM has got the entire checkpoint, then it *can* apply it all
> > > and carry on from that point.
> > 
> > Does it mean we assert() that qemu_load_device_state() will always success
> > for COLO syncs?
> 
> Not sure; I'd expect if that load fails then the SVM fails; if that happens
> on a periodic checkpoint then the PVM should carry on.

Hmm right, if qemu_load_device_state() failed, likely PVM is still alive.

> 
> > Logically post_load() can invoke anything and I'm not sure if something can
> > start to fail, but I confess I don't know an existing device that can
> > trigger it.
> 
> Like a postcopy, it shouldn't fail unless there's an underlying failure
> (e.g. storage died)

Postcopy can definitely fail at post_load()..  Actually Juraj just fixed it
for 10.2 here so postcopy can now fail properly while save/load device
states (we used to hang):

https://lore.kernel.org/r/20251103183301.3840862-1-jmarcin@redhat.com

The two major causes that can fail postcopy vmstate load that I hit (while
looking at bugs after you left; I wished you are still here!):

(1) KVM put() failures due to kernel version mismatch, or,

(2) virtio post_load() failures due to e.g. virtio feature unsupported.

Both of them fall into "unsupported dest kernel version" realm, though, so
indeed it may not affect COLO, as I expect COLO should have two hosts to
run the same kernel.

> 
> > Lukas told me something was broken though with pc machine type, on
> > post_load() not re-entrant.  I think it might be possible though when
> > post_load() is relevant to some device states (that guest driver can change
> > between two checkpoint loads), but that's still only theoretical.  So maybe
> > we can indeed assert it here.
> 
> I don't understand that non re-entrant bit?

It may not be the exact wording, the message is here:

https://lore.kernel.org/r/20260115233500.26fd1628@penguin

        There is a bug in the emulated ahci disk controller which crashes
        when it's vmstate is loaded more than once.

I was expecting it's a post_load() because normal scalar vmstates should be
fine to be loaded more than once.  I didn't look deeper.

> 
> > > 
> > > > Here either (1.b) or (2) seems fatal to me on the whole high level design.
> > > > Periodical syncs with x-checkpoint-delay can make this easier to happen, so
> > > > larger windows of critical failures.  That's also why I think it's
> > > > confusing COLO prefers more checkpoints - while it helps sync things up, it
> > > > enlarges high risk window and overall overhead.
> > > 
> > > No, there should be no point at which a failure leaves the SVM without a checkpoint
> > > that it can apply to take over.
> > > 
> > > > > > > I have quite a few more performance and cleanup patches on my hands,
> > > > > > > for example to transfer dirty memory between checkpoints.
> > > > > > >   
> > > > > > > > 
> > > > > > > > IIUC, the critical path of COLO shouldn't be migration on its own?  It
> > > > > > > > should be when heartbeat gets lost; that normally should happen when two
> > > > > > > > VMs are in sync.  In this path, I don't see how multifd helps..  because
> > > > > > > > there's no migration happening, only the src recording what has changed.
> > > > > > > > Hence I think some number with description of the measurements may help us
> > > > > > > > understand how important multifd is to COLO.
> > > > > > > > 
> > > > > > > > Supporting multifd will cause new COLO functions to inject into core
> > > > > > > > migration code paths (even if not much..). I want to make sure such (new)
> > > > > > > > complexity is justified. I also want to avoid introducing a feature only
> > > > > > > > because "we have XXX, then let's support XXX in COLO too, maybe some day
> > > > > > > > it'll be useful".  
> > > > > > > 
> > > > > > > What COLO needs from migration at the low level:
> > > > > > > 
> > > > > > > Primary/Outgoing side:
> > > > > > > 
> > > > > > > Not much actually, we just need a way to incrementally send the
> > > > > > > dirtied memory and the full device state.
> > > > > > > Also, we ensure that migration never actually finishes since we will
> > > > > > > never do a switchover. For example we never set
> > > > > > > RAMState::last_stage with COLO.
> > > > > > > 
> > > > > > > Secondary/Incoming side:
> > > > > > > 
> > > > > > > colo cache:
> > > > > > > Since the secondary always needs to be ready to take over (even during
> > > > > > > checkpointing), we can not write the received ram pages directly to
> > > > > > > the guest ram to prevent having half of the old and half of the new
> > > > > > > contents.
> > > > > > > So we redirect the received ram pages to the colo cache. This is
> > > > > > > basically a mirror of the primary side ram.
> > > > > > > It also simplifies the primary side since from it's point of view it's
> > > > > > > just a normal migration target. So primary side doesn't have to care
> > > > > > > about dirtied pages on the secondary for example.
> > > > > > > 
> > > > > > > Dirty Bitmap:
> > > > > > > With COLO we also need a dirty bitmap on the incoming side to track
> > > > > > > 1. pages dirtied by the secondary guest
> > > > > > > 2. pages dirtied by the primary guest (incoming ram pages)
> > > > > > > In the last step during the checkpointing, this bitmap is then used
> > > > > > > to overwrite the guest ram with the colo cache so the secondary guest
> > > > > > > is in sync with the primary guest.
> > > > > > > 
> > > > > > > All this individually is very little code as you can see from my
> > > > > > > multifd patch. Just something to keep in mind I guess.
> > > > > > > 
> > > > > > > 
> > > > > > > At the high level we have the COLO framework outgoing and incoming
> > > > > > > threads which just tell the migration code to:
> > > > > > > Send all ram pages (qemu_savevm_live_state()) on the outgoing side
> > > > > > > paired with a qemu_loadvm_state_main on the incoming side.
> > > > > > > Send the device state (qemu_save_device_state()) paired with writing
> > > > > > > that stream to a buffer on the incoming side.
> > > > > > > And finally flusing the colo cache and loading the device state on the
> > > > > > > incoming side.
> > > > > > > 
> > > > > > > And of course we coordinate with the colo block replication and
> > > > > > > colo-compare.  
> > > > > > 
> > > > > > Thank you.  Maybe you should generalize some of the explanations and put it
> > > > > > into docs/devel/migration/ somewhere.  I think many of them are not
> > > > > > mentioned in the doc on how COLO works internally.
> > > > > > 
> > > > > > Let me ask some more questions while I'm reading COLO today:
> > > > > > 
> > > > > > - For each of the checkpoint (colo_do_checkpoint_transaction()), COLO will
> > > > > >   do the following:
> > > > > > 
> > > > > >     bql_lock()
> > > > > >     vm_stop_force_state(RUN_STATE_COLO)     # stop vm
> > > > > >     bql_unlock()
> > > > > > 
> > > > > >     ...
> > > > > >   
> > > > > >     bql_lock()
> > > > > >     qemu_save_device_state()                # into a temp buffer fb
> > > > > >     bql_unlock()
> > > > > > 
> > > > > >     ...
> > > > > > 
> > > > > >     qemu_savevm_state_complete_precopy()    # send RAM, directly to the wire
> > > > > >     qemu_put_buffer(fb)                     # push temp buffer fb to wire
> > > > > > 
> > > > > >     ...
> > > > > > 
> > > > > >     bql_lock()
> > > > > >     vm_start()                              # start vm
> > > > > >     bql_unlock()
> > > > > > 
> > > > > >   A few questions that I didn't ask previously:
> > > > > > 
> > > > > >   - If VM is stopped anyway, why putting the device states into a temp
> > > > > >     buffer, instead of using what we already have for precopy phase, or
> > > > > >     just push everything directly to the wire?
> > > > > 
> > > > > Actually we only do that to get the size of the device state and send
> > > > > the size out-of-band, since we can not use qemu_load_device_state()
> > > > > directly on the secondary side and look for the in-band EOF.
> > > > 
> > > > I also don't understand why the size is needed..
> > > > 
> > > > Currently the streaming protocol for COLO is:
> > > > 
> > > >   - ...
> > > >   - COLO_MESSAGE_VMSTATE_SEND
> > > >   - RAM data
> > > >   - EOF
> > > >   - COLO_MESSAGE_VMSTATE_SIZE
> > > >   - non-RAM data
> > > >   - EOF
> > > > 
> > > > My question is about, why can't we do this instead?
> > > > 
> > > >   - ...
> > > >   - COLO_MESSAGE_VMSTATE_SEND
> > > >   - RAM data
> > 
> > [1]
> > 
> > > >   - non-RAM data
> > > >   - EOF
> > > > 
> > > > If the VM is stoppped during the whole process anyway..
> > > > 
> > > > Here RAM/non-RAM data all are vmstates, and logically can also be loaded in
> > > > one shot of a vmstate load loop.
> > > 
> > > You might be able to; in that case you would have to stream the 
> > > entire thing into a buffer on the secondary rather than applying the
> > > RAM updates to the colo cache.
> > 
> > I thought the colo cache is already such a buffering when receiving at [1]
> > above?  Then we need to flush the colo cache (including scan the SVM bitmap
> > and only flush those pages in colo cache) like before.
> > 
> > If something went wrong (e.g. channel broken during receiving non-ram
> > device states), SVM can directly drop all colo cache as the latest
> > checkpoint isn't complete.
> 
> Oh, I think I've remembered why it's necessary to split it into RAM and non-RAM;
> you can't parse a non-RAM stream and know when you've got an EOF flag in the stream;
> especially for stuff that's open coded (like some of virtio);   so there's

Shouldn't customized get()/put() will at least still be wrapped with a
QEMU_VM_SECTION_FULL section?

> no way to write a 'load until EOF' into a simple RAM buffer; you need to be
> given an explicit size to know how much to expect.
> 
> You could do it for the RAM, but you'd need to write a protocol parser
> to follow the stream to watch for the EOF.  It's actuallly harder with multifd;
> how would you make a temporary buffer with multiple streams like that?

My understanding is postcopy must need a buffer because postcopy needs page
request to work even during loading vmstates.  I don't see it required for
COLO, though..

I'll try to see if I can change COLO to use the generic precopy way of
dumping vmstate, then I'll know if I missed something, and what I've
missed..

Thanks,

-- 
Peter Xu
Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Dr. David Alan Gilbert 2 weeks, 5 days ago
* Peter Xu (peterx@redhat.com) wrote:
> On Wed, Jan 21, 2026 at 01:25:32AM +0000, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Tue, Jan 20, 2026 at 07:04:09PM +0000, Dr. David Alan Gilbert wrote:
> > 
> > <snip>
> > 
> > > > >   (2) Failure happens _after_ applying the new checkpoint, but _before_ the
> > > > >       whole checkpoint is applied.
> > > > > 
> > > > >       To be explicit, consider qemu_load_device_state() when the process of
> > > > >       colo_incoming_process_checkpoint() failed.  It means SVM applied
> > > > >       partial of PVM's checkpoint, I think it should mean PVM is completely
> > > > >       corrupted.
> > > > 
> > > > As long as the SVM has got the entire checkpoint, then it *can* apply it all
> > > > and carry on from that point.
> > > 
> > > Does it mean we assert() that qemu_load_device_state() will always success
> > > for COLO syncs?
> > 
> > Not sure; I'd expect if that load fails then the SVM fails; if that happens
> > on a periodic checkpoint then the PVM should carry on.
> 
> Hmm right, if qemu_load_device_state() failed, likely PVM is still alive.
> 
> > 
> > > Logically post_load() can invoke anything and I'm not sure if something can
> > > start to fail, but I confess I don't know an existing device that can
> > > trigger it.
> > 
> > Like a postcopy, it shouldn't fail unless there's an underlying failure
> > (e.g. storage died)
> 
> Postcopy can definitely fail at post_load()..  Actually Juraj just fixed it
> for 10.2 here so postcopy can now fail properly while save/load device
> states (we used to hang):
> 
> https://lore.kernel.org/r/20251103183301.3840862-1-jmarcin@redhat.com

Ah good.

> The two major causes that can fail postcopy vmstate load that I hit (while
> looking at bugs after you left; I wished you are still here!):
> 
> (1) KVM put() failures due to kernel version mismatch, or,
> 
> (2) virtio post_load() failures due to e.g. virtio feature unsupported.
> 
> Both of them fall into "unsupported dest kernel version" realm, though, so
> indeed it may not affect COLO, as I expect COLO should have two hosts to
> run the same kernel.

Right.

> > > Lukas told me something was broken though with pc machine type, on
> > > post_load() not re-entrant.  I think it might be possible though when
> > > post_load() is relevant to some device states (that guest driver can change
> > > between two checkpoint loads), but that's still only theoretical.  So maybe
> > > we can indeed assert it here.
> > 
> > I don't understand that non re-entrant bit?
> 
> It may not be the exact wording, the message is here:
> 
> https://lore.kernel.org/r/20260115233500.26fd1628@penguin
> 
>         There is a bug in the emulated ahci disk controller which crashes
>         when it's vmstate is loaded more than once.
> 
> I was expecting it's a post_load() because normal scalar vmstates should be
> fine to be loaded more than once.  I didn't look deeper.

Oh I see, multiple calls to post-load rather than calling within side each other;
yeh that makes sense - some things aren't expecting that.
But again, you're likely to find that out pretty quickly either way; it's not
something that is made worse by regular checkpointing.

<snip>

> > Oh, I think I've remembered why it's necessary to split it into RAM and non-RAM;
> > you can't parse a non-RAM stream and know when you've got an EOF flag in the stream;
> > especially for stuff that's open coded (like some of virtio);   so there's
> 
> Shouldn't customized get()/put() will at least still be wrapped with a
> QEMU_VM_SECTION_FULL section?

Yes - but the VM_SECTION wrapper doesn't tell you how long the data in the
section is; you have to walk your vmstate structures, decoding the data
(and possibly doing magic get()/put()'s) and at the end hoping
you hit a VMS_END (which I added just to spot screwups in this process).
So there's no way to 'read the whole of a VM_SECTION' - because you don't
know you've hit the end until you've decoded it.
(And some of those get() calls are open coded list storage which are something
like

  do {
      x=get()
      if (x & flag)
        break;

      read more data
  } while (...)

so on those you're really hoping you hit the flag.
I did turn some get()/put()'s into vmstate a while back; but those open
coded loops are really hard, there's a lot of variation.

> > no way to write a 'load until EOF' into a simple RAM buffer; you need to be
> > given an explicit size to know how much to expect.
> > 
> > You could do it for the RAM, but you'd need to write a protocol parser
> > to follow the stream to watch for the EOF.  It's actuallly harder with multifd;
> > how would you make a temporary buffer with multiple streams like that?
> 
> My understanding is postcopy must need a buffer because postcopy needs page
> request to work even during loading vmstates.  I don't see it required for
> COLO, though..

Right that's true for postcopy; but then the only way to load the stream into
that buffer is to load it all at once because of the vmstate problem above.
(and because in the original postcopy we needed the original fd free
for page requests; you might be able to avoid that with multifd now)

> I'll try to see if I can change COLO to use the generic precopy way of
> dumping vmstate, then I'll know if I missed something, and what I've
> missed..

Dave

> Thanks,
> 
> -- 
> Peter Xu
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/
Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Peter Xu 2 weeks, 4 days ago
On Wed, Jan 21, 2026 at 05:31:32PM +0000, Dr. David Alan Gilbert wrote:
> Right that's true for postcopy; but then the only way to load the stream into
> that buffer is to load it all at once because of the vmstate problem above.
> (and because in the original postcopy we needed the original fd free
> for page requests; you might be able to avoid that with multifd now)

Only until now, I recognized that COLO wants to make sure the checkpoint is
either completely applied or none applied.

So the specialty is COLO does loadvm on top of a running VM, meanwhile COLO
may decide to not loadvm afterwards if checkpoint wasn't correctly
received.

And yes, to cache all device states with current section header definition
in the stream, we'll likely need a size.  We can still parse the stream as
you pointed out previously, but I agree a special SIZE header still makes
sense.

I suppose that answers my question indeed, thanks!

-- 
Peter Xu
Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Dr. David Alan Gilbert 2 weeks, 4 days ago
* Peter Xu (peterx@redhat.com) wrote:
> On Wed, Jan 21, 2026 at 05:31:32PM +0000, Dr. David Alan Gilbert wrote:
> > Right that's true for postcopy; but then the only way to load the stream into
> > that buffer is to load it all at once because of the vmstate problem above.
> > (and because in the original postcopy we needed the original fd free
> > for page requests; you might be able to avoid that with multifd now)
> 
> Only until now, I recognized that COLO wants to make sure the checkpoint is
> either completely applied or none applied.
> 
> So the specialty is COLO does loadvm on top of a running VM, meanwhile COLO
> may decide to not loadvm afterwards if checkpoint wasn't correctly
> received.

Oh yes; because if your secondary is running happily, your primary could
fail while it was sending you a new snapshot - and then the secondary
has to be able to carry on.

> And yes, to cache all device states with current section header definition
> in the stream, we'll likely need a size.  We can still parse the stream as
> you pointed out previously, but I agree a special SIZE header still makes
> sense.

Well, *can't* parse the stream as I said; because of the get()/put() stuff
without rewriting that.

Dave

> I suppose that answers my question indeed, thanks!
> 
> -- 
> Peter Xu
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/
Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Peter Xu 2 weeks, 4 days ago
On Wed, Jan 21, 2026 at 09:31:29PM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Wed, Jan 21, 2026 at 05:31:32PM +0000, Dr. David Alan Gilbert wrote:
> > > Right that's true for postcopy; but then the only way to load the stream into
> > > that buffer is to load it all at once because of the vmstate problem above.
> > > (and because in the original postcopy we needed the original fd free
> > > for page requests; you might be able to avoid that with multifd now)
> > 
> > Only until now, I recognized that COLO wants to make sure the checkpoint is
> > either completely applied or none applied.
> > 
> > So the specialty is COLO does loadvm on top of a running VM, meanwhile COLO
> > may decide to not loadvm afterwards if checkpoint wasn't correctly
> > received.
> 
> Oh yes; because if your secondary is running happily, your primary could
> fail while it was sending you a new snapshot - and then the secondary
> has to be able to carry on.
> 
> > And yes, to cache all device states with current section header definition
> > in the stream, we'll likely need a size.  We can still parse the stream as
> > you pointed out previously, but I agree a special SIZE header still makes
> > sense.
> 
> Well, *can't* parse the stream as I said; because of the get()/put() stuff
> without rewriting that.

Ah, yes..

-- 
Peter Xu
Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Lukas Straub 3 weeks, 1 day ago
On Sat, 17 Jan 2026 20:49:13 +0100
Lukas Straub <lukasstraub2@web.de> wrote:

> On Thu, 15 Jan 2026 18:38:51 -0500
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Thu, Jan 15, 2026 at 10:59:47PM +0000, Dr. David Alan Gilbert wrote:  
> > > * Peter Xu (peterx@redhat.com) wrote:    
> > > > On Thu, Jan 15, 2026 at 10:49:29PM +0100, Lukas Straub wrote:    
> > > > > Nack.
> > > > > 
> > > > > This code has users, as explained in my other email:
> > > > > https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464    
> > > > 
> > > > Please then rework that series and consider include the following (I
> > > > believe I pointed out a long time ago somewhere..):
> > > >     
> > >     
> > > > - Some form of justification of why multifd needs to be enabled for COLO.
> > > >   For example, in your cluster deployment, using multifd can improve XXX
> > > >   by YYY.  Please describe the use case and improvements.    
> > > 
> > > That one is pretty easy; since COLO is regularly taking snapshots, the faster
> > > the snapshoting the less overhead there is.    
> > 
> > Thanks for chiming in, Dave.  I can explain why I want to request for some
> > numbers.
> > 
> > Firstly, numbers normally proves it's used in a real system.  It's at least
> > being used and seriously tested.
> > 
> > Secondly, per my very limited understanding to COLO... the two VMs in most
> > cases should be in-sync state already when both sides generate the same
> > network packets.
> > 
> > Another sync (where multifd can start to take effect) is only needed when
> > there're packets misalignments, but IIUC it should be rare.  I don't know
> > how rare it is, it would be good if Lukas could introduce some of those
> > numbers in his deployment to help us understand COLO better if we'll need
> > to keep it.  
> 
> It really depends on the workload and if you want to tune for
> throughput or latency.
> 
> You need to do a checkpoint eventually and the more time passes between
> checkpoints the more dirty memory you have to transfer during the
> checkpoint.
> 
> Also keep in mind that the guest is stopped during checkpoints. Because
> even if we continue running the guest, we can not release the mismatched
> packets since that would expose a state of the guest to the outside
> world that is not yet replicated to the secondary.
> 
> So the migration performance is actually the most important part in
> COLO to keep the checkpoints as short as possible.
> 
> I have quite a few more performance and cleanup patches on my hands,
> for example to transfer dirty memory between checkpoints.
> 
> > 
> > IIUC, the critical path of COLO shouldn't be migration on its own?  It
> > should be when heartbeat gets lost; that normally should happen when two
> > VMs are in sync.  In this path, I don't see how multifd helps..  because
> > there's no migration happening, only the src recording what has changed.
> > Hence I think some number with description of the measurements may help us
> > understand how important multifd is to COLO.
> > 
> > Supporting multifd will cause new COLO functions to inject into core
> > migration code paths (even if not much..). I want to make sure such (new)
> > complexity is justified. I also want to avoid introducing a feature only
> > because "we have XXX, then let's support XXX in COLO too, maybe some day
> > it'll be useful".  
> 
> What COLO needs from migration at the low level:
> 
> Primary/Outgoing side:
> 
> Not much actually, we just need a way to incrementally send the
> dirtied memory and the full device state.
> Also, we ensure that migration never actually finishes since we will
> never do a switchover. For example we never set
> RAMState::last_stage with COLO.
> 
> Secondary/Incoming side:
> 
> colo cache:
> Since the secondary always needs to be ready to take over (even during
> checkpointing), we can not write the received ram pages directly to
> the guest ram to prevent having half of the old and half of the new
> contents.
> So we redirect the received ram pages to the colo cache. This is
> basically a mirror of the primary side ram.
> It also simplifies the primary side since from it's point of view it's
> just a normal migration target. So primary side doesn't have to care
> about dirtied pages on the secondary for example.
> 
> Dirty Bitmap:
> With COLO we also need a dirty bitmap on the incoming side to track
> 1. pages dirtied by the secondary guest
> 2. pages dirtied by the primary guest (incoming ram pages)
> In the last step during the checkpointing, this bitmap is then used
> to overwrite the guest ram with the colo cache so the secondary guest
> is in sync with the primary guest.
> 
> All this individually is very little code as you can see from my
> multifd patch. Just something to keep in mind I guess.

PS:
Also when the primary or secondary dies, from qemu's point of view the
migration socket(s) starts blocking. So the migration code needs to be
able to recover from such a hanging/blocking socket. This works fine
right now with yank.

> 
> 
> At the high level we have the COLO framework outgoing and incoming
> threads which just tell the migration code to:
> Send all ram pages (qemu_savevm_live_state()) on the outgoing side
> paired with a qemu_loadvm_state_main on the incoming side.
> Send the device state (qemu_save_device_state()) paired with writing
> that stream to a buffer on the incoming side.
> And finally flusing the colo cache and loading the device state on the
> incoming side.
> 
> And of course we coordinate with the colo block replication and
> colo-compare.
> 
> Best regards,
> Lukas Straub
> 
> > 
> > After these days, I found removing code is sometimes harder than writting
> > new..
> > 
> > Thanks,
> >   
> > > 
> > > Lukas: Given COLO has a bunch of different features (i.e. the block
> > > replication, the clever network comparison etc) do you know which ones
> > > are used in the setups you are aware of?
> > > 
> > > I'd guess the tricky part of a test would be the network side; I'm
> > > not too sure how you'd set that in a test.    
> >   
> 

Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Zhang Chen 3 weeks, 3 days ago
On Fri, Jan 16, 2026 at 7:39 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, Jan 15, 2026 at 10:59:47PM +0000, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Thu, Jan 15, 2026 at 10:49:29PM +0100, Lukas Straub wrote:
> > > > Nack.
> > > >
> > > > This code has users, as explained in my other email:
> > > > https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464
> > >
> > > Please then rework that series and consider include the following (I
> > > believe I pointed out a long time ago somewhere..):
> > >
> >
> > > - Some form of justification of why multifd needs to be enabled for COLO.
> > >   For example, in your cluster deployment, using multifd can improve XXX
> > >   by YYY.  Please describe the use case and improvements.
> >
> > That one is pretty easy; since COLO is regularly taking snapshots, the faster
> > the snapshoting the less overhead there is.
>
> Thanks for chiming in, Dave.  I can explain why I want to request for some
> numbers.
>
> Firstly, numbers normally proves it's used in a real system.  It's at least
> being used and seriously tested.
>

Agree.

> Secondly, per my very limited understanding to COLO... the two VMs in most
> cases should be in-sync state already when both sides generate the same
> network packets.

In most cases, you are right. But all the FT/HA system design for the
rare cases.

>
> Another sync (where multifd can start to take effect) is only needed when
> there're packets misalignments, but IIUC it should be rare.  I don't know
> how rare it is, it would be good if Lukas could introduce some of those
> numbers in his deployment to help us understand COLO better if we'll need
> to keep it.

I haven't tested multifd part yet. But let me introduce the background.
COLO system including 2 ways for live migration, network compare triggered
and periodic execution(maybe 10s). It means COLO VM performance
depends on live migration VM stop time, maybe the multifd can help for
this, Lukas?


>
> IIUC, the critical path of COLO shouldn't be migration on its own?  It
> should be when heartbeat gets lost; that normally should happen when two
> VMs are in sync.  In this path, I don't see how multifd helps..  because
> there's no migration happening, only the src recording what has changed.
> Hence I think some number with description of the measurements may help us
> understand how important multifd is to COLO.
>

Yes, after failover, the secondary VM running without migration.

> Supporting multifd will cause new COLO functions to inject into core
> migration code paths (even if not much..). I want to make sure such (new)
> complexity is justified. I also want to avoid introducing a feature only
> because "we have XXX, then let's support XXX in COLO too, maybe some day
> it'll be useful".
>
> After these days, I found removing code is sometimes harder than writting
> new..

Agree, as Lukas said, some customers not follow upstream code(or 2
versions ago) for COLO.
Because FT/HA users focus on system availibility, upgrade is a high
risk for them.
I think the main reason of COLO broken for QEMU release 10.0/10.1 is
lack of test case(Lukas WIP on this).

Thanks
Chen


>
> Thanks,
>
> >
> > Lukas: Given COLO has a bunch of different features (i.e. the block
> > replication, the clever network comparison etc) do you know which ones
> > are used in the setups you are aware of?
> >
> > I'd guess the tricky part of a test would be the network side; I'm
> > not too sure how you'd set that in a test.
>
> --
> Peter Xu
>
Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Dr. David Alan Gilbert 3 weeks, 3 days ago
* Peter Xu (peterx@redhat.com) wrote:
> On Thu, Jan 15, 2026 at 10:59:47PM +0000, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Thu, Jan 15, 2026 at 10:49:29PM +0100, Lukas Straub wrote:
> > > > Nack.
> > > > 
> > > > This code has users, as explained in my other email:
> > > > https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464
> > > 
> > > Please then rework that series and consider include the following (I
> > > believe I pointed out a long time ago somewhere..):
> > > 
> > 
> > > - Some form of justification of why multifd needs to be enabled for COLO.
> > >   For example, in your cluster deployment, using multifd can improve XXX
> > >   by YYY.  Please describe the use case and improvements.
> > 
> > That one is pretty easy; since COLO is regularly taking snapshots, the faster
> > the snapshoting the less overhead there is.
> 
> Thanks for chiming in, Dave.  I can explain why I want to request for some
> numbers.
> 
> Firstly, numbers normally proves it's used in a real system.  It's at least
> being used and seriously tested.

Fair.

> Secondly, per my very limited understanding to COLO... the two VMs in most
> cases should be in-sync state already when both sides generate the same
> network packets.

(It's about a decade since I did any serious Colo, so I'll try and remember)

> Another sync (where multifd can start to take effect) is only needed when
> there're packets misalignments, but IIUC it should be rare.  I don't know
> how rare it is, it would be good if Lukas could introduce some of those
> numbers in his deployment to help us understand COLO better if we'll need
> to keep it.

In reality misalignments are actually pretty common - although it's
very workload dependent.  Any randomness in the order of execution in a multi-threaded
guest for example, or when a timer arrives etc can change the packet generation.
The migration time then becomes a latency issue before you can
transmit the mismatched packet once it's detected.

I think You still need to send a regular stream of snapshots even without
having *yet* received a packet difference.  Now, I'm trying to remember the
reasoning; for a start if you leave the difference too long the migration
snapshot gets larger (which I think needs to be stored on RAM on the dest?)
and also you increase the chances of them getting a packet difference from
randomness increases.
I seem to remember there were clever schemes to get the optimal snapshot
scheme.

> IIUC, the critical path of COLO shouldn't be migration on its own?  It
> should be when heartbeat gets lost; that normally should happen when two
> VMs are in sync.  In this path, I don't see how multifd helps..  because
> there's no migration happening, only the src recording what has changed.
> Hence I think some number with description of the measurements may help us
> understand how important multifd is to COLO.

There's more than one critical path:
  a) Time to recovery when one host fails
  b) Overhead when both hosts are happy.

> Supporting multifd will cause new COLO functions to inject into core
> migration code paths (even if not much..). I want to make sure such (new)
> complexity is justified. I also want to avoid introducing a feature only
> because "we have XXX, then let's support XXX in COLO too, maybe some day
> it'll be useful".

I can't remember where the COLO code got into the main migration paths;
is that the reception side storing the received differences somewhere else?

> After these days, I found removing code is sometimes harder than writting
> new..

Haha yes.

Dave

> Thanks,
> 
> > 
> > Lukas: Given COLO has a bunch of different features (i.e. the block
> > replication, the clever network comparison etc) do you know which ones
> > are used in the setups you are aware of?
> > 
> > I'd guess the tricky part of a test would be the network side; I'm
> > not too sure how you'd set that in a test.
> 
> -- 
> Peter Xu
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/
Re: [PATCH 1/3] migration/colo: Deprecate COLO migration framework
Posted by Zhang Chen 3 weeks, 3 days ago
On Fri, Jan 16, 2026 at 8:37 AM Dr. David Alan Gilbert <dave@treblig.org> wrote:
>
> * Peter Xu (peterx@redhat.com) wrote:
> > On Thu, Jan 15, 2026 at 10:59:47PM +0000, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (peterx@redhat.com) wrote:
> > > > On Thu, Jan 15, 2026 at 10:49:29PM +0100, Lukas Straub wrote:
> > > > > Nack.
> > > > >
> > > > > This code has users, as explained in my other email:
> > > > > https://lore.kernel.org/qemu-devel/20260115224516.7f0309ba@penguin/T/#mc99839451d6841366619c4ec0d5af5264e2f6464
> > > >
> > > > Please then rework that series and consider include the following (I
> > > > believe I pointed out a long time ago somewhere..):
> > > >
> > >
> > > > - Some form of justification of why multifd needs to be enabled for COLO.
> > > >   For example, in your cluster deployment, using multifd can improve XXX
> > > >   by YYY.  Please describe the use case and improvements.
> > >
> > > That one is pretty easy; since COLO is regularly taking snapshots, the faster
> > > the snapshoting the less overhead there is.
> >
> > Thanks for chiming in, Dave.  I can explain why I want to request for some
> > numbers.
> >
> > Firstly, numbers normally proves it's used in a real system.  It's at least
> > being used and seriously tested.
>
> Fair.
>
> > Secondly, per my very limited understanding to COLO... the two VMs in most
> > cases should be in-sync state already when both sides generate the same
> > network packets.
>
> (It's about a decade since I did any serious Colo, so I'll try and remember)

Haha, that was a pleasant time~
I already explained the background in the previous email.

>
> > Another sync (where multifd can start to take effect) is only needed when
> > there're packets misalignments, but IIUC it should be rare.  I don't know
> > how rare it is, it would be good if Lukas could introduce some of those
> > numbers in his deployment to help us understand COLO better if we'll need
> > to keep it.
>
> In reality misalignments are actually pretty common - although it's
> very workload dependent.  Any randomness in the order of execution in a multi-threaded
> guest for example, or when a timer arrives etc can change the packet generation.
> The migration time then becomes a latency issue before you can
> transmit the mismatched packet once it's detected.
>
> I think You still need to send a regular stream of snapshots even without
> having *yet* received a packet difference.  Now, I'm trying to remember the
> reasoning; for a start if you leave the difference too long the migration
> snapshot gets larger (which I think needs to be stored on RAM on the dest?)
> and also you increase the chances of them getting a packet difference from
> randomness increases.
> I seem to remember there were clever schemes to get the optimal snapshot
> scheme.

Basically correct. As I explaned in the previous email.
We cannot expect to lose migration for an extended period of time.
Even if the application's results are consistent, it cannot guarantee that
two independently running guest kernels will behave completely identically.

>
> > IIUC, the critical path of COLO shouldn't be migration on its own?  It
> > should be when heartbeat gets lost; that normally should happen when two
> > VMs are in sync.  In this path, I don't see how multifd helps..  because
> > there's no migration happening, only the src recording what has changed.
> > Hence I think some number with description of the measurements may help us
> > understand how important multifd is to COLO.
>
> There's more than one critical path:
>   a) Time to recovery when one host fails
>   b) Overhead when both hosts are happy.
>
> > Supporting multifd will cause new COLO functions to inject into core
> > migration code paths (even if not much..). I want to make sure such (new)
> > complexity is justified. I also want to avoid introducing a feature only
> > because "we have XXX, then let's support XXX in COLO too, maybe some day
> > it'll be useful".
>
> I can't remember where the COLO code got into the main migration paths;
> is that the reception side storing the received differences somewhere else?
>

Yes. COLO secondary have a buffer to store the primary VMstate.
And load it when triggered the checkpoint.

Thanks
Chen

> > After these days, I found removing code is sometimes harder than writting
> > new..
>
> Haha yes.
>
> Dave
>
> > Thanks,
> >
> > >
> > > Lukas: Given COLO has a bunch of different features (i.e. the block
> > > replication, the clever network comparison etc) do you know which ones
> > > are used in the setups you are aware of?
> > >
> > > I'd guess the tricky part of a test would be the network side; I'm
> > > not too sure how you'd set that in a test.
> >
> > --
> > Peter Xu
> >
> --
>  -----Open up your eyes, open up your mind, open up your code -------
> / Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \
> \        dave @ treblig.org |                               | In Hex /
>  \ _________________________|_____ http://www.treblig.org   |_______/