[PATCH v4] migration: Allow user to specify available switchover bandwidth

Peter Xu posted 1 patch 1 year, 1 month ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20231010221922.40638-1-peterx@redhat.com
Maintainers: Juan Quintela <quintela@redhat.com>, Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>, Leonardo Bras <leobras@redhat.com>, Eric Blake <eblake@redhat.com>, Markus Armbruster <armbru@redhat.com>
qapi/migration.json            | 34 +++++++++++++++++++++++++++++++++-
migration/migration.h          |  2 +-
migration/options.h            |  1 +
migration/migration-hmp-cmds.c | 14 ++++++++++++++
migration/migration.c          | 24 +++++++++++++++++++++---
migration/options.c            | 28 ++++++++++++++++++++++++++++
migration/trace-events         |  2 +-
7 files changed, 99 insertions(+), 6 deletions(-)
[PATCH v4] migration: Allow user to specify available switchover bandwidth
Posted by Peter Xu 1 year, 1 month ago
Migration bandwidth is a very important value to live migration.  It's
because it's one of the major factors that we'll make decision on when to
switchover to destination in a precopy process.

This value is currently estimated by QEMU during the whole live migration
process by monitoring how fast we were sending the data.  This can be the
most accurate bandwidth if in the ideal world, where we're always feeding
unlimited data to the migration channel, and then it'll be limited to the
bandwidth that is available.

However in reality it may be very different, e.g., over a 10Gbps network we
can see query-migrate showing migration bandwidth of only a few tens of
MB/s just because there are plenty of other things the migration thread
might be doing.  For example, the migration thread can be busy scanning
zero pages, or it can be fetching dirty bitmap from other external dirty
sources (like vhost or KVM).  It means we may not be pushing data as much
as possible to migration channel, so the bandwidth estimated from "how many
data we sent in the channel" can be dramatically inaccurate sometimes.

With that, the decision to switchover will be affected, by assuming that we
may not be able to switchover at all with such a low bandwidth, but in
reality we can.

The migration may not even converge at all with the downtime specified,
with that wrong estimation of bandwidth, keeping iterations forever with a
low estimation of bandwidth.

The issue is QEMU itself may not be able to avoid those uncertainties on
measuing the real "available migration bandwidth".  At least not something
I can think of so far.

One way to fix this is when the user is fully aware of the available
bandwidth, then we can allow the user to help providing an accurate value.

For example, if the user has a dedicated channel of 10Gbps for migration
for this specific VM, the user can specify this bandwidth so QEMU can
always do the calculation based on this fact, trusting the user as long as
specified.  It may not be the exact bandwidth when switching over (in which
case qemu will push migration data as fast as possible), but much better
than QEMU trying to wildly guess, especially when very wrong.

A new parameter "avail-switchover-bandwidth" is introduced just for this.
So when the user specified this parameter, instead of trusting the
estimated value from QEMU itself (based on the QEMUFile send speed), it
trusts the user more by using this value to decide when to switchover,
assuming that we'll have such bandwidth available then.

Note that specifying this value will not throttle the bandwidth for
switchover yet, so QEMU will always use the full bandwidth possible for
sending switchover data, assuming that should always be the most important
way to use the network at that time.

This can resolve issues like "unconvergence migration" which is caused by
hilarious low "migration bandwidth" detected for whatever reason.

Reported-by: Zhiyi Guo <zhguo@redhat.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
v4:
- Rebase to master, with duplicated documentations
---
 qapi/migration.json            | 34 +++++++++++++++++++++++++++++++++-
 migration/migration.h          |  2 +-
 migration/options.h            |  1 +
 migration/migration-hmp-cmds.c | 14 ++++++++++++++
 migration/migration.c          | 24 +++++++++++++++++++++---
 migration/options.c            | 28 ++++++++++++++++++++++++++++
 migration/trace-events         |  2 +-
 7 files changed, 99 insertions(+), 6 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 8843e74b59..0c897a99b1 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -759,6 +759,16 @@
 # @max-bandwidth: to set maximum speed for migration.  maximum speed
 #     in bytes per second.  (Since 2.8)
 #
+# @avail-switchover-bandwidth: to set the available bandwidth that
+#     migration can use during switchover phase.  NOTE!  This does not
+#     limit the bandwidth during switchover, but only for calculations when
+#     making decisions to switchover.  By default, this value is zero,
+#     which means QEMU will estimate the bandwidth automatically.  This can
+#     be set when the estimated value is not accurate, while the user is
+#     able to guarantee such bandwidth is available when switching over.
+#     When specified correctly, this can make the switchover decision much
+#     more accurate.  (Since 8.2)
+#
 # @downtime-limit: set maximum tolerated downtime for migration.
 #     maximum downtime in milliseconds (Since 2.8)
 #
@@ -840,7 +850,7 @@
            'cpu-throttle-initial', 'cpu-throttle-increment',
            'cpu-throttle-tailslow',
            'tls-creds', 'tls-hostname', 'tls-authz', 'max-bandwidth',
-           'downtime-limit',
+           'avail-switchover-bandwidth', 'downtime-limit',
            { 'name': 'x-checkpoint-delay', 'features': [ 'unstable' ] },
            'block-incremental',
            'multifd-channels',
@@ -925,6 +935,16 @@
 # @max-bandwidth: to set maximum speed for migration.  maximum speed
 #     in bytes per second.  (Since 2.8)
 #
+# @avail-switchover-bandwidth: to set the available bandwidth that
+#     migration can use during switchover phase.  NOTE!  This does not
+#     limit the bandwidth during switchover, but only for calculations when
+#     making decisions to switchover.  By default, this value is zero,
+#     which means QEMU will estimate the bandwidth automatically.  This can
+#     be set when the estimated value is not accurate, while the user is
+#     able to guarantee such bandwidth is available when switching over.
+#     When specified correctly, this can make the switchover decision much
+#     more accurate.  (Since 8.2)
+#
 # @downtime-limit: set maximum tolerated downtime for migration.
 #     maximum downtime in milliseconds (Since 2.8)
 #
@@ -1018,6 +1038,7 @@
             '*tls-hostname': 'StrOrNull',
             '*tls-authz': 'StrOrNull',
             '*max-bandwidth': 'size',
+            '*avail-switchover-bandwidth': 'size',
             '*downtime-limit': 'uint64',
             '*x-checkpoint-delay': { 'type': 'uint32',
                                      'features': [ 'unstable' ] },
@@ -1128,6 +1149,16 @@
 # @max-bandwidth: to set maximum speed for migration.  maximum speed
 #     in bytes per second.  (Since 2.8)
 #
+# @avail-switchover-bandwidth: to set the available bandwidth that
+#     migration can use during switchover phase.  NOTE!  This does not
+#     limit the bandwidth during switchover, but only for calculations when
+#     making decisions to switchover.  By default, this value is zero,
+#     which means QEMU will estimate the bandwidth automatically.  This can
+#     be set when the estimated value is not accurate, while the user is
+#     able to guarantee such bandwidth is available when switching over.
+#     When specified correctly, this can make the switchover decision much
+#     more accurate.  (Since 8.2)
+#
 # @downtime-limit: set maximum tolerated downtime for migration.
 #     maximum downtime in milliseconds (Since 2.8)
 #
@@ -1218,6 +1249,7 @@
             '*tls-hostname': 'str',
             '*tls-authz': 'str',
             '*max-bandwidth': 'size',
+            '*avail-switchover-bandwidth': 'size',
             '*downtime-limit': 'uint64',
             '*x-checkpoint-delay': { 'type': 'uint32',
                                      'features': [ 'unstable' ] },
diff --git a/migration/migration.h b/migration/migration.h
index 972597f4de..475f442fe6 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -294,7 +294,7 @@ struct MigrationState {
     /*
      * The final stage happens when the remaining data is smaller than
      * this threshold; it's calculated from the requested downtime and
-     * measured bandwidth
+     * measured bandwidth, or avail-switchover-bandwidth if specified.
      */
     int64_t threshold_size;
 
diff --git a/migration/options.h b/migration/options.h
index 045e2a41a2..93ee938ab8 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -80,6 +80,7 @@ int migrate_decompress_threads(void);
 uint64_t migrate_downtime_limit(void);
 uint8_t migrate_max_cpu_throttle(void);
 uint64_t migrate_max_bandwidth(void);
+uint64_t migrate_avail_switchover_bandwidth(void);
 uint64_t migrate_max_postcopy_bandwidth(void);
 int migrate_multifd_channels(void);
 MultiFDCompression migrate_multifd_compression(void);
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index c115ef2d23..2291b1b9b1 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -321,6 +321,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
         monitor_printf(mon, "%s: %" PRIu64 " bytes/second\n",
             MigrationParameter_str(MIGRATION_PARAMETER_MAX_BANDWIDTH),
             params->max_bandwidth);
+        assert(params->has_avail_switchover_bandwidth);
+        monitor_printf(mon, "%s: %" PRIu64 " bytes/second\n",
+            MigrationParameter_str(MIGRATION_PARAMETER_AVAIL_SWITCHOVER_BANDWIDTH),
+            params->avail_switchover_bandwidth);
         assert(params->has_downtime_limit);
         monitor_printf(mon, "%s: %" PRIu64 " ms\n",
             MigrationParameter_str(MIGRATION_PARAMETER_DOWNTIME_LIMIT),
@@ -574,6 +578,16 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
         }
         p->max_bandwidth = valuebw;
         break;
+    case MIGRATION_PARAMETER_AVAIL_SWITCHOVER_BANDWIDTH:
+        p->has_avail_switchover_bandwidth = true;
+        ret = qemu_strtosz_MiB(valuestr, NULL, &valuebw);
+        if (ret < 0 || valuebw > INT64_MAX
+            || (size_t)valuebw != valuebw) {
+            error_setg(&err, "Invalid size %s", valuestr);
+            break;
+        }
+        p->avail_switchover_bandwidth = valuebw;
+        break;
     case MIGRATION_PARAMETER_DOWNTIME_LIMIT:
         p->has_downtime_limit = true;
         visit_type_size(v, param, &p->downtime_limit, &err);
diff --git a/migration/migration.c b/migration/migration.c
index 585d3c8f55..2ec4a761cc 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2667,17 +2667,33 @@ static void migration_update_counters(MigrationState *s,
 {
     uint64_t transferred, transferred_pages, time_spent;
     uint64_t current_bytes; /* bytes transferred since the beginning */
+    uint64_t switchover_bw;
+    /* Expected bandwidth when switching over to destination QEMU */
+    double expected_bw_per_ms;
     double bandwidth;
 
     if (current_time < s->iteration_start_time + BUFFER_DELAY) {
         return;
     }
 
+    switchover_bw = migrate_avail_switchover_bandwidth();
     current_bytes = migration_transferred_bytes(s->to_dst_file);
     transferred = current_bytes - s->iteration_initial_bytes;
     time_spent = current_time - s->iteration_start_time;
     bandwidth = (double)transferred / time_spent;
-    s->threshold_size = bandwidth * migrate_downtime_limit();
+
+    if (switchover_bw) {
+        /*
+         * If the user specified a switchover bandwidth, let's trust the
+         * user so that can be more accurate than what we estimated.
+         */
+        expected_bw_per_ms = switchover_bw / 1000;
+    } else {
+        /* If the user doesn't specify bandwidth, we use the estimated */
+        expected_bw_per_ms = bandwidth;
+    }
+
+    s->threshold_size = expected_bw_per_ms * migrate_downtime_limit();
 
     s->mbps = (((double) transferred * 8.0) /
                ((double) time_spent / 1000.0)) / 1000.0 / 1000.0;
@@ -2694,7 +2710,7 @@ static void migration_update_counters(MigrationState *s,
     if (stat64_get(&mig_stats.dirty_pages_rate) &&
         transferred > 10000) {
         s->expected_downtime =
-            stat64_get(&mig_stats.dirty_bytes_last_sync) / bandwidth;
+            stat64_get(&mig_stats.dirty_bytes_last_sync) / expected_bw_per_ms;
     }
 
     migration_rate_reset(s->to_dst_file);
@@ -2702,7 +2718,9 @@ static void migration_update_counters(MigrationState *s,
     update_iteration_initial_status(s);
 
     trace_migrate_transferred(transferred, time_spent,
-                              bandwidth, s->threshold_size);
+                              /* Both in unit bytes/ms */
+                              bandwidth, switchover_bw / 1000,
+                              s->threshold_size);
 }
 
 static bool migration_can_switchover(MigrationState *s)
diff --git a/migration/options.c b/migration/options.c
index 1d1e1321b0..da87652791 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -125,6 +125,8 @@ Property migration_properties[] = {
                       parameters.cpu_throttle_tailslow, false),
     DEFINE_PROP_SIZE("x-max-bandwidth", MigrationState,
                       parameters.max_bandwidth, MAX_THROTTLE),
+    DEFINE_PROP_SIZE("avail-switchover-bandwidth", MigrationState,
+                      parameters.avail_switchover_bandwidth, 0),
     DEFINE_PROP_UINT64("x-downtime-limit", MigrationState,
                       parameters.downtime_limit,
                       DEFAULT_MIGRATE_SET_DOWNTIME),
@@ -780,6 +782,13 @@ uint64_t migrate_max_bandwidth(void)
     return s->parameters.max_bandwidth;
 }
 
+uint64_t migrate_avail_switchover_bandwidth(void)
+{
+    MigrationState *s = migrate_get_current();
+
+    return s->parameters.avail_switchover_bandwidth;
+}
+
 uint64_t migrate_max_postcopy_bandwidth(void)
 {
     MigrationState *s = migrate_get_current();
@@ -917,6 +926,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
                                  s->parameters.tls_authz : "");
     params->has_max_bandwidth = true;
     params->max_bandwidth = s->parameters.max_bandwidth;
+    params->has_avail_switchover_bandwidth = true;
+    params->avail_switchover_bandwidth = s->parameters.avail_switchover_bandwidth;
     params->has_downtime_limit = true;
     params->downtime_limit = s->parameters.downtime_limit;
     params->has_x_checkpoint_delay = true;
@@ -1056,6 +1067,15 @@ bool migrate_params_check(MigrationParameters *params, Error **errp)
         return false;
     }
 
+    if (params->has_avail_switchover_bandwidth &&
+        (params->avail_switchover_bandwidth > SIZE_MAX)) {
+        error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
+                   "avail_switchover_bandwidth",
+                   "an integer in the range of 0 to "stringify(SIZE_MAX)
+                   " bytes/second");
+        return false;
+    }
+
     if (params->has_downtime_limit &&
         (params->downtime_limit > MAX_MIGRATE_DOWNTIME)) {
         error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
@@ -1225,6 +1245,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
         dest->max_bandwidth = params->max_bandwidth;
     }
 
+    if (params->has_avail_switchover_bandwidth) {
+        dest->avail_switchover_bandwidth = params->avail_switchover_bandwidth;
+    }
+
     if (params->has_downtime_limit) {
         dest->downtime_limit = params->downtime_limit;
     }
@@ -1341,6 +1365,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
         }
     }
 
+    if (params->has_avail_switchover_bandwidth) {
+        s->parameters.avail_switchover_bandwidth = params->avail_switchover_bandwidth;
+    }
+
     if (params->has_downtime_limit) {
         s->parameters.downtime_limit = params->downtime_limit;
     }
diff --git a/migration/trace-events b/migration/trace-events
index 002abe3a4e..2d266bf087 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -186,7 +186,7 @@ source_return_path_thread_shut(uint32_t val) "0x%x"
 source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32
 source_return_path_thread_switchover_acked(void) ""
 migration_thread_low_pending(uint64_t pending) "%" PRIu64
-migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " max_size %" PRId64
+migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t bandwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " switchover_bw %" PRIu64 " max_size %" PRId64
 process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
 process_incoming_migration_co_postcopy_end_main(void) ""
 postcopy_preempt_enabled(bool value) "%d"
-- 
2.41.0
Re: [PATCH v4] migration: Allow user to specify available switchover bandwidth
Posted by Markus Armbruster 1 year, 1 month ago
Peter Xu <peterx@redhat.com> writes:

> Migration bandwidth is a very important value to live migration.  It's
> because it's one of the major factors that we'll make decision on when to
> switchover to destination in a precopy process.
>
> This value is currently estimated by QEMU during the whole live migration
> process by monitoring how fast we were sending the data.  This can be the
> most accurate bandwidth if in the ideal world, where we're always feeding
> unlimited data to the migration channel, and then it'll be limited to the
> bandwidth that is available.
>
> However in reality it may be very different, e.g., over a 10Gbps network we
> can see query-migrate showing migration bandwidth of only a few tens of
> MB/s just because there are plenty of other things the migration thread
> might be doing.  For example, the migration thread can be busy scanning
> zero pages, or it can be fetching dirty bitmap from other external dirty
> sources (like vhost or KVM).  It means we may not be pushing data as much
> as possible to migration channel, so the bandwidth estimated from "how many
> data we sent in the channel" can be dramatically inaccurate sometimes.

how much data we've sent to the channel

>
> With that, the decision to switchover will be affected, by assuming that we
> may not be able to switchover at all with such a low bandwidth, but in
> reality we can.
>
> The migration may not even converge at all with the downtime specified,
> with that wrong estimation of bandwidth, keeping iterations forever with a

iterating forever

> low estimation of bandwidth.
>
> The issue is QEMU itself may not be able to avoid those uncertainties on
> measuing the real "available migration bandwidth".  At least not something
> I can think of so far.
>
> One way to fix this is when the user is fully aware of the available
> bandwidth, then we can allow the user to help providing an accurate value.
>
> For example, if the user has a dedicated channel of 10Gbps for migration
> for this specific VM, the user can specify this bandwidth so QEMU can
> always do the calculation based on this fact, trusting the user as long as
> specified.  It may not be the exact bandwidth when switching over (in which
> case qemu will push migration data as fast as possible), but much better
> than QEMU trying to wildly guess, especially when very wrong.
>
> A new parameter "avail-switchover-bandwidth" is introduced just for this.
> So when the user specified this parameter, instead of trusting the
> estimated value from QEMU itself (based on the QEMUFile send speed), it
> trusts the user more by using this value to decide when to switchover,
> assuming that we'll have such bandwidth available then.
>
> Note that specifying this value will not throttle the bandwidth for
> switchover yet, so QEMU will always use the full bandwidth possible for
> sending switchover data, assuming that should always be the most important
> way to use the network at that time.
>
> This can resolve issues like "unconvergence migration" which is caused by
> hilarious low "migration bandwidth" detected for whatever reason.

"unconvergence" isn't a word :)

Suggest "like migration not converging, because the automatically
detected migration bandwidth is hilariously low for whatever reason."

Appreciate the thorough explanation!

>
> Reported-by: Zhiyi Guo <zhguo@redhat.com>
> Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> v4:
> - Rebase to master, with duplicated documentations
> ---
>  qapi/migration.json            | 34 +++++++++++++++++++++++++++++++++-
>  migration/migration.h          |  2 +-
>  migration/options.h            |  1 +
>  migration/migration-hmp-cmds.c | 14 ++++++++++++++
>  migration/migration.c          | 24 +++++++++++++++++++++---
>  migration/options.c            | 28 ++++++++++++++++++++++++++++
>  migration/trace-events         |  2 +-
>  7 files changed, 99 insertions(+), 6 deletions(-)
>
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 8843e74b59..0c897a99b1 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -759,6 +759,16 @@
>  # @max-bandwidth: to set maximum speed for migration.  maximum speed
>  #     in bytes per second.  (Since 2.8)
>  #
> +# @avail-switchover-bandwidth: to set the available bandwidth that
> +#     migration can use during switchover phase.  NOTE!  This does not
> +#     limit the bandwidth during switchover, but only for calculations when
> +#     making decisions to switchover.  By default, this value is zero,
> +#     which means QEMU will estimate the bandwidth automatically.  This can
> +#     be set when the estimated value is not accurate, while the user is
> +#     able to guarantee such bandwidth is available when switching over.
> +#     When specified correctly, this can make the switchover decision much
> +#     more accurate.  (Since 8.2)

We tend to eschew abbreviations in QAPI schema identifiers.
available-switchover-bandwidth is a mouthful, though.  What do you
think?

> +#
>  # @downtime-limit: set maximum tolerated downtime for migration.
>  #     maximum downtime in milliseconds (Since 2.8)
>  #
> @@ -840,7 +850,7 @@
>             'cpu-throttle-initial', 'cpu-throttle-increment',
>             'cpu-throttle-tailslow',
>             'tls-creds', 'tls-hostname', 'tls-authz', 'max-bandwidth',
> -           'downtime-limit',
> +           'avail-switchover-bandwidth', 'downtime-limit',
>             { 'name': 'x-checkpoint-delay', 'features': [ 'unstable' ] },
>             'block-incremental',
>             'multifd-channels',
> @@ -925,6 +935,16 @@
>  # @max-bandwidth: to set maximum speed for migration.  maximum speed
>  #     in bytes per second.  (Since 2.8)
>  #
> +# @avail-switchover-bandwidth: to set the available bandwidth that
> +#     migration can use during switchover phase.  NOTE!  This does not
> +#     limit the bandwidth during switchover, but only for calculations when
> +#     making decisions to switchover.  By default, this value is zero,
> +#     which means QEMU will estimate the bandwidth automatically.  This can
> +#     be set when the estimated value is not accurate, while the user is
> +#     able to guarantee such bandwidth is available when switching over.
> +#     When specified correctly, this can make the switchover decision much
> +#     more accurate.  (Since 8.2)
> +#
>  # @downtime-limit: set maximum tolerated downtime for migration.
>  #     maximum downtime in milliseconds (Since 2.8)
>  #
> @@ -1018,6 +1038,7 @@
>              '*tls-hostname': 'StrOrNull',
>              '*tls-authz': 'StrOrNull',
>              '*max-bandwidth': 'size',
> +            '*avail-switchover-bandwidth': 'size',
>              '*downtime-limit': 'uint64',
>              '*x-checkpoint-delay': { 'type': 'uint32',
>                                       'features': [ 'unstable' ] },
> @@ -1128,6 +1149,16 @@
>  # @max-bandwidth: to set maximum speed for migration.  maximum speed
>  #     in bytes per second.  (Since 2.8)
>  #
> +# @avail-switchover-bandwidth: to set the available bandwidth that
> +#     migration can use during switchover phase.  NOTE!  This does not
> +#     limit the bandwidth during switchover, but only for calculations when
> +#     making decisions to switchover.  By default, this value is zero,
> +#     which means QEMU will estimate the bandwidth automatically.  This can
> +#     be set when the estimated value is not accurate, while the user is
> +#     able to guarantee such bandwidth is available when switching over.
> +#     When specified correctly, this can make the switchover decision much
> +#     more accurate.  (Since 8.2)
> +#
>  # @downtime-limit: set maximum tolerated downtime for migration.
>  #     maximum downtime in milliseconds (Since 2.8)
>  #
> @@ -1218,6 +1249,7 @@
>              '*tls-hostname': 'str',
>              '*tls-authz': 'str',
>              '*max-bandwidth': 'size',
> +            '*avail-switchover-bandwidth': 'size',
>              '*downtime-limit': 'uint64',
>              '*x-checkpoint-delay': { 'type': 'uint32',
>                                       'features': [ 'unstable' ] },

Regardless:
Acked-by: Markus Armbruster <armbru@redhat.com>

[...]
Re: [PATCH v4] migration: Allow user to specify available switchover bandwidth
Posted by Peter Xu 1 year, 1 month ago
On Tue, Oct 17, 2023 at 04:12:40PM +0200, Markus Armbruster wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > Migration bandwidth is a very important value to live migration.  It's
> > because it's one of the major factors that we'll make decision on when to
> > switchover to destination in a precopy process.
> >
> > This value is currently estimated by QEMU during the whole live migration
> > process by monitoring how fast we were sending the data.  This can be the
> > most accurate bandwidth if in the ideal world, where we're always feeding
> > unlimited data to the migration channel, and then it'll be limited to the
> > bandwidth that is available.
> >
> > However in reality it may be very different, e.g., over a 10Gbps network we
> > can see query-migrate showing migration bandwidth of only a few tens of
> > MB/s just because there are plenty of other things the migration thread
> > might be doing.  For example, the migration thread can be busy scanning
> > zero pages, or it can be fetching dirty bitmap from other external dirty
> > sources (like vhost or KVM).  It means we may not be pushing data as much
> > as possible to migration channel, so the bandwidth estimated from "how many
> > data we sent in the channel" can be dramatically inaccurate sometimes.
> 
> how much data we've sent to the channel
> 
> >
> > With that, the decision to switchover will be affected, by assuming that we
> > may not be able to switchover at all with such a low bandwidth, but in
> > reality we can.
> >
> > The migration may not even converge at all with the downtime specified,
> > with that wrong estimation of bandwidth, keeping iterations forever with a
> 
> iterating forever
> 
> > low estimation of bandwidth.
> >
> > The issue is QEMU itself may not be able to avoid those uncertainties on
> > measuing the real "available migration bandwidth".  At least not something
> > I can think of so far.
> >
> > One way to fix this is when the user is fully aware of the available
> > bandwidth, then we can allow the user to help providing an accurate value.
> >
> > For example, if the user has a dedicated channel of 10Gbps for migration
> > for this specific VM, the user can specify this bandwidth so QEMU can
> > always do the calculation based on this fact, trusting the user as long as
> > specified.  It may not be the exact bandwidth when switching over (in which
> > case qemu will push migration data as fast as possible), but much better
> > than QEMU trying to wildly guess, especially when very wrong.
> >
> > A new parameter "avail-switchover-bandwidth" is introduced just for this.
> > So when the user specified this parameter, instead of trusting the
> > estimated value from QEMU itself (based on the QEMUFile send speed), it
> > trusts the user more by using this value to decide when to switchover,
> > assuming that we'll have such bandwidth available then.
> >
> > Note that specifying this value will not throttle the bandwidth for
> > switchover yet, so QEMU will always use the full bandwidth possible for
> > sending switchover data, assuming that should always be the most important
> > way to use the network at that time.
> >
> > This can resolve issues like "unconvergence migration" which is caused by
> > hilarious low "migration bandwidth" detected for whatever reason.
> 
> "unconvergence" isn't a word :)
> 
> Suggest "like migration not converging, because the automatically
> detected migration bandwidth is hilariously low for whatever reason."
> 
> Appreciate the thorough explanation!

Thanks for reviewing!

The patch is already in today's migration pull, so unfortunately no planned
repost for now.  I'll amend the commit message and collect the ACK if I'll
need to redo it.

> 
> >
> > Reported-by: Zhiyi Guo <zhguo@redhat.com>
> > Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> > v4:
> > - Rebase to master, with duplicated documentations
> > ---
> >  qapi/migration.json            | 34 +++++++++++++++++++++++++++++++++-
> >  migration/migration.h          |  2 +-
> >  migration/options.h            |  1 +
> >  migration/migration-hmp-cmds.c | 14 ++++++++++++++
> >  migration/migration.c          | 24 +++++++++++++++++++++---
> >  migration/options.c            | 28 ++++++++++++++++++++++++++++
> >  migration/trace-events         |  2 +-
> >  7 files changed, 99 insertions(+), 6 deletions(-)
> >
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 8843e74b59..0c897a99b1 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -759,6 +759,16 @@
> >  # @max-bandwidth: to set maximum speed for migration.  maximum speed
> >  #     in bytes per second.  (Since 2.8)
> >  #
> > +# @avail-switchover-bandwidth: to set the available bandwidth that
> > +#     migration can use during switchover phase.  NOTE!  This does not
> > +#     limit the bandwidth during switchover, but only for calculations when
> > +#     making decisions to switchover.  By default, this value is zero,
> > +#     which means QEMU will estimate the bandwidth automatically.  This can
> > +#     be set when the estimated value is not accurate, while the user is
> > +#     able to guarantee such bandwidth is available when switching over.
> > +#     When specified correctly, this can make the switchover decision much
> > +#     more accurate.  (Since 8.2)
> 
> We tend to eschew abbreviations in QAPI schema identifiers.
> available-switchover-bandwidth is a mouthful, though.  What do you
> think?

The named changed in the past versions, and IIRC avail-switchover-bandwidth
is something we came up with last.. assuming a trade-off between length and
sane meanings.  I don't have anything better to come up.. :)

Please shoot if you have better suggestions.  We still have three weeks to
8.2 soft freeze.

Thanks,

-- 
Peter Xu
Re: [PATCH v4] migration: Allow user to specify available switchover bandwidth
Posted by Markus Armbruster 1 year, 1 month ago
Peter Xu <peterx@redhat.com> writes:

> On Tue, Oct 17, 2023 at 04:12:40PM +0200, Markus Armbruster wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > Migration bandwidth is a very important value to live migration.  It's
>> > because it's one of the major factors that we'll make decision on when to
>> > switchover to destination in a precopy process.
>> >
>> > This value is currently estimated by QEMU during the whole live migration
>> > process by monitoring how fast we were sending the data.  This can be the
>> > most accurate bandwidth if in the ideal world, where we're always feeding
>> > unlimited data to the migration channel, and then it'll be limited to the
>> > bandwidth that is available.
>> >
>> > However in reality it may be very different, e.g., over a 10Gbps network we
>> > can see query-migrate showing migration bandwidth of only a few tens of
>> > MB/s just because there are plenty of other things the migration thread
>> > might be doing.  For example, the migration thread can be busy scanning
>> > zero pages, or it can be fetching dirty bitmap from other external dirty
>> > sources (like vhost or KVM).  It means we may not be pushing data as much
>> > as possible to migration channel, so the bandwidth estimated from "how many
>> > data we sent in the channel" can be dramatically inaccurate sometimes.
>> 
>> how much data we've sent to the channel
>> 
>> >
>> > With that, the decision to switchover will be affected, by assuming that we
>> > may not be able to switchover at all with such a low bandwidth, but in
>> > reality we can.
>> >
>> > The migration may not even converge at all with the downtime specified,
>> > with that wrong estimation of bandwidth, keeping iterations forever with a
>> 
>> iterating forever
>> 
>> > low estimation of bandwidth.
>> >
>> > The issue is QEMU itself may not be able to avoid those uncertainties on
>> > measuing the real "available migration bandwidth".  At least not something
>> > I can think of so far.
>> >
>> > One way to fix this is when the user is fully aware of the available
>> > bandwidth, then we can allow the user to help providing an accurate value.
>> >
>> > For example, if the user has a dedicated channel of 10Gbps for migration
>> > for this specific VM, the user can specify this bandwidth so QEMU can
>> > always do the calculation based on this fact, trusting the user as long as
>> > specified.  It may not be the exact bandwidth when switching over (in which
>> > case qemu will push migration data as fast as possible), but much better
>> > than QEMU trying to wildly guess, especially when very wrong.
>> >
>> > A new parameter "avail-switchover-bandwidth" is introduced just for this.
>> > So when the user specified this parameter, instead of trusting the
>> > estimated value from QEMU itself (based on the QEMUFile send speed), it
>> > trusts the user more by using this value to decide when to switchover,
>> > assuming that we'll have such bandwidth available then.
>> >
>> > Note that specifying this value will not throttle the bandwidth for
>> > switchover yet, so QEMU will always use the full bandwidth possible for
>> > sending switchover data, assuming that should always be the most important
>> > way to use the network at that time.
>> >
>> > This can resolve issues like "unconvergence migration" which is caused by
>> > hilarious low "migration bandwidth" detected for whatever reason.
>> 
>> "unconvergence" isn't a word :)
>> 
>> Suggest "like migration not converging, because the automatically
>> detected migration bandwidth is hilariously low for whatever reason."
>> 
>> Appreciate the thorough explanation!
>
> Thanks for reviewing!
>
> The patch is already in today's migration pull, so unfortunately no planned
> repost for now.  I'll amend the commit message and collect the ACK if I'll
> need to redo it.

Didn't see the PR, and didn't expect it so soon.

>> > Reported-by: Zhiyi Guo <zhguo@redhat.com>
>> > Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
>> > Signed-off-by: Peter Xu <peterx@redhat.com>
>> > ---
>> > v4:
>> > - Rebase to master, with duplicated documentations
>> > ---
>> >  qapi/migration.json            | 34 +++++++++++++++++++++++++++++++++-
>> >  migration/migration.h          |  2 +-
>> >  migration/options.h            |  1 +
>> >  migration/migration-hmp-cmds.c | 14 ++++++++++++++
>> >  migration/migration.c          | 24 +++++++++++++++++++++---
>> >  migration/options.c            | 28 ++++++++++++++++++++++++++++
>> >  migration/trace-events         |  2 +-
>> >  7 files changed, 99 insertions(+), 6 deletions(-)
>> >
>> > diff --git a/qapi/migration.json b/qapi/migration.json
>> > index 8843e74b59..0c897a99b1 100644
>> > --- a/qapi/migration.json
>> > +++ b/qapi/migration.json
>> > @@ -759,6 +759,16 @@
>> >  # @max-bandwidth: to set maximum speed for migration.  maximum speed
>> >  #     in bytes per second.  (Since 2.8)
>> >  #
>> > +# @avail-switchover-bandwidth: to set the available bandwidth that
>> > +#     migration can use during switchover phase.  NOTE!  This does not
>> > +#     limit the bandwidth during switchover, but only for calculations when
>> > +#     making decisions to switchover.  By default, this value is zero,
>> > +#     which means QEMU will estimate the bandwidth automatically.  This can
>> > +#     be set when the estimated value is not accurate, while the user is
>> > +#     able to guarantee such bandwidth is available when switching over.
>> > +#     When specified correctly, this can make the switchover decision much
>> > +#     more accurate.  (Since 8.2)
>> 
>> We tend to eschew abbreviations in QAPI schema identifiers.
>> available-switchover-bandwidth is a mouthful, though.  What do you
>> think?
>
> The named changed in the past versions, and IIRC avail-switchover-bandwidth
> is something we came up with last.. assuming a trade-off between length and
> sane meanings.  I don't have anything better to come up.. :)
>
> Please shoot if you have better suggestions.  We still have three weeks to
> 8.2 soft freeze.

I just had a similar conversation with Vladimir for "[PATCH 2/4] qapi:
introduce device-sync-config".  Quoting myself:

    In the words of Captain Barbossa, it's "more what you'd call
    'guidelines' than actual rules."

    I didn't come up with the "avoid abbreviations" stylistic guideline.  I
    inherited it.

    I do like consistent style.  I don't like excessively long names.
    Sometimes these likes conflict, and we need to pick.

For what it's worth, there's precedence for "avail" in the schema.

Let's move on.
Re: [PATCH v4] migration: Allow user to specify available switchover bandwidth
Posted by Juan Quintela 1 year, 1 month ago
Markus Armbruster <armbru@redhat.com> wrote:
> Peter Xu <peterx@redhat.com> writes:

[..]

>>> "unconvergence" isn't a word :)
>>> 
>>> Suggest "like migration not converging, because the automatically
>>> detected migration bandwidth is hilariously low for whatever reason."
>>> 
>>> Appreciate the thorough explanation!
>>
>> Thanks for reviewing!
>>
>> The patch is already in today's migration pull, so unfortunately no planned
>> repost for now.  I'll amend the commit message and collect the ACK if I'll
>> need to redo it.
>
> Didn't see the PR, and didn't expect it so soon.

Sorry.

When a series is at v4 reviewed I tend to be faster pulling.
For new series, I try to let them in the list for a couple of days to
let people comment.

Later, Juan.
Re: [PATCH v4] migration: Allow user to specify available switchover bandwidth
Posted by Juan Quintela 1 year, 1 month ago
Peter Xu <peterx@redhat.com> wrote:
> Migration bandwidth is a very important value to live migration.  It's
> because it's one of the major factors that we'll make decision on when to
> switchover to destination in a precopy process.
>
> This value is currently estimated by QEMU during the whole live migration
> process by monitoring how fast we were sending the data.  This can be the
> most accurate bandwidth if in the ideal world, where we're always feeding
> unlimited data to the migration channel, and then it'll be limited to the
> bandwidth that is available.
>
> However in reality it may be very different, e.g., over a 10Gbps network we
> can see query-migrate showing migration bandwidth of only a few tens of
> MB/s just because there are plenty of other things the migration thread
> might be doing.  For example, the migration thread can be busy scanning
> zero pages, or it can be fetching dirty bitmap from other external dirty
> sources (like vhost or KVM).  It means we may not be pushing data as much
> as possible to migration channel, so the bandwidth estimated from "how many
> data we sent in the channel" can be dramatically inaccurate sometimes.
>
> With that, the decision to switchover will be affected, by assuming that we
> may not be able to switchover at all with such a low bandwidth, but in
> reality we can.
>
> The migration may not even converge at all with the downtime specified,
> with that wrong estimation of bandwidth, keeping iterations forever with a
> low estimation of bandwidth.
>
> The issue is QEMU itself may not be able to avoid those uncertainties on
> measuing the real "available migration bandwidth".  At least not something
> I can think of so far.
>
> One way to fix this is when the user is fully aware of the available
> bandwidth, then we can allow the user to help providing an accurate value.
>
> For example, if the user has a dedicated channel of 10Gbps for migration
> for this specific VM, the user can specify this bandwidth so QEMU can
> always do the calculation based on this fact, trusting the user as long as
> specified.  It may not be the exact bandwidth when switching over (in which
> case qemu will push migration data as fast as possible), but much better
> than QEMU trying to wildly guess, especially when very wrong.
>
> A new parameter "avail-switchover-bandwidth" is introduced just for this.
> So when the user specified this parameter, instead of trusting the
> estimated value from QEMU itself (based on the QEMUFile send speed), it
> trusts the user more by using this value to decide when to switchover,
> assuming that we'll have such bandwidth available then.
>
> Note that specifying this value will not throttle the bandwidth for
> switchover yet, so QEMU will always use the full bandwidth possible for
> sending switchover data, assuming that should always be the most important
> way to use the network at that time.
>
> This can resolve issues like "unconvergence migration" which is caused by
> hilarious low "migration bandwidth" detected for whatever reason.
>
> Reported-by: Zhiyi Guo <zhguo@redhat.com>
> Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

> diff --git a/migration/options.h b/migration/options.h
> index 045e2a41a2..93ee938ab8 100644
> --- a/migration/options.h
> +++ b/migration/options.h
> @@ -80,6 +80,7 @@ int migrate_decompress_threads(void);
>  uint64_t migrate_downtime_limit(void);
>  uint8_t migrate_max_cpu_throttle(void);
>  uint64_t migrate_max_bandwidth(void);
> +uint64_t migrate_avail_switchover_bandwidth(void);
>  uint64_t migrate_max_postcopy_bandwidth(void);
>  int migrate_multifd_channels(void);
>  MultiFDCompression migrate_multifd_compression(void);

This list of functions is alphabetically sorted, doing that during merge.