[PATCH v2] migration/calc-dirty-rate: millisecond-granularity period

Andrei Gudkov via posted 1 patch 9 months, 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/8ddb0d40d143f77aab8f602bd494e01e5fa01614.1691161009.git.gudkov.andrei@huawei.com
Maintainers: Hyman Huang <yong.huang@smartx.com>, Juan Quintela <quintela@redhat.com>, Peter Xu <peterx@redhat.com>, Leonardo Bras <leobras@redhat.com>, Eric Blake <eblake@redhat.com>, Markus Armbruster <armbru@redhat.com>
There is a newer version of this series
qapi/migration.json   | 14 ++++++--
migration/dirtyrate.h | 12 ++++---
migration/dirtyrate.c | 81 +++++++++++++++++++++++++------------------
3 files changed, 67 insertions(+), 40 deletions(-)
[PATCH v2] migration/calc-dirty-rate: millisecond-granularity period
Posted by Andrei Gudkov via 9 months, 2 weeks ago
Introduces alternative argument calc-time-ms, which is the
the same as calc-time but accepts millisecond value.
Millisecond granularity allows to make predictions whether
migration will succeed or not. To do this, calculate dirty
rate with calc-time-ms set to max allowed downtime, convert
measured rate into volume of dirtied memory, and divide by
network throughput. If the value is lower than max allowed
downtime, then migration will converge.

Measurement results for single thread randomly writing to
a 1/4/24GiB memory region:

+--------------+-----------------------------------------------+
| calc-time-ms |                dirty rate MiB/s               |
|              +----------------+---------------+--------------+
|              | theoretical    | page-sampling | dirty-bitmap |
|              | (at 3M wr/sec) |               |              |
+--------------+----------------+---------------+--------------+
|                             1GiB                             |
+--------------+----------------+---------------+--------------+
|          100 |           6996 |          7100 |         3192 |
|          200 |           4606 |          4660 |         2655 |
|          300 |           3305 |          3280 |         2371 |
|          400 |           2534 |          2525 |         2154 |
|          500 |           2041 |          2044 |         1871 |
|          750 |           1365 |          1341 |         1358 |
|         1000 |           1024 |          1052 |         1025 |
|         1500 |            683 |           678 |          684 |
|         2000 |            512 |           507 |          513 |
+--------------+----------------+---------------+--------------+
|                             4GiB                             |
+--------------+----------------+---------------+--------------+
|          100 |          10232 |          8880 |         4070 |
|          200 |           8954 |          8049 |         3195 |
|          300 |           7889 |          7193 |         2881 |
|          400 |           6996 |          6530 |         2700 |
|          500 |           6245 |          5772 |         2312 |
|          750 |           4829 |          4586 |         2465 |
|         1000 |           3865 |          3780 |         2178 |
|         1500 |           2694 |          2633 |         2004 |
|         2000 |           2041 |          2031 |         1789 |
+--------------+----------------+---------------+--------------+
|                             24GiB                            |
+--------------+----------------+---------------+--------------+
|          100 |          11495 |          8640 |         5597 |
|          200 |          11226 |          8616 |         3527 |
|          300 |          10965 |          8386 |         2355 |
|          400 |          10713 |          8370 |         2179 |
|          500 |          10469 |          8196 |         2098 |
|          750 |           9890 |          7885 |         2556 |
|         1000 |           9354 |          7506 |         2084 |
|         1500 |           8397 |          6944 |         2075 |
|         2000 |           7574 |          6402 |         2062 |
+--------------+----------------+---------------+--------------+

Theoretical values are computed according to the following formula:
size * (1 - (1-(4096/size))^(time*wps)) / (time * 2^20),
where size is in bytes, time is in seconds, and wps is number of
writes per second.

Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
---
 qapi/migration.json   | 14 ++++++--
 migration/dirtyrate.h | 12 ++++---
 migration/dirtyrate.c | 81 +++++++++++++++++++++++++------------------
 3 files changed, 67 insertions(+), 40 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 8843e74b59..82493d6a57 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1849,7 +1849,11 @@
 # @start-time: start time in units of second for calculation
 #
 # @calc-time: time period for which dirty page rate was measured
-#     (in seconds)
+#     (rounded down to seconds).
+#
+# @calc-time-ms: actual time period for which dirty page rate was
+#     measured (in milliseconds).  Value may be larger than requested
+#     time period due to measurement overhead.
 #
 # @sample-pages: number of sampled pages per GiB of guest memory.
 #     Valid only in page-sampling mode (Since 6.1)
@@ -1866,6 +1870,7 @@
            'status': 'DirtyRateStatus',
            'start-time': 'int64',
            'calc-time': 'int64',
+           'calc-time-ms': 'int64',
            'sample-pages': 'uint64',
            'mode': 'DirtyRateMeasureMode',
            '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } }
@@ -1908,6 +1913,10 @@
 #     dirty during @calc-time period, further writes to this page will
 #     not increase dirty page rate anymore.
 #
+# @calc-time-ms: the same as @calc-time but in milliseconds.  These
+#    two arguments are mutually exclusive.  Exactly one of them must
+#    be specified. (Since 8.1)
+#
 # @sample-pages: number of sampled pages per each GiB of guest memory.
 #     Default value is 512.  For 4KiB guest pages this corresponds to
 #     sampling ratio of 0.2%.  This argument is used only in page
@@ -1925,7 +1934,8 @@
 #                                                 'sample-pages': 512} }
 # <- { "return": {} }
 ##
-{ 'command': 'calc-dirty-rate', 'data': {'calc-time': 'int64',
+{ 'command': 'calc-dirty-rate', 'data': {'*calc-time': 'int64',
+                                         '*calc-time-ms': 'int64',
                                          '*sample-pages': 'int',
                                          '*mode': 'DirtyRateMeasureMode'} }
 
diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
index 594a5c0bb6..869c060941 100644
--- a/migration/dirtyrate.h
+++ b/migration/dirtyrate.h
@@ -31,10 +31,12 @@
 #define MIN_RAMBLOCK_SIZE                         128
 
 /*
- * Take 1s as minimum time for calculation duration
+ * Allowed range for dirty page rate calculation (in milliseconds).
+ * Lower limit relates to the smallest realistic downtime it
+ * makes sense to impose on migration.
  */
-#define MIN_FETCH_DIRTYRATE_TIME_SEC              1
-#define MAX_FETCH_DIRTYRATE_TIME_SEC              60
+#define MIN_CALC_TIME_MS                          50
+#define MAX_CALC_TIME_MS                       60000
 
 /*
  * Take 1/16 pages in 1G as the maxmum sample page count
@@ -44,7 +46,7 @@
 
 struct DirtyRateConfig {
     uint64_t sample_pages_per_gigabytes; /* sample pages per GB */
-    int64_t sample_period_seconds; /* time duration between two sampling */
+    int64_t calc_time_ms; /* desired calculation time (in milliseconds) */
     DirtyRateMeasureMode mode; /* mode of dirtyrate measurement */
 };
 
@@ -73,7 +75,7 @@ typedef struct SampleVMStat {
 struct DirtyRateStat {
     int64_t dirty_rate; /* dirty rate in MB/s */
     int64_t start_time; /* calculation start time in units of second */
-    int64_t calc_time; /* time duration of two sampling in units of second */
+    int64_t calc_time_ms; /* actual calculation time (in milliseconds) */
     uint64_t sample_pages; /* sample pages per GB */
     union {
         SampleVMStat page_sampling;
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index 84f1b0fb20..90fb336329 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -57,6 +57,8 @@ static int64_t dirty_stat_wait(int64_t msec, int64_t initial_time)
         msec = current_time - initial_time;
     } else {
         g_usleep((msec + initial_time - current_time) * 1000);
+        /* g_usleep may overshoot */
+        msec = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - initial_time;
     }
 
     return msec;
@@ -77,9 +79,12 @@ static int64_t do_calculate_dirtyrate(DirtyPageRecord dirty_pages,
 {
     uint64_t increased_dirty_pages =
         dirty_pages.end_pages - dirty_pages.start_pages;
-    uint64_t memory_size_MiB = qemu_target_pages_to_MiB(increased_dirty_pages);
-
-    return memory_size_MiB * 1000 / calc_time_ms;
+    /*
+     * multiply by 1000ms/s _before_ converting down to megabytes
+     * to avoid losing precision
+     */
+    return qemu_target_pages_to_MiB(increased_dirty_pages * 1000) /
+        calc_time_ms;
 }
 
 void global_dirty_log_change(unsigned int flag, bool start)
@@ -183,10 +188,9 @@ retry:
     return duration;
 }
 
-static bool is_sample_period_valid(int64_t sec)
+static bool is_calc_time_valid(int64_t msec)
 {
-    if (sec < MIN_FETCH_DIRTYRATE_TIME_SEC ||
-        sec > MAX_FETCH_DIRTYRATE_TIME_SEC) {
+    if ((msec < MIN_CALC_TIME_MS) || (msec > MAX_CALC_TIME_MS)) {
         return false;
     }
 
@@ -219,7 +223,8 @@ static struct DirtyRateInfo *query_dirty_rate_info(void)
 
     info->status = CalculatingState;
     info->start_time = DirtyStat.start_time;
-    info->calc_time = DirtyStat.calc_time;
+    info->calc_time_ms = DirtyStat.calc_time_ms;
+    info->calc_time = DirtyStat.calc_time_ms / 1000;
     info->sample_pages = DirtyStat.sample_pages;
     info->mode = dirtyrate_mode;
 
@@ -258,7 +263,7 @@ static void init_dirtyrate_stat(int64_t start_time,
 {
     DirtyStat.dirty_rate = -1;
     DirtyStat.start_time = start_time;
-    DirtyStat.calc_time = config.sample_period_seconds;
+    DirtyStat.calc_time_ms = config.calc_time_ms;
     DirtyStat.sample_pages = config.sample_pages_per_gigabytes;
 
     switch (config.mode) {
@@ -568,7 +573,6 @@ static inline void dirtyrate_manual_reset_protect(void)
 
 static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig config)
 {
-    int64_t msec = 0;
     int64_t start_time;
     DirtyPageRecord dirty_pages;
 
@@ -596,9 +600,7 @@ static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig config)
     start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     DirtyStat.start_time = start_time / 1000;
 
-    msec = config.sample_period_seconds * 1000;
-    msec = dirty_stat_wait(msec, start_time);
-    DirtyStat.calc_time = msec / 1000;
+    DirtyStat.calc_time_ms = dirty_stat_wait(config.calc_time_ms, start_time);
 
     /*
      * do two things.
@@ -609,12 +611,12 @@ static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig config)
 
     record_dirtypages_bitmap(&dirty_pages, false);
 
-    DirtyStat.dirty_rate = do_calculate_dirtyrate(dirty_pages, msec);
+    DirtyStat.dirty_rate = do_calculate_dirtyrate(dirty_pages,
+                                                  DirtyStat.calc_time_ms);
 }
 
 static void calculate_dirtyrate_dirty_ring(struct DirtyRateConfig config)
 {
-    int64_t duration;
     uint64_t dirtyrate = 0;
     uint64_t dirtyrate_sum = 0;
     int i = 0;
@@ -625,12 +627,10 @@ static void calculate_dirtyrate_dirty_ring(struct DirtyRateConfig config)
     DirtyStat.start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000;
 
     /* calculate vcpu dirtyrate */
-    duration = vcpu_calculate_dirtyrate(config.sample_period_seconds * 1000,
-                                        &DirtyStat.dirty_ring,
-                                        GLOBAL_DIRTY_DIRTY_RATE,
-                                        true);
-
-    DirtyStat.calc_time = duration / 1000;
+    DirtyStat.calc_time_ms = vcpu_calculate_dirtyrate(config.calc_time_ms,
+                                                      &DirtyStat.dirty_ring,
+                                                      GLOBAL_DIRTY_DIRTY_RATE,
+                                                      true);
 
     /* calculate vm dirtyrate */
     for (i = 0; i < DirtyStat.dirty_ring.nvcpu; i++) {
@@ -646,7 +646,6 @@ static void calculate_dirtyrate_sample_vm(struct DirtyRateConfig config)
 {
     struct RamblockDirtyInfo *block_dinfo = NULL;
     int block_count = 0;
-    int64_t msec = 0;
     int64_t initial_time;
 
     rcu_read_lock();
@@ -656,17 +655,16 @@ static void calculate_dirtyrate_sample_vm(struct DirtyRateConfig config)
     }
     rcu_read_unlock();
 
-    msec = config.sample_period_seconds * 1000;
-    msec = dirty_stat_wait(msec, initial_time);
+    DirtyStat.calc_time_ms = dirty_stat_wait(config.calc_time_ms,
+                                             initial_time);
     DirtyStat.start_time = initial_time / 1000;
-    DirtyStat.calc_time = msec / 1000;
 
     rcu_read_lock();
     if (!compare_page_hash_info(block_dinfo, block_count)) {
         goto out;
     }
 
-    update_dirtyrate(msec);
+    update_dirtyrate(DirtyStat.calc_time_ms);
 
 out:
     rcu_read_unlock();
@@ -711,7 +709,10 @@ void *get_dirtyrate_thread(void *arg)
     return NULL;
 }
 
-void qmp_calc_dirty_rate(int64_t calc_time,
+void qmp_calc_dirty_rate(bool has_calc_time,
+                         int64_t calc_time,
+                         bool has_calc_time_ms,
+                         int64_t calc_time_ms,
                          bool has_sample_pages,
                          int64_t sample_pages,
                          bool has_mode,
@@ -731,10 +732,21 @@ void qmp_calc_dirty_rate(int64_t calc_time,
         return;
     }
 
-    if (!is_sample_period_valid(calc_time)) {
-        error_setg(errp, "calc-time is out of range[%d, %d].",
-                         MIN_FETCH_DIRTYRATE_TIME_SEC,
-                         MAX_FETCH_DIRTYRATE_TIME_SEC);
+    if ((int)has_calc_time + (int)has_calc_time_ms != 1) {
+        error_setg(errp, "Exactly one of calc-time and calc-time-ms must"
+                         " be specified");
+        return;
+    }
+    if (has_calc_time) {
+        /*
+         * The worst thing that can happen due to overflow is that
+         * invalid value will become valid.
+         */
+        calc_time_ms = calc_time * 1000;
+    }
+    if (!is_calc_time_valid(calc_time_ms)) {
+        error_setg(errp, "Calculation time is out of range[%dms, %dms].",
+                         MIN_CALC_TIME_MS, MAX_CALC_TIME_MS);
         return;
     }
 
@@ -781,7 +793,7 @@ void qmp_calc_dirty_rate(int64_t calc_time,
         return;
     }
 
-    config.sample_period_seconds = calc_time;
+    config.calc_time_ms = calc_time_ms;
     config.sample_pages_per_gigabytes = sample_pages;
     config.mode = mode;
 
@@ -867,8 +879,11 @@ void hmp_calc_dirty_rate(Monitor *mon, const QDict *qdict)
         mode = DIRTY_RATE_MEASURE_MODE_DIRTY_RING;
     }
 
-    qmp_calc_dirty_rate(sec, has_sample_pages, sample_pages, true,
-                        mode, &err);
+    qmp_calc_dirty_rate(true, sec, /* calc_time */
+                        false, 0, /* calc_time_ms */
+                        has_sample_pages, sample_pages,
+                        true, mode,
+                        &err);
     if (err) {
         hmp_handle_error(mon, err);
         return;
-- 
2.30.2
Re: [PATCH v2] migration/calc-dirty-rate: millisecond-granularity period
Posted by Yong Huang 9 months, 1 week ago
On Fri, Aug 4, 2023 at 11:03 PM Andrei Gudkov <gudkov.andrei@huawei.com>
wrote:

> Introduces alternative argument calc-time-ms, which is the
> the same as calc-time but accepts millisecond value.
> Millisecond granularity allows to make predictions whether
> migration will succeed or not. To do this, calculate dirty
> rate with calc-time-ms set to max allowed downtime, convert
> measured rate into volume of dirtied memory, and divide by
> network throughput. If the value is lower than max allowed
>
Not for the patch, I'm just curious about how the predication
decides the network throughput, I mean QEMU predicts
if migration will converge based on how fast it sends the data,
not the actual bandwidth of the interface, which one the
prediction uses?

> downtime, then migration will converge.
>
> Measurement results for single thread randomly writing to
> a 1/4/24GiB memory region:
>
> +--------------+-----------------------------------------------+
> | calc-time-ms |                dirty rate MiB/s               |
> |              +----------------+---------------+--------------+
> |              | theoretical    | page-sampling | dirty-bitmap |
> |              | (at 3M wr/sec) |               |              |
> +--------------+----------------+---------------+--------------+
> |                             1GiB                             |
> +--------------+----------------+---------------+--------------+
> |          100 |           6996 |          7100 |         3192 |
> |          200 |           4606 |          4660 |         2655 |
> |          300 |           3305 |          3280 |         2371 |
> |          400 |           2534 |          2525 |         2154 |
> |          500 |           2041 |          2044 |         1871 |
> |          750 |           1365 |          1341 |         1358 |
> |         1000 |           1024 |          1052 |         1025 |
> |         1500 |            683 |           678 |          684 |
> |         2000 |            512 |           507 |          513 |
> +--------------+----------------+---------------+--------------+
> |                             4GiB                             |
> +--------------+----------------+---------------+--------------+
> |          100 |          10232 |          8880 |         4070 |
> |          200 |           8954 |          8049 |         3195 |
> |          300 |           7889 |          7193 |         2881 |
> |          400 |           6996 |          6530 |         2700 |
> |          500 |           6245 |          5772 |         2312 |
> |          750 |           4829 |          4586 |         2465 |
> |         1000 |           3865 |          3780 |         2178 |
> |         1500 |           2694 |          2633 |         2004 |
> |         2000 |           2041 |          2031 |         1789 |
> +--------------+----------------+---------------+--------------+
> |                             24GiB                            |
> +--------------+----------------+---------------+--------------+
> |          100 |          11495 |          8640 |         5597 |
> |          200 |          11226 |          8616 |         3527 |
> |          300 |          10965 |          8386 |         2355 |
> |          400 |          10713 |          8370 |         2179 |
> |          500 |          10469 |          8196 |         2098 |
> |          750 |           9890 |          7885 |         2556 |
> |         1000 |           9354 |          7506 |         2084 |
> |         1500 |           8397 |          6944 |         2075 |
> |         2000 |           7574 |          6402 |         2062 |
> +--------------+----------------+---------------+--------------+
>
> Theoretical values are computed according to the following formula:
> size * (1 - (1-(4096/size))^(time*wps)) / (time * 2^20),
> where size is in bytes, time is in seconds, and wps is number of
> writes per second.
>
> Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
> ---
>  qapi/migration.json   | 14 ++++++--
>  migration/dirtyrate.h | 12 ++++---
>  migration/dirtyrate.c | 81 +++++++++++++++++++++++++------------------
>  3 files changed, 67 insertions(+), 40 deletions(-)
>
> [...]

> diff --git a/qapi/migration.json b/qapi/migration.json
> index 8843e74b59..82493d6a57 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1849,7 +1849,11 @@
>  # @start-time: start time in units of second for calculation
>  #
>  # @calc-time: time period for which dirty page rate was measured
> -#     (in seconds)
> +#     (rounded down to seconds).
>
Does there need an extra comment to emphasize that calc-time shows
zero if the calc-time-ms is lower than 1000?

> +#
> +# @calc-time-ms: actual time period for which dirty page rate was
> +#     measured (in milliseconds).  Value may be larger than requested
> +#     time period due to measurement overhead.
>  #
>  # @sample-pages: number of sampled pages per GiB of guest memory.
>  #     Valid only in page-sampling mode (Since 6.1)
> @@ -1866,6 +1870,7 @@
>             'status': 'DirtyRateStatus',
>             'start-time': 'int64',
>             'calc-time': 'int64',
> +           'calc-time-ms': 'int64',
>             'sample-pages': 'uint64',
>             'mode': 'DirtyRateMeasureMode',
>             '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } }
> @@ -1908,6 +1913,10 @@
>  #     dirty during @calc-time period, further writes to this page will
>  #     not increase dirty page rate anymore.
>  #
> +# @calc-time-ms: the same as @calc-time but in milliseconds.  These
> +#    two arguments are mutually exclusive.  Exactly one of them must
> +#    be specified. (Since 8.1)
> +#
>  # @sample-pages: number of sampled pages per each GiB of guest memory.
>  #     Default value is 512.  For 4KiB guest pages this corresponds to
>  #     sampling ratio of 0.2%.  This argument is used only in page
> @@ -1925,7 +1934,8 @@
>  #                                                 'sample-pages': 512} }
>  # <- { "return": {} }
>  ##
> -{ 'command': 'calc-dirty-rate', 'data': {'calc-time': 'int64',
> +{ 'command': 'calc-dirty-rate', 'data': {'*calc-time': 'int64',
> +                                         '*calc-time-ms': 'int64',
>                                           '*sample-pages': 'int',
>                                           '*mode': 'DirtyRateMeasureMode'}
> }
>
> diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
> index 594a5c0bb6..869c060941 100644
> --- a/migration/dirtyrate.h
> +++ b/migration/dirtyrate.h
> @@ -31,10 +31,12 @@
>  #define MIN_RAMBLOCK_SIZE                         128
>
>  /*
> - * Take 1s as minimum time for calculation duration
> + * Allowed range for dirty page rate calculation (in milliseconds).
> + * Lower limit relates to the smallest realistic downtime it
> + * makes sense to impose on migration.
>   */
> -#define MIN_FETCH_DIRTYRATE_TIME_SEC              1
> -#define MAX_FETCH_DIRTYRATE_TIME_SEC              60
> +#define MIN_CALC_TIME_MS                          50
> +#define MAX_CALC_TIME_MS                       60000
>
>  /*
>   * Take 1/16 pages in 1G as the maxmum sample page count
> @@ -44,7 +46,7 @@
>
>  struct DirtyRateConfig {
>      uint64_t sample_pages_per_gigabytes; /* sample pages per GB */
> -    int64_t sample_period_seconds; /* time duration between two sampling
> */
> +    int64_t calc_time_ms; /* desired calculation time (in milliseconds) */
>      DirtyRateMeasureMode mode; /* mode of dirtyrate measurement */
>  };
>
> @@ -73,7 +75,7 @@ typedef struct SampleVMStat {
>  struct DirtyRateStat {
>      int64_t dirty_rate; /* dirty rate in MB/s */
>      int64_t start_time; /* calculation start time in units of second */
> -    int64_t calc_time; /* time duration of two sampling in units of
> second */
> +    int64_t calc_time_ms; /* actual calculation time (in milliseconds) */
>      uint64_t sample_pages; /* sample pages per GB */
>      union {
>          SampleVMStat page_sampling;

diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
> index 84f1b0fb20..90fb336329 100644
> --- a/migration/dirtyrate.c
> +++ b/migration/dirtyrate.c
> @@ -57,6 +57,8 @@ static int64_t dirty_stat_wait(int64_t msec, int64_t
> initial_time)
>          msec = current_time - initial_time;
>      } else {
>          g_usleep((msec + initial_time - current_time) * 1000);
> +        /* g_usleep may overshoot */
> +        msec = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - initial_time;
>
The optimization could be a standalone commit along with the content
below(see the following comment)?

>      }
>
>      return msec;
> @@ -77,9 +79,12 @@ static int64_t do_calculate_dirtyrate(DirtyPageRecord
> dirty_pages,
>  {
>      uint64_t increased_dirty_pages =
>          dirty_pages.end_pages - dirty_pages.start_pages;
> -    uint64_t memory_size_MiB =
> qemu_target_pages_to_MiB(increased_dirty_pages);
> -
> -    return memory_size_MiB * 1000 / calc_time_ms;
> +    /*
> +     * multiply by 1000ms/s _before_ converting down to megabytes
> +     * to avoid losing precision
> +     */
> +    return qemu_target_pages_to_MiB(increased_dirty_pages * 1000) /
> +        calc_time_ms;
>
Code optimization, could be in a standalone commit.

>  }
>
>  void global_dirty_log_change(unsigned int flag, bool start)
> @@ -183,10 +188,9 @@ retry:
>      return duration;
>  }
>
> -static bool is_sample_period_valid(int64_t sec)
> +static bool is_calc_time_valid(int64_t msec)
>  {
> -    if (sec < MIN_FETCH_DIRTYRATE_TIME_SEC ||
> -        sec > MAX_FETCH_DIRTYRATE_TIME_SEC) {
> +    if ((msec < MIN_CALC_TIME_MS) || (msec > MAX_CALC_TIME_MS)) {
>          return false;
>      }
>
> @@ -219,7 +223,8 @@ static struct DirtyRateInfo
> *query_dirty_rate_info(void)
>
>      info->status = CalculatingState;
>      info->start_time = DirtyStat.start_time;
> -    info->calc_time = DirtyStat.calc_time;
> +    info->calc_time_ms = DirtyStat.calc_time_ms;
> +    info->calc_time = DirtyStat.calc_time_ms / 1000;
>      info->sample_pages = DirtyStat.sample_pages;
>      info->mode = dirtyrate_mode;
>
> @@ -258,7 +263,7 @@ static void init_dirtyrate_stat(int64_t start_time,
>  {
>      DirtyStat.dirty_rate = -1;
>      DirtyStat.start_time = start_time;
> -    DirtyStat.calc_time = config.sample_period_seconds;
> +    DirtyStat.calc_time_ms = config.calc_time_ms;
>      DirtyStat.sample_pages = config.sample_pages_per_gigabytes;
>
>      switch (config.mode) {
> @@ -568,7 +573,6 @@ static inline void dirtyrate_manual_reset_protect(void)
>
>  static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig
> config)
>  {
> -    int64_t msec = 0;
>      int64_t start_time;
>      DirtyPageRecord dirty_pages;
>
> @@ -596,9 +600,7 @@ static void calculate_dirtyrate_dirty_bitmap(struct
> DirtyRateConfig config)
>      start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>      DirtyStat.start_time = start_time / 1000;
>
> -    msec = config.sample_period_seconds * 1000;
> -    msec = dirty_stat_wait(msec, start_time);
> -    DirtyStat.calc_time = msec / 1000;
> +    DirtyStat.calc_time_ms = dirty_stat_wait(config.calc_time_ms,
> start_time);
>
>      /*
>       * do two things.
> @@ -609,12 +611,12 @@ static void calculate_dirtyrate_dirty_bitmap(struct
> DirtyRateConfig config)
>
>      record_dirtypages_bitmap(&dirty_pages, false);
>
> -    DirtyStat.dirty_rate = do_calculate_dirtyrate(dirty_pages, msec);
> +    DirtyStat.dirty_rate = do_calculate_dirtyrate(dirty_pages,
> +                                                  DirtyStat.calc_time_ms);
>  }
>
>  static void calculate_dirtyrate_dirty_ring(struct DirtyRateConfig config)
>  {
> -    int64_t duration;
>      uint64_t dirtyrate = 0;
>      uint64_t dirtyrate_sum = 0;
>      int i = 0;
> @@ -625,12 +627,10 @@ static void calculate_dirtyrate_dirty_ring(struct
> DirtyRateConfig config)
>      DirtyStat.start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000;
>
>      /* calculate vcpu dirtyrate */
> -    duration = vcpu_calculate_dirtyrate(config.sample_period_seconds *
> 1000,
> -                                        &DirtyStat.dirty_ring,
> -                                        GLOBAL_DIRTY_DIRTY_RATE,
> -                                        true);
> -
> -    DirtyStat.calc_time = duration / 1000;
> +    DirtyStat.calc_time_ms = vcpu_calculate_dirtyrate(config.calc_time_ms,
> +
> &DirtyStat.dirty_ring,
> +
> GLOBAL_DIRTY_DIRTY_RATE,
> +                                                      true);
>
>      /* calculate vm dirtyrate */
>      for (i = 0; i < DirtyStat.dirty_ring.nvcpu; i++) {
> @@ -646,7 +646,6 @@ static void calculate_dirtyrate_sample_vm(struct
> DirtyRateConfig config)
>  {
>      struct RamblockDirtyInfo *block_dinfo = NULL;
>      int block_count = 0;
> -    int64_t msec = 0;
>      int64_t initial_time;
>
>      rcu_read_lock();
> @@ -656,17 +655,16 @@ static void calculate_dirtyrate_sample_vm(struct
> DirtyRateConfig config)
>      }
>      rcu_read_unlock();
>
> -    msec = config.sample_period_seconds * 1000;
> -    msec = dirty_stat_wait(msec, initial_time);
> +    DirtyStat.calc_time_ms = dirty_stat_wait(config.calc_time_ms,
> +                                             initial_time);
>      DirtyStat.start_time = initial_time / 1000;
> -    DirtyStat.calc_time = msec / 1000;
>
>      rcu_read_lock();
>      if (!compare_page_hash_info(block_dinfo, block_count)) {
>          goto out;
>      }
>
> -    update_dirtyrate(msec);
> +    update_dirtyrate(DirtyStat.calc_time_ms);
>
>  out:
>      rcu_read_unlock();
> @@ -711,7 +709,10 @@ void *get_dirtyrate_thread(void *arg)
>      return NULL;
>  }
>
> -void qmp_calc_dirty_rate(int64_t calc_time,
> +void qmp_calc_dirty_rate(bool has_calc_time,
> +                         int64_t calc_time,
> +                         bool has_calc_time_ms,
> +                         int64_t calc_time_ms,
>                           bool has_sample_pages,
>                           int64_t sample_pages,
>                           bool has_mode,
> @@ -731,10 +732,21 @@ void qmp_calc_dirty_rate(int64_t calc_time,
>          return;
>      }
>
> -    if (!is_sample_period_valid(calc_time)) {
> -        error_setg(errp, "calc-time is out of range[%d, %d].",
> -                         MIN_FETCH_DIRTYRATE_TIME_SEC,
> -                         MAX_FETCH_DIRTYRATE_TIME_SEC);
> +    if ((int)has_calc_time + (int)has_calc_time_ms != 1) {
> +        error_setg(errp, "Exactly one of calc-time and calc-time-ms must"
> +                         " be specified");
> +        return;
> +    }
> +    if (has_calc_time) {
> +        /*
> +         * The worst thing that can happen due to overflow is that
> +         * invalid value will become valid.
> +         */
> +        calc_time_ms = calc_time * 1000;
> +    }
> +    if (!is_calc_time_valid(calc_time_ms)) {
> +        error_setg(errp, "Calculation time is out of range[%dms, %dms].",
> +                         MIN_CALC_TIME_MS, MAX_CALC_TIME_MS);
>          return;
>      }
>
> @@ -781,7 +793,7 @@ void qmp_calc_dirty_rate(int64_t calc_time,
>          return;
>      }
>
> -    config.sample_period_seconds = calc_time;
> +    config.calc_time_ms = calc_time_ms;
>      config.sample_pages_per_gigabytes = sample_pages;
>      config.mode = mode;
>
> @@ -867,8 +879,11 @@ void hmp_calc_dirty_rate(Monitor *mon, const QDict
> *qdict)
>          mode = DIRTY_RATE_MEASURE_MODE_DIRTY_RING;
>      }
>
> -    qmp_calc_dirty_rate(sec, has_sample_pages, sample_pages, true,
> -                        mode, &err);
> +    qmp_calc_dirty_rate(true, sec, /* calc_time */
> +                        false, 0, /* calc_time_ms */
> +                        has_sample_pages, sample_pages,
> +                        true, mode,
> +                        &err);
>      if (err) {
>          hmp_handle_error(mon, err);
>          return;
> --
> 2.30.2
>
> The patch set works for me, and I'm inclined to split it into two commits
as I point out above.

Thanks

Yong

-- 
Best regards
Re: [PATCH v2] migration/calc-dirty-rate: millisecond-granularity period
Posted by gudkov.andrei--- via 9 months, 1 week ago
On Sun, Aug 06, 2023 at 02:16:34PM +0800, Yong Huang wrote:
> On Fri, Aug 4, 2023 at 11:03 PM Andrei Gudkov <gudkov.andrei@huawei.com>
> wrote:
> 
> > Introduces alternative argument calc-time-ms, which is the
> > the same as calc-time but accepts millisecond value.
> > Millisecond granularity allows to make predictions whether
> > migration will succeed or not. To do this, calculate dirty
> > rate with calc-time-ms set to max allowed downtime, convert
> > measured rate into volume of dirtied memory, and divide by
> > network throughput. If the value is lower than max allowed
> >
> Not for the patch, I'm just curious about how the predication
> decides the network throughput, I mean QEMU predicts
> if migration will converge based on how fast it sends the data,
> not the actual bandwidth of the interface, which one the
> prediction uses?
> 
Currently I use network nominal bandwidth, e.g. 1gbps. It would
be nice, of course, to use measured throughput since it can take
into account network headers overhead (as Wang Lei previously
mentioned), compression, etc., but it looks too complicated to
implement outside of migration process.

> > downtime, then migration will converge.
> >
> > Measurement results for single thread randomly writing to
> > a 1/4/24GiB memory region:
> >
> > +--------------+-----------------------------------------------+
> > | calc-time-ms |                dirty rate MiB/s               |
> > |              +----------------+---------------+--------------+
> > |              | theoretical    | page-sampling | dirty-bitmap |
> > |              | (at 3M wr/sec) |               |              |
> > +--------------+----------------+---------------+--------------+
> > |                             1GiB                             |
> > +--------------+----------------+---------------+--------------+
> > |          100 |           6996 |          7100 |         3192 |
> > |          200 |           4606 |          4660 |         2655 |
> > |          300 |           3305 |          3280 |         2371 |
> > |          400 |           2534 |          2525 |         2154 |
> > |          500 |           2041 |          2044 |         1871 |
> > |          750 |           1365 |          1341 |         1358 |
> > |         1000 |           1024 |          1052 |         1025 |
> > |         1500 |            683 |           678 |          684 |
> > |         2000 |            512 |           507 |          513 |
> > +--------------+----------------+---------------+--------------+
> > |                             4GiB                             |
> > +--------------+----------------+---------------+--------------+
> > |          100 |          10232 |          8880 |         4070 |
> > |          200 |           8954 |          8049 |         3195 |
> > |          300 |           7889 |          7193 |         2881 |
> > |          400 |           6996 |          6530 |         2700 |
> > |          500 |           6245 |          5772 |         2312 |
> > |          750 |           4829 |          4586 |         2465 |
> > |         1000 |           3865 |          3780 |         2178 |
> > |         1500 |           2694 |          2633 |         2004 |
> > |         2000 |           2041 |          2031 |         1789 |
> > +--------------+----------------+---------------+--------------+
> > |                             24GiB                            |
> > +--------------+----------------+---------------+--------------+
> > |          100 |          11495 |          8640 |         5597 |
> > |          200 |          11226 |          8616 |         3527 |
> > |          300 |          10965 |          8386 |         2355 |
> > |          400 |          10713 |          8370 |         2179 |
> > |          500 |          10469 |          8196 |         2098 |
> > |          750 |           9890 |          7885 |         2556 |
> > |         1000 |           9354 |          7506 |         2084 |
> > |         1500 |           8397 |          6944 |         2075 |
> > |         2000 |           7574 |          6402 |         2062 |
> > +--------------+----------------+---------------+--------------+
> >
> > Theoretical values are computed according to the following formula:
> > size * (1 - (1-(4096/size))^(time*wps)) / (time * 2^20),
> > where size is in bytes, time is in seconds, and wps is number of
> > writes per second.
> >
> > Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
> > ---
> >  qapi/migration.json   | 14 ++++++--
> >  migration/dirtyrate.h | 12 ++++---
> >  migration/dirtyrate.c | 81 +++++++++++++++++++++++++------------------
> >  3 files changed, 67 insertions(+), 40 deletions(-)
> >
> > [...]
> 
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 8843e74b59..82493d6a57 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -1849,7 +1849,11 @@
> >  # @start-time: start time in units of second for calculation
> >  #
> >  # @calc-time: time period for which dirty page rate was measured
> > -#     (in seconds)
> > +#     (rounded down to seconds).
> >
> Does there need an extra comment to emphasize that calc-time shows
> zero if the calc-time-ms is lower than 1000?
> 
> > +#
> > +# @calc-time-ms: actual time period for which dirty page rate was
> > +#     measured (in milliseconds).  Value may be larger than requested
> > +#     time period due to measurement overhead.
> >  #
> >  # @sample-pages: number of sampled pages per GiB of guest memory.
> >  #     Valid only in page-sampling mode (Since 6.1)
> > @@ -1866,6 +1870,7 @@
> >             'status': 'DirtyRateStatus',
> >             'start-time': 'int64',
> >             'calc-time': 'int64',
> > +           'calc-time-ms': 'int64',
> >             'sample-pages': 'uint64',
> >             'mode': 'DirtyRateMeasureMode',
> >             '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } }
> > @@ -1908,6 +1913,10 @@
> >  #     dirty during @calc-time period, further writes to this page will
> >  #     not increase dirty page rate anymore.
> >  #
> > +# @calc-time-ms: the same as @calc-time but in milliseconds.  These
> > +#    two arguments are mutually exclusive.  Exactly one of them must
> > +#    be specified. (Since 8.1)
> > +#
> >  # @sample-pages: number of sampled pages per each GiB of guest memory.
> >  #     Default value is 512.  For 4KiB guest pages this corresponds to
> >  #     sampling ratio of 0.2%.  This argument is used only in page
> > @@ -1925,7 +1934,8 @@
> >  #                                                 'sample-pages': 512} }
> >  # <- { "return": {} }
> >  ##
> > -{ 'command': 'calc-dirty-rate', 'data': {'calc-time': 'int64',
> > +{ 'command': 'calc-dirty-rate', 'data': {'*calc-time': 'int64',
> > +                                         '*calc-time-ms': 'int64',
> >                                           '*sample-pages': 'int',
> >                                           '*mode': 'DirtyRateMeasureMode'}
> > }
> >
> > diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
> > index 594a5c0bb6..869c060941 100644
> > --- a/migration/dirtyrate.h
> > +++ b/migration/dirtyrate.h
> > @@ -31,10 +31,12 @@
> >  #define MIN_RAMBLOCK_SIZE                         128
> >
> >  /*
> > - * Take 1s as minimum time for calculation duration
> > + * Allowed range for dirty page rate calculation (in milliseconds).
> > + * Lower limit relates to the smallest realistic downtime it
> > + * makes sense to impose on migration.
> >   */
> > -#define MIN_FETCH_DIRTYRATE_TIME_SEC              1
> > -#define MAX_FETCH_DIRTYRATE_TIME_SEC              60
> > +#define MIN_CALC_TIME_MS                          50
> > +#define MAX_CALC_TIME_MS                       60000
> >
> >  /*
> >   * Take 1/16 pages in 1G as the maxmum sample page count
> > @@ -44,7 +46,7 @@
> >
> >  struct DirtyRateConfig {
> >      uint64_t sample_pages_per_gigabytes; /* sample pages per GB */
> > -    int64_t sample_period_seconds; /* time duration between two sampling
> > */
> > +    int64_t calc_time_ms; /* desired calculation time (in milliseconds) */
> >      DirtyRateMeasureMode mode; /* mode of dirtyrate measurement */
> >  };
> >
> > @@ -73,7 +75,7 @@ typedef struct SampleVMStat {
> >  struct DirtyRateStat {
> >      int64_t dirty_rate; /* dirty rate in MB/s */
> >      int64_t start_time; /* calculation start time in units of second */
> > -    int64_t calc_time; /* time duration of two sampling in units of
> > second */
> > +    int64_t calc_time_ms; /* actual calculation time (in milliseconds) */
> >      uint64_t sample_pages; /* sample pages per GB */
> >      union {
> >          SampleVMStat page_sampling;
> 
> diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
> > index 84f1b0fb20..90fb336329 100644
> > --- a/migration/dirtyrate.c
> > +++ b/migration/dirtyrate.c
> > @@ -57,6 +57,8 @@ static int64_t dirty_stat_wait(int64_t msec, int64_t
> > initial_time)
> >          msec = current_time - initial_time;
> >      } else {
> >          g_usleep((msec + initial_time - current_time) * 1000);
> > +        /* g_usleep may overshoot */
> > +        msec = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - initial_time;
> >
> The optimization could be a standalone commit along with the content
> below(see the following comment)?
> 
OK, I will move it into separate commit.

> >      }
> >
> >      return msec;
> > @@ -77,9 +79,12 @@ static int64_t do_calculate_dirtyrate(DirtyPageRecord
> > dirty_pages,
> >  {
> >      uint64_t increased_dirty_pages =
> >          dirty_pages.end_pages - dirty_pages.start_pages;
> > -    uint64_t memory_size_MiB =
> > qemu_target_pages_to_MiB(increased_dirty_pages);
> > -
> > -    return memory_size_MiB * 1000 / calc_time_ms;
> > +    /*
> > +     * multiply by 1000ms/s _before_ converting down to megabytes
> > +     * to avoid losing precision
> > +     */
> > +    return qemu_target_pages_to_MiB(increased_dirty_pages * 1000) /
> > +        calc_time_ms;
> >
> Code optimization, could be in a standalone commit.
> 
OK, but note that it is an important optimization. Imagine that
calc_time_ms=100 and increased_dirty_pages=1000. If we compute without
this optimization, we will get only 1000/(2^8)*1000/100=30. With
optmization: 1000*1000/(2^8)/100=39.

> >  }
> >
> >  void global_dirty_log_change(unsigned int flag, bool start)
> > @@ -183,10 +188,9 @@ retry:
> >      return duration;
> >  }
> >
> > -static bool is_sample_period_valid(int64_t sec)
> > +static bool is_calc_time_valid(int64_t msec)
> >  {
> > -    if (sec < MIN_FETCH_DIRTYRATE_TIME_SEC ||
> > -        sec > MAX_FETCH_DIRTYRATE_TIME_SEC) {
> > +    if ((msec < MIN_CALC_TIME_MS) || (msec > MAX_CALC_TIME_MS)) {
> >          return false;
> >      }
> >
> > @@ -219,7 +223,8 @@ static struct DirtyRateInfo
> > *query_dirty_rate_info(void)
> >
> >      info->status = CalculatingState;
> >      info->start_time = DirtyStat.start_time;
> > -    info->calc_time = DirtyStat.calc_time;
> > +    info->calc_time_ms = DirtyStat.calc_time_ms;
> > +    info->calc_time = DirtyStat.calc_time_ms / 1000;
> >      info->sample_pages = DirtyStat.sample_pages;
> >      info->mode = dirtyrate_mode;
> >
> > @@ -258,7 +263,7 @@ static void init_dirtyrate_stat(int64_t start_time,
> >  {
> >      DirtyStat.dirty_rate = -1;
> >      DirtyStat.start_time = start_time;
> > -    DirtyStat.calc_time = config.sample_period_seconds;
> > +    DirtyStat.calc_time_ms = config.calc_time_ms;
> >      DirtyStat.sample_pages = config.sample_pages_per_gigabytes;
> >
> >      switch (config.mode) {
> > @@ -568,7 +573,6 @@ static inline void dirtyrate_manual_reset_protect(void)
> >
> >  static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig
> > config)
> >  {
> > -    int64_t msec = 0;
> >      int64_t start_time;
> >      DirtyPageRecord dirty_pages;
> >
> > @@ -596,9 +600,7 @@ static void calculate_dirtyrate_dirty_bitmap(struct
> > DirtyRateConfig config)
> >      start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >      DirtyStat.start_time = start_time / 1000;
> >
> > -    msec = config.sample_period_seconds * 1000;
> > -    msec = dirty_stat_wait(msec, start_time);
> > -    DirtyStat.calc_time = msec / 1000;
> > +    DirtyStat.calc_time_ms = dirty_stat_wait(config.calc_time_ms,
> > start_time);
> >
> >      /*
> >       * do two things.
> > @@ -609,12 +611,12 @@ static void calculate_dirtyrate_dirty_bitmap(struct
> > DirtyRateConfig config)
> >
> >      record_dirtypages_bitmap(&dirty_pages, false);
> >
> > -    DirtyStat.dirty_rate = do_calculate_dirtyrate(dirty_pages, msec);
> > +    DirtyStat.dirty_rate = do_calculate_dirtyrate(dirty_pages,
> > +                                                  DirtyStat.calc_time_ms);
> >  }
> >
> >  static void calculate_dirtyrate_dirty_ring(struct DirtyRateConfig config)
> >  {
> > -    int64_t duration;
> >      uint64_t dirtyrate = 0;
> >      uint64_t dirtyrate_sum = 0;
> >      int i = 0;
> > @@ -625,12 +627,10 @@ static void calculate_dirtyrate_dirty_ring(struct
> > DirtyRateConfig config)
> >      DirtyStat.start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000;
> >
> >      /* calculate vcpu dirtyrate */
> > -    duration = vcpu_calculate_dirtyrate(config.sample_period_seconds *
> > 1000,
> > -                                        &DirtyStat.dirty_ring,
> > -                                        GLOBAL_DIRTY_DIRTY_RATE,
> > -                                        true);
> > -
> > -    DirtyStat.calc_time = duration / 1000;
> > +    DirtyStat.calc_time_ms = vcpu_calculate_dirtyrate(config.calc_time_ms,
> > +
> > &DirtyStat.dirty_ring,
> > +
> > GLOBAL_DIRTY_DIRTY_RATE,
> > +                                                      true);
> >
> >      /* calculate vm dirtyrate */
> >      for (i = 0; i < DirtyStat.dirty_ring.nvcpu; i++) {
> > @@ -646,7 +646,6 @@ static void calculate_dirtyrate_sample_vm(struct
> > DirtyRateConfig config)
> >  {
> >      struct RamblockDirtyInfo *block_dinfo = NULL;
> >      int block_count = 0;
> > -    int64_t msec = 0;
> >      int64_t initial_time;
> >
> >      rcu_read_lock();
> > @@ -656,17 +655,16 @@ static void calculate_dirtyrate_sample_vm(struct
> > DirtyRateConfig config)
> >      }
> >      rcu_read_unlock();
> >
> > -    msec = config.sample_period_seconds * 1000;
> > -    msec = dirty_stat_wait(msec, initial_time);
> > +    DirtyStat.calc_time_ms = dirty_stat_wait(config.calc_time_ms,
> > +                                             initial_time);
> >      DirtyStat.start_time = initial_time / 1000;
> > -    DirtyStat.calc_time = msec / 1000;
> >
> >      rcu_read_lock();
> >      if (!compare_page_hash_info(block_dinfo, block_count)) {
> >          goto out;
> >      }
> >
> > -    update_dirtyrate(msec);
> > +    update_dirtyrate(DirtyStat.calc_time_ms);
> >
> >  out:
> >      rcu_read_unlock();
> > @@ -711,7 +709,10 @@ void *get_dirtyrate_thread(void *arg)
> >      return NULL;
> >  }
> >
> > -void qmp_calc_dirty_rate(int64_t calc_time,
> > +void qmp_calc_dirty_rate(bool has_calc_time,
> > +                         int64_t calc_time,
> > +                         bool has_calc_time_ms,
> > +                         int64_t calc_time_ms,
> >                           bool has_sample_pages,
> >                           int64_t sample_pages,
> >                           bool has_mode,
> > @@ -731,10 +732,21 @@ void qmp_calc_dirty_rate(int64_t calc_time,
> >          return;
> >      }
> >
> > -    if (!is_sample_period_valid(calc_time)) {
> > -        error_setg(errp, "calc-time is out of range[%d, %d].",
> > -                         MIN_FETCH_DIRTYRATE_TIME_SEC,
> > -                         MAX_FETCH_DIRTYRATE_TIME_SEC);
> > +    if ((int)has_calc_time + (int)has_calc_time_ms != 1) {
> > +        error_setg(errp, "Exactly one of calc-time and calc-time-ms must"
> > +                         " be specified");
> > +        return;
> > +    }
> > +    if (has_calc_time) {
> > +        /*
> > +         * The worst thing that can happen due to overflow is that
> > +         * invalid value will become valid.
> > +         */
> > +        calc_time_ms = calc_time * 1000;
> > +    }
> > +    if (!is_calc_time_valid(calc_time_ms)) {
> > +        error_setg(errp, "Calculation time is out of range[%dms, %dms].",
> > +                         MIN_CALC_TIME_MS, MAX_CALC_TIME_MS);
> >          return;
> >      }
> >
> > @@ -781,7 +793,7 @@ void qmp_calc_dirty_rate(int64_t calc_time,
> >          return;
> >      }
> >
> > -    config.sample_period_seconds = calc_time;
> > +    config.calc_time_ms = calc_time_ms;
> >      config.sample_pages_per_gigabytes = sample_pages;
> >      config.mode = mode;
> >
> > @@ -867,8 +879,11 @@ void hmp_calc_dirty_rate(Monitor *mon, const QDict
> > *qdict)
> >          mode = DIRTY_RATE_MEASURE_MODE_DIRTY_RING;
> >      }
> >
> > -    qmp_calc_dirty_rate(sec, has_sample_pages, sample_pages, true,
> > -                        mode, &err);
> > +    qmp_calc_dirty_rate(true, sec, /* calc_time */
> > +                        false, 0, /* calc_time_ms */
> > +                        has_sample_pages, sample_pages,
> > +                        true, mode,
> > +                        &err);
> >      if (err) {
> >          hmp_handle_error(mon, err);
> >          return;
> > --
> > 2.30.2
> >
> > The patch set works for me, and I'm inclined to split it into two commits
> as I point out above.
> 
> Thanks
> 
> Yong
> 
> -- 
> Best regards

Re: [PATCH v2] migration/calc-dirty-rate: millisecond-granularity period
Posted by Yong Huang 8 months, 3 weeks ago
Hi, Andrei. I'm preparing a patchset for a pull request.

For this patch, would you mind if I?
1. Generate a single patch from the optimization partition.
2. In the patch above, include my comment and my "Reviewed-by".
3. Add it to the queue with other patchsets.

Then you can improve on top at your leisure, Would that work for you?

On Wed, Aug 9, 2023 at 11:03 PM <gudkov.andrei@huawei.com> wrote:

> On Sun, Aug 06, 2023 at 02:16:34PM +0800, Yong Huang wrote:
> > On Fri, Aug 4, 2023 at 11:03 PM Andrei Gudkov <gudkov.andrei@huawei.com>
> > wrote:
> >
> > > Introduces alternative argument calc-time-ms, which is the
> > > the same as calc-time but accepts millisecond value.
> > > Millisecond granularity allows to make predictions whether
> > > migration will succeed or not. To do this, calculate dirty
> > > rate with calc-time-ms set to max allowed downtime, convert
> > > measured rate into volume of dirtied memory, and divide by
> > > network throughput. If the value is lower than max allowed
> > >
> > Not for the patch, I'm just curious about how the predication
> > decides the network throughput, I mean QEMU predicts
> > if migration will converge based on how fast it sends the data,
> > not the actual bandwidth of the interface, which one the
> > prediction uses?
> >
> Currently I use network nominal bandwidth, e.g. 1gbps. It would
> be nice, of course, to use measured throughput since it can take
> into account network headers overhead (as Wang Lei previously
> mentioned), compression, etc., but it looks too complicated to
> implement outside of migration process.
>
> > > downtime, then migration will converge.
> > >
> > > Measurement results for single thread randomly writing to
> > > a 1/4/24GiB memory region:
> > >
> > > +--------------+-----------------------------------------------+
> > > | calc-time-ms |                dirty rate MiB/s               |
> > > |              +----------------+---------------+--------------+
> > > |              | theoretical    | page-sampling | dirty-bitmap |
> > > |              | (at 3M wr/sec) |               |              |
> > > +--------------+----------------+---------------+--------------+
> > > |                             1GiB                             |
> > > +--------------+----------------+---------------+--------------+
> > > |          100 |           6996 |          7100 |         3192 |
> > > |          200 |           4606 |          4660 |         2655 |
> > > |          300 |           3305 |          3280 |         2371 |
> > > |          400 |           2534 |          2525 |         2154 |
> > > |          500 |           2041 |          2044 |         1871 |
> > > |          750 |           1365 |          1341 |         1358 |
> > > |         1000 |           1024 |          1052 |         1025 |
> > > |         1500 |            683 |           678 |          684 |
> > > |         2000 |            512 |           507 |          513 |
> > > +--------------+----------------+---------------+--------------+
> > > |                             4GiB                             |
> > > +--------------+----------------+---------------+--------------+
> > > |          100 |          10232 |          8880 |         4070 |
> > > |          200 |           8954 |          8049 |         3195 |
> > > |          300 |           7889 |          7193 |         2881 |
> > > |          400 |           6996 |          6530 |         2700 |
> > > |          500 |           6245 |          5772 |         2312 |
> > > |          750 |           4829 |          4586 |         2465 |
> > > |         1000 |           3865 |          3780 |         2178 |
> > > |         1500 |           2694 |          2633 |         2004 |
> > > |         2000 |           2041 |          2031 |         1789 |
> > > +--------------+----------------+---------------+--------------+
> > > |                             24GiB                            |
> > > +--------------+----------------+---------------+--------------+
> > > |          100 |          11495 |          8640 |         5597 |
> > > |          200 |          11226 |          8616 |         3527 |
> > > |          300 |          10965 |          8386 |         2355 |
> > > |          400 |          10713 |          8370 |         2179 |
> > > |          500 |          10469 |          8196 |         2098 |
> > > |          750 |           9890 |          7885 |         2556 |
> > > |         1000 |           9354 |          7506 |         2084 |
> > > |         1500 |           8397 |          6944 |         2075 |
> > > |         2000 |           7574 |          6402 |         2062 |
> > > +--------------+----------------+---------------+--------------+
> > >
> > > Theoretical values are computed according to the following formula:
> > > size * (1 - (1-(4096/size))^(time*wps)) / (time * 2^20),
> > > where size is in bytes, time is in seconds, and wps is number of
> > > writes per second.
> > >
> > > Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
> > > ---
> > >  qapi/migration.json   | 14 ++++++--
> > >  migration/dirtyrate.h | 12 ++++---
> > >  migration/dirtyrate.c | 81 +++++++++++++++++++++++++------------------
> > >  3 files changed, 67 insertions(+), 40 deletions(-)
> > >
> > > [...]
> >
> > > diff --git a/qapi/migration.json b/qapi/migration.json
> > > index 8843e74b59..82493d6a57 100644
> > > --- a/qapi/migration.json
> > > +++ b/qapi/migration.json
> > > @@ -1849,7 +1849,11 @@
> > >  # @start-time: start time in units of second for calculation
> > >  #
> > >  # @calc-time: time period for which dirty page rate was measured
> > > -#     (in seconds)
> > > +#     (rounded down to seconds).
> > >
> > Does there need an extra comment to emphasize that calc-time shows
> > zero if the calc-time-ms is lower than 1000?
> >
> > > +#
> > > +# @calc-time-ms: actual time period for which dirty page rate was
> > > +#     measured (in milliseconds).  Value may be larger than requested
> > > +#     time period due to measurement overhead.
> > >  #
> > >  # @sample-pages: number of sampled pages per GiB of guest memory.
> > >  #     Valid only in page-sampling mode (Since 6.1)
> > > @@ -1866,6 +1870,7 @@
> > >             'status': 'DirtyRateStatus',
> > >             'start-time': 'int64',
> > >             'calc-time': 'int64',
> > > +           'calc-time-ms': 'int64',
> > >             'sample-pages': 'uint64',
> > >             'mode': 'DirtyRateMeasureMode',
> > >             '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } }
> > > @@ -1908,6 +1913,10 @@
> > >  #     dirty during @calc-time period, further writes to this page will
> > >  #     not increase dirty page rate anymore.
> > >  #
> > > +# @calc-time-ms: the same as @calc-time but in milliseconds.  These
> > > +#    two arguments are mutually exclusive.  Exactly one of them must
> > > +#    be specified. (Since 8.1)
> > > +#
> > >  # @sample-pages: number of sampled pages per each GiB of guest memory.
> > >  #     Default value is 512.  For 4KiB guest pages this corresponds to
> > >  #     sampling ratio of 0.2%.  This argument is used only in page
> > > @@ -1925,7 +1934,8 @@
> > >  #                                                 'sample-pages':
> 512} }
> > >  # <- { "return": {} }
> > >  ##
> > > -{ 'command': 'calc-dirty-rate', 'data': {'calc-time': 'int64',
> > > +{ 'command': 'calc-dirty-rate', 'data': {'*calc-time': 'int64',
> > > +                                         '*calc-time-ms': 'int64',
> > >                                           '*sample-pages': 'int',
> > >                                           '*mode':
> 'DirtyRateMeasureMode'}
> > > }
> > >
> > > diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
> > > index 594a5c0bb6..869c060941 100644
> > > --- a/migration/dirtyrate.h
> > > +++ b/migration/dirtyrate.h
> > > @@ -31,10 +31,12 @@
> > >  #define MIN_RAMBLOCK_SIZE                         128
> > >
> > >  /*
> > > - * Take 1s as minimum time for calculation duration
> > > + * Allowed range for dirty page rate calculation (in milliseconds).
> > > + * Lower limit relates to the smallest realistic downtime it
> > > + * makes sense to impose on migration.
> > >   */
> > > -#define MIN_FETCH_DIRTYRATE_TIME_SEC              1
> > > -#define MAX_FETCH_DIRTYRATE_TIME_SEC              60
> > > +#define MIN_CALC_TIME_MS                          50
> > > +#define MAX_CALC_TIME_MS                       60000
> > >
> > >  /*
> > >   * Take 1/16 pages in 1G as the maxmum sample page count
> > > @@ -44,7 +46,7 @@
> > >
> > >  struct DirtyRateConfig {
> > >      uint64_t sample_pages_per_gigabytes; /* sample pages per GB */
> > > -    int64_t sample_period_seconds; /* time duration between two
> sampling
> > > */
> > > +    int64_t calc_time_ms; /* desired calculation time (in
> milliseconds) */
> > >      DirtyRateMeasureMode mode; /* mode of dirtyrate measurement */
> > >  };
> > >
> > > @@ -73,7 +75,7 @@ typedef struct SampleVMStat {
> > >  struct DirtyRateStat {
> > >      int64_t dirty_rate; /* dirty rate in MB/s */
> > >      int64_t start_time; /* calculation start time in units of second
> */
> > > -    int64_t calc_time; /* time duration of two sampling in units of
> > > second */
> > > +    int64_t calc_time_ms; /* actual calculation time (in
> milliseconds) */
> > >      uint64_t sample_pages; /* sample pages per GB */
> > >      union {
> > >          SampleVMStat page_sampling;
> >
> > diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
> > > index 84f1b0fb20..90fb336329 100644
> > > --- a/migration/dirtyrate.c
> > > +++ b/migration/dirtyrate.c
> > > @@ -57,6 +57,8 @@ static int64_t dirty_stat_wait(int64_t msec, int64_t
> > > initial_time)
> > >          msec = current_time - initial_time;
> > >      } else {
> > >          g_usleep((msec + initial_time - current_time) * 1000);
> > > +        /* g_usleep may overshoot */
> > > +        msec = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - initial_time;
> > >
> > The optimization could be a standalone commit along with the content
> > below(see the following comment)?
> >
> OK, I will move it into separate commit.


> > >      }
> > >
> > >      return msec;
> > > @@ -77,9 +79,12 @@ static int64_t
> do_calculate_dirtyrate(DirtyPageRecord
> > > dirty_pages,
> > >  {
> > >      uint64_t increased_dirty_pages =
> > >          dirty_pages.end_pages - dirty_pages.start_pages;
> > > -    uint64_t memory_size_MiB =
> > > qemu_target_pages_to_MiB(increased_dirty_pages);
> > > -
> > > -    return memory_size_MiB * 1000 / calc_time_ms;
> > > +    /*
> > > +     * multiply by 1000ms/s _before_ converting down to megabytes
> > > +     * to avoid losing precision
> > > +     */
> > > +    return qemu_target_pages_to_MiB(increased_dirty_pages * 1000) /
> > > +        calc_time_ms;
> > >
> > Code optimization, could be in a standalone commit.
> >
> OK, but note that it is an important optimization. Imagine that
> calc_time_ms=100 and increased_dirty_pages=1000. If we compute without
> this optimization, we will get only 1000/(2^8)*1000/100=30. With
> optmization: 1000*1000/(2^8)/100=39.
>
> > >  }
> > >
> > >  void global_dirty_log_change(unsigned int flag, bool start)
> > > @@ -183,10 +188,9 @@ retry:
> > >      return duration;
> > >  }
> > >
> > > -static bool is_sample_period_valid(int64_t sec)
> > > +static bool is_calc_time_valid(int64_t msec)
> > >  {
> > > -    if (sec < MIN_FETCH_DIRTYRATE_TIME_SEC ||
> > > -        sec > MAX_FETCH_DIRTYRATE_TIME_SEC) {
> > > +    if ((msec < MIN_CALC_TIME_MS) || (msec > MAX_CALC_TIME_MS)) {
> > >          return false;
> > >      }
> > >
> > > @@ -219,7 +223,8 @@ static struct DirtyRateInfo
> > > *query_dirty_rate_info(void)
> > >
> > >      info->status = CalculatingState;
> > >      info->start_time = DirtyStat.start_time;
> > > -    info->calc_time = DirtyStat.calc_time;
> > > +    info->calc_time_ms = DirtyStat.calc_time_ms;
> > > +    info->calc_time = DirtyStat.calc_time_ms / 1000;
> > >      info->sample_pages = DirtyStat.sample_pages;
> > >      info->mode = dirtyrate_mode;
> > >
> > > @@ -258,7 +263,7 @@ static void init_dirtyrate_stat(int64_t start_time,
> > >  {
> > >      DirtyStat.dirty_rate = -1;
> > >      DirtyStat.start_time = start_time;
> > > -    DirtyStat.calc_time = config.sample_period_seconds;
> > > +    DirtyStat.calc_time_ms = config.calc_time_ms;
> > >      DirtyStat.sample_pages = config.sample_pages_per_gigabytes;
> > >
> > >      switch (config.mode) {
> > > @@ -568,7 +573,6 @@ static inline void
> dirtyrate_manual_reset_protect(void)
> > >
> > >  static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig
> > > config)
> > >  {
> > > -    int64_t msec = 0;
> > >      int64_t start_time;
> > >      DirtyPageRecord dirty_pages;
> > >
> > > @@ -596,9 +600,7 @@ static void calculate_dirtyrate_dirty_bitmap(struct
> > > DirtyRateConfig config)
> > >      start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > >      DirtyStat.start_time = start_time / 1000;
> > >
> > > -    msec = config.sample_period_seconds * 1000;
> > > -    msec = dirty_stat_wait(msec, start_time);
> > > -    DirtyStat.calc_time = msec / 1000;
> > > +    DirtyStat.calc_time_ms = dirty_stat_wait(config.calc_time_ms,
> > > start_time);
> > >
> > >      /*
> > >       * do two things.
> > > @@ -609,12 +611,12 @@ static void
> calculate_dirtyrate_dirty_bitmap(struct
> > > DirtyRateConfig config)
> > >
> > >      record_dirtypages_bitmap(&dirty_pages, false);
> > >
> > > -    DirtyStat.dirty_rate = do_calculate_dirtyrate(dirty_pages, msec);
> > > +    DirtyStat.dirty_rate = do_calculate_dirtyrate(dirty_pages,
> > > +
> DirtyStat.calc_time_ms);
> > >  }
> > >
> > >  static void calculate_dirtyrate_dirty_ring(struct DirtyRateConfig
> config)
> > >  {
> > > -    int64_t duration;
> > >      uint64_t dirtyrate = 0;
> > >      uint64_t dirtyrate_sum = 0;
> > >      int i = 0;
> > > @@ -625,12 +627,10 @@ static void calculate_dirtyrate_dirty_ring(struct
> > > DirtyRateConfig config)
> > >      DirtyStat.start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) /
> 1000;
> > >
> > >      /* calculate vcpu dirtyrate */
> > > -    duration = vcpu_calculate_dirtyrate(config.sample_period_seconds *
> > > 1000,
> > > -                                        &DirtyStat.dirty_ring,
> > > -                                        GLOBAL_DIRTY_DIRTY_RATE,
> > > -                                        true);
> > > -
> > > -    DirtyStat.calc_time = duration / 1000;
> > > +    DirtyStat.calc_time_ms =
> vcpu_calculate_dirtyrate(config.calc_time_ms,
> > > +
> > > &DirtyStat.dirty_ring,
> > > +
> > > GLOBAL_DIRTY_DIRTY_RATE,
> > > +                                                      true);
> > >
> > >      /* calculate vm dirtyrate */
> > >      for (i = 0; i < DirtyStat.dirty_ring.nvcpu; i++) {
> > > @@ -646,7 +646,6 @@ static void calculate_dirtyrate_sample_vm(struct
> > > DirtyRateConfig config)
> > >  {
> > >      struct RamblockDirtyInfo *block_dinfo = NULL;
> > >      int block_count = 0;
> > > -    int64_t msec = 0;
> > >      int64_t initial_time;
> > >
> > >      rcu_read_lock();
> > > @@ -656,17 +655,16 @@ static void calculate_dirtyrate_sample_vm(struct
> > > DirtyRateConfig config)
> > >      }
> > >      rcu_read_unlock();
> > >
> > > -    msec = config.sample_period_seconds * 1000;
> > > -    msec = dirty_stat_wait(msec, initial_time);
> > > +    DirtyStat.calc_time_ms = dirty_stat_wait(config.calc_time_ms,
> > > +                                             initial_time);
> > >      DirtyStat.start_time = initial_time / 1000;
> > > -    DirtyStat.calc_time = msec / 1000;
> > >
> > >      rcu_read_lock();
> > >      if (!compare_page_hash_info(block_dinfo, block_count)) {
> > >          goto out;
> > >      }
> > >
> > > -    update_dirtyrate(msec);
> > > +    update_dirtyrate(DirtyStat.calc_time_ms);
> > >
> > >  out:
> > >      rcu_read_unlock();
> > > @@ -711,7 +709,10 @@ void *get_dirtyrate_thread(void *arg)
> > >      return NULL;
> > >  }
> > >
> > > -void qmp_calc_dirty_rate(int64_t calc_time,
> > > +void qmp_calc_dirty_rate(bool has_calc_time,
> > > +                         int64_t calc_time,
> > > +                         bool has_calc_time_ms,
> > > +                         int64_t calc_time_ms,
> > >                           bool has_sample_pages,
> > >                           int64_t sample_pages,
> > >                           bool has_mode,
> > > @@ -731,10 +732,21 @@ void qmp_calc_dirty_rate(int64_t calc_time,
> > >          return;
> > >      }
> > >
> > > -    if (!is_sample_period_valid(calc_time)) {
> > > -        error_setg(errp, "calc-time is out of range[%d, %d].",
> > > -                         MIN_FETCH_DIRTYRATE_TIME_SEC,
> > > -                         MAX_FETCH_DIRTYRATE_TIME_SEC);
> > > +    if ((int)has_calc_time + (int)has_calc_time_ms != 1) {
> > > +        error_setg(errp, "Exactly one of calc-time and calc-time-ms
> must"
> > > +                         " be specified");
> > > +        return;
> > > +    }
> > > +    if (has_calc_time) {
> > > +        /*
> > > +         * The worst thing that can happen due to overflow is that
> > > +         * invalid value will become valid.
> > > +         */
> > > +        calc_time_ms = calc_time * 1000;
> > > +    }
> > > +    if (!is_calc_time_valid(calc_time_ms)) {
> > > +        error_setg(errp, "Calculation time is out of range[%dms,
> %dms].",
> > > +                         MIN_CALC_TIME_MS, MAX_CALC_TIME_MS);
> > >          return;
> > >      }
> > >
> > > @@ -781,7 +793,7 @@ void qmp_calc_dirty_rate(int64_t calc_time,
> > >          return;
> > >      }
> > >
> > > -    config.sample_period_seconds = calc_time;
> > > +    config.calc_time_ms = calc_time_ms;
> > >      config.sample_pages_per_gigabytes = sample_pages;
> > >      config.mode = mode;
> > >
> > > @@ -867,8 +879,11 @@ void hmp_calc_dirty_rate(Monitor *mon, const QDict
> > > *qdict)
> > >          mode = DIRTY_RATE_MEASURE_MODE_DIRTY_RING;
> > >      }
> > >
> > > -    qmp_calc_dirty_rate(sec, has_sample_pages, sample_pages, true,
> > > -                        mode, &err);
> > > +    qmp_calc_dirty_rate(true, sec, /* calc_time */
> > > +                        false, 0, /* calc_time_ms */
> > > +                        has_sample_pages, sample_pages,
> > > +                        true, mode,
> > > +                        &err);
> > >      if (err) {
> > >          hmp_handle_error(mon, err);
> > >          return;
> > > --
> > > 2.30.2
> > >
> > > The patch set works for me, and I'm inclined to split it into two
> commits
> > as I point out above.
> >
> > Thanks
> >
> > Yong
> >
> > --
> > Best regards
>


-- 
Best regards
Re: [PATCH v2] migration/calc-dirty-rate: millisecond-granularity period
Posted by gudkov.andrei--- via 8 months, 3 weeks ago
On Sat, Aug 26, 2023 at 04:32:01PM +0800, Yong Huang wrote:
> Hi, Andrei. I'm preparing a patchset for a pull request.
> 
> For this patch, would you mind if I?
> 1. Generate a single patch from the optimization partition.
> 2. In the patch above, include my comment and my "Reviewed-by".
> 3. Add it to the queue with other patchsets.
> 
> Then you can improve on top at your leisure, Would that work for you?
> 
Yes, sure!

Sorry for the delay. I hope to make discussed improvements till the
end of this week.

> On Wed, Aug 9, 2023 at 11:03 PM <gudkov.andrei@huawei.com> wrote:
> 
> > On Sun, Aug 06, 2023 at 02:16:34PM +0800, Yong Huang wrote:
> > > On Fri, Aug 4, 2023 at 11:03 PM Andrei Gudkov <gudkov.andrei@huawei.com>
> > > wrote:
> > >
> > > > Introduces alternative argument calc-time-ms, which is the
> > > > the same as calc-time but accepts millisecond value.
> > > > Millisecond granularity allows to make predictions whether
> > > > migration will succeed or not. To do this, calculate dirty
> > > > rate with calc-time-ms set to max allowed downtime, convert
> > > > measured rate into volume of dirtied memory, and divide by
> > > > network throughput. If the value is lower than max allowed
> > > >
> > > Not for the patch, I'm just curious about how the predication
> > > decides the network throughput, I mean QEMU predicts
> > > if migration will converge based on how fast it sends the data,
> > > not the actual bandwidth of the interface, which one the
> > > prediction uses?
> > >
> > Currently I use network nominal bandwidth, e.g. 1gbps. It would
> > be nice, of course, to use measured throughput since it can take
> > into account network headers overhead (as Wang Lei previously
> > mentioned), compression, etc., but it looks too complicated to
> > implement outside of migration process.
> >
> > > > downtime, then migration will converge.
> > > >
> > > > Measurement results for single thread randomly writing to
> > > > a 1/4/24GiB memory region:
> > > >
> > > > +--------------+-----------------------------------------------+
> > > > | calc-time-ms |                dirty rate MiB/s               |
> > > > |              +----------------+---------------+--------------+
> > > > |              | theoretical    | page-sampling | dirty-bitmap |
> > > > |              | (at 3M wr/sec) |               |              |
> > > > +--------------+----------------+---------------+--------------+
> > > > |                             1GiB                             |
> > > > +--------------+----------------+---------------+--------------+
> > > > |          100 |           6996 |          7100 |         3192 |
> > > > |          200 |           4606 |          4660 |         2655 |
> > > > |          300 |           3305 |          3280 |         2371 |
> > > > |          400 |           2534 |          2525 |         2154 |
> > > > |          500 |           2041 |          2044 |         1871 |
> > > > |          750 |           1365 |          1341 |         1358 |
> > > > |         1000 |           1024 |          1052 |         1025 |
> > > > |         1500 |            683 |           678 |          684 |
> > > > |         2000 |            512 |           507 |          513 |
> > > > +--------------+----------------+---------------+--------------+
> > > > |                             4GiB                             |
> > > > +--------------+----------------+---------------+--------------+
> > > > |          100 |          10232 |          8880 |         4070 |
> > > > |          200 |           8954 |          8049 |         3195 |
> > > > |          300 |           7889 |          7193 |         2881 |
> > > > |          400 |           6996 |          6530 |         2700 |
> > > > |          500 |           6245 |          5772 |         2312 |
> > > > |          750 |           4829 |          4586 |         2465 |
> > > > |         1000 |           3865 |          3780 |         2178 |
> > > > |         1500 |           2694 |          2633 |         2004 |
> > > > |         2000 |           2041 |          2031 |         1789 |
> > > > +--------------+----------------+---------------+--------------+
> > > > |                             24GiB                            |
> > > > +--------------+----------------+---------------+--------------+
> > > > |          100 |          11495 |          8640 |         5597 |
> > > > |          200 |          11226 |          8616 |         3527 |
> > > > |          300 |          10965 |          8386 |         2355 |
> > > > |          400 |          10713 |          8370 |         2179 |
> > > > |          500 |          10469 |          8196 |         2098 |
> > > > |          750 |           9890 |          7885 |         2556 |
> > > > |         1000 |           9354 |          7506 |         2084 |
> > > > |         1500 |           8397 |          6944 |         2075 |
> > > > |         2000 |           7574 |          6402 |         2062 |
> > > > +--------------+----------------+---------------+--------------+
> > > >
> > > > Theoretical values are computed according to the following formula:
> > > > size * (1 - (1-(4096/size))^(time*wps)) / (time * 2^20),
> > > > where size is in bytes, time is in seconds, and wps is number of
> > > > writes per second.
> > > >
> > > > Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
> > > > ---
> > > >  qapi/migration.json   | 14 ++++++--
> > > >  migration/dirtyrate.h | 12 ++++---
> > > >  migration/dirtyrate.c | 81 +++++++++++++++++++++++++------------------
> > > >  3 files changed, 67 insertions(+), 40 deletions(-)
> > > >
> > > > [...]
> > >
> > > > diff --git a/qapi/migration.json b/qapi/migration.json
> > > > index 8843e74b59..82493d6a57 100644
> > > > --- a/qapi/migration.json
> > > > +++ b/qapi/migration.json
> > > > @@ -1849,7 +1849,11 @@
> > > >  # @start-time: start time in units of second for calculation
> > > >  #
> > > >  # @calc-time: time period for which dirty page rate was measured
> > > > -#     (in seconds)
> > > > +#     (rounded down to seconds).
> > > >
> > > Does there need an extra comment to emphasize that calc-time shows
> > > zero if the calc-time-ms is lower than 1000?
> > >
> > > > +#
> > > > +# @calc-time-ms: actual time period for which dirty page rate was
> > > > +#     measured (in milliseconds).  Value may be larger than requested
> > > > +#     time period due to measurement overhead.
> > > >  #
> > > >  # @sample-pages: number of sampled pages per GiB of guest memory.
> > > >  #     Valid only in page-sampling mode (Since 6.1)
> > > > @@ -1866,6 +1870,7 @@
> > > >             'status': 'DirtyRateStatus',
> > > >             'start-time': 'int64',
> > > >             'calc-time': 'int64',
> > > > +           'calc-time-ms': 'int64',
> > > >             'sample-pages': 'uint64',
> > > >             'mode': 'DirtyRateMeasureMode',
> > > >             '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } }
> > > > @@ -1908,6 +1913,10 @@
> > > >  #     dirty during @calc-time period, further writes to this page will
> > > >  #     not increase dirty page rate anymore.
> > > >  #
> > > > +# @calc-time-ms: the same as @calc-time but in milliseconds.  These
> > > > +#    two arguments are mutually exclusive.  Exactly one of them must
> > > > +#    be specified. (Since 8.1)
> > > > +#
> > > >  # @sample-pages: number of sampled pages per each GiB of guest memory.
> > > >  #     Default value is 512.  For 4KiB guest pages this corresponds to
> > > >  #     sampling ratio of 0.2%.  This argument is used only in page
> > > > @@ -1925,7 +1934,8 @@
> > > >  #                                                 'sample-pages':
> > 512} }
> > > >  # <- { "return": {} }
> > > >  ##
> > > > -{ 'command': 'calc-dirty-rate', 'data': {'calc-time': 'int64',
> > > > +{ 'command': 'calc-dirty-rate', 'data': {'*calc-time': 'int64',
> > > > +                                         '*calc-time-ms': 'int64',
> > > >                                           '*sample-pages': 'int',
> > > >                                           '*mode':
> > 'DirtyRateMeasureMode'}
> > > > }
> > > >
> > > > diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
> > > > index 594a5c0bb6..869c060941 100644
> > > > --- a/migration/dirtyrate.h
> > > > +++ b/migration/dirtyrate.h
> > > > @@ -31,10 +31,12 @@
> > > >  #define MIN_RAMBLOCK_SIZE                         128
> > > >
> > > >  /*
> > > > - * Take 1s as minimum time for calculation duration
> > > > + * Allowed range for dirty page rate calculation (in milliseconds).
> > > > + * Lower limit relates to the smallest realistic downtime it
> > > > + * makes sense to impose on migration.
> > > >   */
> > > > -#define MIN_FETCH_DIRTYRATE_TIME_SEC              1
> > > > -#define MAX_FETCH_DIRTYRATE_TIME_SEC              60
> > > > +#define MIN_CALC_TIME_MS                          50
> > > > +#define MAX_CALC_TIME_MS                       60000
> > > >
> > > >  /*
> > > >   * Take 1/16 pages in 1G as the maxmum sample page count
> > > > @@ -44,7 +46,7 @@
> > > >
> > > >  struct DirtyRateConfig {
> > > >      uint64_t sample_pages_per_gigabytes; /* sample pages per GB */
> > > > -    int64_t sample_period_seconds; /* time duration between two
> > sampling
> > > > */
> > > > +    int64_t calc_time_ms; /* desired calculation time (in
> > milliseconds) */
> > > >      DirtyRateMeasureMode mode; /* mode of dirtyrate measurement */
> > > >  };
> > > >
> > > > @@ -73,7 +75,7 @@ typedef struct SampleVMStat {
> > > >  struct DirtyRateStat {
> > > >      int64_t dirty_rate; /* dirty rate in MB/s */
> > > >      int64_t start_time; /* calculation start time in units of second
> > */
> > > > -    int64_t calc_time; /* time duration of two sampling in units of
> > > > second */
> > > > +    int64_t calc_time_ms; /* actual calculation time (in
> > milliseconds) */
> > > >      uint64_t sample_pages; /* sample pages per GB */
> > > >      union {
> > > >          SampleVMStat page_sampling;
> > >
> > > diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
> > > > index 84f1b0fb20..90fb336329 100644
> > > > --- a/migration/dirtyrate.c
> > > > +++ b/migration/dirtyrate.c
> > > > @@ -57,6 +57,8 @@ static int64_t dirty_stat_wait(int64_t msec, int64_t
> > > > initial_time)
> > > >          msec = current_time - initial_time;
> > > >      } else {
> > > >          g_usleep((msec + initial_time - current_time) * 1000);
> > > > +        /* g_usleep may overshoot */
> > > > +        msec = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - initial_time;
> > > >
> > > The optimization could be a standalone commit along with the content
> > > below(see the following comment)?
> > >
> > OK, I will move it into separate commit.
> 
> 
> > > >      }
> > > >
> > > >      return msec;
> > > > @@ -77,9 +79,12 @@ static int64_t
> > do_calculate_dirtyrate(DirtyPageRecord
> > > > dirty_pages,
> > > >  {
> > > >      uint64_t increased_dirty_pages =
> > > >          dirty_pages.end_pages - dirty_pages.start_pages;
> > > > -    uint64_t memory_size_MiB =
> > > > qemu_target_pages_to_MiB(increased_dirty_pages);
> > > > -
> > > > -    return memory_size_MiB * 1000 / calc_time_ms;
> > > > +    /*
> > > > +     * multiply by 1000ms/s _before_ converting down to megabytes
> > > > +     * to avoid losing precision
> > > > +     */
> > > > +    return qemu_target_pages_to_MiB(increased_dirty_pages * 1000) /
> > > > +        calc_time_ms;
> > > >
> > > Code optimization, could be in a standalone commit.
> > >
> > OK, but note that it is an important optimization. Imagine that
> > calc_time_ms=100 and increased_dirty_pages=1000. If we compute without
> > this optimization, we will get only 1000/(2^8)*1000/100=30. With
> > optmization: 1000*1000/(2^8)/100=39.
> >
> > > >  }
> > > >
> > > >  void global_dirty_log_change(unsigned int flag, bool start)
> > > > @@ -183,10 +188,9 @@ retry:
> > > >      return duration;
> > > >  }
> > > >
> > > > -static bool is_sample_period_valid(int64_t sec)
> > > > +static bool is_calc_time_valid(int64_t msec)
> > > >  {
> > > > -    if (sec < MIN_FETCH_DIRTYRATE_TIME_SEC ||
> > > > -        sec > MAX_FETCH_DIRTYRATE_TIME_SEC) {
> > > > +    if ((msec < MIN_CALC_TIME_MS) || (msec > MAX_CALC_TIME_MS)) {
> > > >          return false;
> > > >      }
> > > >
> > > > @@ -219,7 +223,8 @@ static struct DirtyRateInfo
> > > > *query_dirty_rate_info(void)
> > > >
> > > >      info->status = CalculatingState;
> > > >      info->start_time = DirtyStat.start_time;
> > > > -    info->calc_time = DirtyStat.calc_time;
> > > > +    info->calc_time_ms = DirtyStat.calc_time_ms;
> > > > +    info->calc_time = DirtyStat.calc_time_ms / 1000;
> > > >      info->sample_pages = DirtyStat.sample_pages;
> > > >      info->mode = dirtyrate_mode;
> > > >
> > > > @@ -258,7 +263,7 @@ static void init_dirtyrate_stat(int64_t start_time,
> > > >  {
> > > >      DirtyStat.dirty_rate = -1;
> > > >      DirtyStat.start_time = start_time;
> > > > -    DirtyStat.calc_time = config.sample_period_seconds;
> > > > +    DirtyStat.calc_time_ms = config.calc_time_ms;
> > > >      DirtyStat.sample_pages = config.sample_pages_per_gigabytes;
> > > >
> > > >      switch (config.mode) {
> > > > @@ -568,7 +573,6 @@ static inline void
> > dirtyrate_manual_reset_protect(void)
> > > >
> > > >  static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig
> > > > config)
> > > >  {
> > > > -    int64_t msec = 0;
> > > >      int64_t start_time;
> > > >      DirtyPageRecord dirty_pages;
> > > >
> > > > @@ -596,9 +600,7 @@ static void calculate_dirtyrate_dirty_bitmap(struct
> > > > DirtyRateConfig config)
> > > >      start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > >      DirtyStat.start_time = start_time / 1000;
> > > >
> > > > -    msec = config.sample_period_seconds * 1000;
> > > > -    msec = dirty_stat_wait(msec, start_time);
> > > > -    DirtyStat.calc_time = msec / 1000;
> > > > +    DirtyStat.calc_time_ms = dirty_stat_wait(config.calc_time_ms,
> > > > start_time);
> > > >
> > > >      /*
> > > >       * do two things.
> > > > @@ -609,12 +611,12 @@ static void
> > calculate_dirtyrate_dirty_bitmap(struct
> > > > DirtyRateConfig config)
> > > >
> > > >      record_dirtypages_bitmap(&dirty_pages, false);
> > > >
> > > > -    DirtyStat.dirty_rate = do_calculate_dirtyrate(dirty_pages, msec);
> > > > +    DirtyStat.dirty_rate = do_calculate_dirtyrate(dirty_pages,
> > > > +
> > DirtyStat.calc_time_ms);
> > > >  }
> > > >
> > > >  static void calculate_dirtyrate_dirty_ring(struct DirtyRateConfig
> > config)
> > > >  {
> > > > -    int64_t duration;
> > > >      uint64_t dirtyrate = 0;
> > > >      uint64_t dirtyrate_sum = 0;
> > > >      int i = 0;
> > > > @@ -625,12 +627,10 @@ static void calculate_dirtyrate_dirty_ring(struct
> > > > DirtyRateConfig config)
> > > >      DirtyStat.start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) /
> > 1000;
> > > >
> > > >      /* calculate vcpu dirtyrate */
> > > > -    duration = vcpu_calculate_dirtyrate(config.sample_period_seconds *
> > > > 1000,
> > > > -                                        &DirtyStat.dirty_ring,
> > > > -                                        GLOBAL_DIRTY_DIRTY_RATE,
> > > > -                                        true);
> > > > -
> > > > -    DirtyStat.calc_time = duration / 1000;
> > > > +    DirtyStat.calc_time_ms =
> > vcpu_calculate_dirtyrate(config.calc_time_ms,
> > > > +
> > > > &DirtyStat.dirty_ring,
> > > > +
> > > > GLOBAL_DIRTY_DIRTY_RATE,
> > > > +                                                      true);
> > > >
> > > >      /* calculate vm dirtyrate */
> > > >      for (i = 0; i < DirtyStat.dirty_ring.nvcpu; i++) {
> > > > @@ -646,7 +646,6 @@ static void calculate_dirtyrate_sample_vm(struct
> > > > DirtyRateConfig config)
> > > >  {
> > > >      struct RamblockDirtyInfo *block_dinfo = NULL;
> > > >      int block_count = 0;
> > > > -    int64_t msec = 0;
> > > >      int64_t initial_time;
> > > >
> > > >      rcu_read_lock();
> > > > @@ -656,17 +655,16 @@ static void calculate_dirtyrate_sample_vm(struct
> > > > DirtyRateConfig config)
> > > >      }
> > > >      rcu_read_unlock();
> > > >
> > > > -    msec = config.sample_period_seconds * 1000;
> > > > -    msec = dirty_stat_wait(msec, initial_time);
> > > > +    DirtyStat.calc_time_ms = dirty_stat_wait(config.calc_time_ms,
> > > > +                                             initial_time);
> > > >      DirtyStat.start_time = initial_time / 1000;
> > > > -    DirtyStat.calc_time = msec / 1000;
> > > >
> > > >      rcu_read_lock();
> > > >      if (!compare_page_hash_info(block_dinfo, block_count)) {
> > > >          goto out;
> > > >      }
> > > >
> > > > -    update_dirtyrate(msec);
> > > > +    update_dirtyrate(DirtyStat.calc_time_ms);
> > > >
> > > >  out:
> > > >      rcu_read_unlock();
> > > > @@ -711,7 +709,10 @@ void *get_dirtyrate_thread(void *arg)
> > > >      return NULL;
> > > >  }
> > > >
> > > > -void qmp_calc_dirty_rate(int64_t calc_time,
> > > > +void qmp_calc_dirty_rate(bool has_calc_time,
> > > > +                         int64_t calc_time,
> > > > +                         bool has_calc_time_ms,
> > > > +                         int64_t calc_time_ms,
> > > >                           bool has_sample_pages,
> > > >                           int64_t sample_pages,
> > > >                           bool has_mode,
> > > > @@ -731,10 +732,21 @@ void qmp_calc_dirty_rate(int64_t calc_time,
> > > >          return;
> > > >      }
> > > >
> > > > -    if (!is_sample_period_valid(calc_time)) {
> > > > -        error_setg(errp, "calc-time is out of range[%d, %d].",
> > > > -                         MIN_FETCH_DIRTYRATE_TIME_SEC,
> > > > -                         MAX_FETCH_DIRTYRATE_TIME_SEC);
> > > > +    if ((int)has_calc_time + (int)has_calc_time_ms != 1) {
> > > > +        error_setg(errp, "Exactly one of calc-time and calc-time-ms
> > must"
> > > > +                         " be specified");
> > > > +        return;
> > > > +    }
> > > > +    if (has_calc_time) {
> > > > +        /*
> > > > +         * The worst thing that can happen due to overflow is that
> > > > +         * invalid value will become valid.
> > > > +         */
> > > > +        calc_time_ms = calc_time * 1000;
> > > > +    }
> > > > +    if (!is_calc_time_valid(calc_time_ms)) {
> > > > +        error_setg(errp, "Calculation time is out of range[%dms,
> > %dms].",
> > > > +                         MIN_CALC_TIME_MS, MAX_CALC_TIME_MS);
> > > >          return;
> > > >      }
> > > >
> > > > @@ -781,7 +793,7 @@ void qmp_calc_dirty_rate(int64_t calc_time,
> > > >          return;
> > > >      }
> > > >
> > > > -    config.sample_period_seconds = calc_time;
> > > > +    config.calc_time_ms = calc_time_ms;
> > > >      config.sample_pages_per_gigabytes = sample_pages;
> > > >      config.mode = mode;
> > > >
> > > > @@ -867,8 +879,11 @@ void hmp_calc_dirty_rate(Monitor *mon, const QDict
> > > > *qdict)
> > > >          mode = DIRTY_RATE_MEASURE_MODE_DIRTY_RING;
> > > >      }
> > > >
> > > > -    qmp_calc_dirty_rate(sec, has_sample_pages, sample_pages, true,
> > > > -                        mode, &err);
> > > > +    qmp_calc_dirty_rate(true, sec, /* calc_time */
> > > > +                        false, 0, /* calc_time_ms */
> > > > +                        has_sample_pages, sample_pages,
> > > > +                        true, mode,
> > > > +                        &err);
> > > >      if (err) {
> > > >          hmp_handle_error(mon, err);
> > > >          return;
> > > > --
> > > > 2.30.2
> > > >
> > > > The patch set works for me, and I'm inclined to split it into two
> > commits
> > > as I point out above.
> > >
> > > Thanks
> > >
> > > Yong
> > >
> > > --
> > > Best regards
> >
> 
> 
> -- 
> Best regards

Re: [PATCH v2] migration/calc-dirty-rate: millisecond-granularity period
Posted by Yong Huang 9 months, 1 week ago
On Wed, Aug 9, 2023 at 11:03 PM <gudkov.andrei@huawei.com> wrote:

> On Sun, Aug 06, 2023 at 02:16:34PM +0800, Yong Huang wrote:
> > On Fri, Aug 4, 2023 at 11:03 PM Andrei Gudkov <gudkov.andrei@huawei.com>
> > wrote:
> >
> > > Introduces alternative argument calc-time-ms, which is the
> > > the same as calc-time but accepts millisecond value.
> > > Millisecond granularity allows to make predictions whether
> > > migration will succeed or not. To do this, calculate dirty
> > > rate with calc-time-ms set to max allowed downtime, convert
> > > measured rate into volume of dirtied memory, and divide by
> > > network throughput. If the value is lower than max allowed
> > >
> > Not for the patch, I'm just curious about how the predication
> > decides the network throughput, I mean QEMU predicts
> > if migration will converge based on how fast it sends the data,
> > not the actual bandwidth of the interface, which one the
> > prediction uses?
> >
> Currently I use network nominal bandwidth, e.g. 1gbps. It would
> be nice, of course, to use measured throughput since it can take
> into account network headers overhead (as Wang Lei previously
> mentioned), compression, etc., but it looks too complicated to
> implement outside of migration process.
>
> > > downtime, then migration will converge.
> > >
> > > Measurement results for single thread randomly writing to
> > > a 1/4/24GiB memory region:
> > >
> > > +--------------+-----------------------------------------------+
> > > | calc-time-ms |                dirty rate MiB/s               |
> > > |              +----------------+---------------+--------------+
> > > |              | theoretical    | page-sampling | dirty-bitmap |
> > > |              | (at 3M wr/sec) |               |              |
> > > +--------------+----------------+---------------+--------------+
> > > |                             1GiB                             |
> > > +--------------+----------------+---------------+--------------+
> > > |          100 |           6996 |          7100 |         3192 |
> > > |          200 |           4606 |          4660 |         2655 |
> > > |          300 |           3305 |          3280 |         2371 |
> > > |          400 |           2534 |          2525 |         2154 |
> > > |          500 |           2041 |          2044 |         1871 |
> > > |          750 |           1365 |          1341 |         1358 |
> > > |         1000 |           1024 |          1052 |         1025 |
> > > |         1500 |            683 |           678 |          684 |
> > > |         2000 |            512 |           507 |          513 |
> > > +--------------+----------------+---------------+--------------+
> > > |                             4GiB                             |
> > > +--------------+----------------+---------------+--------------+
> > > |          100 |          10232 |          8880 |         4070 |
> > > |          200 |           8954 |          8049 |         3195 |
> > > |          300 |           7889 |          7193 |         2881 |
> > > |          400 |           6996 |          6530 |         2700 |
> > > |          500 |           6245 |          5772 |         2312 |
> > > |          750 |           4829 |          4586 |         2465 |
> > > |         1000 |           3865 |          3780 |         2178 |
> > > |         1500 |           2694 |          2633 |         2004 |
> > > |         2000 |           2041 |          2031 |         1789 |
> > > +--------------+----------------+---------------+--------------+
> > > |                             24GiB                            |
> > > +--------------+----------------+---------------+--------------+
> > > |          100 |          11495 |          8640 |         5597 |
> > > |          200 |          11226 |          8616 |         3527 |
> > > |          300 |          10965 |          8386 |         2355 |
> > > |          400 |          10713 |          8370 |         2179 |
> > > |          500 |          10469 |          8196 |         2098 |
> > > |          750 |           9890 |          7885 |         2556 |
> > > |         1000 |           9354 |          7506 |         2084 |
> > > |         1500 |           8397 |          6944 |         2075 |
> > > |         2000 |           7574 |          6402 |         2062 |
> > > +--------------+----------------+---------------+--------------+
> > >
> > > Theoretical values are computed according to the following formula:
> > > size * (1 - (1-(4096/size))^(time*wps)) / (time * 2^20),
> > > where size is in bytes, time is in seconds, and wps is number of
> > > writes per second.
> > >
> > > Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
> > > ---
> > >  qapi/migration.json   | 14 ++++++--
> > >  migration/dirtyrate.h | 12 ++++---
> > >  migration/dirtyrate.c | 81 +++++++++++++++++++++++++------------------
> > >  3 files changed, 67 insertions(+), 40 deletions(-)
> > >
> > > [...]
> >
> > > diff --git a/qapi/migration.json b/qapi/migration.json
> > > index 8843e74b59..82493d6a57 100644
> > > --- a/qapi/migration.json
> > > +++ b/qapi/migration.json
> > > @@ -1849,7 +1849,11 @@
> > >  # @start-time: start time in units of second for calculation
> > >  #
> > >  # @calc-time: time period for which dirty page rate was measured
> > > -#     (in seconds)
> > > +#     (rounded down to seconds).
> > >
> > Does there need an extra comment to emphasize that calc-time shows
> > zero if the calc-time-ms is lower than 1000?
> >
> > > +#
> > > +# @calc-time-ms: actual time period for which dirty page rate was
> > > +#     measured (in milliseconds).  Value may be larger than requested
> > > +#     time period due to measurement overhead.
> > >  #
> > >  # @sample-pages: number of sampled pages per GiB of guest memory.
> > >  #     Valid only in page-sampling mode (Since 6.1)
> > > @@ -1866,6 +1870,7 @@
> > >             'status': 'DirtyRateStatus',
> > >             'start-time': 'int64',
> > >             'calc-time': 'int64',
> > > +           'calc-time-ms': 'int64',
> > >             'sample-pages': 'uint64',
> > >             'mode': 'DirtyRateMeasureMode',
> > >             '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } }
> > > @@ -1908,6 +1913,10 @@
> > >  #     dirty during @calc-time period, further writes to this page will
> > >  #     not increase dirty page rate anymore.
> > >  #
> > > +# @calc-time-ms: the same as @calc-time but in milliseconds.  These
> > > +#    two arguments are mutually exclusive.  Exactly one of them must
> > > +#    be specified. (Since 8.1)
> > > +#
> > >  # @sample-pages: number of sampled pages per each GiB of guest memory.
> > >  #     Default value is 512.  For 4KiB guest pages this corresponds to
> > >  #     sampling ratio of 0.2%.  This argument is used only in page
> > > @@ -1925,7 +1934,8 @@
> > >  #                                                 'sample-pages':
> 512} }
> > >  # <- { "return": {} }
> > >  ##
> > > -{ 'command': 'calc-dirty-rate', 'data': {'calc-time': 'int64',
> > > +{ 'command': 'calc-dirty-rate', 'data': {'*calc-time': 'int64',
> > > +                                         '*calc-time-ms': 'int64',
> > >                                           '*sample-pages': 'int',
> > >                                           '*mode':
> 'DirtyRateMeasureMode'}
> > > }
> > >
> > > diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
> > > index 594a5c0bb6..869c060941 100644
> > > --- a/migration/dirtyrate.h
> > > +++ b/migration/dirtyrate.h
> > > @@ -31,10 +31,12 @@
> > >  #define MIN_RAMBLOCK_SIZE                         128
> > >
> > >  /*
> > > - * Take 1s as minimum time for calculation duration
> > > + * Allowed range for dirty page rate calculation (in milliseconds).
> > > + * Lower limit relates to the smallest realistic downtime it
> > > + * makes sense to impose on migration.
> > >   */
> > > -#define MIN_FETCH_DIRTYRATE_TIME_SEC              1
> > > -#define MAX_FETCH_DIRTYRATE_TIME_SEC              60
> > > +#define MIN_CALC_TIME_MS                          50
> > > +#define MAX_CALC_TIME_MS                       60000
> > >
> > >  /*
> > >   * Take 1/16 pages in 1G as the maxmum sample page count
> > > @@ -44,7 +46,7 @@
> > >
> > >  struct DirtyRateConfig {
> > >      uint64_t sample_pages_per_gigabytes; /* sample pages per GB */
> > > -    int64_t sample_period_seconds; /* time duration between two
> sampling
> > > */
> > > +    int64_t calc_time_ms; /* desired calculation time (in
> milliseconds) */
> > >      DirtyRateMeasureMode mode; /* mode of dirtyrate measurement */
> > >  };
> > >
> > > @@ -73,7 +75,7 @@ typedef struct SampleVMStat {
> > >  struct DirtyRateStat {
> > >      int64_t dirty_rate; /* dirty rate in MB/s */
> > >      int64_t start_time; /* calculation start time in units of second
> */
> > > -    int64_t calc_time; /* time duration of two sampling in units of
> > > second */
> > > +    int64_t calc_time_ms; /* actual calculation time (in
> milliseconds) */
> > >      uint64_t sample_pages; /* sample pages per GB */
> > >      union {
> > >          SampleVMStat page_sampling;
> >
> > diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
> > > index 84f1b0fb20..90fb336329 100644
> > > --- a/migration/dirtyrate.c
> > > +++ b/migration/dirtyrate.c
> > > @@ -57,6 +57,8 @@ static int64_t dirty_stat_wait(int64_t msec, int64_t
> > > initial_time)
> > >          msec = current_time - initial_time;
> > >      } else {
> > >          g_usleep((msec + initial_time - current_time) * 1000);
> > > +        /* g_usleep may overshoot */
> > > +        msec = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - initial_time;
> > >
> > The optimization could be a standalone commit along with the content
> > below(see the following comment)?
> >
> OK, I will move it into separate commit.
>
> > >      }
> > >
> > >      return msec;
> > > @@ -77,9 +79,12 @@ static int64_t
> do_calculate_dirtyrate(DirtyPageRecord
> > > dirty_pages,
> > >  {
> > >      uint64_t increased_dirty_pages =
> > >          dirty_pages.end_pages - dirty_pages.start_pages;
> > > -    uint64_t memory_size_MiB =
> > > qemu_target_pages_to_MiB(increased_dirty_pages);
> > > -
> > > -    return memory_size_MiB * 1000 / calc_time_ms;
> > > +    /*
> > > +     * multiply by 1000ms/s _before_ converting down to megabytes
> > > +     * to avoid losing precision
> > > +     */
> > > +    return qemu_target_pages_to_MiB(increased_dirty_pages * 1000) /
> > > +        calc_time_ms;
> > >
> > Code optimization, could be in a standalone commit.
> >
> OK, but note that it is an important optimization. Imagine that
> calc_time_ms=100 and increased_dirty_pages=1000. If we compute without
> this optimization, we will get only 1000/(2^8)*1000/100=30. With
> optmization: 1000*1000/(2^8)/100=39.
>
> Good work, and thanks for this patch. :)


> > >  }
> > >
> > >  void global_dirty_log_change(unsigned int flag, bool start)
> > > @@ -183,10 +188,9 @@ retry:
> > >      return duration;
> > >  }
> > >
> > > -static bool is_sample_period_valid(int64_t sec)
> > > +static bool is_calc_time_valid(int64_t msec)
> > >  {
> > > -    if (sec < MIN_FETCH_DIRTYRATE_TIME_SEC ||
> > > -        sec > MAX_FETCH_DIRTYRATE_TIME_SEC) {
> > > +    if ((msec < MIN_CALC_TIME_MS) || (msec > MAX_CALC_TIME_MS)) {
> > >          return false;
> > >      }
> > >
> > > @@ -219,7 +223,8 @@ static struct DirtyRateInfo
> > > *query_dirty_rate_info(void)
> > >
> > >      info->status = CalculatingState;
> > >      info->start_time = DirtyStat.start_time;
> > > -    info->calc_time = DirtyStat.calc_time;
> > > +    info->calc_time_ms = DirtyStat.calc_time_ms;
> > > +    info->calc_time = DirtyStat.calc_time_ms / 1000;
> > >      info->sample_pages = DirtyStat.sample_pages;
> > >      info->mode = dirtyrate_mode;
> > >
> > > @@ -258,7 +263,7 @@ static void init_dirtyrate_stat(int64_t start_time,
> > >  {
> > >      DirtyStat.dirty_rate = -1;
> > >      DirtyStat.start_time = start_time;
> > > -    DirtyStat.calc_time = config.sample_period_seconds;
> > > +    DirtyStat.calc_time_ms = config.calc_time_ms;
> > >      DirtyStat.sample_pages = config.sample_pages_per_gigabytes;
> > >
> > >      switch (config.mode) {
> > > @@ -568,7 +573,6 @@ static inline void
> dirtyrate_manual_reset_protect(void)
> > >
> > >  static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig
> > > config)
> > >  {
> > > -    int64_t msec = 0;
> > >      int64_t start_time;
> > >      DirtyPageRecord dirty_pages;
> > >
> > > @@ -596,9 +600,7 @@ static void calculate_dirtyrate_dirty_bitmap(struct
> > > DirtyRateConfig config)
> > >      start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > >      DirtyStat.start_time = start_time / 1000;
> > >
> > > -    msec = config.sample_period_seconds * 1000;
> > > -    msec = dirty_stat_wait(msec, start_time);
> > > -    DirtyStat.calc_time = msec / 1000;
> > > +    DirtyStat.calc_time_ms = dirty_stat_wait(config.calc_time_ms,
> > > start_time);
> > >
> > >      /*
> > >       * do two things.
> > > @@ -609,12 +611,12 @@ static void
> calculate_dirtyrate_dirty_bitmap(struct
> > > DirtyRateConfig config)
> > >
> > >      record_dirtypages_bitmap(&dirty_pages, false);
> > >
> > > -    DirtyStat.dirty_rate = do_calculate_dirtyrate(dirty_pages, msec);
> > > +    DirtyStat.dirty_rate = do_calculate_dirtyrate(dirty_pages,
> > > +
> DirtyStat.calc_time_ms);
> > >  }
> > >
> > >  static void calculate_dirtyrate_dirty_ring(struct DirtyRateConfig
> config)
> > >  {
> > > -    int64_t duration;
> > >      uint64_t dirtyrate = 0;
> > >      uint64_t dirtyrate_sum = 0;
> > >      int i = 0;
> > > @@ -625,12 +627,10 @@ static void calculate_dirtyrate_dirty_ring(struct
> > > DirtyRateConfig config)
> > >      DirtyStat.start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) /
> 1000;
> > >
> > >      /* calculate vcpu dirtyrate */
> > > -    duration = vcpu_calculate_dirtyrate(config.sample_period_seconds *
> > > 1000,
> > > -                                        &DirtyStat.dirty_ring,
> > > -                                        GLOBAL_DIRTY_DIRTY_RATE,
> > > -                                        true);
> > > -
> > > -    DirtyStat.calc_time = duration / 1000;
> > > +    DirtyStat.calc_time_ms =
> vcpu_calculate_dirtyrate(config.calc_time_ms,
> > > +
> > > &DirtyStat.dirty_ring,
> > > +
> > > GLOBAL_DIRTY_DIRTY_RATE,
> > > +                                                      true);
> > >
> > >      /* calculate vm dirtyrate */
> > >      for (i = 0; i < DirtyStat.dirty_ring.nvcpu; i++) {
> > > @@ -646,7 +646,6 @@ static void calculate_dirtyrate_sample_vm(struct
> > > DirtyRateConfig config)
> > >  {
> > >      struct RamblockDirtyInfo *block_dinfo = NULL;
> > >      int block_count = 0;
> > > -    int64_t msec = 0;
> > >      int64_t initial_time;
> > >
> > >      rcu_read_lock();
> > > @@ -656,17 +655,16 @@ static void calculate_dirtyrate_sample_vm(struct
> > > DirtyRateConfig config)
> > >      }
> > >      rcu_read_unlock();
> > >
> > > -    msec = config.sample_period_seconds * 1000;
> > > -    msec = dirty_stat_wait(msec, initial_time);
> > > +    DirtyStat.calc_time_ms = dirty_stat_wait(config.calc_time_ms,
> > > +                                             initial_time);
> > >      DirtyStat.start_time = initial_time / 1000;
> > > -    DirtyStat.calc_time = msec / 1000;
> > >
> > >      rcu_read_lock();
> > >      if (!compare_page_hash_info(block_dinfo, block_count)) {
> > >          goto out;
> > >      }
> > >
> > > -    update_dirtyrate(msec);
> > > +    update_dirtyrate(DirtyStat.calc_time_ms);
> > >
> > >  out:
> > >      rcu_read_unlock();
> > > @@ -711,7 +709,10 @@ void *get_dirtyrate_thread(void *arg)
> > >      return NULL;
> > >  }
> > >
> > > -void qmp_calc_dirty_rate(int64_t calc_time,
> > > +void qmp_calc_dirty_rate(bool has_calc_time,
> > > +                         int64_t calc_time,
> > > +                         bool has_calc_time_ms,
> > > +                         int64_t calc_time_ms,
> > >                           bool has_sample_pages,
> > >                           int64_t sample_pages,
> > >                           bool has_mode,
> > > @@ -731,10 +732,21 @@ void qmp_calc_dirty_rate(int64_t calc_time,
> > >          return;
> > >      }
> > >
> > > -    if (!is_sample_period_valid(calc_time)) {
> > > -        error_setg(errp, "calc-time is out of range[%d, %d].",
> > > -                         MIN_FETCH_DIRTYRATE_TIME_SEC,
> > > -                         MAX_FETCH_DIRTYRATE_TIME_SEC);
> > > +    if ((int)has_calc_time + (int)has_calc_time_ms != 1) {
> > > +        error_setg(errp, "Exactly one of calc-time and calc-time-ms
> must"
> > > +                         " be specified");
> > > +        return;
> > > +    }
> > > +    if (has_calc_time) {
> > > +        /*
> > > +         * The worst thing that can happen due to overflow is that
> > > +         * invalid value will become valid.
> > > +         */
> > > +        calc_time_ms = calc_time * 1000;
> > > +    }
> > > +    if (!is_calc_time_valid(calc_time_ms)) {
> > > +        error_setg(errp, "Calculation time is out of range[%dms,
> %dms].",
> > > +                         MIN_CALC_TIME_MS, MAX_CALC_TIME_MS);
> > >          return;
> > >      }
> > >
> > > @@ -781,7 +793,7 @@ void qmp_calc_dirty_rate(int64_t calc_time,
> > >          return;
> > >      }
> > >
> > > -    config.sample_period_seconds = calc_time;
> > > +    config.calc_time_ms = calc_time_ms;
> > >      config.sample_pages_per_gigabytes = sample_pages;
> > >      config.mode = mode;
> > >
> > > @@ -867,8 +879,11 @@ void hmp_calc_dirty_rate(Monitor *mon, const QDict
> > > *qdict)
> > >          mode = DIRTY_RATE_MEASURE_MODE_DIRTY_RING;
> > >      }
> > >
> > > -    qmp_calc_dirty_rate(sec, has_sample_pages, sample_pages, true,
> > > -                        mode, &err);
> > > +    qmp_calc_dirty_rate(true, sec, /* calc_time */
> > > +                        false, 0, /* calc_time_ms */
> > > +                        has_sample_pages, sample_pages,
> > > +                        true, mode,
> > > +                        &err);
> > >      if (err) {
> > >          hmp_handle_error(mon, err);
> > >          return;
> > > --
> > > 2.30.2
> > >
> > > The patch set works for me, and I'm inclined to split it into two
> commits
> > as I point out above.
> >
> > Thanks
> >
> > Yong
> >
> > --
> > Best regards
>


-- 
Best regards
Re: [PATCH v2] migration/calc-dirty-rate: millisecond-granularity period
Posted by Peter Xu 9 months, 1 week ago
On Wed, Aug 09, 2023 at 06:02:52PM +0300, gudkov.andrei@huawei.com wrote:
> > Not for the patch, I'm just curious about how the predication
> > decides the network throughput, I mean QEMU predicts
> > if migration will converge based on how fast it sends the data,
> > not the actual bandwidth of the interface, which one the
> > prediction uses?
> > 
> Currently I use network nominal bandwidth, e.g. 1gbps. It would
> be nice, of course, to use measured throughput since it can take
> into account network headers overhead (as Wang Lei previously
> mentioned), compression, etc., but it looks too complicated to
> implement outside of migration process.

Using line speed is definitely wise, but qemu may be stupid enough to do
wrong calculations and when it happens migration may not switchover..

See this:

https://lore.kernel.org/r/20230803155344.11450-3-peterx@redhat.com

Probably should be used together with your ways of prediction to be even
more accurate on some workloads where mbps reports insane numbers.

Thanks,

-- 
Peter Xu
Re: [PATCH v2] migration/calc-dirty-rate: millisecond-granularity period
Posted by Markus Armbruster 9 months, 2 weeks ago
Andrei Gudkov <gudkov.andrei@huawei.com> writes:

> Introduces alternative argument calc-time-ms, which is the
> the same as calc-time but accepts millisecond value.
> Millisecond granularity allows to make predictions whether
> migration will succeed or not. To do this, calculate dirty
> rate with calc-time-ms set to max allowed downtime, convert
> measured rate into volume of dirtied memory, and divide by
> network throughput. If the value is lower than max allowed
> downtime, then migration will converge.
>
> Measurement results for single thread randomly writing to
> a 1/4/24GiB memory region:
>
> +--------------+-----------------------------------------------+
> | calc-time-ms |                dirty rate MiB/s               |
> |              +----------------+---------------+--------------+
> |              | theoretical    | page-sampling | dirty-bitmap |
> |              | (at 3M wr/sec) |               |              |
> +--------------+----------------+---------------+--------------+
> |                             1GiB                             |
> +--------------+----------------+---------------+--------------+
> |          100 |           6996 |          7100 |         3192 |
> |          200 |           4606 |          4660 |         2655 |
> |          300 |           3305 |          3280 |         2371 |
> |          400 |           2534 |          2525 |         2154 |
> |          500 |           2041 |          2044 |         1871 |
> |          750 |           1365 |          1341 |         1358 |
> |         1000 |           1024 |          1052 |         1025 |
> |         1500 |            683 |           678 |          684 |
> |         2000 |            512 |           507 |          513 |
> +--------------+----------------+---------------+--------------+
> |                             4GiB                             |
> +--------------+----------------+---------------+--------------+
> |          100 |          10232 |          8880 |         4070 |
> |          200 |           8954 |          8049 |         3195 |
> |          300 |           7889 |          7193 |         2881 |
> |          400 |           6996 |          6530 |         2700 |
> |          500 |           6245 |          5772 |         2312 |
> |          750 |           4829 |          4586 |         2465 |
> |         1000 |           3865 |          3780 |         2178 |
> |         1500 |           2694 |          2633 |         2004 |
> |         2000 |           2041 |          2031 |         1789 |
> +--------------+----------------+---------------+--------------+
> |                             24GiB                            |
> +--------------+----------------+---------------+--------------+
> |          100 |          11495 |          8640 |         5597 |
> |          200 |          11226 |          8616 |         3527 |
> |          300 |          10965 |          8386 |         2355 |
> |          400 |          10713 |          8370 |         2179 |
> |          500 |          10469 |          8196 |         2098 |
> |          750 |           9890 |          7885 |         2556 |
> |         1000 |           9354 |          7506 |         2084 |
> |         1500 |           8397 |          6944 |         2075 |
> |         2000 |           7574 |          6402 |         2062 |
> +--------------+----------------+---------------+--------------+
>
> Theoretical values are computed according to the following formula:
> size * (1 - (1-(4096/size))^(time*wps)) / (time * 2^20),
> where size is in bytes, time is in seconds, and wps is number of
> writes per second.
>
> Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
> ---
>  qapi/migration.json   | 14 ++++++--
>  migration/dirtyrate.h | 12 ++++---
>  migration/dirtyrate.c | 81 +++++++++++++++++++++++++------------------
>  3 files changed, 67 insertions(+), 40 deletions(-)
>
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 8843e74b59..82493d6a57 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1849,7 +1849,11 @@
>  # @start-time: start time in units of second for calculation
>  #
>  # @calc-time: time period for which dirty page rate was measured
> -#     (in seconds)
> +#     (rounded down to seconds).
> +#
> +# @calc-time-ms: actual time period for which dirty page rate was
> +#     measured (in milliseconds).  Value may be larger than requested
> +#     time period due to measurement overhead.
>  #
>  # @sample-pages: number of sampled pages per GiB of guest memory.
>  #     Valid only in page-sampling mode (Since 6.1)
> @@ -1866,6 +1870,7 @@
>             'status': 'DirtyRateStatus',
>             'start-time': 'int64',
>             'calc-time': 'int64',
> +           'calc-time-ms': 'int64',
>             'sample-pages': 'uint64',
>             'mode': 'DirtyRateMeasureMode',
>             '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } }
> @@ -1908,6 +1913,10 @@
>  #     dirty during @calc-time period, further writes to this page will
>  #     not increase dirty page rate anymore.
>  #
> +# @calc-time-ms: the same as @calc-time but in milliseconds.  These
> +#    two arguments are mutually exclusive.  Exactly one of them must
> +#    be specified. (Since 8.1)
> +#
>  # @sample-pages: number of sampled pages per each GiB of guest memory.
>  #     Default value is 512.  For 4KiB guest pages this corresponds to
>  #     sampling ratio of 0.2%.  This argument is used only in page
> @@ -1925,7 +1934,8 @@
>  #                                                 'sample-pages': 512} }
>  # <- { "return": {} }
>  ##
> -{ 'command': 'calc-dirty-rate', 'data': {'calc-time': 'int64',
> +{ 'command': 'calc-dirty-rate', 'data': {'*calc-time': 'int64',
> +                                         '*calc-time-ms': 'int64',
>                                           '*sample-pages': 'int',
>                                           '*mode': 'DirtyRateMeasureMode'} }
>  

Having both @calc-time and @calc-time-ms is ugly.

Can we deprecate @calc-time?

I don't like the name @calc-time-ms.  We don't put units in names
elsewhere.

Differently ugly: new member containing the fractional part, i.e. time
in seconds = calc-time + fractional-part / 1000.  With a better name, of
course.

[...]
Re: [PATCH v2] migration/calc-dirty-rate: millisecond-granularity period
Posted by Yong Huang 9 months, 1 week ago
On Sat, Aug 5, 2023 at 2:05 AM Markus Armbruster <armbru@redhat.com> wrote:

> Andrei Gudkov <gudkov.andrei@huawei.com> writes:
>
> > Introduces alternative argument calc-time-ms, which is the
> > the same as calc-time but accepts millisecond value.
> > Millisecond granularity allows to make predictions whether
> > migration will succeed or not. To do this, calculate dirty
> > rate with calc-time-ms set to max allowed downtime, convert
> > measured rate into volume of dirtied memory, and divide by
> > network throughput. If the value is lower than max allowed
> > downtime, then migration will converge.
> >
> > Measurement results for single thread randomly writing to
> > a 1/4/24GiB memory region:
> >
> > +--------------+-----------------------------------------------+
> > | calc-time-ms |                dirty rate MiB/s               |
> > |              +----------------+---------------+--------------+
> > |              | theoretical    | page-sampling | dirty-bitmap |
> > |              | (at 3M wr/sec) |               |              |
> > +--------------+----------------+---------------+--------------+
> > |                             1GiB                             |
> > +--------------+----------------+---------------+--------------+
> > |          100 |           6996 |          7100 |         3192 |
> > |          200 |           4606 |          4660 |         2655 |
> > |          300 |           3305 |          3280 |         2371 |
> > |          400 |           2534 |          2525 |         2154 |
> > |          500 |           2041 |          2044 |         1871 |
> > |          750 |           1365 |          1341 |         1358 |
> > |         1000 |           1024 |          1052 |         1025 |
> > |         1500 |            683 |           678 |          684 |
> > |         2000 |            512 |           507 |          513 |
> > +--------------+----------------+---------------+--------------+
> > |                             4GiB                             |
> > +--------------+----------------+---------------+--------------+
> > |          100 |          10232 |          8880 |         4070 |
> > |          200 |           8954 |          8049 |         3195 |
> > |          300 |           7889 |          7193 |         2881 |
> > |          400 |           6996 |          6530 |         2700 |
> > |          500 |           6245 |          5772 |         2312 |
> > |          750 |           4829 |          4586 |         2465 |
> > |         1000 |           3865 |          3780 |         2178 |
> > |         1500 |           2694 |          2633 |         2004 |
> > |         2000 |           2041 |          2031 |         1789 |
> > +--------------+----------------+---------------+--------------+
> > |                             24GiB                            |
> > +--------------+----------------+---------------+--------------+
> > |          100 |          11495 |          8640 |         5597 |
> > |          200 |          11226 |          8616 |         3527 |
> > |          300 |          10965 |          8386 |         2355 |
> > |          400 |          10713 |          8370 |         2179 |
> > |          500 |          10469 |          8196 |         2098 |
> > |          750 |           9890 |          7885 |         2556 |
> > |         1000 |           9354 |          7506 |         2084 |
> > |         1500 |           8397 |          6944 |         2075 |
> > |         2000 |           7574 |          6402 |         2062 |
> > +--------------+----------------+---------------+--------------+
> >
> > Theoretical values are computed according to the following formula:
> > size * (1 - (1-(4096/size))^(time*wps)) / (time * 2^20),
> > where size is in bytes, time is in seconds, and wps is number of
> > writes per second.
> >
> > Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
> > ---
> >  qapi/migration.json   | 14 ++++++--
> >  migration/dirtyrate.h | 12 ++++---
> >  migration/dirtyrate.c | 81 +++++++++++++++++++++++++------------------
> >  3 files changed, 67 insertions(+), 40 deletions(-)
> >
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 8843e74b59..82493d6a57 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -1849,7 +1849,11 @@
> >  # @start-time: start time in units of second for calculation
> >  #
> >  # @calc-time: time period for which dirty page rate was measured
> > -#     (in seconds)
> > +#     (rounded down to seconds).
> > +#
> > +# @calc-time-ms: actual time period for which dirty page rate was
> > +#     measured (in milliseconds).  Value may be larger than requested
> > +#     time period due to measurement overhead.
> >  #
> >  # @sample-pages: number of sampled pages per GiB of guest memory.
> >  #     Valid only in page-sampling mode (Since 6.1)
> > @@ -1866,6 +1870,7 @@
> >             'status': 'DirtyRateStatus',
> >             'start-time': 'int64',
> >             'calc-time': 'int64',
> > +           'calc-time-ms': 'int64',
> >             'sample-pages': 'uint64',
> >             'mode': 'DirtyRateMeasureMode',
> >             '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } }
> > @@ -1908,6 +1913,10 @@
> >  #     dirty during @calc-time period, further writes to this page will
> >  #     not increase dirty page rate anymore.
> >  #
> > +# @calc-time-ms: the same as @calc-time but in milliseconds.  These
> > +#    two arguments are mutually exclusive.  Exactly one of them must
> > +#    be specified. (Since 8.1)
> > +#
> >  # @sample-pages: number of sampled pages per each GiB of guest memory.
> >  #     Default value is 512.  For 4KiB guest pages this corresponds to
> >  #     sampling ratio of 0.2%.  This argument is used only in page
> > @@ -1925,7 +1934,8 @@
> >  #                                                 'sample-pages': 512} }
> >  # <- { "return": {} }
> >  ##
> > -{ 'command': 'calc-dirty-rate', 'data': {'calc-time': 'int64',
> > +{ 'command': 'calc-dirty-rate', 'data': {'*calc-time': 'int64',
> > +                                         '*calc-time-ms': 'int64',
> >                                           '*sample-pages': 'int',
> >                                           '*mode':
> 'DirtyRateMeasureMode'} }
> >
>
> Having both @calc-time and @calc-time-ms is ugly.
>
> Can we deprecate @calc-time?
>
Since the upper app Libvirt has used this field to implement
the virDomainStartDirtyRateCalc API unfortunately.
Deprecating this requires the extra patch on Libvirt but no
functional improvement, IMHO, the field could remain untouched.

>
> I don't like the name @calc-time-ms.  We don't put units in names
> elsewhere.
>
> Differently ugly: new member containing the fractional part, i.e. time
> in seconds = calc-time + fractional-part / 1000.  With a better name, of
> course.
>
> [...]
>
>
Thanks,

Yong
-- 
Best regards
Re: [PATCH v2] migration/calc-dirty-rate: millisecond-granularity period
Posted by gudkov.andrei--- via 9 months, 1 week ago
On Sun, Aug 06, 2023 at 02:31:43PM +0800, Yong Huang wrote:
> On Sat, Aug 5, 2023 at 2:05 AM Markus Armbruster <armbru@redhat.com> wrote:
> 
> > Andrei Gudkov <gudkov.andrei@huawei.com> writes:
> >
> > > Introduces alternative argument calc-time-ms, which is the
> > > the same as calc-time but accepts millisecond value.
> > > Millisecond granularity allows to make predictions whether
> > > migration will succeed or not. To do this, calculate dirty
> > > rate with calc-time-ms set to max allowed downtime, convert
> > > measured rate into volume of dirtied memory, and divide by
> > > network throughput. If the value is lower than max allowed
> > > downtime, then migration will converge.
> > >
> > > Measurement results for single thread randomly writing to
> > > a 1/4/24GiB memory region:
> > >
> > > +--------------+-----------------------------------------------+
> > > | calc-time-ms |                dirty rate MiB/s               |
> > > |              +----------------+---------------+--------------+
> > > |              | theoretical    | page-sampling | dirty-bitmap |
> > > |              | (at 3M wr/sec) |               |              |
> > > +--------------+----------------+---------------+--------------+
> > > |                             1GiB                             |
> > > +--------------+----------------+---------------+--------------+
> > > |          100 |           6996 |          7100 |         3192 |
> > > |          200 |           4606 |          4660 |         2655 |
> > > |          300 |           3305 |          3280 |         2371 |
> > > |          400 |           2534 |          2525 |         2154 |
> > > |          500 |           2041 |          2044 |         1871 |
> > > |          750 |           1365 |          1341 |         1358 |
> > > |         1000 |           1024 |          1052 |         1025 |
> > > |         1500 |            683 |           678 |          684 |
> > > |         2000 |            512 |           507 |          513 |
> > > +--------------+----------------+---------------+--------------+
> > > |                             4GiB                             |
> > > +--------------+----------------+---------------+--------------+
> > > |          100 |          10232 |          8880 |         4070 |
> > > |          200 |           8954 |          8049 |         3195 |
> > > |          300 |           7889 |          7193 |         2881 |
> > > |          400 |           6996 |          6530 |         2700 |
> > > |          500 |           6245 |          5772 |         2312 |
> > > |          750 |           4829 |          4586 |         2465 |
> > > |         1000 |           3865 |          3780 |         2178 |
> > > |         1500 |           2694 |          2633 |         2004 |
> > > |         2000 |           2041 |          2031 |         1789 |
> > > +--------------+----------------+---------------+--------------+
> > > |                             24GiB                            |
> > > +--------------+----------------+---------------+--------------+
> > > |          100 |          11495 |          8640 |         5597 |
> > > |          200 |          11226 |          8616 |         3527 |
> > > |          300 |          10965 |          8386 |         2355 |
> > > |          400 |          10713 |          8370 |         2179 |
> > > |          500 |          10469 |          8196 |         2098 |
> > > |          750 |           9890 |          7885 |         2556 |
> > > |         1000 |           9354 |          7506 |         2084 |
> > > |         1500 |           8397 |          6944 |         2075 |
> > > |         2000 |           7574 |          6402 |         2062 |
> > > +--------------+----------------+---------------+--------------+
> > >
> > > Theoretical values are computed according to the following formula:
> > > size * (1 - (1-(4096/size))^(time*wps)) / (time * 2^20),
> > > where size is in bytes, time is in seconds, and wps is number of
> > > writes per second.
> > >
> > > Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
> > > ---
> > >  qapi/migration.json   | 14 ++++++--
> > >  migration/dirtyrate.h | 12 ++++---
> > >  migration/dirtyrate.c | 81 +++++++++++++++++++++++++------------------
> > >  3 files changed, 67 insertions(+), 40 deletions(-)
> > >
> > > diff --git a/qapi/migration.json b/qapi/migration.json
> > > index 8843e74b59..82493d6a57 100644
> > > --- a/qapi/migration.json
> > > +++ b/qapi/migration.json
> > > @@ -1849,7 +1849,11 @@
> > >  # @start-time: start time in units of second for calculation
> > >  #
> > >  # @calc-time: time period for which dirty page rate was measured
> > > -#     (in seconds)
> > > +#     (rounded down to seconds).
> > > +#
> > > +# @calc-time-ms: actual time period for which dirty page rate was
> > > +#     measured (in milliseconds).  Value may be larger than requested
> > > +#     time period due to measurement overhead.
> > >  #
> > >  # @sample-pages: number of sampled pages per GiB of guest memory.
> > >  #     Valid only in page-sampling mode (Since 6.1)
> > > @@ -1866,6 +1870,7 @@
> > >             'status': 'DirtyRateStatus',
> > >             'start-time': 'int64',
> > >             'calc-time': 'int64',
> > > +           'calc-time-ms': 'int64',
> > >             'sample-pages': 'uint64',
> > >             'mode': 'DirtyRateMeasureMode',
> > >             '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } }
> > > @@ -1908,6 +1913,10 @@
> > >  #     dirty during @calc-time period, further writes to this page will
> > >  #     not increase dirty page rate anymore.
> > >  #
> > > +# @calc-time-ms: the same as @calc-time but in milliseconds.  These
> > > +#    two arguments are mutually exclusive.  Exactly one of them must
> > > +#    be specified. (Since 8.1)
> > > +#
> > >  # @sample-pages: number of sampled pages per each GiB of guest memory.
> > >  #     Default value is 512.  For 4KiB guest pages this corresponds to
> > >  #     sampling ratio of 0.2%.  This argument is used only in page
> > > @@ -1925,7 +1934,8 @@
> > >  #                                                 'sample-pages': 512} }
> > >  # <- { "return": {} }
> > >  ##
> > > -{ 'command': 'calc-dirty-rate', 'data': {'calc-time': 'int64',
> > > +{ 'command': 'calc-dirty-rate', 'data': {'*calc-time': 'int64',
> > > +                                         '*calc-time-ms': 'int64',
> > >                                           '*sample-pages': 'int',
> > >                                           '*mode':
> > 'DirtyRateMeasureMode'} }
> > >
> >
> > Having both @calc-time and @calc-time-ms is ugly.
> >
> > Can we deprecate @calc-time?
> >
> Since the upper app Libvirt has used this field to implement
> the virDomainStartDirtyRateCalc API unfortunately.
> Deprecating this requires the extra patch on Libvirt but no
> functional improvement, IMHO, the field could remain untouched.
> 
> >
> > I don't like the name @calc-time-ms.  We don't put units in names
> > elsewhere.
> >
> > Differently ugly: new member containing the fractional part, i.e. time
> > in seconds = calc-time + fractional-part / 1000.  With a better name, of
> > course.
> >
> > [...]
> >
> >

As another alternative I can propose to add an optional field that
specifies time unit.

Initiate dirty page rate measurements for 300ms period:
{"execute": "calc-dirty-rate",
 "arguments":{"calc-time": 300, "time-unit": "millis"}}

Query dirty rate. Report calc-time in milliseconds:
{"execute": "query-dirty-rate",
 "arguments":{"time-unit": "millis"}}

> Thanks,
> 
> Yong
> -- 
> Best regards

Re: [PATCH v2] migration/calc-dirty-rate: millisecond-granularity period
Posted by Yong Huang 9 months, 1 week ago
On Wed, Aug 9, 2023 at 9:59 PM <gudkov.andrei@huawei.com> wrote:

> On Sun, Aug 06, 2023 at 02:31:43PM +0800, Yong Huang wrote:
> > On Sat, Aug 5, 2023 at 2:05 AM Markus Armbruster <armbru@redhat.com>
> wrote:
> >
> > > Andrei Gudkov <gudkov.andrei@huawei.com> writes:
> > >
> > > > Introduces alternative argument calc-time-ms, which is the
> > > > the same as calc-time but accepts millisecond value.
> > > > Millisecond granularity allows to make predictions whether
> > > > migration will succeed or not. To do this, calculate dirty
> > > > rate with calc-time-ms set to max allowed downtime, convert
> > > > measured rate into volume of dirtied memory, and divide by
> > > > network throughput. If the value is lower than max allowed
> > > > downtime, then migration will converge.
> > > >
> > > > Measurement results for single thread randomly writing to
> > > > a 1/4/24GiB memory region:
> > > >
> > > > +--------------+-----------------------------------------------+
> > > > | calc-time-ms |                dirty rate MiB/s               |
> > > > |              +----------------+---------------+--------------+
> > > > |              | theoretical    | page-sampling | dirty-bitmap |
> > > > |              | (at 3M wr/sec) |               |              |
> > > > +--------------+----------------+---------------+--------------+
> > > > |                             1GiB                             |
> > > > +--------------+----------------+---------------+--------------+
> > > > |          100 |           6996 |          7100 |         3192 |
> > > > |          200 |           4606 |          4660 |         2655 |
> > > > |          300 |           3305 |          3280 |         2371 |
> > > > |          400 |           2534 |          2525 |         2154 |
> > > > |          500 |           2041 |          2044 |         1871 |
> > > > |          750 |           1365 |          1341 |         1358 |
> > > > |         1000 |           1024 |          1052 |         1025 |
> > > > |         1500 |            683 |           678 |          684 |
> > > > |         2000 |            512 |           507 |          513 |
> > > > +--------------+----------------+---------------+--------------+
> > > > |                             4GiB                             |
> > > > +--------------+----------------+---------------+--------------+
> > > > |          100 |          10232 |          8880 |         4070 |
> > > > |          200 |           8954 |          8049 |         3195 |
> > > > |          300 |           7889 |          7193 |         2881 |
> > > > |          400 |           6996 |          6530 |         2700 |
> > > > |          500 |           6245 |          5772 |         2312 |
> > > > |          750 |           4829 |          4586 |         2465 |
> > > > |         1000 |           3865 |          3780 |         2178 |
> > > > |         1500 |           2694 |          2633 |         2004 |
> > > > |         2000 |           2041 |          2031 |         1789 |
> > > > +--------------+----------------+---------------+--------------+
> > > > |                             24GiB                            |
> > > > +--------------+----------------+---------------+--------------+
> > > > |          100 |          11495 |          8640 |         5597 |
> > > > |          200 |          11226 |          8616 |         3527 |
> > > > |          300 |          10965 |          8386 |         2355 |
> > > > |          400 |          10713 |          8370 |         2179 |
> > > > |          500 |          10469 |          8196 |         2098 |
> > > > |          750 |           9890 |          7885 |         2556 |
> > > > |         1000 |           9354 |          7506 |         2084 |
> > > > |         1500 |           8397 |          6944 |         2075 |
> > > > |         2000 |           7574 |          6402 |         2062 |
> > > > +--------------+----------------+---------------+--------------+
> > > >
> > > > Theoretical values are computed according to the following formula:
> > > > size * (1 - (1-(4096/size))^(time*wps)) / (time * 2^20),
> > > > where size is in bytes, time is in seconds, and wps is number of
> > > > writes per second.
> > > >
> > > > Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
> > > > ---
> > > >  qapi/migration.json   | 14 ++++++--
> > > >  migration/dirtyrate.h | 12 ++++---
> > > >  migration/dirtyrate.c | 81
> +++++++++++++++++++++++++------------------
> > > >  3 files changed, 67 insertions(+), 40 deletions(-)
> > > >
> > > > diff --git a/qapi/migration.json b/qapi/migration.json
> > > > index 8843e74b59..82493d6a57 100644
> > > > --- a/qapi/migration.json
> > > > +++ b/qapi/migration.json
> > > > @@ -1849,7 +1849,11 @@
> > > >  # @start-time: start time in units of second for calculation
> > > >  #
> > > >  # @calc-time: time period for which dirty page rate was measured
> > > > -#     (in seconds)
> > > > +#     (rounded down to seconds).
> > > > +#
> > > > +# @calc-time-ms: actual time period for which dirty page rate was
> > > > +#     measured (in milliseconds).  Value may be larger than
> requested
> > > > +#     time period due to measurement overhead.
> > > >  #
> > > >  # @sample-pages: number of sampled pages per GiB of guest memory.
> > > >  #     Valid only in page-sampling mode (Since 6.1)
> > > > @@ -1866,6 +1870,7 @@
> > > >             'status': 'DirtyRateStatus',
> > > >             'start-time': 'int64',
> > > >             'calc-time': 'int64',
> > > > +           'calc-time-ms': 'int64',
> > > >             'sample-pages': 'uint64',
> > > >             'mode': 'DirtyRateMeasureMode',
> > > >             '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } }
> > > > @@ -1908,6 +1913,10 @@
> > > >  #     dirty during @calc-time period, further writes to this page
> will
> > > >  #     not increase dirty page rate anymore.
> > > >  #
> > > > +# @calc-time-ms: the same as @calc-time but in milliseconds.  These
> > > > +#    two arguments are mutually exclusive.  Exactly one of them must
> > > > +#    be specified. (Since 8.1)
> > > > +#
> > > >  # @sample-pages: number of sampled pages per each GiB of guest
> memory.
> > > >  #     Default value is 512.  For 4KiB guest pages this corresponds
> to
> > > >  #     sampling ratio of 0.2%.  This argument is used only in page
> > > > @@ -1925,7 +1934,8 @@
> > > >  #                                                 'sample-pages':
> 512} }
> > > >  # <- { "return": {} }
> > > >  ##
> > > > -{ 'command': 'calc-dirty-rate', 'data': {'calc-time': 'int64',
> > > > +{ 'command': 'calc-dirty-rate', 'data': {'*calc-time': 'int64',
> > > > +                                         '*calc-time-ms': 'int64',
> > > >                                           '*sample-pages': 'int',
> > > >                                           '*mode':
> > > 'DirtyRateMeasureMode'} }
> > > >
> > >
> > > Having both @calc-time and @calc-time-ms is ugly.
> > >
> > > Can we deprecate @calc-time?
> > >
> > Since the upper app Libvirt has used this field to implement
> > the virDomainStartDirtyRateCalc API unfortunately.
> > Deprecating this requires the extra patch on Libvirt but no
> > functional improvement, IMHO, the field could remain untouched.
> >
> > >
> > > I don't like the name @calc-time-ms.  We don't put units in names
> > > elsewhere.
> > >
> > > Differently ugly: new member containing the fractional part, i.e. time
> > > in seconds = calc-time + fractional-part / 1000.  With a better name,
> of
> > > course.
> > >
> > > [...]
> > >
> > >
>
> As another alternative I can propose to add an optional field that
> specifies time unit.
>
> Initiate dirty page rate measurements for 300ms period:
> {"execute": "calc-dirty-rate",
>  "arguments":{"calc-time": 300, "time-unit": "millis"}}
>
> Query dirty rate. Report calc-time in milliseconds:
> {"execute": "query-dirty-rate",
>  "arguments":{"time-unit": "millis"}}
>

This sounds good and compatible with the old api. Thanks !

>
> > Thanks,
> >
> > Yong
> > --
> > Best regards
>

yong.

-- 
Best regards
Re: [PATCH v2] migration/calc-dirty-rate: millisecond-granularity period
Posted by Peter Xu 9 months, 2 weeks ago
On Fri, Aug 04, 2023 at 06:03:27PM +0300, Andrei Gudkov wrote:
> Introduces alternative argument calc-time-ms, which is the
> the same as calc-time but accepts millisecond value.
> Millisecond granularity allows to make predictions whether
> migration will succeed or not. To do this, calculate dirty
> rate with calc-time-ms set to max allowed downtime, convert
> measured rate into volume of dirtied memory, and divide by
> network throughput. If the value is lower than max allowed
> downtime, then migration will converge.
> 
> Measurement results for single thread randomly writing to
> a 1/4/24GiB memory region:
> 
> +--------------+-----------------------------------------------+
> | calc-time-ms |                dirty rate MiB/s               |
> |              +----------------+---------------+--------------+
> |              | theoretical    | page-sampling | dirty-bitmap |
> |              | (at 3M wr/sec) |               |              |
> +--------------+----------------+---------------+--------------+
> |                             1GiB                             |
> +--------------+----------------+---------------+--------------+
> |          100 |           6996 |          7100 |         3192 |
> |          200 |           4606 |          4660 |         2655 |
> |          300 |           3305 |          3280 |         2371 |
> |          400 |           2534 |          2525 |         2154 |
> |          500 |           2041 |          2044 |         1871 |
> |          750 |           1365 |          1341 |         1358 |
> |         1000 |           1024 |          1052 |         1025 |
> |         1500 |            683 |           678 |          684 |
> |         2000 |            512 |           507 |          513 |
> +--------------+----------------+---------------+--------------+
> |                             4GiB                             |
> +--------------+----------------+---------------+--------------+
> |          100 |          10232 |          8880 |         4070 |
> |          200 |           8954 |          8049 |         3195 |
> |          300 |           7889 |          7193 |         2881 |
> |          400 |           6996 |          6530 |         2700 |
> |          500 |           6245 |          5772 |         2312 |
> |          750 |           4829 |          4586 |         2465 |
> |         1000 |           3865 |          3780 |         2178 |
> |         1500 |           2694 |          2633 |         2004 |
> |         2000 |           2041 |          2031 |         1789 |
> +--------------+----------------+---------------+--------------+
> |                             24GiB                            |
> +--------------+----------------+---------------+--------------+
> |          100 |          11495 |          8640 |         5597 |
> |          200 |          11226 |          8616 |         3527 |
> |          300 |          10965 |          8386 |         2355 |
> |          400 |          10713 |          8370 |         2179 |
> |          500 |          10469 |          8196 |         2098 |
> |          750 |           9890 |          7885 |         2556 |
> |         1000 |           9354 |          7506 |         2084 |
> |         1500 |           8397 |          6944 |         2075 |
> |         2000 |           7574 |          6402 |         2062 |
> +--------------+----------------+---------------+--------------+
> 
> Theoretical values are computed according to the following formula:
> size * (1 - (1-(4096/size))^(time*wps)) / (time * 2^20),
> where size is in bytes, time is in seconds, and wps is number of
> writes per second.
> 
> Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>

Acked-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu