From nobody Mon Feb  9 04:23:03 2026
Delivered-To: importer@patchew.org
Authentication-Results: mx.zohomail.com;
	spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as
 permitted sender)
  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org;
	dmarc=pass(p=none dis=none)  header.from=nongnu.org
ARC-Seal: i=1; a=rsa-sha256; t=1677594038; cv=none;
	d=zohomail.com; s=zohoarc;
	b=PT83s9u8wAS4fH1jNS+enKpDt7yXttv4VcxSMIbqSeQ2+doPNqAGN25kSFWxEhFKAFA4ZazTdLjz8CcUskuALuGuW9Br3gGGtXrSxHf481daZb2GW55VHUaQvVAmzxSaJXKChbALs4KZPHz8t9c+huhT1UBWrB8rP8LEYsD35iw=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com;
 s=zohoarc;
	t=1677594038;
 h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Reply-To:References:Sender:Subject:To;
	bh=cESHH95KpI9K9SB3vpi+aglXY3r3VltFdfUBRicXOkg=;
	b=SjvyFCh4+yS+OXcZHzSCu21Yh/MrSbgimMJPEzPsdI/3onUW8cYyIKrE7gSB4NBonssNBt27zskefumUcMDF8DtUNvOEs+NrVmqOyRvebQ761Y+O9ooWTpnBANxhRxyuIPdM3XbQ9aiCvmvOXgRPtkuPZl54n9ReSlgjvnWZixQ=
ARC-Authentication-Results: i=1; mx.zohomail.com;
	spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as
 permitted sender)
  smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org;
	dmarc=pass header.from=<qemu-devel@nongnu.org> (p=none dis=none)
Return-Path: <qemu-devel-bounces+importer=patchew.org@nongnu.org>
Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by
 mx.zohomail.com
	with SMTPS id 1677594038326732.1215437042881;
 Tue, 28 Feb 2023 06:20:38 -0800 (PST)
Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <qemu-devel-bounces@nongnu.org>)
	id 1pX0pU-0001SZ-5u; Tue, 28 Feb 2023 09:19:44 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <gudkov.andrei@huawei.com>)
 id 1pWzrL-0001Cv-1c
 for qemu-devel@nongnu.org; Tue, 28 Feb 2023 08:17:35 -0500
Received: from frasgout.his.huawei.com ([185.176.79.56])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <gudkov.andrei@huawei.com>)
 id 1pWzrF-0002sE-6w
 for qemu-devel@nongnu.org; Tue, 28 Feb 2023 08:17:34 -0500
Received: from lhrpeml500004.china.huawei.com (unknown [172.18.147.226])
 by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4PQyT54t6Jz6J7dm;
 Tue, 28 Feb 2023 21:12:25 +0800 (CST)
Received: from DESKTOP-0LHM7NF.huawei.com (10.199.58.101) by
 lhrpeml500004.china.huawei.com (7.191.163.9) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2507.21; Tue, 28 Feb 2023 13:17:17 +0000
To: <qemu-devel@nongnu.org>
CC: <quintela@redhat.com>, <dgilbert@redhat.com>, Andrei Gudkov
 <gudkov.andrei@huawei.com>
Subject: [PATCH 2/2] migration/calc-dirty-rate: tool to predict migration time
Date: Tue, 28 Feb 2023 16:16:03 +0300
Message-ID: 
 <839db5c82057f59782cc6156d74ffdc43d512a3d.1677589218.git.gudkov.andrei@huawei.com>
X-Mailer: git-send-email 2.30.2
In-Reply-To: <cover.1677589218.git.gudkov.andrei@huawei.com>
References: <cover.1677589218.git.gudkov.andrei@huawei.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Originating-IP: [10.199.58.101]
X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To
 lhrpeml500004.china.huawei.com (7.191.163.9)
X-CFilter-Loop: Reflected
Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17
 as permitted sender) client-ip=209.51.188.17;
 envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org;
 helo=lists.gnu.org;
Received-SPF: pass client-ip=185.176.79.56;
 envelope-from=gudkov.andrei@huawei.com; helo=frasgout.his.huawei.com
X-Spam_score_int: -41
X-Spam_score: -4.2
X-Spam_bar: ----
X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3,
 RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Mailman-Approved-At: Tue, 28 Feb 2023 09:19:40 -0500
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Reply-to: Andrei Gudkov <gudkov.andrei@huawei.com>
From: Andrei Gudkov via <qemu-devel@nongnu.org>
Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org
Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org
X-ZM-MESSAGEID: 1677594040543100012

Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
---
 MAINTAINERS                  |   1 +
 scripts/predict_migration.py | 283 +++++++++++++++++++++++++++++++++++
 2 files changed, 284 insertions(+)
 create mode 100644 scripts/predict_migration.py

diff --git a/MAINTAINERS b/MAINTAINERS
index c6e6549f06..2fb5b6298a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3107,6 +3107,7 @@ F: docs/devel/migration.rst
 F: qapi/migration.json
 F: tests/migration/
 F: util/userfaultfd.c
+F: scripts/predict_migration.py
=20
 D-Bus
 M: Marc-Andr=C3=A9 Lureau <marcandre.lureau@redhat.com>
diff --git a/scripts/predict_migration.py b/scripts/predict_migration.py
new file mode 100644
index 0000000000..c92a97585f
--- /dev/null
+++ b/scripts/predict_migration.py
@@ -0,0 +1,283 @@
+#!/usr/bin/env python3
+#
+# Predicts time required to migrate VM under given max downtime constraint.
+#
+# Copyright (c) 2023 HUAWEI TECHNOLOGIES CO.,LTD.
+#
+# Authors:
+#  Andrei Gudkov <gudkov.andrei@huawei.com>
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later.  See the COPYING file in the top-level directory.
+
+
+# Usage:
+#
+# Step 1. Collect dirty page statistics from live VM:
+# $ scripts/predict_migration.py calc-dirty-rate <qmphost> <qmpport> >dirt=
y.json
+# <...takes 1 minute by default...>
+#
+# Step 2. Run predictor against collected data:
+# $ scripts/predict_migration.py predict < dirty.json
+# Downtime> |    125ms |    250ms |    500ms |   1000ms |   5000ms |    un=
lim |
+# ------------------------------------------------------------------------=
-----
+#  100 Mbps |        - |        - |        - |        - |        - |   16m=
45s |
+#    1 Gbps |        - |        - |        - |        - |        - |    1m=
39s |
+#    2 Gbps |        - |        - |        - |        - |    1m55s |      =
50s |
+#  2.5 Gbps |        - |        - |        - |        - |    1m12s |      =
40s |
+#    5 Gbps |        - |        - |        - |      29s |      25s |      =
20s |
+#   10 Gbps |      13s |      13s |      12s |      12s |      12s |      =
10s |
+#   25 Gbps |       5s |       5s |       5s |       5s |       4s |      =
 4s |
+#   40 Gbps |       3s |       3s |       3s |       3s |       3s |      =
 3s |
+#
+# The latter prints table that lists estimated time it will take to migrat=
e VM.
+# This time depends on the network bandwidth and max allowed downtime.
+# Dash indicates that migration does not converge.
+# Prediction takes care only about migrating RAM and only in pre-copy mode.
+# Other features, such as compression or local disk migration, are not sup=
ported
+
+
+import sys
+import os
+import math
+import json
+from dataclasses import dataclass
+import asyncio
+import argparse
+
+sys.path.append(os.path.join(os.path.dirname(__file__), '..', 'python'))
+from qemu.qmp import QMPClient
+
+async def calc_dirty_rate(host, port, calc_time, sample_pages):
+    client =3D QMPClient()
+    try:
+        await client.connect((host, port))
+        args =3D {
+            'calc-time': calc_time,
+            'sample-pages': sample_pages
+        }
+        await client.execute('calc-dirty-rate', args)
+        await asyncio.sleep(calc_time)
+        while True:
+            data =3D await client.execute('query-dirty-rate')
+            if data['status'] =3D=3D 'measuring':
+                await asyncio.sleep(0.5)
+            elif data['status'] =3D=3D 'measured':
+                return data
+            else:
+                raise ValueError(data['status'])
+    finally:
+        await client.disconnect()
+
+
+class MemoryModel:
+    """
+    Models RAM state during pre-copy migration using calc-dirty-rate resul=
ts.
+    Its primary function is to estimate how many pages will be dirtied
+    after given time starting from "clean" state.
+    This function is non-linear and saturates at some point.
+    """
+
+    @dataclass
+    class Point:
+        period_millis:float
+        dirty_pages:float
+
+    def __init__(self, data):
+        """
+        :param data: dictionary returned by calc-dirty-rate
+        """
+        self.__points =3D self.__make_points(data)
+        self.__page_size =3D data['page-size']
+        self.__num_total_pages =3D data['n-total-pages']
+        self.__num_zero_pages =3D data['n-zero-pages'] / \
+                (data['n-sampled-pages'] / data['n-total-pages'])
+
+    def __make_points(self, data):
+        points =3D list()
+
+        # Add observed points
+        sample_ratio =3D data['n-sampled-pages'] / data['n-total-pages']
+        for millis,dirty_pages in zip(data['periods'], data['n-dirty-pages=
']):
+            millis =3D float(millis)
+            dirty_pages =3D dirty_pages / sample_ratio
+            points.append(MemoryModel.Point(millis, dirty_pages))
+
+        # Extrapolate function to the left.
+        # Assuming that the function is convex, the worst case is achieved
+        # when dirty page count immediately jumps to some value at zero ti=
me
+        # (infinite slope), and next keeps the same slope as in the region
+        # between the first two observed points: points[0]..points[1]
+        slope, offset =3D self.__fit_line(points[0], points[1])
+        points.insert(0, MemoryModel.Point(0.0, max(offset, 0.0)))
+
+        # Extrapolate function to the right.
+        # The worst case is achieved when the function has the same slope
+        # as in the last observed region.
+        slope, offset =3D self.__fit_line(points[-2], points[-1])
+        max_dirty_pages =3D \
+                data['n-total-pages'] - (data['n-zero-pages'] / sample_rat=
io)
+        if slope > 0.0:
+            saturation_millis =3D (max_dirty_pages - offset) / slope
+            points.append(MemoryModel.Point(saturation_millis, max_dirty_p=
ages))
+        points.append(MemoryModel.Point(math.inf, max_dirty_pages))
+
+        return points
+
+    def __fit_line(self, lhs:Point, rhs:Point):
+        slope =3D (rhs.dirty_pages - lhs.dirty_pages) / \
+                (rhs.period_millis - lhs.period_millis)
+        offset =3D lhs.dirty_pages - slope * lhs.period_millis
+        return slope, offset
+
+    def page_size(self):
+        """
+        Return page size in bytes
+        """
+        return self.__page_size
+
+    def num_total_pages(self):
+        return self.__num_total_pages
+
+    def num_zero_pages(self):
+        """
+        Estimated total number of zero pages. Assumed to be constant.
+        """
+        return self.__num_zero_pages
+
+    def num_dirty_pages(self, millis):
+        """
+        Estimate number of dirty pages after given time starting from "cle=
an"
+        state. The estimation is based on piece-wise linear interpolation.
+        """
+        for i in range(len(self.__points)):
+            if self.__points[i].period_millis =3D=3D millis:
+                return self.__points[i].dirty_pages
+            elif self.__points[i].period_millis > millis:
+                slope, offset =3D self.__fit_line(self.__points[i-1],
+                                                        self.__points[i])
+                return offset + slope * millis
+        raise RuntimeError("unreachable")
+
+
+def predict_migration_time(model, bandwidth, downtime, deadline=3D3600*100=
0):
+    """
+    Predict how much time it will take to migrate VM under under given
+    deadline constraint.
+
+    :param model: `MemoryModel` object for a given VM
+    :param bandwidth: Bandwidth available for migration [bytes/s]
+    :param downtime: Max allowed downtime [milliseconds]
+    :param deadline: Max total time to migrate VM before timeout [millisec=
onds]
+    :return: Predicted migration time [milliseconds] or `None`
+             if migration process doesn't converge before given deadline
+    """
+
+    left_zero_pages =3D model.num_zero_pages()
+    left_normal_pages =3D model.num_total_pages() - model.num_zero_pages()
+    header_size =3D 8
+
+    total_millis =3D 0.0
+    while True:
+        iter_bytes =3D 0.0
+        iter_bytes +=3D left_normal_pages * (model.page_size() + header_si=
ze)
+        iter_bytes +=3D left_zero_pages * header_size
+
+        iter_millis =3D iter_bytes * 1000.0 / bandwidth
+
+        total_millis +=3D iter_millis
+
+        if iter_millis <=3D downtime:
+            return int(math.ceil(total_millis))
+        elif total_millis > deadline:
+            return None
+        else:
+            left_zero_pages =3D 0
+            left_normal_pages =3D model.num_dirty_pages(iter_millis)
+
+
+def run_predict_cmd(model):
+    @dataclass
+    class ValStr:
+        value:object
+        string:str
+
+    def gbps(value):
+        return ValStr(value*1024*1024*1024/8, f'{value} Gbps')
+
+    def mbps(value):
+        return ValStr(value*1024*1024/8, f'{value} Mbps')
+
+    def dt(millis):
+        if millis is not None:
+            return ValStr(millis, f'{millis}ms')
+        else:
+            return ValStr(math.inf, 'unlim')
+
+    def eta(millis):
+        if millis is not None:
+            seconds =3D int(math.ceil(millis/1000.0))
+            minutes, seconds =3D divmod(seconds, 60)
+            s =3D ''
+            if minutes > 0:
+                s +=3D f'{minutes}m'
+            if len(s) > 0:
+                s +=3D f'{seconds:02d}s'
+            else:
+                s +=3D f'{seconds}s'
+        else:
+            s =3D '-'
+        return ValStr(millis, s)
+
+
+    bandwidths =3D [mbps(100), gbps(1), gbps(2), gbps(2.5), gbps(5), gbps(=
10),
+                  gbps(25), gbps(40)]
+    downtimes =3D [dt(125), dt(250), dt(500), dt(1000), dt(5000), dt(None)]
+
+    out =3D ''
+    out +=3D 'Downtime> |'
+    for downtime in downtimes:
+        out +=3D f'  {downtime.string:>7} |'
+    print(out)
+
+    print('-'*len(out))
+
+    for bandwidth in bandwidths:
+        print(f'{bandwidth.string:>9} | ', '', end=3D'')
+        for downtime in downtimes:
+            millis =3D predict_migration_time(model,
+                                            bandwidth.value,
+                                            downtime.value)
+            print(f'{eta(millis).string:>7} | ', '', end=3D'')
+        print()
+
+def main():
+    parser =3D argparse.ArgumentParser()
+    subparsers =3D parser.add_subparsers(dest=3D'command', required=3DTrue)
+
+    parser_cdr =3D subparsers.add_parser('calc-dirty-rate',
+            help=3D'Collect and print dirty page statistics from live VM')
+    parser_cdr.add_argument('--calc-time', type=3Dint, default=3D60,
+                            help=3D'Calculation time in seconds')
+    parser_cdr.add_argument('--sample-pages', type=3Dint, default=3D512,
+            help=3D'Number of sampled pages per one gigabyte of RAM')
+    parser_cdr.add_argument('host', metavar=3D'host', type=3Dstr, help=3D'=
QMP host')
+    parser_cdr.add_argument('port', metavar=3D'port', type=3Dint, help=3D'=
QMP port')
+
+    subparsers.add_parser('predict', help=3D'Predict migration time')
+
+    args =3D parser.parse_args()
+
+    if args.command =3D=3D 'calc-dirty-rate':
+        data =3D asyncio.run(calc_dirty_rate(host=3Dargs.host,
+                                           port=3Dargs.port,
+                                           calc_time=3Dargs.calc_time,
+                                           sample_pages=3Dargs.sample_page=
s))
+        print(json.dumps(data))
+    elif args.command =3D=3D 'predict':
+        data =3D json.load(sys.stdin)
+        model =3D MemoryModel(data)
+        run_predict_cmd(model)
+
+if __name__ =3D=3D '__main__':
+    main()
--=20
2.30.2