From nobody Tue Dec 16 11:44:10 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1748387662; cv=none; d=zohomail.com; s=zohoarc; b=ib9bIVPCfdm/zcGqHkQVzzSpruXmdhfkLfeWvE04xhet25LAM9Z4mmA1eqs1gpBEEgJPhiugN8HXzF0g7x++NJh5yte1mbsLbzNeavDCOQWZp2d3G+jjFbXxGhuhsLpJHXIqewwGQuo2Tmilo4PkxJi8Ha2mWY/qiRZ/TswZV7Y= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1748387662; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=L8xxihyV1kwj1WX+SgxHrTlMY975PGImx8ozXT5u7mk=; b=cZoKX1g2AHQibX8SRlXFMHGLmTgdONza0njLDPQ5JVQnPG5vlWf6FZX5FWmiTmJ1cfW7d0+tQT/7ED6dWHf3XYfKfKszwstW5y8ffsVpTt9u3MTO9V0kXDlMAfaYqzjQXG0Vvd56Z8lQc7eTY+V4oWHGx6r5Ajvqks4vnVZZZ/4= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1748387662400532.8428391779075; Tue, 27 May 2025 16:14:22 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1uK3Tv-0003Jj-FL; Tue, 27 May 2025 19:13:15 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uK3Ts-0003Ib-SL for qemu-devel@nongnu.org; Tue, 27 May 2025 19:13:13 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uK3Tq-0001Dv-UC for qemu-devel@nongnu.org; Tue, 27 May 2025 19:13:12 -0400 Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-631-Iu4NqknGN2OjgOOIh0dbAw-1; Tue, 27 May 2025 19:13:08 -0400 Received: by mail-qt1-f198.google.com with SMTP id d75a77b69052e-478f78ff9beso116118981cf.1 for ; Tue, 27 May 2025 16:13:08 -0700 (PDT) Received: from x1.com ([85.131.185.92]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4a2f939cad3sm1984441cf.79.2025.05.27.16.13.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 May 2025 16:13:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1748387590; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=L8xxihyV1kwj1WX+SgxHrTlMY975PGImx8ozXT5u7mk=; b=M9GpP/B6Aszmg4shXrGyXElJXh+HHngMHdHJsKQxFsL93RQ4W6JcgktGebmOziC4PSXB+z XDWln+SQIPr+jIIqslaIuEmLz74vcfH8z7B2xRcSNnfWFvcsurDsBMTZU2iFdW+8AIjrsU f809ogBFKuIfPWOyBXY/Kp4j+Cztlh0= X-MC-Unique: Iu4NqknGN2OjgOOIh0dbAw-1 X-Mimecast-MFC-AGG-ID: Iu4NqknGN2OjgOOIh0dbAw_1748387588 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748387587; x=1748992387; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=L8xxihyV1kwj1WX+SgxHrTlMY975PGImx8ozXT5u7mk=; b=ubGXlD7LxVYwimoUVCI+OAjvc/WYbHEoUVpwG4lKnd/djQhXXWpioGvdb5e6T3DbHS jBK0GAW7wDI38bVu1agO91hTslq/Z52JvYdk9WQsSdOmAySSTe8fRIWjnqoSzP5f9RRu f3gVw0KpGZSL6f835qQZRSbpgho3+/RNJtuGmsl2M71nvKT6QNQS3VKGmT9YQL84lVVJ CuCq+I79sO4sOpLit/cIRC/m5Kwn9AagSWSXZFGK6yg+DATaVrCIjX0X8ca+vp2vxX1N 4QyGjuhg1ZWgKF293FKTI4mxmWrOLTCt55aDZSy1fq9o1PM7ZN5Ei/BheMku22KKEWDX r4QQ== X-Gm-Message-State: AOJu0YzQ1Yo8NzcosjRJMfkqasbOx2imSyc60WvZRFHOmXmY7WuioAV1 R6yAwHFRl0KTylvkdgM/KG4SNJ8aN5NZMw90tlKaxQuNL6p95OgeORHsiarr4o9BYUn/VMBjyc3 Ua8Hxlrr/NjJPMoFOCNeFlyeWKncRAIc4ieVTKJIR8ye5u9caFebdXOP+3lMvqCVCEJqb+58IF9 fknG5zaRuyJx2uUDA1xpBdXKY2+EejU3IFLrR2CA== X-Gm-Gg: ASbGncsEKJHZObhzsCd20ZG2uZWU46nuJyRp8ib5BW18Fl/g/U8w/W3c5vDgqFEpOoN r+y+bT7DsE3zlXlBjR005PYvoGrdyCDKP02QeHG0Gyv3jJzr7Xv5QOPqwcKs3KejxNuvhAq/uGY MOrlxObKchfNHEhV4tB7mN3KsNCNmN0qUEq/ub6PfU8PhbQ4S9fGn2nLvGt3V9I5BnmlQ8yZSv4 AN0NMRq458TF1EezW6u/fbcF4OUVS/UMjygwpxgzMEWVPTgb9tgNfEXCmqnSgnyEyANeQUYZfKv X-Received: by 2002:a05:622a:4c0e:b0:48d:7c95:4878 with SMTP id d75a77b69052e-49f46e3b379mr236342641cf.29.1748387587549; Tue, 27 May 2025 16:13:07 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE1sxba2rgi/DbE8hy//0giKNyhhtBIrget6BSYEYRHm6fbQONqdiKkdz2qg3Okp6dp6mXDog== X-Received: by 2002:a05:622a:4c0e:b0:48d:7c95:4878 with SMTP id d75a77b69052e-49f46e3b379mr236342301cf.29.1748387587183; Tue, 27 May 2025 16:13:07 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: "Dr . David Alan Gilbert" , peterx@redhat.com, Alexey Perevalov , Fabiano Rosas , Juraj Marcin , Markus Armbruster Subject: [PATCH 08/13] migration/postcopy: Report fault latencies in blocktime Date: Tue, 27 May 2025 19:12:43 -0400 Message-ID: <20250527231248.1279174-9-peterx@redhat.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250527231248.1279174-1-peterx@redhat.com> References: <20250527231248.1279174-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -49 X-Spam_score: -5.0 X-Spam_bar: ----- X-Spam_report: (-5.0 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-2.907, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1748387663814116600 Content-Type: text/plain; charset="utf-8" Blocktime so far only cares about the time one vcpu (or the whole system) got blocked. It would be also be helpful if it can also report the latency of page requests, which could be very sensitive during postcopy. Blocktime itself is sometimes not very important, especially when one thinks about KVM async PF support, which means vCPUs are literally almost not blocked at all because the guest OS is smart enough to switch to another task when a remote fault is needed. However, latency is still sensitive and important because even if the guest vCPU is running on threads that do not need a remote fault, the workload that accesses some missing page is still affected. Add two entries to the report, showing how long it takes to resolve a remote fault. Mention in the QAPI doc that this is not the real average fault latency, but only the ones that was requested for a remote fault. Unwrap get_vcpu_blocktime_list() so we don't need to walk the list twice, meanwhile add the entry checks in qtests for all postcopy tests. Cc: Markus Armbruster Cc: Dr. David Alan Gilbert Signed-off-by: Peter Xu Reviewed-by: Fabiano Rosas --- qapi/migration.json | 13 +++++ migration/migration-hmp-cmds.c | 70 ++++++++++++++++++--------- migration/postcopy-ram.c | 48 ++++++++++++------ tests/qtest/migration/migration-qmp.c | 3 ++ 4 files changed, 97 insertions(+), 37 deletions(-) diff --git a/qapi/migration.json b/qapi/migration.json index 8b9c53595c..8b13cea169 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -236,6 +236,17 @@ # This is only present when the postcopy-blocktime migration # capability is enabled. (Since 3.0) # +# @postcopy-latency: average remote page fault latency (in us). Note that +# this doesn't include all faults, but only the ones that require a +# remote page request. So it should be always bigger than the real +# average page fault latency. This is only present when the +# postcopy-blocktime migration capability is enabled. (Since 10.1) +# +# @postcopy-vcpu-latency: average remote page fault latency per vCPU (in +# us). It has the same definition of @postcopy-latency, but instead +# this is the per-vCPU statistics. This is only present when the +# postcopy-blocktime migration capability is enabled. (Since 10.1) +# # @socket-address: Only used for tcp, to know what the real port is # (Since 4.0) # @@ -275,6 +286,8 @@ '*blocked-reasons': ['str'], '*postcopy-blocktime': 'uint32', '*postcopy-vcpu-blocktime': ['uint32'], + '*postcopy-latency': 'uint64', + '*postcopy-vcpu-latency': ['uint64'], '*socket-address': ['SocketAddress'], '*dirty-limit-throttle-time-per-round': 'uint64', '*dirty-limit-ring-full-time': 'uint64'} } diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c index 3cf890b887..a18049a7e8 100644 --- a/migration/migration-hmp-cmds.c +++ b/migration/migration-hmp-cmds.c @@ -52,6 +52,53 @@ static void migration_global_dump(Monitor *mon) ms->clear_bitmap_shift); } =20 +static void migration_dump_blocktime(Monitor *mon, MigrationInfo *info) +{ + if (info->has_postcopy_blocktime) { + monitor_printf(mon, "Postcopy Blocktime (ms): %" PRIu32 "\n", + info->postcopy_blocktime); + } + + if (info->has_postcopy_vcpu_blocktime) { + uint32List *item =3D info->postcopy_vcpu_blocktime; + int count =3D 0; + + monitor_printf(mon, "Postcopy vCPU Blocktime (ms): \n ["); + + while (item) { + monitor_printf(mon, "%"PRIu32", ", item->value); + item =3D item->next; + /* Each line 10 vcpu results, newline if there's more */ + if ((++count % 10 =3D=3D 0) && item) { + monitor_printf(mon, "\n "); + } + } + monitor_printf(mon, "\b\b]\n"); + } + + if (info->has_postcopy_latency) { + monitor_printf(mon, "Postcopy Latency (us): %" PRIu64 "\n", + info->postcopy_latency); + } + + if (info->has_postcopy_vcpu_latency) { + uint64List *item =3D info->postcopy_vcpu_latency; + int count =3D 0; + + monitor_printf(mon, "Postcopy vCPU Latencies (us): \n ["); + + while (item) { + monitor_printf(mon, "%"PRIu64", ", item->value); + item =3D item->next; + /* Each line 10 vcpu results, newline if there's more */ + if ((++count % 10 =3D=3D 0) && item) { + monitor_printf(mon, "\n "); + } + } + monitor_printf(mon, "\b\b]\n"); + } +} + void hmp_info_migrate(Monitor *mon, const QDict *qdict) { bool show_all =3D qdict_get_try_bool(qdict, "all", false); @@ -202,28 +249,7 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict) info->dirty_limit_ring_full_time); } =20 - if (info->has_postcopy_blocktime) { - monitor_printf(mon, "Postcopy Blocktime (ms): %" PRIu32 "\n", - info->postcopy_blocktime); - } - - if (info->has_postcopy_vcpu_blocktime) { - uint32List *item =3D info->postcopy_vcpu_blocktime; - int count =3D 0; - - monitor_printf(mon, "Postcopy vCPU Blocktime (ms): \n ["); - - while (item) { - monitor_printf(mon, "%"PRIu32", ", item->value); - item =3D item->next; - /* Each line 10 vcpu results, newline if there's more */ - if ((++count % 10 =3D=3D 0) && item) { - monitor_printf(mon, "\n "); - } - } - monitor_printf(mon, "\b\b]\n"); - } - + migration_dump_blocktime(mon, info); out: qapi_free_MigrationInfo(info); } diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index 46a8fdb6c2..2aca41b3d7 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -166,20 +166,6 @@ static struct PostcopyBlocktimeContext *blocktime_cont= ext_new(void) return ctx; } =20 -static uint32List *get_vcpu_blocktime_list(PostcopyBlocktimeContext *ctx) -{ - MachineState *ms =3D MACHINE(qdev_get_machine()); - uint32List *list =3D NULL; - int i; - - for (i =3D ms->smp.cpus - 1; i >=3D 0; i--) { - QAPI_LIST_PREPEND( - list, (uint32_t)(ctx->vcpu_blocktime_total[i] / 1000)); - } - - return list; -} - /* * This function just populates MigrationInfo from postcopy's * blocktime context. It will not populate MigrationInfo, @@ -191,15 +177,47 @@ void fill_destination_postcopy_migration_info(Migrati= onInfo *info) { MigrationIncomingState *mis =3D migration_incoming_get_current(); PostcopyBlocktimeContext *bc =3D mis->blocktime_ctx; + MachineState *ms =3D MACHINE(qdev_get_machine()); + uint64_t latency_total =3D 0, faults =3D 0; + uint32List *list_blocktime =3D NULL; + uint64List *list_latency =3D NULL; + int i; =20 if (!bc) { return; } =20 + for (i =3D ms->smp.cpus - 1; i >=3D 0; i--) { + uint64_t latency, total, count; + + /* This is in milliseconds */ + QAPI_LIST_PREPEND(list_blocktime, + (uint32_t)(bc->vcpu_blocktime_total[i] / 1000)); + + /* The rest in microseconds */ + total =3D bc->vcpu_blocktime_total[i]; + latency_total +=3D total; + count =3D bc->vcpu_faults_count[i]; + faults +=3D count; + + if (count) { + latency =3D total / count; + } else { + /* No fault detected */ + latency =3D 0; + } + + QAPI_LIST_PREPEND(list_latency, latency); + } + info->has_postcopy_blocktime =3D true; info->postcopy_blocktime =3D (uint32_t)(bc->total_blocktime / 1000); info->has_postcopy_vcpu_blocktime =3D true; - info->postcopy_vcpu_blocktime =3D get_vcpu_blocktime_list(bc); + info->postcopy_vcpu_blocktime =3D list_blocktime; + info->has_postcopy_latency =3D true; + info->postcopy_latency =3D faults ? (latency_total / faults) : 0; + info->has_postcopy_vcpu_latency =3D true; + info->postcopy_vcpu_latency =3D list_latency; } =20 static uint64_t get_postcopy_total_blocktime(void) diff --git a/tests/qtest/migration/migration-qmp.c b/tests/qtest/migration/= migration-qmp.c index fb59741b2c..1a5ab2d229 100644 --- a/tests/qtest/migration/migration-qmp.c +++ b/tests/qtest/migration/migration-qmp.c @@ -358,6 +358,9 @@ void read_blocktime(QTestState *who) =20 rsp_return =3D migrate_query_not_failed(who); g_assert(qdict_haskey(rsp_return, "postcopy-blocktime")); + g_assert(qdict_haskey(rsp_return, "postcopy-vcpu-blocktime")); + g_assert(qdict_haskey(rsp_return, "postcopy-latency")); + g_assert(qdict_haskey(rsp_return, "postcopy-vcpu-latency")); qobject_unref(rsp_return); } =20 --=20 2.49.0