From nobody Sun Feb 8 18:24:32 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of redhat.com designates 170.10.129.124 as permitted sender) client-ip=170.10.129.124; envelope-from=libvir-list-bounces@redhat.com; helo=us-smtp-delivery-124.mimecast.com; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1654088162; cv=none; d=zohomail.com; s=zohoarc; b=OvRM6Nawo7xOfwJLmqIgTCYx15L27P+ho0WW7aHRC5rnGkbGQuTZDQros1yoDvnLVsNseDLbL88r1MRi4tHve2AN+CZSNt6TxTgY4LUUEnz4Zld+bWIeL1AiR5f/sBif9wKc7nPLxNFmGS30KqOnKo3QHT6OYuSBtI6ZJnMD8g4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1654088162; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=+YN7ft/wBGWdtCOz7yOkOMDJ3FrzUxi2UwaQ6g2TRPU=; b=hETyhE7IGmIfNhInaKYpcvWqHaryn3H2Rf86J3/YWAbFSSOCq2jtFWjaXQzbM9SUNnv8B2bDQvoWvqEsQ+79QNAv3CsfbuYT1m8JzdQ3mWqYks8j3WNOxifnmugcTseIYu/1jK+paNGyrmhSfBG6pNUo4s3DNakBbU4s0OhZkyY= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mx.zohomail.com with SMTPS id 1654088162572579.8695767600773; Wed, 1 Jun 2022 05:56:02 -0700 (PDT) Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-73-VS1GDsmpPRW_kdvnm-V9MA-1; Wed, 01 Jun 2022 08:56:00 -0400 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1ECFE3802BBA; Wed, 1 Jun 2022 12:55:56 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (unknown [10.30.29.100]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0811C40CFD0A; Wed, 1 Jun 2022 12:55:56 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (localhost [IPv6:::1]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id 3D1C5193F6F4; Wed, 1 Jun 2022 12:55:54 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id 032A3194707B for ; Wed, 1 Jun 2022 12:55:52 +0000 (UTC) Received: by smtp.corp.redhat.com (Postfix) id ECF821121315; Wed, 1 Jun 2022 12:55:51 +0000 (UTC) Received: from virval.usersys.redhat.com (unknown [10.43.2.227]) by smtp.corp.redhat.com (Postfix) with ESMTPS id AA25D1121314; Wed, 1 Jun 2022 12:55:51 +0000 (UTC) Received: by virval.usersys.redhat.com (Postfix, from userid 500) id CC226245BAB; Wed, 1 Jun 2022 14:50:25 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1654088161; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=+YN7ft/wBGWdtCOz7yOkOMDJ3FrzUxi2UwaQ6g2TRPU=; b=T0EoBFkFp9gshIl+ohQjH6fZONhQhhlhaIBlKiVCvUjuD7kADsknPvFERrJx86nzwoqCYM 9rjKgHuvwSezcl3p9sHOZdFHeRVIit9Eb2ivbGZVndGtDxQOKgL5/kXj3XiU2HKt00C9kg mDI8SSsu9bx6EoHSeOJX+Nr7xaJA47E= X-MC-Unique: VS1GDsmpPRW_kdvnm-V9MA-1 X-Original-To: libvir-list@listman.corp.redhat.com From: Jiri Denemark To: libvir-list@redhat.com Subject: [libvirt PATCH v2 81/81] RFC: qemu: Keep vCPUs paused while migration is in postcopy-paused Date: Wed, 1 Jun 2022 14:50:21 +0200 Message-Id: <3802648cbaafebaa9ca4b2932a660ac06c66b725.1654087150.git.jdenemar@redhat.com> In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Xu Errors-To: libvir-list-bounces@redhat.com Sender: "libvir-list" X-Scanned-By: MIMEDefang 2.84 on 10.11.54.1 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=libvir-list-bounces@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1654088163031100005 Content-Type: text/plain; charset="utf-8" QEMU keeps guest CPUs running even in postcopy-paused migration state so that processes that already have all memory pages they need migrated to the destination can keep running. However, this behavior might bring unexpected delays in interprocess communication as some processes will be stopped until migration is recover and their memory pages migrated. So let's make sure all guest CPUs are paused while postcopy migration is paused. --- Notes: Version 2: - new patch - this patch does not currently work as QEMU cannot handle "stop" QMP command while in postcopy-paused state... the monitor just hangs (see https://gitlab.com/qemu-project/qemu/-/issues/1052 ) - an ideal solution of the QEMU bug would be if QEMU itself paused the CPUs for us and we just got notified about it via QMP events - but Peter Xu thinks this behavior is actually worse than keeping vCPUs running - so let's take this patch as a base for discussing what we should be doing with vCPUs in postcopy-paused migration state src/qemu/qemu_domain.c | 1 + src/qemu/qemu_domain.h | 1 + src/qemu/qemu_driver.c | 30 +++++++++++++++++++++++++ src/qemu/qemu_migration.c | 47 +++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_migration.h | 6 +++++ src/qemu/qemu_process.c | 32 ++++++++++++++++++++++++++ 6 files changed, 117 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index d04ec6cd0c..dcd6d5e1b5 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -11115,6 +11115,7 @@ qemuProcessEventFree(struct qemuProcessEvent *event) break; case QEMU_PROCESS_EVENT_PR_DISCONNECT: case QEMU_PROCESS_EVENT_UNATTENDED_MIGRATION: + case QEMU_PROCESS_EVENT_MIGRATION_CPU_STATE: case QEMU_PROCESS_EVENT_LAST: break; } diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 153dfe3a23..f5cdb2235f 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -427,6 +427,7 @@ typedef enum { QEMU_PROCESS_EVENT_GUEST_CRASHLOADED, QEMU_PROCESS_EVENT_MEMORY_DEVICE_SIZE_CHANGE, QEMU_PROCESS_EVENT_UNATTENDED_MIGRATION, + QEMU_PROCESS_EVENT_MIGRATION_CPU_STATE, =20 QEMU_PROCESS_EVENT_LAST } qemuProcessEventType; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 637106f1b3..d0498ef2aa 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4255,6 +4255,33 @@ processMemoryDeviceSizeChange(virQEMUDriver *driver, } =20 =20 +static void +processMigrationCPUState(virDomainObj *vm, + virDomainState state, + int reason) +{ + qemuDomainObjPrivate *priv =3D vm->privateData; + virQEMUDriver *driver =3D priv->driver; + + if (qemuDomainObjBeginJob(driver, vm, VIR_JOB_MIGRATION_SAFE) < 0) + return; + + if (!virDomainObjIsActive(vm)) { + VIR_DEBUG("Domain '%s' is not running", vm->def->name); + goto endjob; + } + + if (priv->job.asyncJob =3D=3D VIR_ASYNC_JOB_MIGRATION_IN && + virDomainObjIsPostcopy(vm, VIR_DOMAIN_JOB_OPERATION_MIGRATION_IN))= { + qemuMigrationUpdatePostcopyCPUState(vm, state, reason, + VIR_ASYNC_JOB_NONE); + } + + endjob: + qemuDomainObjEndJob(vm); +} + + static void qemuProcessEventHandler(void *data, void *opaque) { struct qemuProcessEvent *processEvent =3D data; @@ -4312,6 +4339,9 @@ static void qemuProcessEventHandler(void *data, void = *opaque) processEvent->action, processEvent->status); break; + case QEMU_PROCESS_EVENT_MIGRATION_CPU_STATE: + processMigrationCPUState(vm, processEvent->action, processEvent->s= tatus); + break; case QEMU_PROCESS_EVENT_LAST: break; } diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 0314fb1148..58d7009363 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -6831,6 +6831,53 @@ qemuMigrationProcessUnattended(virQEMUDriver *driver, } =20 =20 +void +qemuMigrationUpdatePostcopyCPUState(virDomainObj *vm, + virDomainState state, + int reason, + int asyncJob) +{ + virQEMUDriver *driver =3D QEMU_DOMAIN_PRIVATE(vm)->driver; + int current; + + if (state =3D=3D VIR_DOMAIN_PAUSED) { + VIR_DEBUG("Post-copy migration of domain '%s' was paused, stopping= guest CPUs", + vm->def->name); + } else { + VIR_DEBUG("Post-copy migration of domain '%s' was resumed, startin= g guest CPUs", + vm->def->name); + } + + if (virDomainObjGetState(vm, ¤t) =3D=3D state) { + int eventType =3D -1; + int eventDetail =3D -1; + + if (current =3D=3D reason) { + VIR_DEBUG("Guest CPUs are already in the right state"); + return; + } + + VIR_DEBUG("Fixing domain state reason"); + if (state =3D=3D VIR_DOMAIN_PAUSED) { + eventType =3D VIR_DOMAIN_EVENT_SUSPENDED; + eventDetail =3D qemuDomainPausedReasonToSuspendedEvent(reason); + } else { + eventType =3D VIR_DOMAIN_EVENT_RESUMED; + eventDetail =3D qemuDomainRunningReasonToResumeEvent(reason); + } + virDomainObjSetState(vm, state, reason); + qemuDomainSaveStatus(vm); + virObjectEventStateQueue(driver->domainEventState, + virDomainEventLifecycleNewFromObj(vm, eve= ntType, + eventDe= tail)); + } else if (state =3D=3D VIR_DOMAIN_PAUSED) { + qemuProcessStopCPUs(driver, vm, reason, asyncJob); + } else { + qemuProcessStartCPUs(driver, vm, reason, asyncJob); + } +} + + /* Helper function called while vm is active. */ int qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm, diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h index fbc0549b34..a1e2d8d171 100644 --- a/src/qemu/qemu_migration.h +++ b/src/qemu/qemu_migration.h @@ -224,6 +224,12 @@ qemuMigrationProcessUnattended(virQEMUDriver *driver, virDomainAsyncJob job, qemuMonitorMigrationStatus status); =20 +void +qemuMigrationUpdatePostcopyCPUState(virDomainObj *vm, + virDomainState state, + int reason, + int asyncJob); + bool qemuMigrationSrcIsAllowed(virQEMUDriver *driver, virDomainObj *vm, diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index ad529dabb4..7fff68c0db 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1521,6 +1521,10 @@ qemuProcessHandleMigrationStatus(qemuMonitor *mon G_= GNUC_UNUSED, * Thus we need to handle the event here. */ qemuMigrationSrcPostcopyFailed(vm); qemuDomainSaveStatus(vm); + } else if (priv->job.asyncJob =3D=3D VIR_ASYNC_JOB_MIGRATION_IN) { + qemuProcessEventSubmit(vm, QEMU_PROCESS_EVENT_MIGRATION_CPU_ST= ATE, + VIR_DOMAIN_PAUSED, + VIR_DOMAIN_PAUSED_POSTCOPY_FAILED, NULL= ); } break; =20 @@ -1547,6 +1551,12 @@ qemuProcessHandleMigrationStatus(qemuMonitor *mon G_= GNUC_UNUSED, event =3D virDomainEventLifecycleNewFromObj(vm, eventType, eve= ntDetail); qemuDomainSaveStatus(vm); } + + if (priv->job.asyncJob =3D=3D VIR_ASYNC_JOB_MIGRATION_IN) { + qemuProcessEventSubmit(vm, QEMU_PROCESS_EVENT_MIGRATION_CPU_ST= ATE, + VIR_DOMAIN_RUNNING, + VIR_DOMAIN_RUNNING_POSTCOPY, NULL); + } break; =20 case QEMU_MONITOR_MIGRATION_STATUS_COMPLETED: @@ -3703,10 +3713,32 @@ qemuProcessRecoverMigration(virQEMUDriver *driver, if (migStatus =3D=3D VIR_DOMAIN_JOB_STATUS_POSTCOPY) { VIR_DEBUG("Post-copy migration of domain %s still running, it = will be handled as unattended", vm->def->name); + + if (job->asyncJob =3D=3D VIR_ASYNC_JOB_MIGRATION_IN && + state =3D=3D VIR_DOMAIN_PAUSED) { + qemuMigrationUpdatePostcopyCPUState(vm, VIR_DOMAIN_RUNNING, + VIR_DOMAIN_RUNNING_POS= TCOPY, + VIR_ASYNC_JOB_NONE); + } else { + if (state =3D=3D VIR_DOMAIN_RUNNING) + reason =3D VIR_DOMAIN_RUNNING_POSTCOPY; + else + reason =3D VIR_DOMAIN_PAUSED_POSTCOPY; + + virDomainObjSetState(vm, state, reason); + } + qemuProcessRestoreMigrationJob(vm, job); return 0; } =20 + if (job->asyncJob =3D=3D VIR_ASYNC_JOB_MIGRATION_IN && + migStatus =3D=3D VIR_DOMAIN_JOB_STATUS_POSTCOPY_PAUSED) { + qemuMigrationUpdatePostcopyCPUState(vm, VIR_DOMAIN_PAUSED, + VIR_DOMAIN_PAUSED_POSTCOPY, + VIR_ASYNC_JOB_NONE); + } + if (migStatus !=3D VIR_DOMAIN_JOB_STATUS_HYPERVISOR_COMPLETED) { if (job->asyncJob =3D=3D VIR_ASYNC_JOB_MIGRATION_OUT) qemuMigrationSrcPostcopyFailed(vm); --=20 2.35.1