From nobody Sat Feb 7 06:40:54 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.libvirt.org designates 8.43.85.245 as permitted sender) client-ip=8.43.85.245; envelope-from=devel-bounces@lists.libvirt.org; helo=lists.libvirt.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of lists.libvirt.org designates 8.43.85.245 as permitted sender) smtp.mailfrom=devel-bounces@lists.libvirt.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.libvirt.org (lists.libvirt.org [8.43.85.245]) by mx.zohomail.com with SMTPS id 1736366082892428.7627692193055; Wed, 8 Jan 2025 11:54:42 -0800 (PST) Received: by lists.libvirt.org (Postfix, from userid 996) id E3F861442; Wed, 8 Jan 2025 14:54:41 -0500 (EST) Received: from lists.libvirt.org (localhost [IPv6:::1]) by lists.libvirt.org (Postfix) with ESMTP id 192E814F4; Wed, 8 Jan 2025 14:44:23 -0500 (EST) Received: by lists.libvirt.org (Postfix, from userid 996) id BE87A137A; Wed, 8 Jan 2025 14:44:00 -0500 (EST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.libvirt.org (Postfix) with ESMTPS id AB5481486 for ; Wed, 8 Jan 2025 14:43:32 -0500 (EST) Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-166-ybu1Enm6MLiX1TSqw7vgqg-1; Wed, 08 Jan 2025 14:43:31 -0500 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 37DEA1955F3B for ; Wed, 8 Jan 2025 19:43:30 +0000 (UTC) Received: from toolbx.redhat.com (unknown [10.42.28.103]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 402EF3000199; Wed, 8 Jan 2025 19:43:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on lists.libvirt.org X-Spam-Level: X-Spam-Status: No, score=-0.5 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,RCVD_IN_VALIDITY_RPBL_BLOCKED, RCVD_IN_VALIDITY_SAFE_BLOCKED,SPF_HELO_NONE autolearn=unavailable autolearn_force=no version=3.4.4 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736365412; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sAR/2TRlMpCNXImWb3D2Mk039ZirWkqdWPA91IEFm7U=; b=G/urW0Fg1g5GDo89U7rNZ1JJCsQ1bNI/iQEi8OWhEjYGpnnAy3jCUyacMFsheDQmu1FFPC MGqumUpXfU3QB0S4mQKbYjf4l3hLwqo0RSW8dND9k+rHuW1WVIbVmgtGA2AnsqM2sdrUyu yWo0/f5WTc4d7dS+BzQtV6jMfBvJ1W4= X-MC-Unique: ybu1Enm6MLiX1TSqw7vgqg-1 X-Mimecast-MFC-AGG-ID: ybu1Enm6MLiX1TSqw7vgqg From: =?UTF-8?q?Daniel=20P=2E=20Berrang=C3=A9?= To: devel@lists.libvirt.org Cc: =?UTF-8?q?Daniel=20P=2E=20Berrang=C3=A9?= Subject: [PATCH 23/26] rpc: fix shutdown sequence when preserving state Date: Wed, 8 Jan 2025 19:42:56 +0000 Message-ID: <20250108194259.1171990-24-berrange@redhat.com> In-Reply-To: <20250108194259.1171990-1-berrange@redhat.com> References: <20250108194259.1171990-1-berrange@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: izrIPjxUTwlAGkegXhCMzAzJ25wKU6aNNF6fJlXctmM_1736365410 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Message-ID-Hash: LTH3JGZS3FZR2C5QSSB45GRWFYSA3OUM X-Message-ID-Hash: LTH3JGZS3FZR2C5QSSB45GRWFYSA3OUM X-MailFrom: berrange@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-config-1; header-match-config-2; header-match-config-3; header-match-devel.lists.libvirt.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.2.2 Precedence: list List-Id: Development discussions about the libvirt library & tools Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1736366084346116600 Content-Type: text/plain; charset="utf-8" The preserving of state (ie running VMs) requires a fully functional daemon and hypervisor driver. If any part has started shutting down then saving state may fail, or worse, hang. The current shutdown sequence does not guarantee safe ordering, as we synchronize with the state saving thread only after the hypervisor driver has had its 'shutdownPrepare' callback invoked. In the case of QEMU this means that worker threads processing monitor events may well have been stopped. This implements a full state machine that has a well defined ordering that an earlier commit documented as the desired semantics. With this change, nothing will start shutting down if the state saving thread is still running. Signed-off-by: Daniel P. Berrang=C3=A9 --- src/remote/remote_daemon.c | 4 +- src/rpc/virnetdaemon.c | 103 ++++++++++++++++++++++++++----------- 2 files changed, 76 insertions(+), 31 deletions(-) diff --git a/src/remote/remote_daemon.c b/src/remote/remote_daemon.c index e03ee1de5a..f64ecad7f5 100644 --- a/src/remote/remote_daemon.c +++ b/src/remote/remote_daemon.c @@ -617,8 +617,8 @@ static void daemonRunStateInit(void *opaque) NULL, G_DBUS_SIGNAL_FLAGS_NONE, handleSystemMessageFunc, - dmn, - NULL); + dmn, + NULL); =20 /* Only now accept clients from network */ virNetDaemonUpdateServices(dmn, true); diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index 7f14c5420a..d1d7bae569 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -39,6 +39,15 @@ =20 VIR_LOG_INIT("rpc.netdaemon"); =20 +typedef enum { + VIR_NET_DAEMON_QUIT_NONE, + VIR_NET_DAEMON_QUIT_REQUESTED, + VIR_NET_DAEMON_QUIT_PRESERVING, + VIR_NET_DAEMON_QUIT_READY, + VIR_NET_DAEMON_QUIT_WAITING, + VIR_NET_DAEMON_QUIT_COMPLETED, +} virNetDaemonQuitPhase; + #ifndef WIN32 typedef struct _virNetDaemonSignal virNetDaemonSignal; struct _virNetDaemonSignal { @@ -69,9 +78,8 @@ struct _virNetDaemon { virNetDaemonShutdownCallback shutdownPrepareCb; virNetDaemonShutdownCallback shutdownWaitCb; virThread *shutdownPreserveThread; - int finishTimer; - bool quit; - bool finished; + int quitTimer; + virNetDaemonQuitPhase quit; bool graceful; bool execRestart; bool running; /* the daemon has reached the running phase */ @@ -414,7 +422,10 @@ virNetDaemonAutoShutdownTimer(int timerid G_GNUC_UNUSE= D, =20 if (!dmn->autoShutdownInhibitions) { VIR_DEBUG("Automatic shutdown triggered"); - dmn->quit =3D true; + if (dmn->quit =3D=3D VIR_NET_DAEMON_QUIT_NONE) { + VIR_DEBUG("Requesting daemon shutdown"); + dmn->quit =3D VIR_NET_DAEMON_QUIT_REQUESTED; + } } } =20 @@ -705,27 +716,26 @@ daemonShutdownWait(void *opaque) bool graceful =3D false; =20 virHashForEach(dmn->servers, daemonServerShutdownWait, NULL); - if (!dmn->shutdownWaitCb || dmn->shutdownWaitCb() >=3D 0) { - if (dmn->shutdownPreserveThread) - virThreadJoin(dmn->shutdownPreserveThread); - + if (!dmn->shutdownWaitCb || dmn->shutdownWaitCb() >=3D 0) graceful =3D true; - } =20 VIR_WITH_OBJECT_LOCK_GUARD(dmn) { dmn->graceful =3D graceful; - virEventUpdateTimeout(dmn->finishTimer, 0); + dmn->quit =3D VIR_NET_DAEMON_QUIT_COMPLETED; + virEventUpdateTimeout(dmn->quitTimer, 0); + VIR_DEBUG("Shutdown wait completed graceful=3D%d", graceful); } } =20 static void -virNetDaemonFinishTimer(int timerid G_GNUC_UNUSED, - void *opaque) +virNetDaemonQuitTimer(int timerid G_GNUC_UNUSED, + void *opaque) { virNetDaemon *dmn =3D opaque; VIR_LOCK_GUARD lock =3D virObjectLockGuard(dmn); =20 - dmn->finished =3D true; + dmn->quit =3D VIR_NET_DAEMON_QUIT_COMPLETED; + VIR_DEBUG("Shutdown wait timed out"); } =20 =20 @@ -742,9 +752,8 @@ virNetDaemonRun(virNetDaemon *dmn) goto cleanup; } =20 - dmn->quit =3D false; - dmn->finishTimer =3D -1; - dmn->finished =3D false; + dmn->quit =3D VIR_NET_DAEMON_QUIT_NONE; + dmn->quitTimer =3D -1; dmn->graceful =3D false; dmn->running =3D true; =20 @@ -753,7 +762,7 @@ virNetDaemonRun(virNetDaemon *dmn) virSystemdNotifyReady(); =20 VIR_DEBUG("dmn=3D%p quit=3D%d", dmn, dmn->quit); - while (!dmn->finished) { + while (dmn->quit !=3D VIR_NET_DAEMON_QUIT_COMPLETED) { virNetDaemonShutdownTimerUpdate(dmn); =20 virObjectUnlock(dmn); @@ -767,17 +776,30 @@ virNetDaemonRun(virNetDaemon *dmn) virHashForEach(dmn->servers, daemonServerProcessClients, NULL); =20 /* don't shutdown services when performing an exec-restart */ - if (dmn->quit && dmn->execRestart) + if (dmn->quit =3D=3D VIR_NET_DAEMON_QUIT_REQUESTED && dmn->execRes= tart) goto cleanup; =20 - if (dmn->quit && dmn->finishTimer =3D=3D -1) { + if (dmn->quit =3D=3D VIR_NET_DAEMON_QUIT_REQUESTED) { + VIR_DEBUG("Process quit request"); virHashForEach(dmn->servers, daemonServerClose, NULL); + + if (dmn->shutdownPreserveThread) { + VIR_DEBUG("Shutdown preserve thread running"); + dmn->quit =3D VIR_NET_DAEMON_QUIT_PRESERVING; + } else { + VIR_DEBUG("Ready to shutdown"); + dmn->quit =3D VIR_NET_DAEMON_QUIT_READY; + } + } + + if (dmn->quit =3D=3D VIR_NET_DAEMON_QUIT_READY) { + VIR_DEBUG("Starting shutdown, running prepare"); if (dmn->shutdownPrepareCb && dmn->shutdownPrepareCb() < 0) break; =20 - if ((dmn->finishTimer =3D virEventAddTimeout(30 * 1000, - virNetDaemonFinishT= imer, - dmn, NULL)) < 0) { + if ((dmn->quitTimer =3D virEventAddTimeout(30 * 1000, + virNetDaemonQuitTimer, + dmn, NULL)) < 0) { VIR_WARN("Failed to register finish timer."); break; } @@ -787,6 +809,9 @@ virNetDaemonRun(virNetDaemon *dmn) VIR_WARN("Failed to register join thread."); break; } + + VIR_DEBUG("Waiting for shutdown completion"); + dmn->quit =3D VIR_NET_DAEMON_QUIT_WAITING; } } =20 @@ -808,7 +833,7 @@ virNetDaemonQuit(virNetDaemon *dmn) VIR_LOCK_GUARD lock =3D virObjectLockGuard(dmn); =20 VIR_DEBUG("Quit requested %p", dmn); - dmn->quit =3D true; + dmn->quit =3D VIR_NET_DAEMON_QUIT_REQUESTED; } =20 =20 @@ -818,7 +843,7 @@ virNetDaemonQuitExecRestart(virNetDaemon *dmn) VIR_LOCK_GUARD lock =3D virObjectLockGuard(dmn); =20 VIR_DEBUG("Exec-restart requested %p", dmn); - dmn->quit =3D true; + dmn->quit =3D VIR_NET_DAEMON_QUIT_REQUESTED; dmn->execRestart =3D true; } =20 @@ -827,12 +852,21 @@ static void virNetDaemonPreserveWorker(void *opaque) { virNetDaemon *dmn =3D opaque; =20 - VIR_DEBUG("Begin stop dmn=3D%p", dmn); + VIR_DEBUG("Begin preserve dmn=3D%p", dmn); =20 dmn->shutdownPreserveCb(); =20 - VIR_DEBUG("Completed stop dmn=3D%p", dmn); + VIR_DEBUG("Completed preserve dmn=3D%p", dmn); =20 + VIR_WITH_OBJECT_LOCK_GUARD(dmn) { + if (dmn->quit =3D=3D VIR_NET_DAEMON_QUIT_PRESERVING) { + VIR_DEBUG("Marking shutdown as ready"); + dmn->quit =3D VIR_NET_DAEMON_QUIT_READY; + } + g_clear_pointer(&dmn->shutdownPreserveThread, g_free); + } + + VIR_DEBUG("End preserve dmn=3D%p", dmn); virObjectUnref(dmn); } =20 @@ -840,15 +874,26 @@ static void virNetDaemonPreserveWorker(void *opaque) void virNetDaemonPreserve(virNetDaemon *dmn) { VIR_LOCK_GUARD lock =3D virObjectLockGuard(dmn); + VIR_DEBUG("Preserve state request"); =20 - if (!dmn->shutdownPreserveCb || - dmn->shutdownPreserveThread) + if (!dmn->shutdownPreserveCb) { + VIR_DEBUG("No preserve callback registered"); return; + } + if (dmn->shutdownPreserveThread) { + VIR_DEBUG("Preserve state thread already running"); + return; + } + + if (dmn->quit !=3D VIR_NET_DAEMON_QUIT_NONE) { + VIR_WARN("Already initiated shutdown sequence, unable to preserve = state"); + return; + } =20 virObjectRef(dmn); dmn->shutdownPreserveThread =3D g_new0(virThread, 1); =20 - if (virThreadCreateFull(dmn->shutdownPreserveThread, true, + if (virThreadCreateFull(dmn->shutdownPreserveThread, false, virNetDaemonPreserveWorker, "daemon-stop", false, dmn) < 0) { virObjectUnref(dmn); --=20 2.47.1