From nobody Mon Dec 15 09:42:39 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.libvirt.org designates 8.43.85.245 as permitted sender) client-ip=8.43.85.245; envelope-from=devel-bounces@lists.libvirt.org; helo=lists.libvirt.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of lists.libvirt.org designates 8.43.85.245 as permitted sender) smtp.mailfrom=devel-bounces@lists.libvirt.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.libvirt.org (lists.libvirt.org [8.43.85.245]) by mx.zohomail.com with SMTPS id 1741800201951479.8617819386184; Wed, 12 Mar 2025 10:23:21 -0700 (PDT) Received: by lists.libvirt.org (Postfix, from userid 996) id D9C9B1A5F; Wed, 12 Mar 2025 13:23:20 -0400 (EDT) Received: from lists.libvirt.org (localhost [IPv6:::1]) by lists.libvirt.org (Postfix) with ESMTP id F3FC21E48; Wed, 12 Mar 2025 13:19:09 -0400 (EDT) Received: by lists.libvirt.org (Postfix, from userid 996) id 75CAC1B13; Wed, 12 Mar 2025 13:19:04 -0400 (EDT) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.libvirt.org (Postfix) with ESMTPS id B5C661CB6 for ; Wed, 12 Mar 2025 13:18:37 -0400 (EDT) Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-450-mTN2wnCxPYGElW_v_mipDQ-1; Wed, 12 Mar 2025 13:18:36 -0400 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 5B26F1955E8C for ; Wed, 12 Mar 2025 17:18:35 +0000 (UTC) Received: from toolbx.redhat.com (unknown [10.42.28.57]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 5180D18001DE; Wed, 12 Mar 2025 17:18:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on lists.libvirt.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,RCVD_IN_VALIDITY_RPBL_BLOCKED, RCVD_IN_VALIDITY_SAFE_BLOCKED,SPF_HELO_NONE autolearn=unavailable autolearn_force=no version=3.4.4 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1741799917; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/7UOAZblBGXVwCkqV8g/pO3033aD5xM5Q8f/KitgUx8=; b=WvpAh37iX+dRGXUrielrmgEcGas9MiPINSxYMgWNkeKDvwBa41GJPTJcX+4quIET1xyT8C DvyJd3GRpiDB8uHJWhwe+DlzejT6LWJyMpNninzFEcoIfywzL+niE9vK51PdaNQ5t8qKpC PI6PNBpI2SKelFYc9Q4o9n3sOntyGjI= X-MC-Unique: mTN2wnCxPYGElW_v_mipDQ-1 X-Mimecast-MFC-AGG-ID: mTN2wnCxPYGElW_v_mipDQ_1741799915 From: =?UTF-8?q?Daniel=20P=2E=20Berrang=C3=A9?= To: devel@lists.libvirt.org Cc: =?UTF-8?q?Daniel=20P=2E=20Berrang=C3=A9?= Subject: [PATCH v2 19/22] rpc: fix shutdown sequence when preserving state Date: Wed, 12 Mar 2025 17:17:59 +0000 Message-ID: <20250312171802.1854985-20-berrange@redhat.com> In-Reply-To: <20250312171802.1854985-1-berrange@redhat.com> References: <20250312171802.1854985-1-berrange@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: j29yZBuON9xkol5IbOFbmxLgzW5itXvKxahqR_M1_Ks_1741799915 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Message-ID-Hash: 3DBJTW76HS4X6UNM3GBWHAK4KIBHKAP4 X-Message-ID-Hash: 3DBJTW76HS4X6UNM3GBWHAK4KIBHKAP4 X-MailFrom: berrange@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-config-1; header-match-config-2; header-match-config-3; header-match-devel.lists.libvirt.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.2.2 Precedence: list List-Id: Development discussions about the libvirt library & tools Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1741800203610019100 Content-Type: text/plain; charset="utf-8" The preserving of state (ie running VMs) requires a fully functional daemon and hypervisor driver. If any part has started shutting down then saving state may fail, or worse, hang. The current shutdown sequence does not guarantee safe ordering, as we synchronize with the state saving thread only after the hypervisor driver has had its 'shutdownPrepare' callback invoked. In the case of QEMU this means that worker threads processing monitor events may well have been stopped. This implements a full state machine that has a well defined ordering that an earlier commit documented as the desired semantics. With this change, nothing will start shutting down if the state saving thread is still running. Signed-off-by: Daniel P. Berrang=C3=A9 Reviewed-by: Peter Krempa --- src/rpc/virnetdaemon.c | 107 ++++++++++++++++++++++++++++++----------- 1 file changed, 80 insertions(+), 27 deletions(-) diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c index 469a1d3ae2..53dee60703 100644 --- a/src/rpc/virnetdaemon.c +++ b/src/rpc/virnetdaemon.c @@ -39,6 +39,21 @@ =20 VIR_LOG_INIT("rpc.netdaemon"); =20 +/* The daemon shutdown process will step through + * each state listed below in the order they are + * declared. The 'STOPPING' state may be skipped + * over if the stateStopThread is not required + * for the particular shutdown scenari + */ +typedef enum { + VIR_NET_DAEMON_QUIT_NONE, /* Running normally */ + VIR_NET_DAEMON_QUIT_REQUESTED, /* Daemon shutdown requested */ + VIR_NET_DAEMON_QUIT_STOPPING, /* stateStopThread is running, so shutdo= wn cannot request must be delayed */ + VIR_NET_DAEMON_QUIT_READY, /* Ready to initiate shutdown request by ca= lling shutdownPrepareCb */ + VIR_NET_DAEMON_QUIT_WAITING, /* shutdownWaitCb is running, waiting for= it to finished */ + VIR_NET_DAEMON_QUIT_COMPLETED, /* shutdownWaitCb is finished, event lo= op will now terminate */ +} virNetDaemonQuitPhase; + #ifndef WIN32 typedef struct _virNetDaemonSignal virNetDaemonSignal; struct _virNetDaemonSignal { @@ -69,9 +84,8 @@ struct _virNetDaemon { virNetDaemonLifecycleCallback shutdownPrepareCb; virNetDaemonLifecycleCallback shutdownWaitCb; virThread *stateStopThread; - int finishTimer; - bool quit; - bool finished; + int quitTimer; + virNetDaemonQuitPhase quit; bool graceful; bool execRestart; bool running; /* the daemon has reached the running phase */ @@ -414,7 +428,10 @@ virNetDaemonAutoShutdownTimer(int timerid G_GNUC_UNUSE= D, =20 if (!dmn->autoShutdownInhibitions) { VIR_DEBUG("Automatic shutdown triggered"); - dmn->quit =3D true; + if (dmn->quit =3D=3D VIR_NET_DAEMON_QUIT_NONE) { + VIR_DEBUG("Requesting daemon shutdown"); + dmn->quit =3D VIR_NET_DAEMON_QUIT_REQUESTED; + } } } =20 @@ -709,27 +726,26 @@ daemonShutdownWait(void *opaque) bool graceful =3D false; =20 virHashForEach(dmn->servers, daemonServerShutdownWait, NULL); - if (!dmn->shutdownWaitCb || dmn->shutdownWaitCb() >=3D 0) { - if (dmn->stateStopThread) - virThreadJoin(dmn->stateStopThread); - + if (!dmn->shutdownWaitCb || dmn->shutdownWaitCb() >=3D 0) graceful =3D true; - } =20 VIR_WITH_OBJECT_LOCK_GUARD(dmn) { dmn->graceful =3D graceful; - virEventUpdateTimeout(dmn->finishTimer, 0); + dmn->quit =3D VIR_NET_DAEMON_QUIT_COMPLETED; + virEventUpdateTimeout(dmn->quitTimer, 0); + VIR_DEBUG("Shutdown wait completed graceful=3D%d", graceful); } } =20 static void -virNetDaemonFinishTimer(int timerid G_GNUC_UNUSED, - void *opaque) +virNetDaemonQuitTimer(int timerid G_GNUC_UNUSED, + void *opaque) { virNetDaemon *dmn =3D opaque; VIR_LOCK_GUARD lock =3D virObjectLockGuard(dmn); =20 - dmn->finished =3D true; + dmn->quit =3D VIR_NET_DAEMON_QUIT_COMPLETED; + VIR_DEBUG("Shutdown wait timed out"); } =20 =20 @@ -746,9 +762,8 @@ virNetDaemonRun(virNetDaemon *dmn) goto cleanup; } =20 - dmn->quit =3D false; - dmn->finishTimer =3D -1; - dmn->finished =3D false; + dmn->quit =3D VIR_NET_DAEMON_QUIT_NONE; + dmn->quitTimer =3D -1; dmn->graceful =3D false; dmn->running =3D true; =20 @@ -757,7 +772,7 @@ virNetDaemonRun(virNetDaemon *dmn) virSystemdNotifyReady(); =20 VIR_DEBUG("dmn=3D%p quit=3D%d", dmn, dmn->quit); - while (!dmn->finished) { + while (dmn->quit !=3D VIR_NET_DAEMON_QUIT_COMPLETED) { virNetDaemonShutdownTimerUpdate(dmn); =20 virObjectUnlock(dmn); @@ -771,17 +786,30 @@ virNetDaemonRun(virNetDaemon *dmn) virHashForEach(dmn->servers, daemonServerProcessClients, NULL); =20 /* don't shutdown services when performing an exec-restart */ - if (dmn->quit && dmn->execRestart) + if (dmn->quit =3D=3D VIR_NET_DAEMON_QUIT_REQUESTED && dmn->execRes= tart) goto cleanup; =20 - if (dmn->quit && dmn->finishTimer =3D=3D -1) { + if (dmn->quit =3D=3D VIR_NET_DAEMON_QUIT_REQUESTED) { + VIR_DEBUG("Process quit request"); virHashForEach(dmn->servers, daemonServerClose, NULL); + + if (dmn->stateStopThread) { + VIR_DEBUG("State stop thread running"); + dmn->quit =3D VIR_NET_DAEMON_QUIT_STOPPING; + } else { + VIR_DEBUG("Ready to shutdown"); + dmn->quit =3D VIR_NET_DAEMON_QUIT_READY; + } + } + + if (dmn->quit =3D=3D VIR_NET_DAEMON_QUIT_READY) { + VIR_DEBUG("Starting shutdown, running prepare"); if (dmn->shutdownPrepareCb && dmn->shutdownPrepareCb() < 0) break; =20 - if ((dmn->finishTimer =3D virEventAddTimeout(30 * 1000, - virNetDaemonFinishT= imer, - dmn, NULL)) < 0) { + if ((dmn->quitTimer =3D virEventAddTimeout(30 * 1000, + virNetDaemonQuitTimer, + dmn, NULL)) < 0) { VIR_WARN("Failed to register finish timer."); break; } @@ -791,6 +819,9 @@ virNetDaemonRun(virNetDaemon *dmn) VIR_WARN("Failed to register join thread."); break; } + + VIR_DEBUG("Waiting for shutdown completion"); + dmn->quit =3D VIR_NET_DAEMON_QUIT_WAITING; } } =20 @@ -814,7 +845,8 @@ virNetDaemonQuit(virNetDaemon *dmn) VIR_LOCK_GUARD lock =3D virObjectLockGuard(dmn); =20 VIR_DEBUG("Quit requested %p", dmn); - dmn->quit =3D true; + if (dmn->quit =3D=3D VIR_NET_DAEMON_QUIT_NONE) + dmn->quit =3D VIR_NET_DAEMON_QUIT_REQUESTED; } =20 =20 @@ -824,7 +856,8 @@ virNetDaemonQuitExecRestart(virNetDaemon *dmn) VIR_LOCK_GUARD lock =3D virObjectLockGuard(dmn); =20 VIR_DEBUG("Exec-restart requested %p", dmn); - dmn->quit =3D true; + if (dmn->quit =3D=3D VIR_NET_DAEMON_QUIT_NONE) + dmn->quit =3D VIR_NET_DAEMON_QUIT_REQUESTED; dmn->execRestart =3D true; } =20 @@ -840,6 +873,15 @@ virNetDaemonStopWorker(void *opaque) =20 VIR_DEBUG("Completed stop dmn=3D%p", dmn); =20 + VIR_WITH_OBJECT_LOCK_GUARD(dmn) { + if (dmn->quit =3D=3D VIR_NET_DAEMON_QUIT_STOPPING) { + VIR_DEBUG("Marking shutdown as ready"); + dmn->quit =3D VIR_NET_DAEMON_QUIT_READY; + } + g_clear_pointer(&dmn->stateStopThread, g_free); + } + + VIR_DEBUG("End stop dmn=3D%p", dmn); virObjectUnref(dmn); } =20 @@ -848,15 +890,26 @@ void virNetDaemonStop(virNetDaemon *dmn) { VIR_LOCK_GUARD lock =3D virObjectLockGuard(dmn); + VIR_DEBUG("State stoprequest"); =20 - if (!dmn->stopCb || - dmn->stateStopThread) + if (!dmn->stopCb) { + VIR_DEBUG("No stop callback registered"); return; + } + if (dmn->stateStopThread) { + VIR_DEBUG("State stop thread already running"); + return; + } + + if (dmn->quit !=3D VIR_NET_DAEMON_QUIT_NONE) { + VIR_WARN("Already initiated shutdown sequence, unable to stop stat= e"); + return; + } =20 virObjectRef(dmn); dmn->stateStopThread =3D g_new0(virThread, 1); =20 - if (virThreadCreateFull(dmn->stateStopThread, true, + if (virThreadCreateFull(dmn->stateStopThread, false, virNetDaemonStopWorker, "daemon-stop", false, dmn) < 0) { virObjectUnref(dmn); --=20 2.48.1