From nobody Mon Sep 16 00:05:15 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.libvirt.org designates 8.43.85.245 as permitted sender) client-ip=8.43.85.245; envelope-from=devel-bounces@lists.libvirt.org; helo=lists.libvirt.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of lists.libvirt.org designates 8.43.85.245 as permitted sender) smtp.mailfrom=devel-bounces@lists.libvirt.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.libvirt.org (lists.libvirt.org [8.43.85.245]) by mx.zohomail.com with SMTPS id 1721393251881448.611992170769; Fri, 19 Jul 2024 05:47:31 -0700 (PDT) Received: by lists.libvirt.org (Postfix, from userid 996) id AB5EA9DE; Fri, 19 Jul 2024 08:47:30 -0400 (EDT) Received: from lists.libvirt.org (localhost [IPv6:::1]) by lists.libvirt.org (Postfix) with ESMTP id 2A58C9E1; Fri, 19 Jul 2024 08:47:01 -0400 (EDT) Received: by lists.libvirt.org (Postfix, from userid 996) id 332059D9; Fri, 19 Jul 2024 08:46:57 -0400 (EDT) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.libvirt.org (Postfix) with ESMTPS id 737969B5 for ; Fri, 19 Jul 2024 08:46:56 -0400 (EDT) Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-342-aOiboWT_OVmmMM_XIZzKrA-1; Fri, 19 Jul 2024 08:46:54 -0400 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 393501955D53 for ; Fri, 19 Jul 2024 12:46:53 +0000 (UTC) Received: from maggie.brq.redhat.com (unknown [10.43.3.102]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 5E4361955D47 for ; Fri, 19 Jul 2024 12:46:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on lists.libvirt.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE autolearn=unavailable autolearn_force=no version=3.4.4 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721393215; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=vsIGW1L8JCD95YO6VwdptH5ZDlCvmfW6VjwiyH8p7HA=; b=d7DCfkV7prkbVQJwaiATGbSoF4i5yyyMDJE4B6hWDLJczJrkjVcqPlVE2H4CAkVdR5semv YT1/NozVvVB9vo3b3zyxfGHvQv/5XhdKYme+otAYGCVI5f7hlIdNI+v8Pu1XPpHAr53mmr E0LTCiK2tV7AKN6GfnxJGjZQPtCiwh8= X-MC-Unique: aOiboWT_OVmmMM_XIZzKrA-1 From: Michal Privoznik To: devel@lists.libvirt.org Subject: [PATCH] qemuProcessStop: Don't unlock domain until most of the cleanup is done Date: Fri, 19 Jul 2024 14:46:49 +0200 Message-ID: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Message-ID-Hash: C6QPESAFOCLJSCV4VXGIUUMW3MHIVF6U X-Message-ID-Hash: C6QPESAFOCLJSCV4VXGIUUMW3MHIVF6U X-MailFrom: mprivozn@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-config-1; header-match-config-2; header-match-config-3; header-match-devel.lists.libvirt.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.2.2 Precedence: list List-Id: Development discussions about the libvirt library & tools Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1721393253515116600 Content-Type: text/plain; charset="utf-8"; x-default="true" Imagine two threads. Thread A is executing qemuProcessStop() and thread B is executing qemuDomainCreateXML(). To make things more interesting, the domain XML passed to qemuDomainCreateXML matches name + UUID of that the virDomainObj that qemuProcessStop() is currently working with. Here's what happens. 1) Thread A locks @vm, enters qemuProcessStop(). 2) Thread B parses given XML, calls virDomainObjListAdd() -> virDomainObjListAdd() -> virDomainObjListAddLocked() -> virDomainObjListFindByUUIDLocked(). Since there's a match on UUID, an virDomainObj object is returned and the thread proceeds to calling virObjectLock(). NB, it's the same object as in thread A. 3) Thread A sets vm->def->id =3D -1; this means that from this point on, virDomainObjIsActive() will return false. 4) Thread A calls qemuDomainObjStopWorker() which unlocks the @vm. 5) Thread B acquires the @vm lock and since virDomainObjIsActive() returns false, it proceeds to calling virDomainObjAssignDef() where vm->def is replaced. 6) Thread B then calls qemuProcessBeginJob() which unlocks the @vm temporarily. 7) Thread A, still in qemuDomainObjStopWorker() acquires @vm lock and proceeds with cleanup. 8) Thread A finds different definition than the one needing cleanup. In my testing I've seen stale pointers, e.g. vm->def->nets[0]->priv was NULL, which lead to a SIGSEGV as there's 'QEMU_DOMAIN_NETWORK_PRIVATE(net)->created' line when cleaning up nets. Your mileage may vary. Even if we did not crash, the plain fact that vm->def is changed in the middle of qemuProcessStop() means we might be cleaning up something else than intended. As a fix, I'm moving all lines that obviously touch vm->def before the domain object is unlocked. That left virHookCall(VIR_HOOK_QEMU_OP_STOPPED, VIR_HOOK_SUBOP_END) nearly next to virHookCall(VIR_HOOK_QEMU_OP_RELEASE, VIR_HOOK_SUBOP_END) which I figured is not something we want. So I've shuffled things a bit more. Fixes: 3865410e7f67ca4ec66e9a905e75f452762a97f0 Resolves: https://issues.redhat.com/browse/RHEL-49607 Signed-off-by: Michal Privoznik Reviewed-by: Peter Krempa --- src/qemu/qemu_process.c | 122 ++++++++++++++++++++-------------------- 1 file changed, 62 insertions(+), 60 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 25dfd04272..9ea6c678b8 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -8530,6 +8530,18 @@ void qemuProcessStop(virQEMUDriver *driver, VIR_QEMU_PROCESS_KILL_FORCE| VIR_QEMU_PROCESS_KILL_NOCHECK)); =20 + vm->pid =3D 0; + + /* now that we know it's stopped call the hook if present */ + if (virHookPresent(VIR_HOOK_DRIVER_QEMU)) { + g_autofree char *xml =3D qemuDomainDefFormatXML(driver, NULL, vm->= def, 0); + + /* we can't stop the operation even if the script raised an error = */ + ignore_value(virHookCall(VIR_HOOK_DRIVER_QEMU, vm->def->name, + VIR_HOOK_QEMU_OP_STOPPED, VIR_HOOK_SUBOP_= END, + NULL, xml, NULL)); + } + if (priv->agent) { g_clear_pointer(&priv->agent, qemuAgentClose); } @@ -8553,25 +8565,6 @@ void qemuProcessStop(virQEMUDriver *driver, =20 qemuDBusStop(driver, vm); =20 - /* Only after this point we can reset 'priv->beingDestroyed' so that - * there's no point at which the VM could be considered as alive betwe= en - * entering the destroy job and this point where the active "flag" is - * cleared. - */ - vm->def->id =3D -1; - priv->beingDestroyed =3D false; - - /* Wake up anything waiting on domain condition */ - virDomainObjBroadcast(vm); - - /* IMPORTANT: qemuDomainObjStopWorker() unlocks @vm in order to prevent - * deadlocks with the per-VM event loop thread. This MUST be done after - * marking the VM as dead */ - qemuDomainObjStopWorker(vm); - - if (!!g_atomic_int_dec_and_test(&driver->nactive) && driver->inhibitCa= llback) - driver->inhibitCallback(false, driver->inhibitOpaque); - /* Clear network bandwidth */ virDomainClearNetBandwidth(vm->def); =20 @@ -8588,18 +8581,6 @@ void qemuProcessStop(virQEMUDriver *driver, } } =20 - virPortAllocatorRelease(priv->nbdPort); - priv->nbdPort =3D 0; - - if (priv->monConfig) { - if (priv->monConfig->type =3D=3D VIR_DOMAIN_CHR_TYPE_UNIX) - unlink(priv->monConfig->data.nix.path); - g_clear_pointer(&priv->monConfig, virObjectUnref); - } - - /* Remove the master key */ - qemuDomainMasterKeyRemove(priv); - ignore_value(virDomainChrDefForeach(vm->def, false, qemuProcessCleanupChardevDevice, @@ -8609,22 +8590,6 @@ void qemuProcessStop(virQEMUDriver *driver, /* Its namespace is also gone then. */ qemuDomainDestroyNamespace(driver, vm); =20 - virFileDeleteTree(priv->libDir); - virFileDeleteTree(priv->channelTargetDir); - - /* Stop autodestroy in case guest is restarted */ - virCloseCallbacksDomainRemove(vm, NULL, qemuProcessAutoDestroy); - - /* now that we know it's stopped call the hook if present */ - if (virHookPresent(VIR_HOOK_DRIVER_QEMU)) { - g_autofree char *xml =3D qemuDomainDefFormatXML(driver, NULL, vm->= def, 0); - - /* we can't stop the operation even if the script raised an error = */ - ignore_value(virHookCall(VIR_HOOK_DRIVER_QEMU, vm->def->name, - VIR_HOOK_QEMU_OP_STOPPED, VIR_HOOK_SUBOP_= END, - NULL, xml, NULL)); - } - /* Reset Security Labels unless caller don't want us to */ if (!(flags & VIR_QEMU_PROCESS_STOP_NO_RELABEL)) qemuSecurityRestoreAllLabel(driver, vm, @@ -8672,8 +8637,6 @@ void qemuProcessStop(virQEMUDriver *driver, virResctrlAllocRemove(vm->def->resctrls[i]->alloc); } =20 - qemuProcessRemoveDomainStatus(driver, vm); - /* Remove VNC and Spice ports from port reservation bitmap, but only if they were reserved by the driver (autoport=3Dyes) */ @@ -8706,20 +8669,9 @@ void qemuProcessStop(virQEMUDriver *driver, } } =20 - for (i =3D 0; i < vm->ndeprecations; i++) - g_free(vm->deprecations[i]); - g_clear_pointer(&vm->deprecations, g_free); - vm->ndeprecations =3D 0; - vm->taint =3D 0; - vm->pid =3D 0; - virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF, reason); for (i =3D 0; i < vm->def->niothreadids; i++) vm->def->iothreadids[i]->thread_id =3D 0; =20 - /* clean up a possible backup job */ - if (priv->backup) - qemuBackupJobTerminate(vm, VIR_DOMAIN_JOB_STATUS_CANCELED); - /* Do this explicitly after vm->pid is reset so that security drivers = don't * try to enter the domain's namespace which is non-existent by now as= qemu * is no longer running. */ @@ -8753,6 +8705,56 @@ void qemuProcessStop(virQEMUDriver *driver, =20 qemuSecurityReleaseLabel(driver->securityManager, vm->def); =20 + /* Only after this point we can reset 'priv->beingDestroyed' so that + * there's no point at which the VM could be considered as alive betwe= en + * entering the destroy job and this point where the active "flag" is + * cleared. + */ + vm->def->id =3D -1; + priv->beingDestroyed =3D false; + + /* Wake up anything waiting on domain condition */ + virDomainObjBroadcast(vm); + + /* IMPORTANT: qemuDomainObjStopWorker() unlocks @vm in order to prevent + * deadlocks with the per-VM event loop thread. This MUST be done after + * marking the VM as dead */ + qemuDomainObjStopWorker(vm); + + if (!!g_atomic_int_dec_and_test(&driver->nactive) && driver->inhibitCa= llback) + driver->inhibitCallback(false, driver->inhibitOpaque); + + virPortAllocatorRelease(priv->nbdPort); + priv->nbdPort =3D 0; + + if (priv->monConfig) { + if (priv->monConfig->type =3D=3D VIR_DOMAIN_CHR_TYPE_UNIX) + unlink(priv->monConfig->data.nix.path); + g_clear_pointer(&priv->monConfig, virObjectUnref); + } + + /* Remove the master key */ + qemuDomainMasterKeyRemove(priv); + + virFileDeleteTree(priv->libDir); + virFileDeleteTree(priv->channelTargetDir); + + /* Stop autodestroy in case guest is restarted */ + virCloseCallbacksDomainRemove(vm, NULL, qemuProcessAutoDestroy); + + qemuProcessRemoveDomainStatus(driver, vm); + + for (i =3D 0; i < vm->ndeprecations; i++) + g_free(vm->deprecations[i]); + g_clear_pointer(&vm->deprecations, g_free); + vm->ndeprecations =3D 0; + vm->taint =3D 0; + virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF, reason); + + /* clean up a possible backup job */ + if (priv->backup) + qemuBackupJobTerminate(vm, VIR_DOMAIN_JOB_STATUS_CANCELED); + /* clear all private data entries which are no longer needed */ qemuDomainObjPrivateDataClear(priv); =20 --=20 2.44.2