[PATCH] qemuProcessStop: Don't unlock domain until most of the cleanup is done

Michal Privoznik posted 1 patch 3 months, 1 week ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/c43f3c198f0876d5dbce3b5198782db0e304874f.1721393209.git.mprivozn@redhat.com
src/qemu/qemu_process.c | 122 ++++++++++++++++++++--------------------
1 file changed, 62 insertions(+), 60 deletions(-)
[PATCH] qemuProcessStop: Don't unlock domain until most of the cleanup is done
Posted by Michal Privoznik 3 months, 1 week ago
Imagine two threads. Thread A is executing qemuProcessStop() and
thread B is executing qemuDomainCreateXML(). To make things more
interesting, the domain XML passed to qemuDomainCreateXML matches
name + UUID of that the virDomainObj that qemuProcessStop() is
currently working with. Here's what happens.

1) Thread A locks @vm, enters qemuProcessStop().

2) Thread B parses given XML, calls virDomainObjListAdd() ->
   virDomainObjListAdd() -> virDomainObjListAddLocked() ->
   virDomainObjListFindByUUIDLocked(). Since there's a match on
   UUID, an virDomainObj object is returned and the thread
   proceeds to calling virObjectLock(). NB, it's the same object
   as in thread A.

3) Thread A sets vm->def->id = -1; this means that from this
   point on, virDomainObjIsActive() will return false.

4) Thread A calls qemuDomainObjStopWorker() which unlocks the
   @vm.

5) Thread B acquires the @vm lock and since
   virDomainObjIsActive() returns false, it proceeds to calling
   virDomainObjAssignDef() where vm->def is replaced.

6) Thread B then calls qemuProcessBeginJob() which unlocks the
   @vm temporarily.

7) Thread A, still in qemuDomainObjStopWorker() acquires @vm lock
   and proceeds with cleanup.

8) Thread A finds different definition than the one needing
   cleanup.

In my testing I've seen stale pointers, e.g.
vm->def->nets[0]->priv was NULL, which lead to a SIGSEGV as
there's  'QEMU_DOMAIN_NETWORK_PRIVATE(net)->created' line when
cleaning up nets. Your mileage may vary.

Even if we did not crash, the plain fact that vm->def is changed
in the middle of qemuProcessStop() means we might be cleaning up
something else than intended.

As a fix, I'm moving all lines that obviously touch vm->def
before the domain object is unlocked. That left
virHookCall(VIR_HOOK_QEMU_OP_STOPPED, VIR_HOOK_SUBOP_END) nearly
next to virHookCall(VIR_HOOK_QEMU_OP_RELEASE,
VIR_HOOK_SUBOP_END) which I figured is not something we want. So
I've shuffled things a bit more.

Fixes: 3865410e7f67ca4ec66e9a905e75f452762a97f0
Resolves: https://issues.redhat.com/browse/RHEL-49607
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
---
 src/qemu/qemu_process.c | 122 ++++++++++++++++++++--------------------
 1 file changed, 62 insertions(+), 60 deletions(-)

diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 25dfd04272..9ea6c678b8 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -8530,6 +8530,18 @@ void qemuProcessStop(virQEMUDriver *driver,
                                  VIR_QEMU_PROCESS_KILL_FORCE|
                                  VIR_QEMU_PROCESS_KILL_NOCHECK));
 
+    vm->pid = 0;
+
+    /* now that we know it's stopped call the hook if present */
+    if (virHookPresent(VIR_HOOK_DRIVER_QEMU)) {
+        g_autofree char *xml = qemuDomainDefFormatXML(driver, NULL, vm->def, 0);
+
+        /* we can't stop the operation even if the script raised an error */
+        ignore_value(virHookCall(VIR_HOOK_DRIVER_QEMU, vm->def->name,
+                                 VIR_HOOK_QEMU_OP_STOPPED, VIR_HOOK_SUBOP_END,
+                                 NULL, xml, NULL));
+    }
+
     if (priv->agent) {
         g_clear_pointer(&priv->agent, qemuAgentClose);
     }
@@ -8553,25 +8565,6 @@ void qemuProcessStop(virQEMUDriver *driver,
 
     qemuDBusStop(driver, vm);
 
-    /* Only after this point we can reset 'priv->beingDestroyed' so that
-     * there's no point at which the VM could be considered as alive between
-     * entering the destroy job and this point where the active "flag" is
-     * cleared.
-     */
-    vm->def->id = -1;
-    priv->beingDestroyed = false;
-
-    /* Wake up anything waiting on domain condition */
-    virDomainObjBroadcast(vm);
-
-    /* IMPORTANT: qemuDomainObjStopWorker() unlocks @vm in order to prevent
-     * deadlocks with the per-VM event loop thread. This MUST be done after
-     * marking the VM as dead */
-    qemuDomainObjStopWorker(vm);
-
-    if (!!g_atomic_int_dec_and_test(&driver->nactive) && driver->inhibitCallback)
-        driver->inhibitCallback(false, driver->inhibitOpaque);
-
     /* Clear network bandwidth */
     virDomainClearNetBandwidth(vm->def);
 
@@ -8588,18 +8581,6 @@ void qemuProcessStop(virQEMUDriver *driver,
         }
     }
 
-    virPortAllocatorRelease(priv->nbdPort);
-    priv->nbdPort = 0;
-
-    if (priv->monConfig) {
-        if (priv->monConfig->type == VIR_DOMAIN_CHR_TYPE_UNIX)
-            unlink(priv->monConfig->data.nix.path);
-        g_clear_pointer(&priv->monConfig, virObjectUnref);
-    }
-
-    /* Remove the master key */
-    qemuDomainMasterKeyRemove(priv);
-
     ignore_value(virDomainChrDefForeach(vm->def,
                                         false,
                                         qemuProcessCleanupChardevDevice,
@@ -8609,22 +8590,6 @@ void qemuProcessStop(virQEMUDriver *driver,
     /* Its namespace is also gone then. */
     qemuDomainDestroyNamespace(driver, vm);
 
-    virFileDeleteTree(priv->libDir);
-    virFileDeleteTree(priv->channelTargetDir);
-
-    /* Stop autodestroy in case guest is restarted */
-    virCloseCallbacksDomainRemove(vm, NULL, qemuProcessAutoDestroy);
-
-    /* now that we know it's stopped call the hook if present */
-    if (virHookPresent(VIR_HOOK_DRIVER_QEMU)) {
-        g_autofree char *xml = qemuDomainDefFormatXML(driver, NULL, vm->def, 0);
-
-        /* we can't stop the operation even if the script raised an error */
-        ignore_value(virHookCall(VIR_HOOK_DRIVER_QEMU, vm->def->name,
-                                 VIR_HOOK_QEMU_OP_STOPPED, VIR_HOOK_SUBOP_END,
-                                 NULL, xml, NULL));
-    }
-
     /* Reset Security Labels unless caller don't want us to */
     if (!(flags & VIR_QEMU_PROCESS_STOP_NO_RELABEL))
         qemuSecurityRestoreAllLabel(driver, vm,
@@ -8672,8 +8637,6 @@ void qemuProcessStop(virQEMUDriver *driver,
         virResctrlAllocRemove(vm->def->resctrls[i]->alloc);
     }
 
-    qemuProcessRemoveDomainStatus(driver, vm);
-
     /* Remove VNC and Spice ports from port reservation bitmap, but only if
        they were reserved by the driver (autoport=yes)
     */
@@ -8706,20 +8669,9 @@ void qemuProcessStop(virQEMUDriver *driver,
         }
     }
 
-    for (i = 0; i < vm->ndeprecations; i++)
-        g_free(vm->deprecations[i]);
-    g_clear_pointer(&vm->deprecations, g_free);
-    vm->ndeprecations = 0;
-    vm->taint = 0;
-    vm->pid = 0;
-    virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF, reason);
     for (i = 0; i < vm->def->niothreadids; i++)
         vm->def->iothreadids[i]->thread_id = 0;
 
-    /* clean up a possible backup job */
-    if (priv->backup)
-        qemuBackupJobTerminate(vm, VIR_DOMAIN_JOB_STATUS_CANCELED);
-
     /* Do this explicitly after vm->pid is reset so that security drivers don't
      * try to enter the domain's namespace which is non-existent by now as qemu
      * is no longer running. */
@@ -8753,6 +8705,56 @@ void qemuProcessStop(virQEMUDriver *driver,
 
     qemuSecurityReleaseLabel(driver->securityManager, vm->def);
 
+    /* Only after this point we can reset 'priv->beingDestroyed' so that
+     * there's no point at which the VM could be considered as alive between
+     * entering the destroy job and this point where the active "flag" is
+     * cleared.
+     */
+    vm->def->id = -1;
+    priv->beingDestroyed = false;
+
+    /* Wake up anything waiting on domain condition */
+    virDomainObjBroadcast(vm);
+
+    /* IMPORTANT: qemuDomainObjStopWorker() unlocks @vm in order to prevent
+     * deadlocks with the per-VM event loop thread. This MUST be done after
+     * marking the VM as dead */
+    qemuDomainObjStopWorker(vm);
+
+    if (!!g_atomic_int_dec_and_test(&driver->nactive) && driver->inhibitCallback)
+        driver->inhibitCallback(false, driver->inhibitOpaque);
+
+    virPortAllocatorRelease(priv->nbdPort);
+    priv->nbdPort = 0;
+
+    if (priv->monConfig) {
+        if (priv->monConfig->type == VIR_DOMAIN_CHR_TYPE_UNIX)
+            unlink(priv->monConfig->data.nix.path);
+        g_clear_pointer(&priv->monConfig, virObjectUnref);
+    }
+
+    /* Remove the master key */
+    qemuDomainMasterKeyRemove(priv);
+
+    virFileDeleteTree(priv->libDir);
+    virFileDeleteTree(priv->channelTargetDir);
+
+    /* Stop autodestroy in case guest is restarted */
+    virCloseCallbacksDomainRemove(vm, NULL, qemuProcessAutoDestroy);
+
+    qemuProcessRemoveDomainStatus(driver, vm);
+
+    for (i = 0; i < vm->ndeprecations; i++)
+        g_free(vm->deprecations[i]);
+    g_clear_pointer(&vm->deprecations, g_free);
+    vm->ndeprecations = 0;
+    vm->taint = 0;
+    virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF, reason);
+
+    /* clean up a possible backup job */
+    if (priv->backup)
+        qemuBackupJobTerminate(vm, VIR_DOMAIN_JOB_STATUS_CANCELED);
+
     /* clear all private data entries which are no longer needed */
     qemuDomainObjPrivateDataClear(priv);
 
-- 
2.44.2
Re: [PATCH] qemuProcessStop: Don't unlock domain until most of the cleanup is done
Posted by Peter Krempa 3 months, 1 week ago
On Fri, Jul 19, 2024 at 14:46:49 +0200, Michal Privoznik wrote:
> Imagine two threads. Thread A is executing qemuProcessStop() and
> thread B is executing qemuDomainCreateXML(). To make things more
> interesting, the domain XML passed to qemuDomainCreateXML matches
> name + UUID of that the virDomainObj that qemuProcessStop() is
> currently working with. Here's what happens.
> 
> 1) Thread A locks @vm, enters qemuProcessStop().
> 
> 2) Thread B parses given XML, calls virDomainObjListAdd() ->
>    virDomainObjListAdd() -> virDomainObjListAddLocked() ->
>    virDomainObjListFindByUUIDLocked(). Since there's a match on
>    UUID, an virDomainObj object is returned and the thread
>    proceeds to calling virObjectLock(). NB, it's the same object
>    as in thread A.
> 
> 3) Thread A sets vm->def->id = -1; this means that from this
>    point on, virDomainObjIsActive() will return false.
> 
> 4) Thread A calls qemuDomainObjStopWorker() which unlocks the
>    @vm.
> 
> 5) Thread B acquires the @vm lock and since
>    virDomainObjIsActive() returns false, it proceeds to calling
>    virDomainObjAssignDef() where vm->def is replaced.
> 
> 6) Thread B then calls qemuProcessBeginJob() which unlocks the
>    @vm temporarily.
> 
> 7) Thread A, still in qemuDomainObjStopWorker() acquires @vm lock
>    and proceeds with cleanup.
> 
> 8) Thread A finds different definition than the one needing
>    cleanup.
> 
> In my testing I've seen stale pointers, e.g.
> vm->def->nets[0]->priv was NULL, which lead to a SIGSEGV as
> there's  'QEMU_DOMAIN_NETWORK_PRIVATE(net)->created' line when
> cleaning up nets. Your mileage may vary.
> 
> Even if we did not crash, the plain fact that vm->def is changed
> in the middle of qemuProcessStop() means we might be cleaning up
> something else than intended.

This paragraph is the important bit. The root cause of the problem here
is that 'virDomainObjListAdd' inside 'qemuDomainCreateXML' can modify
'vm->def' whithout holding any _MODIFY-type JOB on the domain object
which we normally require for any modification of 'vm->def' related data.

This wasn't a problem until now as we've relinquished the lock on @vm
only in situations when the @vm object was considered live:

 1) Before the per-VM thread cleanup was added to qemuProcessStop it
    never unlocked
 2) After the per-VM thread cleanup was added, this unlock was done
    prior to setting vm->def->id to '-1'
 3) All other cases are done only when the VM is live.

> As a fix, I'm moving all lines that obviously touch vm->def
> before the domain object is unlocked. That left
> virHookCall(VIR_HOOK_QEMU_OP_STOPPED, VIR_HOOK_SUBOP_END) nearly
> next to virHookCall(VIR_HOOK_QEMU_OP_RELEASE,
> VIR_HOOK_SUBOP_END) which I figured is not something we want. So
> I've shuffled things a bit more.

This feels like a fix for symptoms of 'virDomainObjListAdd' not
honouring the _MODIFY-type job expectation, and we're shuffling code
around so that it doesn't care about the broken expectation.

Since I don't currently have a better idea of how to fix this I'm okay
with this patch given the following conditions:

> Fixes: 3865410e7f67ca4ec66e9a905e75f452762a97f0

Explain that the above commit inverted the order of setting the VM as
inactive and unlocking thus allowing the above sequence of events to
happen, and,

> Resolves: https://issues.redhat.com/browse/RHEL-49607
> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> ---
>  src/qemu/qemu_process.c | 122 ++++++++++++++++++++--------------------
>  1 file changed, 62 insertions(+), 60 deletions(-)
> 
> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index 25dfd04272..9ea6c678b8 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -8530,6 +8530,18 @@ void qemuProcessStop(virQEMUDriver *driver,
>                                   VIR_QEMU_PROCESS_KILL_FORCE|
>                                   VIR_QEMU_PROCESS_KILL_NOCHECK));
>  
> +    vm->pid = 0;
> +
> +    /* now that we know it's stopped call the hook if present */
> +    if (virHookPresent(VIR_HOOK_DRIVER_QEMU)) {
> +        g_autofree char *xml = qemuDomainDefFormatXML(driver, NULL, vm->def, 0);
> +
> +        /* we can't stop the operation even if the script raised an error */
> +        ignore_value(virHookCall(VIR_HOOK_DRIVER_QEMU, vm->def->name,
> +                                 VIR_HOOK_QEMU_OP_STOPPED, VIR_HOOK_SUBOP_END,
> +                                 NULL, xml, NULL));
> +    }
> +
>      if (priv->agent) {
>          g_clear_pointer(&priv->agent, qemuAgentClose);
>      }
> @@ -8553,25 +8565,6 @@ void qemuProcessStop(virQEMUDriver *driver,
>  
>      qemuDBusStop(driver, vm);
>  
> -    /* Only after this point we can reset 'priv->beingDestroyed' so that
> -     * there's no point at which the VM could be considered as alive between
> -     * entering the destroy job and this point where the active "flag" is
> -     * cleared.
> -     */
> -    vm->def->id = -1;
> -    priv->beingDestroyed = false;
> -
> -    /* Wake up anything waiting on domain condition */
> -    virDomainObjBroadcast(vm);
> -
> -    /* IMPORTANT: qemuDomainObjStopWorker() unlocks @vm in order to prevent
> -     * deadlocks with the per-VM event loop thread. This MUST be done after
> -     * marking the VM as dead */
> -    qemuDomainObjStopWorker(vm);
> -
> -    if (!!g_atomic_int_dec_and_test(&driver->nactive) && driver->inhibitCallback)
> -        driver->inhibitCallback(false, driver->inhibitOpaque);
> -
>      /* Clear network bandwidth */
>      virDomainClearNetBandwidth(vm->def);
>  
> @@ -8588,18 +8581,6 @@ void qemuProcessStop(virQEMUDriver *driver,
>          }
>      }
>  
> -    virPortAllocatorRelease(priv->nbdPort);
> -    priv->nbdPort = 0;
> -
> -    if (priv->monConfig) {
> -        if (priv->monConfig->type == VIR_DOMAIN_CHR_TYPE_UNIX)
> -            unlink(priv->monConfig->data.nix.path);
> -        g_clear_pointer(&priv->monConfig, virObjectUnref);
> -    }
> -
> -    /* Remove the master key */
> -    qemuDomainMasterKeyRemove(priv);
> -
>      ignore_value(virDomainChrDefForeach(vm->def,
>                                          false,
>                                          qemuProcessCleanupChardevDevice,
> @@ -8609,22 +8590,6 @@ void qemuProcessStop(virQEMUDriver *driver,
>      /* Its namespace is also gone then. */
>      qemuDomainDestroyNamespace(driver, vm);
>  
> -    virFileDeleteTree(priv->libDir);
> -    virFileDeleteTree(priv->channelTargetDir);
> -
> -    /* Stop autodestroy in case guest is restarted */
> -    virCloseCallbacksDomainRemove(vm, NULL, qemuProcessAutoDestroy);
> -
> -    /* now that we know it's stopped call the hook if present */
> -    if (virHookPresent(VIR_HOOK_DRIVER_QEMU)) {
> -        g_autofree char *xml = qemuDomainDefFormatXML(driver, NULL, vm->def, 0);
> -
> -        /* we can't stop the operation even if the script raised an error */
> -        ignore_value(virHookCall(VIR_HOOK_DRIVER_QEMU, vm->def->name,
> -                                 VIR_HOOK_QEMU_OP_STOPPED, VIR_HOOK_SUBOP_END,
> -                                 NULL, xml, NULL));
> -    }
> -
>      /* Reset Security Labels unless caller don't want us to */
>      if (!(flags & VIR_QEMU_PROCESS_STOP_NO_RELABEL))
>          qemuSecurityRestoreAllLabel(driver, vm,
> @@ -8672,8 +8637,6 @@ void qemuProcessStop(virQEMUDriver *driver,
>          virResctrlAllocRemove(vm->def->resctrls[i]->alloc);
>      }
>  
> -    qemuProcessRemoveDomainStatus(driver, vm);
> -
>      /* Remove VNC and Spice ports from port reservation bitmap, but only if
>         they were reserved by the driver (autoport=yes)
>      */
> @@ -8706,20 +8669,9 @@ void qemuProcessStop(virQEMUDriver *driver,
>          }
>      }
>  
> -    for (i = 0; i < vm->ndeprecations; i++)
> -        g_free(vm->deprecations[i]);
> -    g_clear_pointer(&vm->deprecations, g_free);
> -    vm->ndeprecations = 0;
> -    vm->taint = 0;
> -    vm->pid = 0;
> -    virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF, reason);
>      for (i = 0; i < vm->def->niothreadids; i++)
>          vm->def->iothreadids[i]->thread_id = 0;
>  
> -    /* clean up a possible backup job */
> -    if (priv->backup)
> -        qemuBackupJobTerminate(vm, VIR_DOMAIN_JOB_STATUS_CANCELED);
> -
>      /* Do this explicitly after vm->pid is reset so that security drivers don't
>       * try to enter the domain's namespace which is non-existent by now as qemu
>       * is no longer running. */
> @@ -8753,6 +8705,56 @@ void qemuProcessStop(virQEMUDriver *driver,
>  
>      qemuSecurityReleaseLabel(driver->securityManager, vm->def);
>  
> +    /* Only after this point we can reset 'priv->beingDestroyed' so that
> +     * there's no point at which the VM could be considered as alive between
> +     * entering the destroy job and this point where the active "flag" is
> +     * cleared.
> +     */
> +    vm->def->id = -1;
> +    priv->beingDestroyed = false;
> +
> +    /* Wake up anything waiting on domain condition */
> +    virDomainObjBroadcast(vm);
> +
> +    /* IMPORTANT: qemuDomainObjStopWorker() unlocks @vm in order to prevent
> +     * deadlocks with the per-VM event loop thread. This MUST be done after
> +     * marking the VM as dead */


Extend the comment here stating how virDomainObjAssignDef() in
combination with vm->def->id being -1 can cause update of vm->def and
thus no code below can ever access it any more.

> +    qemuDomainObjStopWorker(vm);
> +
> +    if (!!g_atomic_int_dec_and_test(&driver->nactive) && driver->inhibitCallback)
> +        driver->inhibitCallback(false, driver->inhibitOpaque);
> +
> +    virPortAllocatorRelease(priv->nbdPort);
> +    priv->nbdPort = 0;
> +
> +    if (priv->monConfig) {
> +        if (priv->monConfig->type == VIR_DOMAIN_CHR_TYPE_UNIX)
> +            unlink(priv->monConfig->data.nix.path);
> +        g_clear_pointer(&priv->monConfig, virObjectUnref);
> +    }
> +
> +    /* Remove the master key */
> +    qemuDomainMasterKeyRemove(priv);
> +
> +    virFileDeleteTree(priv->libDir);
> +    virFileDeleteTree(priv->channelTargetDir);
> +
> +    /* Stop autodestroy in case guest is restarted */
> +    virCloseCallbacksDomainRemove(vm, NULL, qemuProcessAutoDestroy);
> +
> +    qemuProcessRemoveDomainStatus(driver, vm);
> +
> +    for (i = 0; i < vm->ndeprecations; i++)
> +        g_free(vm->deprecations[i]);
> +    g_clear_pointer(&vm->deprecations, g_free);
> +    vm->ndeprecations = 0;
> +    vm->taint = 0;
> +    virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF, reason);
> +
> +    /* clean up a possible backup job */
> +    if (priv->backup)
> +        qemuBackupJobTerminate(vm, VIR_DOMAIN_JOB_STATUS_CANCELED);
> +
>      /* clear all private data entries which are no longer needed */
>      qemuDomainObjPrivateDataClear(priv);
>  
> -- 
> 2.44.2
> 

Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Re: [PATCH] qemuProcessStop: Don't unlock domain until most of the cleanup is done
Posted by Daniel P. Berrangé 3 months, 1 week ago
On Tue, Jul 23, 2024 at 02:41:13PM +0200, Peter Krempa wrote:
> On Fri, Jul 19, 2024 at 14:46:49 +0200, Michal Privoznik wrote:
> > Imagine two threads. Thread A is executing qemuProcessStop() and
> > thread B is executing qemuDomainCreateXML(). To make things more
> > interesting, the domain XML passed to qemuDomainCreateXML matches
> > name + UUID of that the virDomainObj that qemuProcessStop() is
> > currently working with. Here's what happens.
> > 
> > 1) Thread A locks @vm, enters qemuProcessStop().
> > 
> > 2) Thread B parses given XML, calls virDomainObjListAdd() ->
> >    virDomainObjListAdd() -> virDomainObjListAddLocked() ->
> >    virDomainObjListFindByUUIDLocked(). Since there's a match on
> >    UUID, an virDomainObj object is returned and the thread
> >    proceeds to calling virObjectLock(). NB, it's the same object
> >    as in thread A.
> > 
> > 3) Thread A sets vm->def->id = -1; this means that from this
> >    point on, virDomainObjIsActive() will return false.
> > 
> > 4) Thread A calls qemuDomainObjStopWorker() which unlocks the
> >    @vm.
> > 
> > 5) Thread B acquires the @vm lock and since
> >    virDomainObjIsActive() returns false, it proceeds to calling
> >    virDomainObjAssignDef() where vm->def is replaced.
> > 
> > 6) Thread B then calls qemuProcessBeginJob() which unlocks the
> >    @vm temporarily.
> > 
> > 7) Thread A, still in qemuDomainObjStopWorker() acquires @vm lock
> >    and proceeds with cleanup.
> > 
> > 8) Thread A finds different definition than the one needing
> >    cleanup.
> > 
> > In my testing I've seen stale pointers, e.g.
> > vm->def->nets[0]->priv was NULL, which lead to a SIGSEGV as
> > there's  'QEMU_DOMAIN_NETWORK_PRIVATE(net)->created' line when
> > cleaning up nets. Your mileage may vary.
> > 
> > Even if we did not crash, the plain fact that vm->def is changed
> > in the middle of qemuProcessStop() means we might be cleaning up
> > something else than intended.
> 
> This paragraph is the important bit. The root cause of the problem here
> is that 'virDomainObjListAdd' inside 'qemuDomainCreateXML' can modify
> 'vm->def' whithout holding any _MODIFY-type JOB on the domain object
> which we normally require for any modification of 'vm->def' related data.
> 
> This wasn't a problem until now as we've relinquished the lock on @vm
> only in situations when the @vm object was considered live:
> 
>  1) Before the per-VM thread cleanup was added to qemuProcessStop it
>     never unlocked
>  2) After the per-VM thread cleanup was added, this unlock was done
>     prior to setting vm->def->id to '-1'
>  3) All other cases are done only when the VM is live.
> 
> > As a fix, I'm moving all lines that obviously touch vm->def
> > before the domain object is unlocked. That left
> > virHookCall(VIR_HOOK_QEMU_OP_STOPPED, VIR_HOOK_SUBOP_END) nearly
> > next to virHookCall(VIR_HOOK_QEMU_OP_RELEASE,
> > VIR_HOOK_SUBOP_END) which I figured is not something we want. So
> > I've shuffled things a bit more.
> 
> This feels like a fix for symptoms of 'virDomainObjListAdd' not
> honouring the _MODIFY-type job expectation, and we're shuffling code
> around so that it doesn't care about the broken expectation.
> 
> Since I don't currently have a better idea of how to fix this I'm okay
> with this patch given the following conditions:
> 
> > Fixes: 3865410e7f67ca4ec66e9a905e75f452762a97f0
> 
> Explain that the above commit inverted the order of setting the VM as
> inactive and unlocking thus allowing the above sequence of events to
> happen, and,

Why not just revert that change ? It claimed to be making things
safer, but did the opposite.   Even with this fixup below I'm
pretty uncomfortable with setting 'id = -1' and unlocking the
@vm, before we've done all our cleanup.

> 
> > Resolves: https://issues.redhat.com/browse/RHEL-49607
> > Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> > ---
> >  src/qemu/qemu_process.c | 122 ++++++++++++++++++++--------------------
> >  1 file changed, 62 insertions(+), 60 deletions(-)
> > 
> > diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> > index 25dfd04272..9ea6c678b8 100644
> > --- a/src/qemu/qemu_process.c
> > +++ b/src/qemu/qemu_process.c
> > @@ -8530,6 +8530,18 @@ void qemuProcessStop(virQEMUDriver *driver,
> >                                   VIR_QEMU_PROCESS_KILL_FORCE|
> >                                   VIR_QEMU_PROCESS_KILL_NOCHECK));
> >  
> > +    vm->pid = 0;
> > +
> > +    /* now that we know it's stopped call the hook if present */
> > +    if (virHookPresent(VIR_HOOK_DRIVER_QEMU)) {
> > +        g_autofree char *xml = qemuDomainDefFormatXML(driver, NULL, vm->def, 0);
> > +
> > +        /* we can't stop the operation even if the script raised an error */
> > +        ignore_value(virHookCall(VIR_HOOK_DRIVER_QEMU, vm->def->name,
> > +                                 VIR_HOOK_QEMU_OP_STOPPED, VIR_HOOK_SUBOP_END,
> > +                                 NULL, xml, NULL));
> > +    }
> > +
> >      if (priv->agent) {
> >          g_clear_pointer(&priv->agent, qemuAgentClose);
> >      }
> > @@ -8553,25 +8565,6 @@ void qemuProcessStop(virQEMUDriver *driver,
> >  
> >      qemuDBusStop(driver, vm);
> >  
> > -    /* Only after this point we can reset 'priv->beingDestroyed' so that
> > -     * there's no point at which the VM could be considered as alive between
> > -     * entering the destroy job and this point where the active "flag" is
> > -     * cleared.
> > -     */
> > -    vm->def->id = -1;
> > -    priv->beingDestroyed = false;
> > -
> > -    /* Wake up anything waiting on domain condition */
> > -    virDomainObjBroadcast(vm);
> > -
> > -    /* IMPORTANT: qemuDomainObjStopWorker() unlocks @vm in order to prevent
> > -     * deadlocks with the per-VM event loop thread. This MUST be done after
> > -     * marking the VM as dead */
> > -    qemuDomainObjStopWorker(vm);
> > -
> > -    if (!!g_atomic_int_dec_and_test(&driver->nactive) && driver->inhibitCallback)
> > -        driver->inhibitCallback(false, driver->inhibitOpaque);
> > -
> >      /* Clear network bandwidth */
> >      virDomainClearNetBandwidth(vm->def);
> >  
> > @@ -8588,18 +8581,6 @@ void qemuProcessStop(virQEMUDriver *driver,
> >          }
> >      }
> >  
> > -    virPortAllocatorRelease(priv->nbdPort);
> > -    priv->nbdPort = 0;
> > -
> > -    if (priv->monConfig) {
> > -        if (priv->monConfig->type == VIR_DOMAIN_CHR_TYPE_UNIX)
> > -            unlink(priv->monConfig->data.nix.path);
> > -        g_clear_pointer(&priv->monConfig, virObjectUnref);
> > -    }
> > -
> > -    /* Remove the master key */
> > -    qemuDomainMasterKeyRemove(priv);
> > -
> >      ignore_value(virDomainChrDefForeach(vm->def,
> >                                          false,
> >                                          qemuProcessCleanupChardevDevice,
> > @@ -8609,22 +8590,6 @@ void qemuProcessStop(virQEMUDriver *driver,
> >      /* Its namespace is also gone then. */
> >      qemuDomainDestroyNamespace(driver, vm);
> >  
> > -    virFileDeleteTree(priv->libDir);
> > -    virFileDeleteTree(priv->channelTargetDir);
> > -
> > -    /* Stop autodestroy in case guest is restarted */
> > -    virCloseCallbacksDomainRemove(vm, NULL, qemuProcessAutoDestroy);
> > -
> > -    /* now that we know it's stopped call the hook if present */
> > -    if (virHookPresent(VIR_HOOK_DRIVER_QEMU)) {
> > -        g_autofree char *xml = qemuDomainDefFormatXML(driver, NULL, vm->def, 0);
> > -
> > -        /* we can't stop the operation even if the script raised an error */
> > -        ignore_value(virHookCall(VIR_HOOK_DRIVER_QEMU, vm->def->name,
> > -                                 VIR_HOOK_QEMU_OP_STOPPED, VIR_HOOK_SUBOP_END,
> > -                                 NULL, xml, NULL));
> > -    }
> > -
> >      /* Reset Security Labels unless caller don't want us to */
> >      if (!(flags & VIR_QEMU_PROCESS_STOP_NO_RELABEL))
> >          qemuSecurityRestoreAllLabel(driver, vm,
> > @@ -8672,8 +8637,6 @@ void qemuProcessStop(virQEMUDriver *driver,
> >          virResctrlAllocRemove(vm->def->resctrls[i]->alloc);
> >      }
> >  
> > -    qemuProcessRemoveDomainStatus(driver, vm);
> > -
> >      /* Remove VNC and Spice ports from port reservation bitmap, but only if
> >         they were reserved by the driver (autoport=yes)
> >      */
> > @@ -8706,20 +8669,9 @@ void qemuProcessStop(virQEMUDriver *driver,
> >          }
> >      }
> >  
> > -    for (i = 0; i < vm->ndeprecations; i++)
> > -        g_free(vm->deprecations[i]);
> > -    g_clear_pointer(&vm->deprecations, g_free);
> > -    vm->ndeprecations = 0;
> > -    vm->taint = 0;
> > -    vm->pid = 0;
> > -    virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF, reason);
> >      for (i = 0; i < vm->def->niothreadids; i++)
> >          vm->def->iothreadids[i]->thread_id = 0;
> >  
> > -    /* clean up a possible backup job */
> > -    if (priv->backup)
> > -        qemuBackupJobTerminate(vm, VIR_DOMAIN_JOB_STATUS_CANCELED);
> > -
> >      /* Do this explicitly after vm->pid is reset so that security drivers don't
> >       * try to enter the domain's namespace which is non-existent by now as qemu
> >       * is no longer running. */
> > @@ -8753,6 +8705,56 @@ void qemuProcessStop(virQEMUDriver *driver,
> >  
> >      qemuSecurityReleaseLabel(driver->securityManager, vm->def);
> >  
> > +    /* Only after this point we can reset 'priv->beingDestroyed' so that
> > +     * there's no point at which the VM could be considered as alive between
> > +     * entering the destroy job and this point where the active "flag" is
> > +     * cleared.
> > +     */
> > +    vm->def->id = -1;
> > +    priv->beingDestroyed = false;
> > +
> > +    /* Wake up anything waiting on domain condition */
> > +    virDomainObjBroadcast(vm);
> > +
> > +    /* IMPORTANT: qemuDomainObjStopWorker() unlocks @vm in order to prevent
> > +     * deadlocks with the per-VM event loop thread. This MUST be done after
> > +     * marking the VM as dead */
> 
> 
> Extend the comment here stating how virDomainObjAssignDef() in
> combination with vm->def->id being -1 can cause update of vm->def and
> thus no code below can ever access it any more.
> 
> > +    qemuDomainObjStopWorker(vm);
> > +
> > +    if (!!g_atomic_int_dec_and_test(&driver->nactive) && driver->inhibitCallback)
> > +        driver->inhibitCallback(false, driver->inhibitOpaque);
> > +
> > +    virPortAllocatorRelease(priv->nbdPort);
> > +    priv->nbdPort = 0;
> > +
> > +    if (priv->monConfig) {
> > +        if (priv->monConfig->type == VIR_DOMAIN_CHR_TYPE_UNIX)
> > +            unlink(priv->monConfig->data.nix.path);
> > +        g_clear_pointer(&priv->monConfig, virObjectUnref);
> > +    }
> > +
> > +    /* Remove the master key */
> > +    qemuDomainMasterKeyRemove(priv);
> > +
> > +    virFileDeleteTree(priv->libDir);
> > +    virFileDeleteTree(priv->channelTargetDir);
> > +
> > +    /* Stop autodestroy in case guest is restarted */
> > +    virCloseCallbacksDomainRemove(vm, NULL, qemuProcessAutoDestroy);
> > +
> > +    qemuProcessRemoveDomainStatus(driver, vm);
> > +
> > +    for (i = 0; i < vm->ndeprecations; i++)
> > +        g_free(vm->deprecations[i]);
> > +    g_clear_pointer(&vm->deprecations, g_free);
> > +    vm->ndeprecations = 0;
> > +    vm->taint = 0;
> > +    virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF, reason);
> > +
> > +    /* clean up a possible backup job */
> > +    if (priv->backup)
> > +        qemuBackupJobTerminate(vm, VIR_DOMAIN_JOB_STATUS_CANCELED);
> > +
> >      /* clear all private data entries which are no longer needed */
> >      qemuDomainObjPrivateDataClear(priv);
> >  
> > -- 
> > 2.44.2
> > 
> 
> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [PATCH] qemuProcessStop: Don't unlock domain until most of the cleanup is done
Posted by Peter Krempa 3 months, 1 week ago
On Tue, Jul 23, 2024 at 13:51:09 +0100, Daniel P. Berrangé wrote:
> On Tue, Jul 23, 2024 at 02:41:13PM +0200, Peter Krempa wrote:
> > On Fri, Jul 19, 2024 at 14:46:49 +0200, Michal Privoznik wrote:
> > > Imagine two threads. Thread A is executing qemuProcessStop() and
> > > thread B is executing qemuDomainCreateXML(). To make things more
> > > interesting, the domain XML passed to qemuDomainCreateXML matches
> > > name + UUID of that the virDomainObj that qemuProcessStop() is
> > > currently working with. Here's what happens.
> > > 
> > > 1) Thread A locks @vm, enters qemuProcessStop().
> > > 
> > > 2) Thread B parses given XML, calls virDomainObjListAdd() ->
> > >    virDomainObjListAdd() -> virDomainObjListAddLocked() ->
> > >    virDomainObjListFindByUUIDLocked(). Since there's a match on
> > >    UUID, an virDomainObj object is returned and the thread
> > >    proceeds to calling virObjectLock(). NB, it's the same object
> > >    as in thread A.
> > > 
> > > 3) Thread A sets vm->def->id = -1; this means that from this
> > >    point on, virDomainObjIsActive() will return false.
> > > 
> > > 4) Thread A calls qemuDomainObjStopWorker() which unlocks the
> > >    @vm.
> > > 
> > > 5) Thread B acquires the @vm lock and since
> > >    virDomainObjIsActive() returns false, it proceeds to calling
> > >    virDomainObjAssignDef() where vm->def is replaced.
> > > 
> > > 6) Thread B then calls qemuProcessBeginJob() which unlocks the
> > >    @vm temporarily.
> > > 
> > > 7) Thread A, still in qemuDomainObjStopWorker() acquires @vm lock
> > >    and proceeds with cleanup.
> > > 
> > > 8) Thread A finds different definition than the one needing
> > >    cleanup.
> > > 
> > > In my testing I've seen stale pointers, e.g.
> > > vm->def->nets[0]->priv was NULL, which lead to a SIGSEGV as
> > > there's  'QEMU_DOMAIN_NETWORK_PRIVATE(net)->created' line when
> > > cleaning up nets. Your mileage may vary.
> > > 
> > > Even if we did not crash, the plain fact that vm->def is changed
> > > in the middle of qemuProcessStop() means we might be cleaning up
> > > something else than intended.
> > 
> > This paragraph is the important bit. The root cause of the problem here
> > is that 'virDomainObjListAdd' inside 'qemuDomainCreateXML' can modify
> > 'vm->def' whithout holding any _MODIFY-type JOB on the domain object
> > which we normally require for any modification of 'vm->def' related data.
> > 
> > This wasn't a problem until now as we've relinquished the lock on @vm
> > only in situations when the @vm object was considered live:
> > 
> >  1) Before the per-VM thread cleanup was added to qemuProcessStop it
> >     never unlocked
> >  2) After the per-VM thread cleanup was added, this unlock was done
> >     prior to setting vm->def->id to '-1'
> >  3) All other cases are done only when the VM is live.
> > 
> > > As a fix, I'm moving all lines that obviously touch vm->def
> > > before the domain object is unlocked. That left
> > > virHookCall(VIR_HOOK_QEMU_OP_STOPPED, VIR_HOOK_SUBOP_END) nearly
> > > next to virHookCall(VIR_HOOK_QEMU_OP_RELEASE,
> > > VIR_HOOK_SUBOP_END) which I figured is not something we want. So
> > > I've shuffled things a bit more.
> > 
> > This feels like a fix for symptoms of 'virDomainObjListAdd' not
> > honouring the _MODIFY-type job expectation, and we're shuffling code
> > around so that it doesn't care about the broken expectation.
> > 
> > Since I don't currently have a better idea of how to fix this I'm okay
> > with this patch given the following conditions:
> > 
> > > Fixes: 3865410e7f67ca4ec66e9a905e75f452762a97f0
> > 
> > Explain that the above commit inverted the order of setting the VM as
> > inactive and unlocking thus allowing the above sequence of events to
> > happen, and,
> 
> Why not just revert that change ? It claimed to be making things
> safer, but did the opposite.   Even with this fixup below I'm
> pretty uncomfortable with setting 'id = -1' and unlocking the
> @vm, before we've done all our cleanup.

You'd have to also revert d29e0f3d4a5362d7b33261df1e55896396707de3 which
is the commit actually moving the unlock of the VM object inside
qemuProcessStop after setting id = -1 which actually does fix a
crash on migration.

We can possibly move the qemuDomainObjStopWorker(vm) call as the last
thing in qemuProcessStop, so that everything is cleaned up before
unlocking.

Either way, unlocking inside qemuProcessStop is fragile as we're giving
a chance for races which would not be possible before.
Re: [PATCH] qemuProcessStop: Don't unlock domain until most of the cleanup is done
Posted by Daniel P. Berrangé 3 months, 1 week ago
On Tue, Jul 23, 2024 at 03:06:41PM +0200, Peter Krempa wrote:
> On Tue, Jul 23, 2024 at 13:51:09 +0100, Daniel P. Berrangé wrote:
> > On Tue, Jul 23, 2024 at 02:41:13PM +0200, Peter Krempa wrote:
> > > Since I don't currently have a better idea of how to fix this I'm okay
> > > with this patch given the following conditions:
> > > 
> > > > Fixes: 3865410e7f67ca4ec66e9a905e75f452762a97f0
> > > 
> > > Explain that the above commit inverted the order of setting the VM as
> > > inactive and unlocking thus allowing the above sequence of events to
> > > happen, and,
> > 
> > Why not just revert that change ? It claimed to be making things
> > safer, but did the opposite.   Even with this fixup below I'm
> > pretty uncomfortable with setting 'id = -1' and unlocking the
> > @vm, before we've done all our cleanup.
> 
> You'd have to also revert d29e0f3d4a5362d7b33261df1e55896396707de3 which
> is the commit actually moving the unlock of the VM object inside
> qemuProcessStop after setting id = -1 which actually does fix a
> crash on migration.
> 
> We can possibly move the qemuDomainObjStopWorker(vm) call as the last
> thing in qemuProcessStop, so that everything is cleaned up before
> unlocking.
> 
> Either way, unlocking inside qemuProcessStop is fragile as we're giving
> a chance for races which would not be possible before.

Yeah, the unlocking is pretty sketchy. That arrived back in

  commit 860a999802d3c82538373bb3f314f92a2e258754
  Author: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
  Date:   Thu Jul 23 11:02:59 2020 +0300

    qemu: avoid deadlock in qemuDomainObjStopWorker
    
    We are dropping the only reference here so that the event loop thread
    is going to be exited synchronously. In order to avoid deadlocks we
    need to unlock the VM so that any handler being called can finish
    execution and thus even loop thread be finished too.
    
    Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com>
    Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
    Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


It is bad design that a mere g_object_unref has side effects in the object
finalizer which synchronize on external code. Finalizers really should only
do self-contained operations that cannot block. ie release resources.
We've got a g_thread_join() in vir_event_thread_finalize which is the root
of our problems.

So I think we need to split the event loop cleanup into two phases.

Right at the start of qemuProcessStop, before doing any cleanup, we could
issue some sort of a "shutdown" call to the event loop, to tell it to
finish processing pending work and stop the event loop. While waiting for
this to be done, we can safely release the mutex.

Then we do the qemuProcessStop as usual, and when it comes time to unref
the event thread object, there won't be any synchronization required, it
will be a plain resource release. Thus we don't need to release the mutex
anymore.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|