[libvirt] [PATCH] create a thread to handle MigrationParamReset to avoid deadlock

Yi Wang posted 1 patch 4 years, 3 months ago
Test syntax-check failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/1577085190-23456-1-git-send-email-wang.yi59@zte.com.cn
src/remote/remote_daemon_dispatch.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
[libvirt] [PATCH] create a thread to handle MigrationParamReset to avoid deadlock
Posted by Yi Wang 4 years, 3 months ago
From: Li XueLei <li.xuelei1@zte.com.cn>

Libvirtd no longer receives external requests, after I do the following.

Steps to reproduce:
1.Virsh tool initiates a non-p2p migrations.
2.After the phase of qemuDomainMigratePerform3 and before the phase of
  qemuDomainMigrateConfirm3, kill the virsh client which initiated the
  migration.

What happens:
    Libvirtd will be blocked in the next call stack because it can't
get the condition variables, and it will not be able to receive new
requests.

Thread 1 (Thread 0x7f36a7b9f8c0 (LWP 27556)):
#0  0x00007f36a45799f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f36a6e7a97e in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f3679c6a1b3 in qemuMonitorSend () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#3  0x00007f3679c837de in qemuMonitorJSONCommandWithFd () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#4  0x00007f3679c93c5a in qemuMonitorJSONSetMigrationCapabilities () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#5  0x00007f3679c786ed in qemuMonitorSetMigrationCapabilities () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#6  0x00007f3679c67857 in qemuMigrationParamsApply () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#7  0x00007f3679c67eff in qemuMigrationParamsReset () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#8  0x00007f3679c5198a in qemuMigrationSrcCleanup () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#9  0x00007f36a6dcd862 in virCloseCallbacksRun () from /lib64/libvirt.so.0
#10 0x00007f3679cc28f6 in qemuConnectClose () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#11 0x00007f36a70a0870 in virConnectDispose () from /lib64/libvirt.so.0
#12 0x00007f36a6e3f43b in virObjectUnref () from /lib64/libvirt.so.0
#13 0x00007f36a70a55c4 in virConnectClose () from /lib64/libvirt.so.0
#14 0x000056058a206f92 in remoteClientFree ()
#15 0x00007f36a6fa49a0 in virNetServerClientDispose () from /lib64/libvirt.so.0
#16 0x00007f36a6e3f43b in virObjectUnref () from /lib64/libvirt.so.0
#17 0x00007f36a6e3fd21 in virObjectFreeCallback () from /lib64/libvirt.so.0
#18 0x00007f36a6f9942e in virNetSocketEventFree () from /lib64/libvirt.so.0
#19 0x00007f36a6de8439 in virEventPollCleanupHandles () from /lib64/libvirt.so.0
#20 0x00007f36a6de9ab6 in virEventPollRunOnce () from /lib64/libvirt.so.0
#21 0x00007f36a6de817a in virEventRunDefaultImpl () from /lib64/libvirt.so.0
#22 0x00007f36a6faadb5 in virNetDaemonRun () from /lib64/libvirt.so.0
#23 0x000056058a1c5cc1 in main ()

Here's a patch to solve this bug, that is to create a thread to do
MigrationParamReset.

Signed-off-by: Li XueLei <li.xuelei1@zte.com.cn>
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>

---
 src/remote/remote_daemon_dispatch.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/src/remote/remote_daemon_dispatch.c b/src/remote/remote_daemon_dispatch.c
index 76c676c..3962ee3 100644
--- a/src/remote/remote_daemon_dispatch.c
+++ b/src/remote/remote_daemon_dispatch.c
@@ -1741,9 +1741,10 @@ remoteClientFreePrivateCallbacks(struct daemonClientPrivate *priv)
  * We keep the libvirt connection open until any async
  * jobs have finished, then clean it up elsewhere
  */
-void remoteClientFree(void *data)
+
+static void remoteClientFreeFun(void *opaque)
 {
-    struct daemonClientPrivate *priv = data;
+    struct daemonClientPrivate *priv = opaque;
 
     if (priv->conn)
         virConnectClose(priv->conn);
@@ -1760,7 +1761,16 @@ void remoteClientFree(void *data)
     if (priv->storageConn)
         virConnectClose(priv->storageConn);
 
-    VIR_FREE(priv);
+    VIR_FREE(priv);
+}
+
+void remoteClientFree(void *data)
+{
+    virThread thread;
+
+    if (virThreadCreate(&thread, false, remoteClientFreeFun, data) < 0) {
+        VIR_ERROR("Unable to create remoteClientFreeFun thread");
+    }
 }
 
 
-- 
1.8.3.1


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] create a thread to handle MigrationParamReset to avoid deadlock
Posted by Daniel P. Berrangé 4 years, 3 months ago
On Mon, Dec 23, 2019 at 03:13:10PM +0800, Yi Wang wrote:
> From: Li XueLei <li.xuelei1@zte.com.cn>
> 
> Libvirtd no longer receives external requests, after I do the following.
> 
> Steps to reproduce:
> 1.Virsh tool initiates a non-p2p migrations.
> 2.After the phase of qemuDomainMigratePerform3 and before the phase of
>   qemuDomainMigrateConfirm3, kill the virsh client which initiated the
>   migration.
> 
> What happens:
>     Libvirtd will be blocked in the next call stack because it can't
> get the condition variables, and it will not be able to receive new
> requests.
> 
> Thread 1 (Thread 0x7f36a7b9f8c0 (LWP 27556)):
> #0  0x00007f36a45799f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #1  0x00007f36a6e7a97e in virCondWait () from /lib64/libvirt.so.0
> #2  0x00007f3679c6a1b3 in qemuMonitorSend () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #3  0x00007f3679c837de in qemuMonitorJSONCommandWithFd () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #4  0x00007f3679c93c5a in qemuMonitorJSONSetMigrationCapabilities () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #5  0x00007f3679c786ed in qemuMonitorSetMigrationCapabilities () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #6  0x00007f3679c67857 in qemuMigrationParamsApply () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #7  0x00007f3679c67eff in qemuMigrationParamsReset () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #8  0x00007f3679c5198a in qemuMigrationSrcCleanup () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #9  0x00007f36a6dcd862 in virCloseCallbacksRun () from /lib64/libvirt.so.0
> #10 0x00007f3679cc28f6 in qemuConnectClose () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #11 0x00007f36a70a0870 in virConnectDispose () from /lib64/libvirt.so.0
> #12 0x00007f36a6e3f43b in virObjectUnref () from /lib64/libvirt.so.0
> #13 0x00007f36a70a55c4 in virConnectClose () from /lib64/libvirt.so.0
> #14 0x000056058a206f92 in remoteClientFree ()
> #15 0x00007f36a6fa49a0 in virNetServerClientDispose () from /lib64/libvirt.so.0
> #16 0x00007f36a6e3f43b in virObjectUnref () from /lib64/libvirt.so.0
> #17 0x00007f36a6e3fd21 in virObjectFreeCallback () from /lib64/libvirt.so.0
> #18 0x00007f36a6f9942e in virNetSocketEventFree () from /lib64/libvirt.so.0
> #19 0x00007f36a6de8439 in virEventPollCleanupHandles () from /lib64/libvirt.so.0
> #20 0x00007f36a6de9ab6 in virEventPollRunOnce () from /lib64/libvirt.so.0
> #21 0x00007f36a6de817a in virEventRunDefaultImpl () from /lib64/libvirt.so.0
> #22 0x00007f36a6faadb5 in virNetDaemonRun () from /lib64/libvirt.so.0
> #23 0x000056058a1c5cc1 in main ()

Hmm, this is a bit strange - why are we trying to reset migration parameters
in a connection close callback in the first place ?

Libvirt allows multiple connections concurrently and changes made by one
connection are supposed to apply globally. No per-VM state should be tied
to a particular connection.

IOW, I don't think we need a thread. We should just stop calling
qemuMigrationSrcCleanup from the connection close callback entirely


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] create a thread to handle MigrationParamReset to avoid deadlock
Posted by Michal Prívozník 4 years, 3 months ago
On 12/23/19 11:12 AM, Daniel P. Berrangé wrote:
> On Mon, Dec 23, 2019 at 03:13:10PM +0800, Yi Wang wrote:
>> From: Li XueLei <li.xuelei1@zte.com.cn>
>>
>> Libvirtd no longer receives external requests, after I do the following.
>>
>> Steps to reproduce:
>> 1.Virsh tool initiates a non-p2p migrations.
>> 2.After the phase of qemuDomainMigratePerform3 and before the phase of
>>   qemuDomainMigrateConfirm3, kill the virsh client which initiated the
>>   migration.
>>
>> What happens:
>>     Libvirtd will be blocked in the next call stack because it can't
>> get the condition variables, and it will not be able to receive new
>> requests.
>>
>> Thread 1 (Thread 0x7f36a7b9f8c0 (LWP 27556)):
>> #0  0x00007f36a45799f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
>> #1  0x00007f36a6e7a97e in virCondWait () from /lib64/libvirt.so.0
>> #2  0x00007f3679c6a1b3 in qemuMonitorSend () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
>> #3  0x00007f3679c837de in qemuMonitorJSONCommandWithFd () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
>> #4  0x00007f3679c93c5a in qemuMonitorJSONSetMigrationCapabilities () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
>> #5  0x00007f3679c786ed in qemuMonitorSetMigrationCapabilities () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
>> #6  0x00007f3679c67857 in qemuMigrationParamsApply () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
>> #7  0x00007f3679c67eff in qemuMigrationParamsReset () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
>> #8  0x00007f3679c5198a in qemuMigrationSrcCleanup () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
>> #9  0x00007f36a6dcd862 in virCloseCallbacksRun () from /lib64/libvirt.so.0
>> #10 0x00007f3679cc28f6 in qemuConnectClose () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
>> #11 0x00007f36a70a0870 in virConnectDispose () from /lib64/libvirt.so.0
>> #12 0x00007f36a6e3f43b in virObjectUnref () from /lib64/libvirt.so.0
>> #13 0x00007f36a70a55c4 in virConnectClose () from /lib64/libvirt.so.0
>> #14 0x000056058a206f92 in remoteClientFree ()
>> #15 0x00007f36a6fa49a0 in virNetServerClientDispose () from /lib64/libvirt.so.0
>> #16 0x00007f36a6e3f43b in virObjectUnref () from /lib64/libvirt.so.0
>> #17 0x00007f36a6e3fd21 in virObjectFreeCallback () from /lib64/libvirt.so.0
>> #18 0x00007f36a6f9942e in virNetSocketEventFree () from /lib64/libvirt.so.0
>> #19 0x00007f36a6de8439 in virEventPollCleanupHandles () from /lib64/libvirt.so.0
>> #20 0x00007f36a6de9ab6 in virEventPollRunOnce () from /lib64/libvirt.so.0
>> #21 0x00007f36a6de817a in virEventRunDefaultImpl () from /lib64/libvirt.so.0
>> #22 0x00007f36a6faadb5 in virNetDaemonRun () from /lib64/libvirt.so.0
>> #23 0x000056058a1c5cc1 in main ()
> 
> Hmm, this is a bit strange - why are we trying to reset migration parameters
> in a connection close callback in the first place ?

Because we are cancelling the ongoing migration because the connection
that initiated it was closed.

> 
> Libvirt allows multiple connections concurrently and changes made by one
> connection are supposed to apply globally. No per-VM state should be tied
> to a particular connection.

This is generally very true, except for migration.

> 
> IOW, I don't think we need a thread. We should just stop calling
> qemuMigrationSrcCleanup from the connection close callback entirely

But I agree that something fishy is going on and this doesn't look like
proper solution. Yi, can you please share the backtrace of other threads
too? Why aren't we getting any reply on the monitor?

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] create a thread to handle MigrationParamReset to avoid deadlock
Posted by Daniel P. Berrangé 4 years, 3 months ago
On Mon, Dec 23, 2019 at 04:50:00PM +0100, Michal Prívozník wrote:
> On 12/23/19 11:12 AM, Daniel P. Berrangé wrote:
> > On Mon, Dec 23, 2019 at 03:13:10PM +0800, Yi Wang wrote:
> >> From: Li XueLei <li.xuelei1@zte.com.cn>
> >>
> >> Libvirtd no longer receives external requests, after I do the following.
> >>
> >> Steps to reproduce:
> >> 1.Virsh tool initiates a non-p2p migrations.
> >> 2.After the phase of qemuDomainMigratePerform3 and before the phase of
> >>   qemuDomainMigrateConfirm3, kill the virsh client which initiated the
> >>   migration.
> >>
> >> What happens:
> >>     Libvirtd will be blocked in the next call stack because it can't
> >> get the condition variables, and it will not be able to receive new
> >> requests.
> >>
> >> Thread 1 (Thread 0x7f36a7b9f8c0 (LWP 27556)):
> >> #0  0x00007f36a45799f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> >> #1  0x00007f36a6e7a97e in virCondWait () from /lib64/libvirt.so.0
> >> #2  0x00007f3679c6a1b3 in qemuMonitorSend () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> >> #3  0x00007f3679c837de in qemuMonitorJSONCommandWithFd () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> >> #4  0x00007f3679c93c5a in qemuMonitorJSONSetMigrationCapabilities () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> >> #5  0x00007f3679c786ed in qemuMonitorSetMigrationCapabilities () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> >> #6  0x00007f3679c67857 in qemuMigrationParamsApply () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> >> #7  0x00007f3679c67eff in qemuMigrationParamsReset () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> >> #8  0x00007f3679c5198a in qemuMigrationSrcCleanup () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> >> #9  0x00007f36a6dcd862 in virCloseCallbacksRun () from /lib64/libvirt.so.0
> >> #10 0x00007f3679cc28f6 in qemuConnectClose () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> >> #11 0x00007f36a70a0870 in virConnectDispose () from /lib64/libvirt.so.0
> >> #12 0x00007f36a6e3f43b in virObjectUnref () from /lib64/libvirt.so.0
> >> #13 0x00007f36a70a55c4 in virConnectClose () from /lib64/libvirt.so.0
> >> #14 0x000056058a206f92 in remoteClientFree ()
> >> #15 0x00007f36a6fa49a0 in virNetServerClientDispose () from /lib64/libvirt.so.0
> >> #16 0x00007f36a6e3f43b in virObjectUnref () from /lib64/libvirt.so.0
> >> #17 0x00007f36a6e3fd21 in virObjectFreeCallback () from /lib64/libvirt.so.0
> >> #18 0x00007f36a6f9942e in virNetSocketEventFree () from /lib64/libvirt.so.0
> >> #19 0x00007f36a6de8439 in virEventPollCleanupHandles () from /lib64/libvirt.so.0
> >> #20 0x00007f36a6de9ab6 in virEventPollRunOnce () from /lib64/libvirt.so.0
> >> #21 0x00007f36a6de817a in virEventRunDefaultImpl () from /lib64/libvirt.so.0
> >> #22 0x00007f36a6faadb5 in virNetDaemonRun () from /lib64/libvirt.so.0
> >> #23 0x000056058a1c5cc1 in main ()
> > 
> > Hmm, this is a bit strange - why are we trying to reset migration parameters
> > in a connection close callback in the first place ?
> 
> Because we are cancelling the ongoing migration because the connection
> that initiated it was closed.
> 
> > 
> > Libvirt allows multiple connections concurrently and changes made by one
> > connection are supposed to apply globally. No per-VM state should be tied
> > to a particular connection.
> 
> This is generally very true, except for migration.

Hmm, so in that case we do need to make sure this runs from a non-main
event thread as this patch does. I'm surprised we've not seen this
before though - i'd think it should be a guaranteed deadlock anytime
someone tests this scenario.

> 
> > 
> > IOW, I don't think we need a thread. We should just stop calling
> > qemuMigrationSrcCleanup from the connection close callback entirely
> 
> But I agree that something fishy is going on and this doesn't look like
> proper solution. Yi, can you please share the backtrace of other threads
> too? Why aren't we getting any reply on the monitor?

qemuMonitorSend() just puts date onto the TX queue & waits for the
main event loop to send it.  We're invoking qemuMonitorSend from
the main event loop in this backtrace, hence it'll wait forever
for the thread to send it.

qemuMonitorSend must never be invoked from the main event thread.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt][PATCH] create a thread to handle MigrationParamResetto avoid deadlock
Posted by wang.yi59@zte.com.cn 4 years, 3 months ago
Hi Daniel,

Thanks a lot for your review and reply!

> On Mon, Dec 23, 2019 at 04:50:00PM +0100, Michal Prívozník wrote:
> > On 12/23/19 11:12 AM, Daniel P. Berrangé wrote:
> > > On Mon, Dec 23, 2019 at 03:13:10PM +0800, Yi Wang wrote:
> > >> From: Li XueLei <li.xuelei1@zte.com.cn>
> > >>
> > >> Libvirtd no longer receives external requests, after I do the following.
> > >>

.....

> > >
> > > Libvirt allows multiple connections concurrently and changes made by one
> > > connection are supposed to apply globally. No per-VM state should be tied
> > > to a particular connection.
> >
> > This is generally very true, except for migration.
>
> Hmm, so in that case we do need to make sure this runs from a non-main
> event thread as this patch does. I'm surprised we've not seen this
> before though - i'd think it should be a guaranteed deadlock anytime
> someone tests this scenario.
>
> >
> > >
> > > IOW, I don't think we need a thread. We should just stop calling
> > > qemuMigrationSrcCleanup from the connection close callback entirely
> >
> > But I agree that something fishy is going on and this doesn't look like
> > proper solution. Yi, can you please share the backtrace of other threads
> > too? Why aren't we getting any reply on the monitor?
>
> qemuMonitorSend() just puts date onto the TX queue & waits for the
> main event loop to send it.  We're invoking qemuMonitorSend from
> the main event loop in this backtrace, hence it'll wait forever
> for the thread to send it.
>
> qemuMonitorSend must never be invoked from the main event thread.

So do you think how to imporve this patch? Any sugesstion is appreciated :-)

>
> Regards,
> Daniel--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] create a thread to handle MigrationParamResetto avoid deadlock
Posted by Daniel P. Berrangé 4 years, 2 months ago
On Fri, Dec 27, 2019 at 01:59:51PM +0800, wang.yi59@zte.com.cn wrote:
> Hi Daniel,
> 
> Thanks a lot for your review and reply!
> 
> > On Mon, Dec 23, 2019 at 04:50:00PM +0100, Michal Prívozník wrote:
> > > On 12/23/19 11:12 AM, Daniel P. Berrangé wrote:
> > > > On Mon, Dec 23, 2019 at 03:13:10PM +0800, Yi Wang wrote:
> > > >> From: Li XueLei <li.xuelei1@zte.com.cn>
> > > >>
> > > >> Libvirtd no longer receives external requests, after I do the following.
> > > >>
> 
> .....
> 
> > > >
> > > > Libvirt allows multiple connections concurrently and changes made by one
> > > > connection are supposed to apply globally. No per-VM state should be tied
> > > > to a particular connection.
> > >
> > > This is generally very true, except for migration.
> >
> > Hmm, so in that case we do need to make sure this runs from a non-main
> > event thread as this patch does. I'm surprised we've not seen this
> > before though - i'd think it should be a guaranteed deadlock anytime
> > someone tests this scenario.
> >
> > >
> > > >
> > > > IOW, I don't think we need a thread. We should just stop calling
> > > > qemuMigrationSrcCleanup from the connection close callback entirely
> > >
> > > But I agree that something fishy is going on and this doesn't look like
> > > proper solution. Yi, can you please share the backtrace of other threads
> > > too? Why aren't we getting any reply on the monitor?
> >
> > qemuMonitorSend() just puts date onto the TX queue & waits for the
> > main event loop to send it.  We're invoking qemuMonitorSend from
> > the main event loop in this backtrace, hence it'll wait forever
> > for the thread to send it.
> >
> > qemuMonitorSend must never be invoked from the main event thread.
> 
> So do you think how to imporve this patch? Any sugesstion is appreciated :-)

I'm facing a related problem in my patches for providing an "embedded"
QEMU driver - any libvirt API calls that the application makes from
its main event loop will deadlock if they result in a QEMU monitor
command.

We've got another long standing scalability bug too when dealing
with shutdown takes too long, the main event loop is blocked for an
unreasonably long time.

I think the solution to all of these problems is to stop using the main
event loop for the QEMU monitor and agent.

Instead, we should create a new thread for each QEMU VM, and that thread
should run an event loop, handling just the monitor and agent work for
that one VM.

We can do this quite easily now if we use GLib's  GMainContext as the
event loop.

We'll create a GMainContext and store it in qemuDomainObjPrivatePtr.
In qemuProcessLaunch and qemuProcessReconnect we'll need to spawn a
thread running this GMainContext as an event loop.

At the end of qemuProcess we'll need to tell this event loop to quit
and stop the thread.

Then the qemu_agent.c and qemu_monitor.c code need changing to use the
g_source_add_unix_fd () / g_source_remove_unix_fd () APIs instead of
virEventAddHandle/virEventRemoveHandle.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] create a thread to handle MigrationParamResetto avoid deadlock
Posted by Daniel P. Berrangé 4 years, 2 months ago
On Fri, Jan 03, 2020 at 10:11:22AM +0000, Daniel P. Berrangé wrote:
> On Fri, Dec 27, 2019 at 01:59:51PM +0800, wang.yi59@zte.com.cn wrote:
> > Hi Daniel,
> > 
> > Thanks a lot for your review and reply!
> > 
> > > On Mon, Dec 23, 2019 at 04:50:00PM +0100, Michal Prívozník wrote:
> > > > On 12/23/19 11:12 AM, Daniel P. Berrangé wrote:
> > > > > On Mon, Dec 23, 2019 at 03:13:10PM +0800, Yi Wang wrote:
> > > > >> From: Li XueLei <li.xuelei1@zte.com.cn>
> > > > >>
> > > > >> Libvirtd no longer receives external requests, after I do the following.
> > > > >>
> > 
> > .....
> > 
> > > > >
> > > > > Libvirt allows multiple connections concurrently and changes made by one
> > > > > connection are supposed to apply globally. No per-VM state should be tied
> > > > > to a particular connection.
> > > >
> > > > This is generally very true, except for migration.
> > >
> > > Hmm, so in that case we do need to make sure this runs from a non-main
> > > event thread as this patch does. I'm surprised we've not seen this
> > > before though - i'd think it should be a guaranteed deadlock anytime
> > > someone tests this scenario.
> > >
> > > >
> > > > >
> > > > > IOW, I don't think we need a thread. We should just stop calling
> > > > > qemuMigrationSrcCleanup from the connection close callback entirely
> > > >
> > > > But I agree that something fishy is going on and this doesn't look like
> > > > proper solution. Yi, can you please share the backtrace of other threads
> > > > too? Why aren't we getting any reply on the monitor?
> > >
> > > qemuMonitorSend() just puts date onto the TX queue & waits for the
> > > main event loop to send it.  We're invoking qemuMonitorSend from
> > > the main event loop in this backtrace, hence it'll wait forever
> > > for the thread to send it.
> > >
> > > qemuMonitorSend must never be invoked from the main event thread.
> > 
> > So do you think how to imporve this patch? Any sugesstion is appreciated :-)
> 
> I'm facing a related problem in my patches for providing an "embedded"
> QEMU driver - any libvirt API calls that the application makes from
> its main event loop will deadlock if they result in a QEMU monitor
> command.
> 
> We've got another long standing scalability bug too when dealing
> with shutdown takes too long, the main event loop is blocked for an
> unreasonably long time.
> 
> I think the solution to all of these problems is to stop using the main
> event loop for the QEMU monitor and agent.
> 
> Instead, we should create a new thread for each QEMU VM, and that thread
> should run an event loop, handling just the monitor and agent work for
> that one VM.
> 
> We can do this quite easily now if we use GLib's  GMainContext as the
> event loop.
> 
> We'll create a GMainContext and store it in qemuDomainObjPrivatePtr.
> In qemuProcessLaunch and qemuProcessReconnect we'll need to spawn a
> thread running this GMainContext as an event loop.
> 
> At the end of qemuProcess we'll need to tell this event loop to quit
> and stop the thread.
> 
> Then the qemu_agent.c and qemu_monitor.c code need changing to use the
> g_source_add_unix_fd () / g_source_remove_unix_fd () APIs instead of
> virEventAddHandle/virEventRemoveHandle.

Is this approach something you would like to work on implementing,
or have already started ?  If not, I'll attempt to implement it
myself in the next week or so.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|