[libvirt] [PATCH v2] qemu: fix deadlock if create qemuProcessReconnect thread failed

Wang Yechao posted 1 patch 5 years, 7 months ago
Failed in applying to current master (apply log)
There is a newer version of this series
src/conf/domain_conf.h      | 1 +
src/conf/virdomainobjlist.c | 6 ++++--
src/qemu/qemu_process.c     | 1 +
3 files changed, 6 insertions(+), 2 deletions(-)
[libvirt] [PATCH v2] qemu: fix deadlock if create qemuProcessReconnect thread failed
Posted by Wang Yechao 5 years, 7 months ago
qemuProcessReconnectHelper has hold the doms lock, if create
qemuProcessReconnect thread failed, it will get the doms lock
again to remove the dom from doms list.

add obj->inReconnetCtx flag to avoid deadlock.

Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
---
 src/conf/domain_conf.h      | 1 +
 src/conf/virdomainobjlist.c | 6 ++++--
 src/qemu/qemu_process.c     | 1 +
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
index e30a4b2..5bc5771 100644
--- a/src/conf/domain_conf.h
+++ b/src/conf/domain_conf.h
@@ -2608,6 +2608,7 @@ struct _virDomainObj {
     virDomainSnapshotObjPtr current_snapshot;
 
     bool hasManagedSave;
+    bool inReconnectCtx;
 
     void *privateData;
     void (*privateDataFreeFunc)(void *);
diff --git a/src/conf/virdomainobjlist.c b/src/conf/virdomainobjlist.c
index 805fe94..30300b4 100644
--- a/src/conf/virdomainobjlist.c
+++ b/src/conf/virdomainobjlist.c
@@ -397,11 +397,13 @@ virDomainObjListRemove(virDomainObjListPtr doms,
     dom->removing = true;
     virObjectRef(dom);
     virObjectUnlock(dom);
-    virObjectRWLockWrite(doms);
+    if (!dom->inReconnectCtx)
+        virObjectRWLockWrite(doms);
     virObjectLock(dom);
     virDomainObjListRemoveLocked(doms, dom);
     virObjectUnref(dom);
-    virObjectRWUnlock(doms);
+    if (!dom->inReconnectCtx)
+        virObjectRWUnlock(doms);
 }
 
 
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index eb9904b..8c30850 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -8029,6 +8029,7 @@ qemuProcessReconnectHelper(virDomainObjPtr obj,
          */
         qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_FAILED,
                         QEMU_ASYNC_JOB_NONE, 0);
+        obj->inReconnectCtx = true;
         qemuDomainRemoveInactiveJob(src->driver, obj);
 
         virDomainObjEndAPI(&obj);
-- 
1.8.3.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2] qemu: fix deadlock if create qemuProcessReconnect thread failed
Posted by Peter Krempa 5 years, 7 months ago
On Thu, Sep 13, 2018 at 19:28:12 +0800, Wang Yechao wrote:
> qemuProcessReconnectHelper has hold the doms lock, if create
> qemuProcessReconnect thread failed, it will get the doms lock
> again to remove the dom from doms list.
> 
> add obj->inReconnetCtx flag to avoid deadlock.

Please describe the situation more or provide a reproducer.

> 
> Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
> ---
>  src/conf/domain_conf.h      | 1 +
>  src/conf/virdomainobjlist.c | 6 ++++--
>  src/qemu/qemu_process.c     | 1 +
>  3 files changed, 6 insertions(+), 2 deletions(-)
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] 答复: Re: [PATCH v2] qemu: fix deadlock if createqemuProcessReconnect thread failed
Posted by wang.yechao255@zte.com.cn 5 years, 7 months ago
I just code review, found there may be problem.


The follow statement in founction qemuProcessReconnectHelper:


"if (virThreadCreate(&thread, false, qemuProcessReconnect, data) < 0) "


may be failed (no one can guarantee 'virThreadCreate' always success).


if ‘virThreadCreate’ failed, the follow backstrace we will get:





#0  0x00007fa89921203e in pthread_rwlock_wrlock () from /lib64/libpthread.so.0

#1  0x00007fa89ba218e5 in virRWLockWrite (m=<optimized out>) at util/virthread.c:122

#2  0x00007fa89b9f9ebb in virObjectRWLockWrite (anyobj=<optimized out>) at util/virobject.c:487

#3  0x00007fa89ba82a68 in virDomainObjListRemove (doms=0x7fa87411fde0, dom=0x7fa8740f94f0) at conf/virdomainobjlist.c:400

#4  0x00007fa87e1b9ace in qemuDomainRemoveInactive (driver=driver@entry=0x7fa87411aa20, vm=vm@entry=0x7fa8740f94f0) at qemu/qemu_domain.c:8309

#5  0x00007fa87e1b9c02 in qemuDomainRemoveInactiveJob (driver=0x7fa87411aa20, vm=0x7fa8740f94f0) at qemu/qemu_domain.c:8331

#6  0x00007fa87e1ef36d in qemuProcessReconnectHelper (obj=0x7fa8740f94f0, opaque=0x7fa87b4b3c30) at qemu/qemu_process.c:8035

#7  0x00007fa89ba81e9a in virDomainObjListHelper (payload=<optimized out>, name=<optimized out>, opaque=0x7fa87b4b3c00) at conf/virdomainobjlist.c:804

#8  0x00007fa89b9ccaa0 in virHashForEach (table=0x7fa87410e520, iter=iter@entry=0x7fa89ba81e90 <virDomainObjListHelper>, data=data@entry=0x7fa87b4b3c00)

    at util/virhash.c:580

#9  0x00007fa89ba83391 in virDomainObjListForEach (doms=0x7fa87411fde0, callback=callback@entry=0x7fa87e1ef220 <qemuProcessReconnectHelper>,

    opaque=opaque@entry=0x7fa87b4b3c30) at conf/virdomainobjlist.c:819

#10 0x00007fa87e1f1564 in qemuProcessReconnectAll (driver=<optimized out>) at qemu/qemu_process.c:8056

#11 0x00007fa87e227928 in qemuStateInitialize (privileged=true, callback=<optimized out>, opaque=<optimized out>) at qemu/qemu_driver.c:919

#12 0x00007fa89bb9f91f in virStateInitialize (privileged=true, callback=callback@entry=0x7fa89c547cd0 <daemonInhibitCallback>, opaque=opaque@entry=0x7fa89d875c00)

    at libvirt.c:662

#13 0x00007fa89c547d2b in daemonRunStateInit (opaque=0x7fa89d875c00) at remote/remote_daemon.c:803

#14 0x00007fa89ba21712 in virThreadHelper (data=<optimized out>) at util/virthread.c:206

#15 0x00007fa89920edc5 in start_thread () from /lib64/libpthread.so.0

#16 0x00007fa898b3673d in clone () from /lib64/libc.so.6






frame 8, virHashForEach has called virObjectLock(doms)


frame 3, virDomainObjListRemove calls virObjectRWLockWrite(doms) again.


thus deadlock occurs.











原始邮件



发件人:PeterKrempa <pkrempa@redhat.com>
收件人:王业超10154425;
抄送人:libvir-list@redhat.com <libvir-list@redhat.com>
日 期 :2018年09月13日 19:31
主 题 :Re: [libvirt] [PATCH v2] qemu: fix deadlock if createqemuProcessReconnect thread failed


On Thu, Sep 13, 2018 at 19:28:12 +0800, Wang Yechao wrote:
> qemuProcessReconnectHelper has hold the doms lock, if create
> qemuProcessReconnect thread failed, it will get the doms lock
> again to remove the dom from doms list.
> 
> add obj->inReconnetCtx flag to avoid deadlock.

Please describe the situation more or provide a reproducer.

> 
> Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
> ---
>  src/conf/domain_conf.h      | 1 +
>  src/conf/virdomainobjlist.c | 6 ++++--
>  src/qemu/qemu_process.c     | 1 +
>  3 files changed, 6 insertions(+), 2 deletions(-)--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt]答复: Re: [PATCH v2] qemu: fix deadlock if createqemuProcessReconnect thread failed
Posted by John Ferlan 5 years, 7 months ago

On 09/13/2018 10:11 PM, wang.yechao255@zte.com.cn wrote:
> I just code review, found there may be problem.
> 
> The follow statement in founction qemuProcessReconnectHelper:
> 
> "if (virThreadCreate(&thread, false, qemuProcessReconnect, data) < 0) "
> 
> may be failed (no one can guarantee 'virThreadCreate' always success).

Not sure how, but your response isn't threaded with your original
posting. Makes it difficult to follow.

Posting a few bugs related to qemuProcessReconnectHelper - makes me
wonder "how" you get yourself into this situation. Is it like the
libnuma bug and you're creating scripts that have frequent and continual
libvirtd restarts, which I noted there isn't not really a normal situation.

In any case, I think in this situation what we want is to call
virDomainObjListRemoveLocked instead of virDomainObjListRemove from
qemuDomainRemoveInactive because all the prerequisites for calling the
Locked function are true (the obj is locked/reffed and the list lock is
held).

Now this is *not* to say qemuDomainRemoveInactive should always call the
*Locked variant. It means adding a "bool useLocked" to the API and then
passing true *only* from qemuProcessReconnectHelper and then of course
calling the virDomainObjListRemoveLocked variant when the bool is true.
Requires changing all the other callers to pass false.

Another preferred option:

Patch 1. Split up qemuDomainRemoveInactive, such that code from "cfg =
virQEMUDriverGetConfig(driver);" through/including
"qemuExtDevicesCleanupHost(driver, vm->def);" plus the
"virObjectUnref(cfg);" is in a static helper
"qemuDomainRemoveInactiveCommon".

Patch 2. Create a qemuDomainRemoveInactiveLocked which is a copy of
qemuDomainRemoveInactive except that instead of calling
virDomainObjListRemove it calls virDomainObjListRemoveLocked.

Patch 3. Create a qemuDomainRemoveInactiveJobLocked which copies
qemuDomainRemoveInactiveJob except of course calling
virDomainObjListRemoveLocked.

Patch 4. Modify qemuProcessReconnectHelper to call
qemuDomainRemoveInactiveJobLocked.

John

> 
> if ‘virThreadCreate’ failed, the follow backstrace we will get:
> 
> 
> #0  0x00007fa89921203e in pthread_rwlock_wrlock () from
> /lib64/libpthread.so.0
> 
> #1  0x00007fa89ba218e5 in virRWLockWrite (m=<optimized out>) at
> util/virthread.c:122
> 
> #2  0x00007fa89b9f9ebb in virObjectRWLockWrite (anyobj=<optimized out>)
> at util/virobject.c:487
> 
> #3  0x00007fa89ba82a68 in virDomainObjListRemove (doms=0x7fa87411fde0,
> dom=0x7fa8740f94f0) at conf/virdomainobjlist.c:400
> 
> #4  0x00007fa87e1b9ace in qemuDomainRemoveInactive
> (driver=driver@entry=0x7fa87411aa20, vm=vm@entry=0x7fa8740f94f0) at
> qemu/qemu_domain.c:8309
> 
> #5  0x00007fa87e1b9c02 in qemuDomainRemoveInactiveJob
> (driver=0x7fa87411aa20, vm=0x7fa8740f94f0) at qemu/qemu_domain.c:8331
> 
> #6  0x00007fa87e1ef36d in qemuProcessReconnectHelper
> (obj=0x7fa8740f94f0, opaque=0x7fa87b4b3c30) at qemu/qemu_process.c:8035
> 
> #7  0x00007fa89ba81e9a in virDomainObjListHelper (payload=<optimized
> out>, name=<optimized out>, opaque=0x7fa87b4b3c00) at
> conf/virdomainobjlist.c:804
> 
> #8  0x00007fa89b9ccaa0 in virHashForEach (table=0x7fa87410e520,
> iter=iter@entry=0x7fa89ba81e90 <virDomainObjListHelper>,
> data=data@entry=0x7fa87b4b3c00)
> 
>     at util/virhash.c:580
> 
> #9  0x00007fa89ba83391 in virDomainObjListForEach (doms=0x7fa87411fde0,
> callback=callback@entry=0x7fa87e1ef220 <qemuProcessReconnectHelper>,
> 
>     opaque=opaque@entry=0x7fa87b4b3c30) at conf/virdomainobjlist.c:819
> 
> #10 0x00007fa87e1f1564 in qemuProcessReconnectAll (driver=<optimized
> out>) at qemu/qemu_process.c:8056
> 
> #11 0x00007fa87e227928 in qemuStateInitialize (privileged=true,
> callback=<optimized out>, opaque=<optimized out>) at qemu/qemu_driver.c:919
> 
> #12 0x00007fa89bb9f91f in virStateInitialize (privileged=true,
> callback=callback@entry=0x7fa89c547cd0 <daemonInhibitCallback>,
> opaque=opaque@entry=0x7fa89d875c00)
> 
>     at libvirt.c:662
> 
> #13 0x00007fa89c547d2b in daemonRunStateInit (opaque=0x7fa89d875c00) at
> remote/remote_daemon.c:803
> 
> #14 0x00007fa89ba21712 in virThreadHelper (data=<optimized out>) at
> util/virthread.c:206
> 
> #15 0x00007fa89920edc5 in start_thread () from /lib64/libpthread.so.0
> 
> #16 0x00007fa898b3673d in clone () from /lib64/libc.so.6
> 
> 
> frame 8, virHashForEach has called virObjectLock(doms)
> 
> frame 3, virDomainObjListRemove calls virObjectRWLockWrite(doms) again.
> 
> thus deadlock occurs.
> 
> 
> 
> 原始邮件
> *发件人:*PeterKrempa <pkrempa@redhat.com>
> *收件人:*王业超10154425;
> *抄送人:*libvir-list@redhat.com <libvir-list@redhat.com>
> *日 期 :*2018年09月13日 19:31
> *主 题 :**Re: [libvirt] [PATCH v2] qemu: fix deadlock if
> createqemuProcessReconnect thread failed*
> On Thu, Sep 13, 2018 at 19:28:12 +0800, Wang Yechao wrote:
>> qemuProcessReconnectHelper has hold the doms lock, if create
>> qemuProcessReconnect thread failed, it will get the doms lock
>> again to remove the dom from doms list.
>> 
>> add obj->inReconnetCtx flag to avoid deadlock.
> 
> Please describe the situation more or provide a reproducer.
> 
>> 
>> Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
>> ---
>>  src/conf/domain_conf.h      | 1 +
>>  src/conf/virdomainobjlist.c | 6 ++++--
>>  src/qemu/qemu_process.c     | 1 +
>>  3 files changed, 6 insertions(+), 2 deletions(-)
> 
> 
> 
> 
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt]: Re: [PATCH v2] qemu: fix deadlock if createqemuProcessReconnect thread failed
Posted by wang.yechao255@zte.com.cn 5 years, 7 months ago
> On 09/13/2018 10:11 PM, wang.yechao255@zte.com.cn wrote:
> > I just code review, found there may be problem.
> > 
> > The follow statement in founction qemuProcessReconnectHelper:
> > 
> > "if (virThreadCreate(&thread, false, qemuProcessReconnect, data) < 0) "
> > 
> > may be failed (no one can guarantee 'virThreadCreate' always success).

> Not sure how, but your response isn't threaded with your original
> posting. Makes it difficult to follow.

I'll fix later.

> Posting a few bugs related to qemuProcessReconnectHelper - makes me
> wonder "how" you get yourself into this situation. Is it like the
> libnuma bug and you're creating scripts that have frequent and continual
> libvirtd restarts, which I noted there isn't not really a normal situation.

libnuma bug is reproduced by scripts. But this bug is not, I did not reproduce it.
Only read the code, found may be enter this situation.

> In any case, I think in this situation what we want is to call
> virDomainObjListRemoveLocked instead of virDomainObjListRemove from
> qemuDomainRemoveInactive because all the prerequisites for calling the
> Locked function are true (the obj is locked/reffed and the list lock is
> held).

> Now this is *not* to say qemuDomainRemoveInactive should always call the
> *Locked variant. It means adding a "bool useLocked" to the API and then
> passing true *only* from qemuProcessReconnectHelper and then of course
> calling the virDomainObjListRemoveLocked variant when the bool is true.
> Requires changing all the other callers to pass false.

> Another preferred option:

> Patch 1. Split up qemuDomainRemoveInactive, such that code from "cfg =
> virQEMUDriverGetConfig(driver);" through/including
> "qemuExtDevicesCleanupHost(driver, vm->def);" plus the
> "virObjectUnref(cfg);" is in a static helper
> "qemuDomainRemoveInactiveCommon".

> Patch 2. Create a qemuDomainRemoveInactiveLocked which is a copy of
> qemuDomainRemoveInactive except that instead of calling
> virDomainObjListRemove it calls virDomainObjListRemoveLocked.

> Patch 3. Create a qemuDomainRemoveInactiveJobLocked which copies
> qemuDomainRemoveInactiveJob except of course calling
> virDomainObjListRemoveLocked.

> Patch 4. Modify qemuProcessReconnectHelper to call
> qemuDomainRemoveInactiveJobLocked.

> John

Yes, your suggested pathes are preferred. Thanks, I'll give v3 patch.

---
Best wishes
Wang Yechao--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list