[libvirt] [PATCH] qemu: snapshot: Fix libvirtd crash in snapshot-revert

Lin Ma posted 1 patch 4 years, 4 months ago
Test syntax-check failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20191213074736.21641-1-lma@suse.com
src/qemu/qemu_monitor.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[libvirt] [PATCH] qemu: snapshot: Fix libvirtd crash in snapshot-revert
Posted by Lin Ma 4 years, 4 months ago
When reverting a running domain to a snapshot(active state), We need to
use the FORCE flag for snapshot-revert if current domain configuration
is different from the target domain configuration, and this will start a
new qemu instance for the target domain.

In this situation, if there is existing connection to the domain, say
Spice or VNC through virt-manager, Then the libvirtd would crash during
snapshot revert because: Both of snapshot revert worker and new worker
job 'remoteDispatchDomainOpenGraphicsFd' are waiting for mon->msg->finished
in qemuMonitorSend(), We know if IO process resulted in an error with a
message, Libvirtd main thread calls qemuMonitorIO() to wakeup the waiter.
Then mon->msg will be set to NULL in qemuMonitorSend() once the worker
GraphicsFD is woken up, which causes snapshot revert worker dereferences
this null pointer.

Not sure whether this scenario makes sense, But at least the libvirtd
should not crash, So fix it.

 Thread 6 "libvirtd" hit Breakpoint 1, qemuMonitorSend
 987         if (mon->lastError.code != VIR_ERR_OK) {
 (gdb) bt
 #0  in qemuMonitorSend
 #1  in qemuMonitorJSONCommandWithFd
 #2  in qemuMonitorJSONSendFileHandle
 #3  in qemuMonitorSendFileHandle
 #4  in qemuMonitorOpenGraphics
 #5  in qemuDomainOpenGraphicsFD
 #6  in virDomainOpenGraphicsFD
 #7  in remoteDispatchDomainOpenGraphicsFd
 #8  in remoteDispatchDomainOpenGraphicsFdHelper
 #9  in virNetServerProgramDispatchCall
 #10 in virNetServerProgramDispatch
 #11 in virNetServerProcessMsg
 #12 in virNetServerHandleJob
 #13 in virThreadPoolWorker
 #14 in virThreadHelper
 #15 in start_thread
 #16 in clone

 (gdb) c
 Continuing.

 Thread 2 "libvirtd" received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x7fbd20fe2700 (LWP 19014)]
 0x00007fbd1c3be838 in qemuMonitorSend (mon=mon@entry=0x7fbd1804a940,
                                        msg=msg@entry=0x7fbd20fe15b0)
 at ../../src/qemu/qemu_monitor.c:1001

 1001        while (!mon->msg->finished) {
 (gdb) bt
 #0  in qemuMonitorSend
 #1  in qemuMonitorJSONCommandWithFd
 #2  in qemuMonitorJSONCommand
 #3  in qemuMonitorJSONGetChardevInfo
 #4  in qemuMonitorGetChardevInfo
 #5  in qemuProcessWaitForMonitor
 #6  in qemuProcessLaunch
 #7  in qemuProcessStart
 #8  in qemuDomainRevertToSnapshot
 #9  in virDomainRevertToSnapshot
 #10 in remoteDispatchDomainRevertToSnapshot
 #11 in remoteDispatchDomainRevertToSnapshotHelper
 #12 in virNetServerProgramDispatchCall
 #13 in virNetServerProgramDispatch
 #14 in virNetServerProcessMsg
 #15 in virNetServerHandleJob
 #16 in virThreadPoolWorker
 #17 in virThreadHelper
 #18 in start_thread
 #19 in clone

Signed-off-by: Lin Ma <lma@suse.com>
---
 src/qemu/qemu_monitor.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c
index ea3e62dc8e..a8344e698b 100644
--- a/src/qemu/qemu_monitor.c
+++ b/src/qemu/qemu_monitor.c
@@ -994,7 +994,7 @@ qemuMonitorSend(qemuMonitorPtr mon,
           "mon=%p msg=%s fd=%d",
           mon, mon->msg->txBuffer, mon->msg->txFD);
 
-    while (!mon->msg->finished) {
+    while (mon->msg && !mon->msg->finished) {
         if (virCondWait(&mon->notify, &mon->parent.lock) < 0) {
             virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
                            _("Unable to wait on monitor condition"));
-- 
2.23.0


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] qemu: snapshot: Fix libvirtd crash in snapshot-revert
Posted by Peter Krempa 4 years, 4 months ago
On Fri, Dec 13, 2019 at 15:47:36 +0800, Lin Ma wrote:
> When reverting a running domain to a snapshot(active state), We need to
> use the FORCE flag for snapshot-revert if current domain configuration
> is different from the target domain configuration, and this will start a
> new qemu instance for the target domain.
> 
> In this situation, if there is existing connection to the domain, say
> Spice or VNC through virt-manager, Then the libvirtd would crash during
> snapshot revert because: Both of snapshot revert worker and new worker
> job 'remoteDispatchDomainOpenGraphicsFd' are waiting for mon->msg->finished
> in qemuMonitorSend(), We know if IO process resulted in an error with a
> message, Libvirtd main thread calls qemuMonitorIO() to wakeup the waiter.
> Then mon->msg will be set to NULL in qemuMonitorSend() once the worker
> GraphicsFD is woken up, which causes snapshot revert worker dereferences
> this null pointer.

[....]

> Signed-off-by: Lin Ma <lma@suse.com>
> ---
>  src/qemu/qemu_monitor.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c
> index ea3e62dc8e..a8344e698b 100644
> --- a/src/qemu/qemu_monitor.c
> +++ b/src/qemu/qemu_monitor.c
> @@ -994,7 +994,7 @@ qemuMonitorSend(qemuMonitorPtr mon,
>            "mon=%p msg=%s fd=%d",
>            mon, mon->msg->txBuffer, mon->msg->txFD);
>  
> -    while (!mon->msg->finished) {
> +    while (mon->msg && !mon->msg->finished) {

This fixes only the symptom. The actual problem is in handling of our
job state when restarting the qemu process:

Please see the following patches which aim to fix the same problem:

https://www.redhat.com/archives/libvir-list/2019-December/msg00663.html

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] 回复: [PATCH] qemu: snapshot: Fix libvirtd crash in snapshot-revert
Posted by Lin Ma 4 years, 4 months ago

> -----邮件原件-----
> 发件人: Peter Krempa <pkrempa@redhat.com>
> 发送时间: 2019年12月13日 15:56
> 收件人: Lin Ma <LMa@suse.com>
> 抄送: libvir-list@redhat.com; berrange@redhat.com
> 主题: Re: [libvirt] [PATCH] qemu: snapshot: Fix libvirtd crash in snapshot-revert
> 
> On Fri, Dec 13, 2019 at 15:47:36 +0800, Lin Ma wrote:
> > When reverting a running domain to a snapshot(active state), We need
> > to use the FORCE flag for snapshot-revert if current domain
> > configuration is different from the target domain configuration, and
> > this will start a new qemu instance for the target domain.
> >
> > In this situation, if there is existing connection to the domain, say
> > Spice or VNC through virt-manager, Then the libvirtd would crash
> > during snapshot revert because: Both of snapshot revert worker and new
> > worker job 'remoteDispatchDomainOpenGraphicsFd' are waiting for
> > mon->msg->finished in qemuMonitorSend(), We know if IO process
> > resulted in an error with a message, Libvirtd main thread calls
> qemuMonitorIO() to wakeup the waiter.
> > Then mon->msg will be set to NULL in qemuMonitorSend() once the worker
> > GraphicsFD is woken up, which causes snapshot revert worker
> > dereferences this null pointer.
> 
> [....]
> 
> > Signed-off-by: Lin Ma <lma@suse.com>
> > ---
> >  src/qemu/qemu_monitor.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index
> > ea3e62dc8e..a8344e698b 100644
> > --- a/src/qemu/qemu_monitor.c
> > +++ b/src/qemu/qemu_monitor.c
> > @@ -994,7 +994,7 @@ qemuMonitorSend(qemuMonitorPtr mon,
> >            "mon=%p msg=%s fd=%d",
> >            mon, mon->msg->txBuffer, mon->msg->txFD);
> >
> > -    while (!mon->msg->finished) {
> > +    while (mon->msg && !mon->msg->finished) {
> 
> This fixes only the symptom. The actual problem is in handling of our job state
> when restarting the qemu process:
> 
> Please see the following patches which aim to fix the same problem:
> 
> https://www.redhat.com/archives/libvir-list/2019-December/msg00663.html

Indeed, It fixed the root cause, Thanks for your information.

Lin

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: snapshot: Fix libvirtd crash in snapshot-revert
Posted by Michal Privoznik 4 years, 4 months ago
On 12/13/19 8:55 AM, Peter Krempa wrote:
> On Fri, Dec 13, 2019 at 15:47:36 +0800, Lin Ma wrote:
>> When reverting a running domain to a snapshot(active state), We need to
>> use the FORCE flag for snapshot-revert if current domain configuration
>> is different from the target domain configuration, and this will start a
>> new qemu instance for the target domain.
>>
>> In this situation, if there is existing connection to the domain, say
>> Spice or VNC through virt-manager, Then the libvirtd would crash during
>> snapshot revert because: Both of snapshot revert worker and new worker
>> job 'remoteDispatchDomainOpenGraphicsFd' are waiting for mon->msg->finished
>> in qemuMonitorSend(), We know if IO process resulted in an error with a
>> message, Libvirtd main thread calls qemuMonitorIO() to wakeup the waiter.
>> Then mon->msg will be set to NULL in qemuMonitorSend() once the worker
>> GraphicsFD is woken up, which causes snapshot revert worker dereferences
>> this null pointer.
> 
> [....]
> 
>> Signed-off-by: Lin Ma <lma@suse.com>
>> ---
>>   src/qemu/qemu_monitor.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c
>> index ea3e62dc8e..a8344e698b 100644
>> --- a/src/qemu/qemu_monitor.c
>> +++ b/src/qemu/qemu_monitor.c
>> @@ -994,7 +994,7 @@ qemuMonitorSend(qemuMonitorPtr mon,
>>             "mon=%p msg=%s fd=%d",
>>             mon, mon->msg->txBuffer, mon->msg->txFD);
>>   
>> -    while (!mon->msg->finished) {
>> +    while (mon->msg && !mon->msg->finished) {
> 
> This fixes only the symptom. The actual problem is in handling of our
> job state when restarting the qemu process:
> 
> Please see the following patches which aim to fix the same problem:
> 
> https://www.redhat.com/archives/libvir-list/2019-December/msg00663.html

In fact, I've pushed the patch yesterday:

   d75f865fb9 qemu: fix concurrency crash bug in snapshot revert

Does it fix the problem you're seeing?

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] 回复: [PATCH] qemu: snapshot: Fix libvirtd crash in snapshot-revert
Posted by Lin Ma 4 years, 4 months ago

> -----邮件原件-----
> 发件人: Michal Privoznik <mprivozn@redhat.com>
> 发送时间: 2019年12月13日 17:30
> 收件人: Peter Krempa <pkrempa@redhat.com>; Lin Ma <LMa@suse.com>
> 抄送: libvir-list@redhat.com
> 主题: Re: [libvirt] [PATCH] qemu: snapshot: Fix libvirtd crash in snapshot-revert
> 
> On 12/13/19 8:55 AM, Peter Krempa wrote:
> > On Fri, Dec 13, 2019 at 15:47:36 +0800, Lin Ma wrote:
> >> When reverting a running domain to a snapshot(active state), We need
> >> to use the FORCE flag for snapshot-revert if current domain
> >> configuration is different from the target domain configuration, and
> >> this will start a new qemu instance for the target domain.
> >>
> >> In this situation, if there is existing connection to the domain, say
> >> Spice or VNC through virt-manager, Then the libvirtd would crash
> >> during snapshot revert because: Both of snapshot revert worker and
> >> new worker job 'remoteDispatchDomainOpenGraphicsFd' are waiting for
> >> mon->msg->finished in qemuMonitorSend(), We know if IO process
> >> resulted in an error with a message, Libvirtd main thread calls
> qemuMonitorIO() to wakeup the waiter.
> >> Then mon->msg will be set to NULL in qemuMonitorSend() once the
> >> worker GraphicsFD is woken up, which causes snapshot revert worker
> >> dereferences this null pointer.
> >
> > [....]
> >
> >> Signed-off-by: Lin Ma <lma@suse.com>
> >> ---
> >>   src/qemu/qemu_monitor.c | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index
> >> ea3e62dc8e..a8344e698b 100644
> >> --- a/src/qemu/qemu_monitor.c
> >> +++ b/src/qemu/qemu_monitor.c
> >> @@ -994,7 +994,7 @@ qemuMonitorSend(qemuMonitorPtr mon,
> >>             "mon=%p msg=%s fd=%d",
> >>             mon, mon->msg->txBuffer, mon->msg->txFD);
> >>
> >> -    while (!mon->msg->finished) {
> >> +    while (mon->msg && !mon->msg->finished) {
> >
> > This fixes only the symptom. The actual problem is in handling of our
> > job state when restarting the qemu process:
> >
> > Please see the following patches which aim to fix the same problem:
> >
> > https://www.redhat.com/archives/libvir-list/2019-December/msg00663.htm
> > l
> 
> In fact, I've pushed the patch yesterday:
> 
>    d75f865fb9 qemu: fix concurrency crash bug in snapshot revert
> 
> Does it fix the problem you're seeing?

Yes, It does fix the problem.

Thanks,
Lin

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list