[libvirt] [PATCH v2]daemon: Fix a crash during virNetlinkEventServiceStopAll

Liu Haitao posted 1 patch 4 years, 10 months ago
Test syntax-check passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/1560323932-91229-1-git-send-email-haitao.liu@windriver.com
src/remote/remote_daemon.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
[libvirt] [PATCH v2]daemon: Fix a crash during virNetlinkEventServiceStopAll
Posted by Liu Haitao 4 years, 10 months ago
When reboot the host, a core dump file would be generated.

The call traces are:

Note.In this case, the  main thread is thread 5.
                
(gdb) thread 5
[Switching to thread 5 (LWP 4142)]
(gdb) bt
0  0x00007f00a6838273 in futex_wait_cancelable (private=<optimized out>, 
    expected=0, futex_word=0x7f004c0125c0)
    at /usr/src/debug/glibc/2.24-r0/git/sysdeps/unix/sysv/linux/futex-internal.h:88
1  __pthread_cond_wait_common (abstime=0x0, mutex=0x7f004c012540, 
    cond=0x7f004c012598)
    at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_cond_wait.c:502
2  __pthread_cond_wait (cond=0x7f004c012598, mutex=0x7f004c012540)
    at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_cond_wait.c:655
3  0x00007f00aa467246 in virCondWait (c=<optimized out>, m=<optimized out>)
    at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthread.c:154
4  0x00007f00aa467eb0 in virThreadPoolFree (pool=<optimized out>)
    at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthreadpool.c:286
5  0x00007f0074349f9d in qemuStateCleanup ()
    at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_driver.c:1036
6  0x00007f00aa5e9486 in virStateCleanup ()
    at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/libvirt.c:682
7  0x000055a687ab86a4 in main (argc=<optimized out>, argv=<optimized out>)
    at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/remote/remote_daemon.c:1473

(gdb) thread 1
[Switching to thread 1 (LWP 4403)]
(gdb) bt
0  __GI___pthread_mutex_lock (mutex=mutex@entry=0x0)
    at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_mutex_lock.c:67
1  0x00007f00aa467165 in virMutexLock (m=m@entry=0x0)
    at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthread.c:89
2  0x00007f00aa43c0f9 in virNetlinkEventServerLock (driver=<optimized out>)
    at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virnetlink.c:799
3  virNetlinkEventRemoveClient (watch=watch@entry=0, 
    macaddr=macaddr@entry=0x7f0088014944, protocol=protocol@entry=0)
    at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virnetlink.c:1197
4  0x00007f00aa4341df in virNetDevMacVLanDeleteWithVPortProfile (
    ifname=<optimized out>, macaddr=macaddr@entry=0x7f0088014944, 
    linkdev=0x7f0088014920 "eth1", mode=mode@entry=1, 
    virtPortProfile=virtPortProfile@entry=0x0, 
    stateDir=stateDir@entry=0x7f004c12fa90 "/var/run/libvirt/qemu")
    at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virnetdevmacvlan.c:1112
5  0x00007f0074312251 in qemuProcessStop (driver=driver@entry=0x7f004c0ecef0, 
    vm=vm@entry=0x7f0088000b00, 
    reason=reason@entry=VIR_DOMAIN_SHUTOFF_SHUTDOWN, 
    asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_NONE, flags=<optimized out>)
    at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_process.c:7291
6  0x00007f007437a5ea in processMonitorEOFEvent (vm=0x7f0088000b00, driver=0x7f004c0ecef0)
    at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_driver.c:4756
7  qemuProcessEventHandler (data=0x55a687d6df10, opaque=0x7f004c0ecef0)
    at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_driver.c:4859
8  0x00007f00aa467c5b in virThreadPoolWorker (
    opaque=opaque@entry=0x55a687d6c110)
    at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthreadpool.c:163
9  0x00007f00aa466fe8 in virThreadHelper (data=<optimized out>)
    at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthread.c:206
10 0x00007f00a68323f4 in start_thread (arg=0x7f00699df700)
    at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_create.c:456
11 0x00007f00a616e10f in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105


1. The execution flow of main thread (Thread 5 LWP 4142):
main()
  -->virNetDaemonRun()
  -->virNetDaemonClose(dmn)  //cleanup 
  -->virNetlinkEventServiceStopAll() 
  -->virStateCleanup() 
	 -->qemuStateCleanup() 
	   -->virThreadPoolFree()
	     -->__pthread_cond_wait()

virNetDaemonRun()
    -->virEventRunDefaultImpl
      -->virEventPollRunOnce
       -->virEventPollDispatchHandles
        -->qemuMonitorIO
          -->qemuProcessHandleMonitorEOF
            -->processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF
             -->virThreadPoolSendJob()

After typing reboot command on the host, the main thread would send an event message to another thread. 
Here it would let thread 1 to handle the shutdown of qemu process. But it could
not be executed immediately.

virNetlinkEventServiceStopAll() 
	--> virNetlinkEventServiceStop()
	  --> server[protocol] = NULL;   // set server to null 

IN virNetlinkEventServiceStopAll(), some variables related to network are freed,
like (static virNetlinkEventSrvPrivatePtr server).

virStateCleanup() 
	-->qemuStateCleanup() 
	   -->virThreadPoolFree()
	     -->__pthread_cond_wait()

In virThreadPoolFree() it will wait other thread to end up. 

2. The execution flow of thread 5 (LWP 4403):
qemuProcessStop()
   -->virNetDevMacVLanDeleteWithVPortProfile()
	  -->virNetlinkEventRemoveClient()
	     --> srv = server[protocol]


Although the main thread had sent the message to thread 1(4403), it could not be
run instantly. It means that the  virNetlinkEventServiceStopAll() might be
executed earlier than virNetlinkEventRemoveClient(). We could get it from the log file.

""
2019-06-12 00:10:09.230+0000: 4142: info : virNetlinkEventServiceStopAll:941 : stopping all netlink event services
2019-06-12 00:10:09.230+0000: 4142: info : virNetlinkEventServiceStop:904 : stopping netlink event service
2019-06-12 00:10:21.165+0000: 4403: debug : virNetlinkEventRemoveClient:1190 : removing client watch=0, mac=0x7f0088014944.
"

In virNetlinkEventRemoveClient() the variable server is used again, but now it
is null that is freed by virNetlinkEventServiceStopAll().So it would case a crash .

The virNetlinkEventServiceStopAll() should be executed behind virStateCleanup(),

Signed-off-by: Liu Haitao <haitao.liu@windriver.com>
---
 src/remote/remote_daemon.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/remote/remote_daemon.c b/src/remote/remote_daemon.c
index c3782971f1..7da20a6644 100644
--- a/src/remote/remote_daemon.c
+++ b/src/remote/remote_daemon.c
@@ -1464,8 +1464,6 @@ int main(int argc, char **argv) {
     /* Keep cleanup order in inverse order of startup */
     virNetDaemonClose(dmn);
 
-    virNetlinkEventServiceStopAll();
-
     if (driversInitialized) {
         /* NB: Possible issue with timing window between driversInitialized
          * setting if virNetlinkEventServerStart fails */
@@ -1473,6 +1471,8 @@ int main(int argc, char **argv) {
         virStateCleanup();
     }
 
+    virNetlinkEventServiceStopAll();
+
     virObjectUnref(adminProgram);
     virObjectUnref(srvAdm);
     virObjectUnref(qemuProgram);
-- 
2.21.0

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2]daemon: Fix a crash during virNetlinkEventServiceStopAll
Posted by Haitaoliu 4 years, 10 months ago
The steps to reproduce:

1. create a xml file including the following configuration.

  <interface type='direct'>
       <mac address='46:Ab:5f:98:8E:cB'/>
       <source dev='eth0' mode='private'/>
       <model type='virtio'/>
       <driver name='vhost'/>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x04' 
function='0x0' />
     </interface>
     <interface type='bridge'>
       <source bridge='ctrlbr0'/>
       <target dev='vnet1'/>
       <model type='virtio'/>
       <driver name='vhost'/>
       <link state='up'/>
     <address type='pci' domain='0x0000' bus='0x00' slot='0x05' 
function='0x0'/>
     </interface>

2.  ensure you systemd  has supported core dump function

3. create and start vm with virsh command

sysctl -w kernel.core_pattern=/core.%u.%e.%p

brctl addbr ctrlbr0
virsh create vm.xml


4. when the vm has started up completely, reboot the host with the 
following command

reboot -d


Note,  you must set one real  Ethernet port(eth0,or eth1) to a direct 
mode in xml file.


On 2019/6/12 15:18, Liu Haitao wrote:
> When reboot the host, a core dump file would be generated.
>
> The call traces are:
>
> Note.In this case, the  main thread is thread 5.
>                  
> (gdb) thread 5
> [Switching to thread 5 (LWP 4142)]
> (gdb) bt
> 0  0x00007f00a6838273 in futex_wait_cancelable (private=<optimized out>,
>      expected=0, futex_word=0x7f004c0125c0)
>      at /usr/src/debug/glibc/2.24-r0/git/sysdeps/unix/sysv/linux/futex-internal.h:88
> 1  __pthread_cond_wait_common (abstime=0x0, mutex=0x7f004c012540,
>      cond=0x7f004c012598)
>      at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_cond_wait.c:502
> 2  __pthread_cond_wait (cond=0x7f004c012598, mutex=0x7f004c012540)
>      at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_cond_wait.c:655
> 3  0x00007f00aa467246 in virCondWait (c=<optimized out>, m=<optimized out>)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthread.c:154
> 4  0x00007f00aa467eb0 in virThreadPoolFree (pool=<optimized out>)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthreadpool.c:286
> 5  0x00007f0074349f9d in qemuStateCleanup ()
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_driver.c:1036
> 6  0x00007f00aa5e9486 in virStateCleanup ()
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/libvirt.c:682
> 7  0x000055a687ab86a4 in main (argc=<optimized out>, argv=<optimized out>)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/remote/remote_daemon.c:1473
>
> (gdb) thread 1
> [Switching to thread 1 (LWP 4403)]
> (gdb) bt
> 0  __GI___pthread_mutex_lock (mutex=mutex@entry=0x0)
>      at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_mutex_lock.c:67
> 1  0x00007f00aa467165 in virMutexLock (m=m@entry=0x0)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthread.c:89
> 2  0x00007f00aa43c0f9 in virNetlinkEventServerLock (driver=<optimized out>)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virnetlink.c:799
> 3  virNetlinkEventRemoveClient (watch=watch@entry=0,
>      macaddr=macaddr@entry=0x7f0088014944, protocol=protocol@entry=0)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virnetlink.c:1197
> 4  0x00007f00aa4341df in virNetDevMacVLanDeleteWithVPortProfile (
>      ifname=<optimized out>, macaddr=macaddr@entry=0x7f0088014944,
>      linkdev=0x7f0088014920 "eth1", mode=mode@entry=1,
>      virtPortProfile=virtPortProfile@entry=0x0,
>      stateDir=stateDir@entry=0x7f004c12fa90 "/var/run/libvirt/qemu")
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virnetdevmacvlan.c:1112
> 5  0x00007f0074312251 in qemuProcessStop (driver=driver@entry=0x7f004c0ecef0,
>      vm=vm@entry=0x7f0088000b00,
>      reason=reason@entry=VIR_DOMAIN_SHUTOFF_SHUTDOWN,
>      asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_NONE, flags=<optimized out>)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_process.c:7291
> 6  0x00007f007437a5ea in processMonitorEOFEvent (vm=0x7f0088000b00, driver=0x7f004c0ecef0)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_driver.c:4756
> 7  qemuProcessEventHandler (data=0x55a687d6df10, opaque=0x7f004c0ecef0)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_driver.c:4859
> 8  0x00007f00aa467c5b in virThreadPoolWorker (
>      opaque=opaque@entry=0x55a687d6c110)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthreadpool.c:163
> 9  0x00007f00aa466fe8 in virThreadHelper (data=<optimized out>)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthread.c:206
> 10 0x00007f00a68323f4 in start_thread (arg=0x7f00699df700)
>      at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_create.c:456
> 11 0x00007f00a616e10f in clone ()
>      at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
>
>
> 1. The execution flow of main thread (Thread 5 LWP 4142):
> main()
>    -->virNetDaemonRun()
>    -->virNetDaemonClose(dmn)  //cleanup
>    -->virNetlinkEventServiceStopAll()
>    -->virStateCleanup()
> 	 -->qemuStateCleanup()
> 	   -->virThreadPoolFree()
> 	     -->__pthread_cond_wait()
>
> virNetDaemonRun()
>      -->virEventRunDefaultImpl
>        -->virEventPollRunOnce
>         -->virEventPollDispatchHandles
>          -->qemuMonitorIO
>            -->qemuProcessHandleMonitorEOF
>              -->processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF
>               -->virThreadPoolSendJob()
>
> After typing reboot command on the host, the main thread would send an event message to another thread.
> Here it would let thread 1 to handle the shutdown of qemu process. But it could
> not be executed immediately.
>
> virNetlinkEventServiceStopAll()
> 	--> virNetlinkEventServiceStop()
> 	  --> server[protocol] = NULL;   // set server to null
>
> IN virNetlinkEventServiceStopAll(), some variables related to network are freed,
> like (static virNetlinkEventSrvPrivatePtr server).
>
> virStateCleanup()
> 	-->qemuStateCleanup()
> 	   -->virThreadPoolFree()
> 	     -->__pthread_cond_wait()
>
> In virThreadPoolFree() it will wait other thread to end up.
>
> 2. The execution flow of thread 5 (LWP 4403):
> qemuProcessStop()
>     -->virNetDevMacVLanDeleteWithVPortProfile()
> 	  -->virNetlinkEventRemoveClient()
> 	     --> srv = server[protocol]
>
>
> Although the main thread had sent the message to thread 1(4403), it could not be
> run instantly. It means that the  virNetlinkEventServiceStopAll() might be
> executed earlier than virNetlinkEventRemoveClient(). We could get it from the log file.
>
> ""
> 2019-06-12 00:10:09.230+0000: 4142: info : virNetlinkEventServiceStopAll:941 : stopping all netlink event services
> 2019-06-12 00:10:09.230+0000: 4142: info : virNetlinkEventServiceStop:904 : stopping netlink event service
> 2019-06-12 00:10:21.165+0000: 4403: debug : virNetlinkEventRemoveClient:1190 : removing client watch=0, mac=0x7f0088014944.
> "
>
> In virNetlinkEventRemoveClient() the variable server is used again, but now it
> is null that is freed by virNetlinkEventServiceStopAll().So it would case a crash .
>
> The virNetlinkEventServiceStopAll() should be executed behind virStateCleanup(),
>
> Signed-off-by: Liu Haitao <haitao.liu@windriver.com>
> ---
>   src/remote/remote_daemon.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/src/remote/remote_daemon.c b/src/remote/remote_daemon.c
> index c3782971f1..7da20a6644 100644
> --- a/src/remote/remote_daemon.c
> +++ b/src/remote/remote_daemon.c
> @@ -1464,8 +1464,6 @@ int main(int argc, char **argv) {
>       /* Keep cleanup order in inverse order of startup */
>       virNetDaemonClose(dmn);
>   
> -    virNetlinkEventServiceStopAll();
> -
>       if (driversInitialized) {
>           /* NB: Possible issue with timing window between driversInitialized
>            * setting if virNetlinkEventServerStart fails */
> @@ -1473,6 +1471,8 @@ int main(int argc, char **argv) {
>           virStateCleanup();
>       }
>   
> +    virNetlinkEventServiceStopAll();
> +
>       virObjectUnref(adminProgram);
>       virObjectUnref(srvAdm);
>       virObjectUnref(qemuProgram);

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2]daemon: Fix a crash during virNetlinkEventServiceStopAll
Posted by Haitaoliu 4 years, 10 months ago
Hi guys,

Could you help me to review it ?

thanks,

haitao

On 2019/6/12 15:18, Liu Haitao wrote:
> When reboot the host, a core dump file would be generated.
>
> The call traces are:
>
> Note.In this case, the  main thread is thread 5.
>                  
> (gdb) thread 5
> [Switching to thread 5 (LWP 4142)]
> (gdb) bt
> 0  0x00007f00a6838273 in futex_wait_cancelable (private=<optimized out>,
>      expected=0, futex_word=0x7f004c0125c0)
>      at /usr/src/debug/glibc/2.24-r0/git/sysdeps/unix/sysv/linux/futex-internal.h:88
> 1  __pthread_cond_wait_common (abstime=0x0, mutex=0x7f004c012540,
>      cond=0x7f004c012598)
>      at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_cond_wait.c:502
> 2  __pthread_cond_wait (cond=0x7f004c012598, mutex=0x7f004c012540)
>      at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_cond_wait.c:655
> 3  0x00007f00aa467246 in virCondWait (c=<optimized out>, m=<optimized out>)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthread.c:154
> 4  0x00007f00aa467eb0 in virThreadPoolFree (pool=<optimized out>)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthreadpool.c:286
> 5  0x00007f0074349f9d in qemuStateCleanup ()
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_driver.c:1036
> 6  0x00007f00aa5e9486 in virStateCleanup ()
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/libvirt.c:682
> 7  0x000055a687ab86a4 in main (argc=<optimized out>, argv=<optimized out>)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/remote/remote_daemon.c:1473
>
> (gdb) thread 1
> [Switching to thread 1 (LWP 4403)]
> (gdb) bt
> 0  __GI___pthread_mutex_lock (mutex=mutex@entry=0x0)
>      at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_mutex_lock.c:67
> 1  0x00007f00aa467165 in virMutexLock (m=m@entry=0x0)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthread.c:89
> 2  0x00007f00aa43c0f9 in virNetlinkEventServerLock (driver=<optimized out>)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virnetlink.c:799
> 3  virNetlinkEventRemoveClient (watch=watch@entry=0,
>      macaddr=macaddr@entry=0x7f0088014944, protocol=protocol@entry=0)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virnetlink.c:1197
> 4  0x00007f00aa4341df in virNetDevMacVLanDeleteWithVPortProfile (
>      ifname=<optimized out>, macaddr=macaddr@entry=0x7f0088014944,
>      linkdev=0x7f0088014920 "eth1", mode=mode@entry=1,
>      virtPortProfile=virtPortProfile@entry=0x0,
>      stateDir=stateDir@entry=0x7f004c12fa90 "/var/run/libvirt/qemu")
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virnetdevmacvlan.c:1112
> 5  0x00007f0074312251 in qemuProcessStop (driver=driver@entry=0x7f004c0ecef0,
>      vm=vm@entry=0x7f0088000b00,
>      reason=reason@entry=VIR_DOMAIN_SHUTOFF_SHUTDOWN,
>      asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_NONE, flags=<optimized out>)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_process.c:7291
> 6  0x00007f007437a5ea in processMonitorEOFEvent (vm=0x7f0088000b00, driver=0x7f004c0ecef0)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_driver.c:4756
> 7  qemuProcessEventHandler (data=0x55a687d6df10, opaque=0x7f004c0ecef0)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_driver.c:4859
> 8  0x00007f00aa467c5b in virThreadPoolWorker (
>      opaque=opaque@entry=0x55a687d6c110)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthreadpool.c:163
> 9  0x00007f00aa466fe8 in virThreadHelper (data=<optimized out>)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthread.c:206
> 10 0x00007f00a68323f4 in start_thread (arg=0x7f00699df700)
>      at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_create.c:456
> 11 0x00007f00a616e10f in clone ()
>      at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
>
>
> 1. The execution flow of main thread (Thread 5 LWP 4142):
> main()
>    -->virNetDaemonRun()
>    -->virNetDaemonClose(dmn)  //cleanup
>    -->virNetlinkEventServiceStopAll()
>    -->virStateCleanup()
> 	 -->qemuStateCleanup()
> 	   -->virThreadPoolFree()
> 	     -->__pthread_cond_wait()
>
> virNetDaemonRun()
>      -->virEventRunDefaultImpl
>        -->virEventPollRunOnce
>         -->virEventPollDispatchHandles
>          -->qemuMonitorIO
>            -->qemuProcessHandleMonitorEOF
>              -->processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF
>               -->virThreadPoolSendJob()
>
> After typing reboot command on the host, the main thread would send an event message to another thread.
> Here it would let thread 1 to handle the shutdown of qemu process. But it could
> not be executed immediately.
>
> virNetlinkEventServiceStopAll()
> 	--> virNetlinkEventServiceStop()
> 	  --> server[protocol] = NULL;   // set server to null
>
> IN virNetlinkEventServiceStopAll(), some variables related to network are freed,
> like (static virNetlinkEventSrvPrivatePtr server).
>
> virStateCleanup()
> 	-->qemuStateCleanup()
> 	   -->virThreadPoolFree()
> 	     -->__pthread_cond_wait()
>
> In virThreadPoolFree() it will wait other thread to end up.
>
> 2. The execution flow of thread 5 (LWP 4403):
> qemuProcessStop()
>     -->virNetDevMacVLanDeleteWithVPortProfile()
> 	  -->virNetlinkEventRemoveClient()
> 	     --> srv = server[protocol]
>
>
> Although the main thread had sent the message to thread 1(4403), it could not be
> run instantly. It means that the  virNetlinkEventServiceStopAll() might be
> executed earlier than virNetlinkEventRemoveClient(). We could get it from the log file.
>
> ""
> 2019-06-12 00:10:09.230+0000: 4142: info : virNetlinkEventServiceStopAll:941 : stopping all netlink event services
> 2019-06-12 00:10:09.230+0000: 4142: info : virNetlinkEventServiceStop:904 : stopping netlink event service
> 2019-06-12 00:10:21.165+0000: 4403: debug : virNetlinkEventRemoveClient:1190 : removing client watch=0, mac=0x7f0088014944.
> "
>
> In virNetlinkEventRemoveClient() the variable server is used again, but now it
> is null that is freed by virNetlinkEventServiceStopAll().So it would case a crash .
>
> The virNetlinkEventServiceStopAll() should be executed behind virStateCleanup(),
>
> Signed-off-by: Liu Haitao <haitao.liu@windriver.com>
> ---
>   src/remote/remote_daemon.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/src/remote/remote_daemon.c b/src/remote/remote_daemon.c
> index c3782971f1..7da20a6644 100644
> --- a/src/remote/remote_daemon.c
> +++ b/src/remote/remote_daemon.c
> @@ -1464,8 +1464,6 @@ int main(int argc, char **argv) {
>       /* Keep cleanup order in inverse order of startup */
>       virNetDaemonClose(dmn);
>   
> -    virNetlinkEventServiceStopAll();
> -
>       if (driversInitialized) {
>           /* NB: Possible issue with timing window between driversInitialized
>            * setting if virNetlinkEventServerStart fails */
> @@ -1473,6 +1471,8 @@ int main(int argc, char **argv) {
>           virStateCleanup();
>       }
>   
> +    virNetlinkEventServiceStopAll();
> +
>       virObjectUnref(adminProgram);
>       virObjectUnref(srvAdm);
>       virObjectUnref(qemuProgram);

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2]daemon: Fix a crash during virNetlinkEventServiceStopAll
Posted by Haitaoliu 4 years, 8 months ago
HI peter,

Could you help me to review it ?  I  sent it about two months ago.

thanks,

haitao

On 6/12/19 3:18 PM, Liu Haitao wrote:
> When reboot the host, a core dump file would be generated.
>
> The call traces are:
>
> Note.In this case, the  main thread is thread 5.
>                  
> (gdb) thread 5
> [Switching to thread 5 (LWP 4142)]
> (gdb) bt
> 0  0x00007f00a6838273 in futex_wait_cancelable (private=<optimized out>,
>      expected=0, futex_word=0x7f004c0125c0)
>      at /usr/src/debug/glibc/2.24-r0/git/sysdeps/unix/sysv/linux/futex-internal.h:88
> 1  __pthread_cond_wait_common (abstime=0x0, mutex=0x7f004c012540,
>      cond=0x7f004c012598)
>      at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_cond_wait.c:502
> 2  __pthread_cond_wait (cond=0x7f004c012598, mutex=0x7f004c012540)
>      at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_cond_wait.c:655
> 3  0x00007f00aa467246 in virCondWait (c=<optimized out>, m=<optimized out>)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthread.c:154
> 4  0x00007f00aa467eb0 in virThreadPoolFree (pool=<optimized out>)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthreadpool.c:286
> 5  0x00007f0074349f9d in qemuStateCleanup ()
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_driver.c:1036
> 6  0x00007f00aa5e9486 in virStateCleanup ()
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/libvirt.c:682
> 7  0x000055a687ab86a4 in main (argc=<optimized out>, argv=<optimized out>)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/remote/remote_daemon.c:1473
>
> (gdb) thread 1
> [Switching to thread 1 (LWP 4403)]
> (gdb) bt
> 0  __GI___pthread_mutex_lock (mutex=mutex@entry=0x0)
>      at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_mutex_lock.c:67
> 1  0x00007f00aa467165 in virMutexLock (m=m@entry=0x0)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthread.c:89
> 2  0x00007f00aa43c0f9 in virNetlinkEventServerLock (driver=<optimized out>)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virnetlink.c:799
> 3  virNetlinkEventRemoveClient (watch=watch@entry=0,
>      macaddr=macaddr@entry=0x7f0088014944, protocol=protocol@entry=0)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virnetlink.c:1197
> 4  0x00007f00aa4341df in virNetDevMacVLanDeleteWithVPortProfile (
>      ifname=<optimized out>, macaddr=macaddr@entry=0x7f0088014944,
>      linkdev=0x7f0088014920 "eth1", mode=mode@entry=1,
>      virtPortProfile=virtPortProfile@entry=0x0,
>      stateDir=stateDir@entry=0x7f004c12fa90 "/var/run/libvirt/qemu")
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virnetdevmacvlan.c:1112
> 5  0x00007f0074312251 in qemuProcessStop (driver=driver@entry=0x7f004c0ecef0,
>      vm=vm@entry=0x7f0088000b00,
>      reason=reason@entry=VIR_DOMAIN_SHUTOFF_SHUTDOWN,
>      asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_NONE, flags=<optimized out>)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_process.c:7291
> 6  0x00007f007437a5ea in processMonitorEOFEvent (vm=0x7f0088000b00, driver=0x7f004c0ecef0)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_driver.c:4756
> 7  qemuProcessEventHandler (data=0x55a687d6df10, opaque=0x7f004c0ecef0)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_driver.c:4859
> 8  0x00007f00aa467c5b in virThreadPoolWorker (
>      opaque=opaque@entry=0x55a687d6c110)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthreadpool.c:163
> 9  0x00007f00aa466fe8 in virThreadHelper (data=<optimized out>)
>      at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthread.c:206
> 10 0x00007f00a68323f4 in start_thread (arg=0x7f00699df700)
>      at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_create.c:456
> 11 0x00007f00a616e10f in clone ()
>      at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
>
>
> 1. The execution flow of main thread (Thread 5 LWP 4142):
> main()
>    -->virNetDaemonRun()
>    -->virNetDaemonClose(dmn)  //cleanup
>    -->virNetlinkEventServiceStopAll()
>    -->virStateCleanup()
> 	 -->qemuStateCleanup()
> 	   -->virThreadPoolFree()
> 	     -->__pthread_cond_wait()
>
> virNetDaemonRun()
>      -->virEventRunDefaultImpl
>        -->virEventPollRunOnce
>         -->virEventPollDispatchHandles
>          -->qemuMonitorIO
>            -->qemuProcessHandleMonitorEOF
>              -->processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF
>               -->virThreadPoolSendJob()
>
> After typing reboot command on the host, the main thread would send an event message to another thread.
> Here it would let thread 1 to handle the shutdown of qemu process. But it could
> not be executed immediately.
>
> virNetlinkEventServiceStopAll()
> 	--> virNetlinkEventServiceStop()
> 	  --> server[protocol] = NULL;   // set server to null
>
> IN virNetlinkEventServiceStopAll(), some variables related to network are freed,
> like (static virNetlinkEventSrvPrivatePtr server).
>
> virStateCleanup()
> 	-->qemuStateCleanup()
> 	   -->virThreadPoolFree()
> 	     -->__pthread_cond_wait()
>
> In virThreadPoolFree() it will wait other thread to end up.
>
> 2. The execution flow of thread 5 (LWP 4403):
> qemuProcessStop()
>     -->virNetDevMacVLanDeleteWithVPortProfile()
> 	  -->virNetlinkEventRemoveClient()
> 	     --> srv = server[protocol]
>
>
> Although the main thread had sent the message to thread 1(4403), it could not be
> run instantly. It means that the  virNetlinkEventServiceStopAll() might be
> executed earlier than virNetlinkEventRemoveClient(). We could get it from the log file.
>
> ""
> 2019-06-12 00:10:09.230+0000: 4142: info : virNetlinkEventServiceStopAll:941 : stopping all netlink event services
> 2019-06-12 00:10:09.230+0000: 4142: info : virNetlinkEventServiceStop:904 : stopping netlink event service
> 2019-06-12 00:10:21.165+0000: 4403: debug : virNetlinkEventRemoveClient:1190 : removing client watch=0, mac=0x7f0088014944.
> "
>
> In virNetlinkEventRemoveClient() the variable server is used again, but now it
> is null that is freed by virNetlinkEventServiceStopAll().So it would case a crash .
>
> The virNetlinkEventServiceStopAll() should be executed behind virStateCleanup(),
>
> Signed-off-by: Liu Haitao <haitao.liu@windriver.com>
> ---
>   src/remote/remote_daemon.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/src/remote/remote_daemon.c b/src/remote/remote_daemon.c
> index c3782971f1..7da20a6644 100644
> --- a/src/remote/remote_daemon.c
> +++ b/src/remote/remote_daemon.c
> @@ -1464,8 +1464,6 @@ int main(int argc, char **argv) {
>       /* Keep cleanup order in inverse order of startup */
>       virNetDaemonClose(dmn);
>   
> -    virNetlinkEventServiceStopAll();
> -
>       if (driversInitialized) {
>           /* NB: Possible issue with timing window between driversInitialized
>            * setting if virNetlinkEventServerStart fails */
> @@ -1473,6 +1471,8 @@ int main(int argc, char **argv) {
>           virStateCleanup();
>       }
>   
> +    virNetlinkEventServiceStopAll();
> +
>       virObjectUnref(adminProgram);
>       virObjectUnref(srvAdm);
>       virObjectUnref(qemuProgram);

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list