From nobody Fri Mar 29 11:39:42 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; envelope-from=libvir-list-bounces@redhat.com; helo=mx1.redhat.com; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1560325079; cv=none; d=zoho.com; s=zohoarc; b=n/h1IHT9smkZ8IqZopt/KbLw7uzJpjlCosGfI4XLPQ82KA58ngQj3orBia7pUwjH0EEm/4z5FVOTlcYjwR+RhmcMJTBTVQkdnTv7AXJ//l/+WQMp/UmtjNpBozf/hGlmjlLfht6USXuNLde35Q55/s4NBETqVROxlVUIjkLK2XY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1560325079; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Sender:Subject:To:ARC-Authentication-Results; bh=vE2TTwgWgmyoeMifcCcbD80Oal8TWNrmFxOQ+0CO3GU=; b=h1OxPv4Rphjp/OjEx051/CN6dmfgFtElFs8McG+ZZ+BjSY4sVQSoXdriHF+PABRqHqGN2YaGexkm9eNLeJApo4kXdSfCUCEHvZFl8ukXekZUoXHpWnqxiUgLq6bYHq2Hc4t64KC0Apr5aeC11qUO+8gc0kRwdrulQVSxkRm2DnA= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by mx.zohomail.com with SMTPS id 1560325079557479.8233926924389; Wed, 12 Jun 2019 00:37:59 -0700 (PDT) Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4146058E5C; Wed, 12 Jun 2019 07:37:46 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id B4133377B; Wed, 12 Jun 2019 07:37:43 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 73D091806B16; Wed, 12 Jun 2019 07:37:36 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id x5C7bXXO014926 for ; Wed, 12 Jun 2019 03:37:33 -0400 Received: by smtp.corp.redhat.com (Postfix) id DEE0D79451; Wed, 12 Jun 2019 07:37:33 +0000 (UTC) Received: from mx1.redhat.com (ext-mx07.extmail.prod.ext.phx2.redhat.com [10.5.110.31]) by smtp.corp.redhat.com (Postfix) with ESMTPS id D98AF513EC for ; Wed, 12 Jun 2019 07:37:31 +0000 (UTC) Received: from mail5.wrs.com (mail5.windriver.com [192.103.53.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id CA320C0578F8 for ; Wed, 12 Jun 2019 07:37:19 +0000 (UTC) Received: from ALA-HCA.corp.ad.wrs.com (ala-hca.corp.ad.wrs.com [147.11.189.40]) by mail5.wrs.com (8.15.2/8.15.2) with ESMTPS id x5C7aAAg003235 (version=TLSv1 cipher=AES128-SHA bits=128 verify=FAIL); Wed, 12 Jun 2019 00:36:46 -0700 Received: from pek-lpggp2.wrs.com (128.224.153.75) by ALA-HCA.corp.ad.wrs.com (147.11.189.50) with Microsoft SMTP Server (TLS) id 14.3.439.0; Wed, 12 Jun 2019 00:36:31 -0700 From: Liu Haitao To: Date: Wed, 12 Jun 2019 15:18:52 +0800 Message-ID: <1560323932-91229-1-git-send-email-haitao.liu@windriver.com> MIME-Version: 1.0 X-Originating-IP: [128.224.153.75] X-Greylist: Sender passed SPF test, Sender IP whitelisted by DNSRBL, ACL 216 matched, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Wed, 12 Jun 2019 07:37:20 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Wed, 12 Jun 2019 07:37:20 +0000 (UTC) for IP:'192.103.53.11' DOMAIN:'mail5.windriver.com' HELO:'mail5.wrs.com' FROM:'Haitao.Liu@windriver.com' RCPT:'' X-RedHat-Spam-Score: -2.3 (RCVD_IN_DNSWL_MED, SPF_HELO_NONE, SPF_PASS) 192.103.53.11 mail5.windriver.com 192.103.53.11 mail5.windriver.com X-Scanned-By: MIMEDefang 2.78 on 10.5.110.31 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-loop: libvir-list@redhat.com Cc: laine@laine.org Subject: [libvirt] [PATCH v2]daemon: Fix a crash during virNetlinkEventServiceStopAll X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: quoted-printable Sender: libvir-list-bounces@redhat.com Errors-To: libvir-list-bounces@redhat.com X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Wed, 12 Jun 2019 07:37:58 +0000 (UTC) Content-Type: text/plain; charset="utf-8" When reboot the host, a core dump file would be generated. The call traces are: Note.In this case, the main thread is thread 5. =20 (gdb) thread 5 [Switching to thread 5 (LWP 4142)] (gdb) bt 0 0x00007f00a6838273 in futex_wait_cancelable (private=3D,=20 expected=3D0, futex_word=3D0x7f004c0125c0) at /usr/src/debug/glibc/2.24-r0/git/sysdeps/unix/sysv/linux/futex-inter= nal.h:88 1 __pthread_cond_wait_common (abstime=3D0x0, mutex=3D0x7f004c012540,=20 cond=3D0x7f004c012598) at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_cond_wait.c:502 2 __pthread_cond_wait (cond=3D0x7f004c012598, mutex=3D0x7f004c012540) at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_cond_wait.c:655 3 0x00007f00aa467246 in virCondWait (c=3D, m=3D) at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthread.c:1= 54 4 0x00007f00aa467eb0 in virThreadPoolFree (pool=3D) at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthreadpool= .c:286 5 0x00007f0074349f9d in qemuStateCleanup () at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_driver.c= :1036 6 0x00007f00aa5e9486 in virStateCleanup () at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/libvirt.c:682 7 0x000055a687ab86a4 in main (argc=3D, argv=3D) at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/remote/remote_daem= on.c:1473 (gdb) thread 1 [Switching to thread 1 (LWP 4403)] (gdb) bt 0 __GI___pthread_mutex_lock (mutex=3Dmutex@entry=3D0x0) at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_mutex_lock.c:67 1 0x00007f00aa467165 in virMutexLock (m=3Dm@entry=3D0x0) at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthread.c:89 2 0x00007f00aa43c0f9 in virNetlinkEventServerLock (driver=3D) at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virnetlink.c:= 799 3 virNetlinkEventRemoveClient (watch=3Dwatch@entry=3D0,=20 macaddr=3Dmacaddr@entry=3D0x7f0088014944, protocol=3Dprotocol@entry=3D0) at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virnetlink.c:= 1197 4 0x00007f00aa4341df in virNetDevMacVLanDeleteWithVPortProfile ( ifname=3D, macaddr=3Dmacaddr@entry=3D0x7f0088014944,=20 linkdev=3D0x7f0088014920 "eth1", mode=3Dmode@entry=3D1,=20 virtPortProfile=3DvirtPortProfile@entry=3D0x0,=20 stateDir=3DstateDir@entry=3D0x7f004c12fa90 "/var/run/libvirt/qemu") at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virnetdevmacv= lan.c:1112 5 0x00007f0074312251 in qemuProcessStop (driver=3Ddriver@entry=3D0x7f004c0= ecef0,=20 vm=3Dvm@entry=3D0x7f0088000b00,=20 reason=3Dreason@entry=3DVIR_DOMAIN_SHUTOFF_SHUTDOWN,=20 asyncJob=3DasyncJob@entry=3DQEMU_ASYNC_JOB_NONE, flags=3D) at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_process.= c:7291 6 0x00007f007437a5ea in processMonitorEOFEvent (vm=3D0x7f0088000b00, drive= r=3D0x7f004c0ecef0) at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_driver.c= :4756 7 qemuProcessEventHandler (data=3D0x55a687d6df10, opaque=3D0x7f004c0ecef0) at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/qemu/qemu_driver.c= :4859 8 0x00007f00aa467c5b in virThreadPoolWorker ( opaque=3Dopaque@entry=3D0x55a687d6c110) at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthreadpool= .c:163 9 0x00007f00aa466fe8 in virThreadHelper (data=3D) at /usr/src/debug/libvirt/5.3.0-r0/libvirt-5.3.0/src/util/virthread.c:2= 06 10 0x00007f00a68323f4 in start_thread (arg=3D0x7f00699df700) at /usr/src/debug/glibc/2.24-r0/git/nptl/pthread_create.c:456 11 0x00007f00a616e10f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105 1. The execution flow of main thread (Thread 5 LWP 4142): main() -->virNetDaemonRun() -->virNetDaemonClose(dmn) //cleanup=20 -->virNetlinkEventServiceStopAll()=20 -->virStateCleanup()=20 -->qemuStateCleanup()=20 -->virThreadPoolFree() -->__pthread_cond_wait() virNetDaemonRun() -->virEventRunDefaultImpl -->virEventPollRunOnce -->virEventPollDispatchHandles -->qemuMonitorIO -->qemuProcessHandleMonitorEOF -->processEvent->eventType =3D QEMU_PROCESS_EVENT_MONITOR_EOF -->virThreadPoolSendJob() After typing reboot command on the host, the main thread would send an even= t message to another thread.=20 Here it would let thread 1 to handle the shutdown of qemu process. But it c= ould not be executed immediately. virNetlinkEventServiceStopAll()=20 --> virNetlinkEventServiceStop() --> server[protocol] =3D NULL; // set server to null=20 IN virNetlinkEventServiceStopAll(), some variables related to network are f= reed, like (static virNetlinkEventSrvPrivatePtr server). virStateCleanup()=20 -->qemuStateCleanup()=20 -->virThreadPoolFree() -->__pthread_cond_wait() In virThreadPoolFree() it will wait other thread to end up.=20 2. The execution flow of thread 5 (LWP 4403): qemuProcessStop() -->virNetDevMacVLanDeleteWithVPortProfile() -->virNetlinkEventRemoveClient() --> srv =3D server[protocol] Although the main thread had sent the message to thread 1(4403), it could n= ot be run instantly. It means that the virNetlinkEventServiceStopAll() might be executed earlier than virNetlinkEventRemoveClient(). We could get it from t= he log file. "" 2019-06-12 00:10:09.230+0000: 4142: info : virNetlinkEventServiceStopAll:94= 1 : stopping all netlink event services 2019-06-12 00:10:09.230+0000: 4142: info : virNetlinkEventServiceStop:904 := stopping netlink event service 2019-06-12 00:10:21.165+0000: 4403: debug : virNetlinkEventRemoveClient:119= 0 : removing client watch=3D0, mac=3D0x7f0088014944. " In virNetlinkEventRemoveClient() the variable server is used again, but now= it is null that is freed by virNetlinkEventServiceStopAll().So it would case a= crash . The virNetlinkEventServiceStopAll() should be executed behind virStateClean= up(), Signed-off-by: Liu Haitao --- src/remote/remote_daemon.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/remote/remote_daemon.c b/src/remote/remote_daemon.c index c3782971f1..7da20a6644 100644 --- a/src/remote/remote_daemon.c +++ b/src/remote/remote_daemon.c @@ -1464,8 +1464,6 @@ int main(int argc, char **argv) { /* Keep cleanup order in inverse order of startup */ virNetDaemonClose(dmn); =20 - virNetlinkEventServiceStopAll(); - if (driversInitialized) { /* NB: Possible issue with timing window between driversInitialized * setting if virNetlinkEventServerStart fails */ @@ -1473,6 +1471,8 @@ int main(int argc, char **argv) { virStateCleanup(); } =20 + virNetlinkEventServiceStopAll(); + virObjectUnref(adminProgram); virObjectUnref(srvAdm); virObjectUnref(qemuProgram); --=20 2.21.0 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list