From nobody Sun May 5 03:37:54 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; envelope-from=libvir-list-bounces@redhat.com; helo=mx1.redhat.com; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by mx.zohomail.com with SMTPS id 1537801909053369.69926270531505; Mon, 24 Sep 2018 08:11:49 -0700 (PDT) Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8A42230832DC; Mon, 24 Sep 2018 15:11:45 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 88E0D1077D48; Mon, 24 Sep 2018 15:11:44 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 35359181A12F; Mon, 24 Sep 2018 15:11:43 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id w8OFBgc7022305 for ; Mon, 24 Sep 2018 11:11:42 -0400 Received: by smtp.corp.redhat.com (Postfix) id 5348384955; Mon, 24 Sep 2018 15:11:42 +0000 (UTC) Received: from mx1.redhat.com (ext-mx04.extmail.prod.ext.phx2.redhat.com [10.5.110.28]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 461C9866C0 for ; Mon, 24 Sep 2018 15:11:38 +0000 (UTC) Received: from mail.windriver.com (mail.windriver.com [147.11.1.11]) (using TLSv1.1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6589C88317 for ; Mon, 24 Sep 2018 15:11:37 +0000 (UTC) Received: from ALA-HCA.corp.ad.wrs.com ([147.11.189.40]) by mail.windriver.com (8.15.2/8.15.1) with ESMTPS id w8OFBaMT004675 (version=TLSv1 cipher=AES128-SHA bits=128 verify=FAIL) for ; Mon, 24 Sep 2018 08:11:36 -0700 (PDT) Received: from yow-dellw-ma.wrs.com (128.224.56.18) by ALA-HCA.corp.ad.wrs.com (147.11.189.50) with Microsoft SMTP Server id 14.3.408.0; Mon, 24 Sep 2018 08:11:36 -0700 From: Mark Asselstine To: Date: Mon, 24 Sep 2018 11:11:35 -0400 Message-ID: <1537801895-405-1-git-send-email-mark.asselstine@windriver.com> MIME-Version: 1.0 X-Greylist: Sender passed SPF test, Sender IP whitelisted by DNSRBL, ACL 214 matched, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Mon, 24 Sep 2018 15:11:37 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Mon, 24 Sep 2018 15:11:37 +0000 (UTC) for IP:'147.11.1.11' DOMAIN:'mail.windriver.com' HELO:'mail.windriver.com' FROM:'Mark.Asselstine@windriver.com' RCPT:'' X-RedHat-Spam-Score: -2.301 (RCVD_IN_DNSWL_MED, SPF_PASS) 147.11.1.11 mail.windriver.com 147.11.1.11 mail.windriver.com X-Scanned-By: MIMEDefang 2.78 on 10.5.110.28 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-loop: libvir-list@redhat.com Subject: [libvirt] [PATCH] lxc_monitor: Avoid AB / BA lock race X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: quoted-printable Sender: libvir-list-bounces@redhat.com Errors-To: libvir-list-bounces@redhat.com X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.44]); Mon, 24 Sep 2018 15:11:46 +0000 (UTC) X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Type: text/plain; charset="utf-8" A deadlock situation can occur when autostarting a LXC domain 'guest' due to two threads attempting to take opposing locks while holding opposing locks (AB BA problem). Thread A takes and holds the 'vm' lock while attempting to take the 'client' lock, meanwhile, thread B takes and holds the 'client' lock while attempting to take the 'vm' lock. The potential for this can be seen as follows: Thread A: virLXCProcessAutostartDomain (takes vm lock) --> virLXCProcessStart --> virLXCProcessConnectMonitor --> virLXCMonitorNew --> virNetClientSetCloseCallback (wants client lock) Thread B: virNetClientIncomingEvent (takes client lock) --> virNetClientIOHandleInput --> virNetClientCallDispatch --> virNetClientCallDispatchMessage --> virNetClientProgramDispatch --> virLXCMonitorHandleEventInit --> virLXCProcessMonitorInitNotify (wants vm lock) Since these threads are scheduled independently and are preemptible it is possible for the deadlock scenario to occur where each thread locks their first lock but both will fail to get their second lock and just spin forever. You get something like: virLXCProcessAutostartDomain (takes vm lock) --> virLXCProcessStart --> virLXCProcessConnectMonitor --> virLXCMonitorNew <...> virNetClientIncomingEvent (takes client lock) --> virNetClientIOHandleInput --> virNetClientCallDispatch --> virNetClientCallDispatchMessage --> virNetClientProgramDispatch --> virLXCMonitorHandleEventInit --> virLXCProcessMonitorInitNotify (wants vm lock but spins) <...> --> virNetClientSetCloseCallback (wants client lock but spins) Neither thread ever gets the lock it needs to be able to continue while holding the lock that the other thread needs. The actual window for preemption which can cause this deadlock is rather small, between the calls to virNetClientProgramNew() and execution of virNetClientSetCloseCallback(), both in virLXCMonitorNew(). But it can be seen in real world use that this small window is enough. By moving the call to virNetClientSetCloseCallback() ahead of virNetClientProgramNew() we can close any possible chance of the deadlock taking place. There should be no other implications to the move since the close callback (in the unlikely event was called) will spin on the vm lock. The remaining work that takes place between the old call location of virNetClientSetCloseCallback() and the new location is unaffected by the move. Signed-off-by: Mark Asselstine --- NOTE I am not intimately familiar with the libvirt code, so although I am confident of the cause of the issue I am not sure this is the best solution. So I am really curious what people who do know this code think and as such I am definitely open to discussions and suggestions on how to rework things to properly address the issue. src/lxc/lxc_monitor.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/lxc/lxc_monitor.c b/src/lxc/lxc_monitor.c index e765c16..218f426 100644 --- a/src/lxc/lxc_monitor.c +++ b/src/lxc/lxc_monitor.c @@ -161,6 +161,10 @@ virLXCMonitorPtr virLXCMonitorNew(virDomainObjPtr vm, if (virNetClientRegisterAsyncIO(mon->client) < 0) goto error; =20 + /* avoid deadlock by making this call before assigning virLXCMonitorEv= ents */ + virNetClientSetCloseCallback(mon->client, virLXCMonitorEOFNotify, mon, + virLXCMonitorCloseFreeCallback); + if (!(mon->program =3D virNetClientProgramNew(VIR_LXC_MONITOR_PROGRAM, VIR_LXC_MONITOR_PROGRAM_VE= RSION, virLXCMonitorEvents, @@ -176,8 +180,6 @@ virLXCMonitorPtr virLXCMonitorNew(virDomainObjPtr vm, memcpy(&mon->cb, cb, sizeof(mon->cb)); =20 virObjectRef(mon); - virNetClientSetCloseCallback(mon->client, virLXCMonitorEOFNotify, mon, - virLXCMonitorCloseFreeCallback); =20 cleanup: VIR_FREE(sockpath); --=20 2.7.4 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list