[libvirt] [PATCH v2] nwfilter: fix deadlock when nwfilter reload

Wang Yechao posted 1 patch 5 years, 7 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/1536837307-2875-1-git-send-email-wang.yechao255@zte.com.cn
Test syntax-check passed
src/nwfilter/nwfilter_driver.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
[libvirt] [PATCH v2] nwfilter: fix deadlock when nwfilter reload
Posted by Wang Yechao 5 years, 7 months ago
user run "firewalld-cmd --reload"
nwfilterStateReload called in main thread
step 1. virRWLockWrite(&updateLock)
step 2. virNWFilterLoadAllConfigs
step 3. virRWLockUnlock(&updateLock);

lauch a vm: qemuDomainCreateXML runs in other thread
step 1. virRWLockRead(&updateLock);
step 2. qemuProcessStart
step 3. qemuProcessWaitForMonitor
step 4. ...
step 5  virRWLockUnlock(&updateLock);

if nwfilterStateReload called in the middle of step 1 and step 5 of
qemuDomainCreateXML, it can't get the updateLock and then block the event_loop,
so event_loop can't handle the qemu-monitor messages, cause deadlock

move nwfilterStateReload into thread to fix this problem.

Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
Reviewed-by: Wang Yi <wang.yi59@zte.com.cn>
---
 src/nwfilter/nwfilter_driver.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/nwfilter/nwfilter_driver.c b/src/nwfilter/nwfilter_driver.c
index 1ee5162..ab85072 100644
--- a/src/nwfilter/nwfilter_driver.c
+++ b/src/nwfilter/nwfilter_driver.c
@@ -80,18 +80,26 @@ static void nwfilterDriverUnlock(void)
 }
 
 #if HAVE_FIREWALLD
+static void nwfilterReloadThread(void *opaque ATTRIBUTE_UNUSED)
+{
+    nwfilterStateReload();
+}
 
 static DBusHandlerResult
 nwfilterFirewalldDBusFilter(DBusConnection *connection ATTRIBUTE_UNUSED,
                             DBusMessage *message,
                             void *user_data ATTRIBUTE_UNUSED)
 {
+    virThread thread;
+
     if (dbus_message_is_signal(message, DBUS_INTERFACE_DBUS,
                                "NameOwnerChanged") ||
         dbus_message_is_signal(message, "org.fedoraproject.FirewallD1",
                                "Reloaded")) {
         VIR_DEBUG("Reload in nwfilter_driver because of firewalld.");
-        nwfilterStateReload();
+
+        if (virThreadCreate(&thread, false, nwfilterReloadThread, NULL) < 0)
+            VIR_WARN("create nwfilterReloadThread failed.");
     }
 
     return DBUS_HANDLER_RESULT_NOT_YET_HANDLED;
-- 
1.8.3.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2] nwfilter: fix deadlock when nwfilter reload
Posted by John Ferlan 5 years, 7 months ago

On 09/13/2018 07:15 AM, Wang Yechao wrote:
> user run "firewalld-cmd --reload"
> nwfilterStateReload called in main thread
> step 1. virRWLockWrite(&updateLock)
> step 2. virNWFilterLoadAllConfigs
> step 3. virRWLockUnlock(&updateLock);
> 
> lauch a vm: qemuDomainCreateXML runs in other thread
> step 1. virRWLockRead(&updateLock);
> step 2. qemuProcessStart
> step 3. qemuProcessWaitForMonitor
> step 4. ...
> step 5  virRWLockUnlock(&updateLock);
> 
> if nwfilterStateReload called in the middle of step 1 and step 5 of
> qemuDomainCreateXML, it can't get the updateLock and then block the event_loop,
> so event_loop can't handle the qemu-monitor messages, cause deadlock
> 
> move nwfilterStateReload into thread to fix this problem.
> 
> Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
> Reviewed-by: Wang Yi <wang.yi59@zte.com.cn>
> ---
>  src/nwfilter/nwfilter_driver.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 

Couple of "administrative" comments - since you were a new contributor
to libvir-list - you just needed to be a bit more patient w/r/t getting
your patches on this list, as described here:

https://libvirt.org/hacking.html

"If everything went well, your patch should show up on the libvir-list
archives in a matter of minutes; if you still can't find it on there
after an hour or so, you should double-check your setup. Note that your
very first post to the mailing list will be subject to moderation, and
it's not uncommon for that to take around a day.
"

OK so that hopefully helps explain why there were many v1's posted

Then when you repost a few days later with something that's changed, you
need to change to using v2.  In the postings on Sept 11 it seems you've
change the R-By email address. That's important enough to note...

Then when you posted as v2, it probably should have been v3 and a note
added why you changed from VIR_ERROR to VIR_WARN

For each of these addition versions after the "---" (above):

   1. Add a pointer to the previous posting from the archives
   2. Briefly describe the differences in the changes

So, after all that...  This seemed really familiar... and it was:

See:

https://www.redhat.com/archives/libvir-list/2017-October/msg01351.html

and follow-ups:

https://www.redhat.com/archives/libvir-list/2017-December/msg00198.html
https://www.redhat.com/archives/libvir-list/2017-December/msg00227.html

However, that followup was essentially dropped. So this code once again
resurrects the same questions that I had back then.

Not that I don't think this should be done in some form, the question
becomes how much "care" needs to be taken to ensure we don't have
multiple threads running.

John

BTW: Whether you were aware of Nikolay's patch or not, I think some
attribution could be made even though it was the same solution.


> diff --git a/src/nwfilter/nwfilter_driver.c b/src/nwfilter/nwfilter_driver.c
> index 1ee5162..ab85072 100644
> --- a/src/nwfilter/nwfilter_driver.c
> +++ b/src/nwfilter/nwfilter_driver.c
> @@ -80,18 +80,26 @@ static void nwfilterDriverUnlock(void)
>  }
>  
>  #if HAVE_FIREWALLD
> +static void nwfilterReloadThread(void *opaque ATTRIBUTE_UNUSED)
> +{
> +    nwfilterStateReload();
> +}
>  
>  static DBusHandlerResult
>  nwfilterFirewalldDBusFilter(DBusConnection *connection ATTRIBUTE_UNUSED,
>                              DBusMessage *message,
>                              void *user_data ATTRIBUTE_UNUSED)
>  {
> +    virThread thread;
> +
>      if (dbus_message_is_signal(message, DBUS_INTERFACE_DBUS,
>                                 "NameOwnerChanged") ||
>          dbus_message_is_signal(message, "org.fedoraproject.FirewallD1",
>                                 "Reloaded")) {
>          VIR_DEBUG("Reload in nwfilter_driver because of firewalld.");
> -        nwfilterStateReload();
> +
> +        if (virThreadCreate(&thread, false, nwfilterReloadThread, NULL) < 0)
> +            VIR_WARN("create nwfilterReloadThread failed.");
>      }
>  
>      return DBUS_HANDLER_RESULT_NOT_YET_HANDLED;
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt][PATCH v2] nwfilter: fix deadlock when nwfilter reload
Posted by wang.yechao255@zte.com.cn 5 years, 7 months ago
I'm sorry about many v1 patches posted. I fix some syntax errors in all v2 patches, 


and should note the changes in these patches. I will learn more about posting patch


correctly. 


Thanks John.






---



Best wishes,


Wang Yechao











原始邮件



发件人:JohnFerlan <jferlan@redhat.com>
收件人:王业超10154425;libvir-list@redhat.com <libvir-list@redhat.com>
日 期 :2018年09月14日 23:49
主 题 :Re: [libvirt] [PATCH v2] nwfilter: fix deadlock when nwfilter reload




On 09/13/2018 07:15 AM, Wang Yechao wrote:
> user run "firewalld-cmd --reload"
> nwfilterStateReload called in main thread
> step 1. virRWLockWrite(&updateLock)
> step 2. virNWFilterLoadAllConfigs
> step 3. virRWLockUnlock(&updateLock);
> 
> lauch a vm: qemuDomainCreateXML runs in other thread
> step 1. virRWLockRead(&updateLock);
> step 2. qemuProcessStart
> step 3. qemuProcessWaitForMonitor
> step 4. ...
> step 5  virRWLockUnlock(&updateLock);
> 
> if nwfilterStateReload called in the middle of step 1 and step 5 of
> qemuDomainCreateXML, it can't get the updateLock and then block the event_loop,
> so event_loop can't handle the qemu-monitor messages, cause deadlock
> 
> move nwfilterStateReload into thread to fix this problem.
> 
> Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
> Reviewed-by: Wang Yi <wang.yi59@zte.com.cn>
> ---
>  src/nwfilter/nwfilter_driver.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 

Couple of "administrative" comments - since you were a new contributor
to libvir-list - you just needed to be a bit more patient w/r/t getting
your patches on this list, as described here:

https://libvirt.org/hacking.html

"If everything went well, your patch should show up on the libvir-list
archives in a matter of minutes; if you still can't find it on there
after an hour or so, you should double-check your setup. Note that your
very first post to the mailing list will be subject to moderation, and
it's not uncommon for that to take around a day.
"

OK so that hopefully helps explain why there were many v1's posted

Then when you repost a few days later with something that's changed, you
need to change to using v2.  In the postings on Sept 11 it seems you've
change the R-By email address. That's important enough to note...

Then when you posted as v2, it probably should have been v3 and a note
added why you changed from VIR_ERROR to VIR_WARN

For each of these addition versions after the "---" (above):

   1. Add a pointer to the previous posting from the archives
   2. Briefly describe the differences in the changes

So, after all that...  This seemed really familiar... and it was:

See:

https://www.redhat.com/archives/libvir-list/2017-October/msg01351.html

and follow-ups:

https://www.redhat.com/archives/libvir-list/2017-December/msg00198.html
https://www.redhat.com/archives/libvir-list/2017-December/msg00227.html

However, that followup was essentially dropped. So this code once again
resurrects the same questions that I had back then.

Not that I don't think this should be done in some form, the question
becomes how much "care" needs to be taken to ensure we don't have
multiple threads running.

John

BTW: Whether you were aware of Nikolay's patch or not, I think some
attribution could be made even though it was the same solution.


> diff --git a/src/nwfilter/nwfilter_driver.c b/src/nwfilter/nwfilter_driver.c
> index 1ee5162..ab85072 100644
> --- a/src/nwfilter/nwfilter_driver.c
> +++ b/src/nwfilter/nwfilter_driver.c
> @@ -80,18 +80,26 @@ static void nwfilterDriverUnlock(void)
>  }
>  
>  #if HAVE_FIREWALLD
> +static void nwfilterReloadThread(void *opaque ATTRIBUTE_UNUSED)
> +{
> +    nwfilterStateReload();
> +}
>  
>  static DBusHandlerResult
>  nwfilterFirewalldDBusFilter(DBusConnection *connection ATTRIBUTE_UNUSED,
>                              DBusMessage *message,
>                              void *user_data ATTRIBUTE_UNUSED)
>  {
> +    virThread thread;
> +
>      if (dbus_message_is_signal(message, DBUS_INTERFACE_DBUS,
>                                 "NameOwnerChanged") ||
>          dbus_message_is_signal(message, "org.fedoraproject.FirewallD1",
>                                 "Reloaded")) {
>          VIR_DEBUG("Reload in nwfilter_driver because of firewalld.");
> -        nwfilterStateReload();
> +
> +        if (virThreadCreate(&thread, false, nwfilterReloadThread, NULL) < 0)
> +            VIR_WARN("create nwfilterReloadThread failed.");
>      }
>  
>      return DBUS_HANDLER_RESULT_NOT_YET_HANDLED;
>--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list