[PATCH] virnwfilterobj: Unlock virNWFilterObj before triggering rebuild

Michal Privoznik posted 1 patch 2 years, 3 months ago
Test syntax-check failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/47f6f74bc626950d4816a04e5a88fba03f44d5a6.1643719090.git.mprivozn@redhat.com
src/conf/virnwfilterobj.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
[PATCH] virnwfilterobj: Unlock virNWFilterObj before triggering rebuild
Posted by Michal Privoznik 2 years, 3 months ago
When updating am NWFilter, rebuilding of all NWFilters is
triggered. This happens in virNWFilterObjListAssignDef() by
calling virNWFilterTriggerRebuild(). Now consider another thread,
that's currently executing virNWFilterBindingCreateXML() over the
same NWFilter.

What happens is that this second thread gets all the way into
virNWFilterInstantiateFilterInternal(), acquires @updateMutex and
transitively calls virNWFilterObjListFindByName() which iterates
over list of NWFilters, locking one by one, comparing names
trying to find the desired one. Sooner or later it will try to
lock the object that the other, original thread is redefining and
which it holds locked. Now that thread can't continue either
because it's waiting for the @updateMutex lock.

So we end up in a typical deadlock situation, one thread holding
lock A trying to acquire lock B, the other thread holding B and
trying to acquire A.

The solution is to unlock the virNWFilterObj in the first thread,
just before triggering rebuild.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2044379
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
---

I'm not sure this is 100% correct, but hey - it make that particular bug
go away.

 src/conf/virnwfilterobj.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/conf/virnwfilterobj.c b/src/conf/virnwfilterobj.c
index 6bbdf6e6fa..d6c2e59ce8 100644
--- a/src/conf/virnwfilterobj.c
+++ b/src/conf/virnwfilterobj.c
@@ -352,12 +352,17 @@ virNWFilterObjListAssignDef(virNWFilterObjList *nwfilters,
         }
 
         obj->newDef = def;
-        /* trigger the update on VMs referencing the filter */
+
+        /* Trigger the update on VMs referencing the filter, but
+         * unlock the filter because rebuild will lock it again. */
+        virNWFilterObjUnlock(obj);
         if (virNWFilterTriggerRebuild() < 0) {
+            virNWFilterObjLock(obj);
             obj->newDef = NULL;
             virNWFilterObjUnlock(obj);
             return NULL;
         }
+        virNWFilterObjLock(obj);
 
         virNWFilterDefFree(objdef);
         obj->def = def;
-- 
2.34.1

Re: [PATCH] virnwfilterobj: Unlock virNWFilterObj before triggering rebuild
Posted by Daniel P. Berrangé 2 years, 3 months ago
On Tue, Feb 01, 2022 at 01:38:53PM +0100, Michal Privoznik wrote:
> When updating am NWFilter, rebuilding of all NWFilters is
> triggered. This happens in virNWFilterObjListAssignDef() by
> calling virNWFilterTriggerRebuild(). Now consider another thread,
> that's currently executing virNWFilterBindingCreateXML() over the
> same NWFilter.
> 
> What happens is that this second thread gets all the way into
> virNWFilterInstantiateFilterInternal(), acquires @updateMutex and
> transitively calls virNWFilterObjListFindByName() which iterates
> over list of NWFilters, locking one by one, comparing names
> trying to find the desired one. Sooner or later it will try to
> lock the object that the other, original thread is redefining and
> which it holds locked. Now that thread can't continue either
> because it's waiting for the @updateMutex lock.
> 
> So we end up in a typical deadlock situation, one thread holding
> lock A trying to acquire lock B, the other thread holding B and
> trying to acquire A.
> 
> The solution is to unlock the virNWFilterObj in the first thread,
> just before triggering rebuild.
> 
> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2044379
> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> ---
> 
> I'm not sure this is 100% correct, but hey - it make that particular bug
> go away.

I've not looked though the code flow again to remind myself of the
horrible locking rules, but as a general point, any correct solution
to this bug will involve eliminating all calls to

  virNWFilterReadLockFilterUpdates()

from outside the nwfilter driver itself.  There can be no scenario
whuer the QEMU driver is acquiring mutexes in the nwfilter, because
that can nmever work in a modular daemon world as they'll have their
own distinct copies of the mutex. Essentially that mutex has to be
private to the nwfilter driver.

As a general rule this goes for any mutex outside the virt drivers.
None are permitted to be used for synchronization between a virt
driver and secondary driver.

>  src/conf/virnwfilterobj.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/src/conf/virnwfilterobj.c b/src/conf/virnwfilterobj.c
> index 6bbdf6e6fa..d6c2e59ce8 100644
> --- a/src/conf/virnwfilterobj.c
> +++ b/src/conf/virnwfilterobj.c
> @@ -352,12 +352,17 @@ virNWFilterObjListAssignDef(virNWFilterObjList *nwfilters,
>          }
>  
>          obj->newDef = def;
> -        /* trigger the update on VMs referencing the filter */
> +
> +        /* Trigger the update on VMs referencing the filter, but
> +         * unlock the filter because rebuild will lock it again. */
> +        virNWFilterObjUnlock(obj);
>          if (virNWFilterTriggerRebuild() < 0) {
> +            virNWFilterObjLock(obj);
>              obj->newDef = NULL;
>              virNWFilterObjUnlock(obj);
>              return NULL;
>          }
> +        virNWFilterObjLock(obj);
>  
>          virNWFilterDefFree(objdef);
>          obj->def = def;
> -- 
> 2.34.1
> 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|