[PATCH] util: workaround libxml2 lack of thread safe initialization

Daniel P. Berrangé via Devel posted 1 patch 5 months, 3 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20250623110231.3262608-1-berrange@redhat.com
There is a newer version of this series
src/util/virxml.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
[PATCH] util: workaround libxml2 lack of thread safe initialization
Posted by Daniel P. Berrangé via Devel 5 months, 3 weeks ago
From: Daniel P. Berrangé <berrange@redhat.com>

The main XML parser code global initializer historically had a mutex
protecting it, and more recently uses a pthread_once. The RelaxNG
code, however, relies on three other global initializers that are
not thread safe, just relying on setting an integer "initialized"
flag.

Calling the relevant initializers from libvirt in a protected global
initializer will protect libvirt's own concurrent usage, however, it
cannot protect against other libraries loaded in process that might
be using libxml2's schema code. Fortunately:

 * The chances of other loaded non-libvirt code using libxml is
   relatively low
 * The chances of other loaded non-libvirt code using the schema
   validation / catalog functionality inside libxml is even
   lower
 * The chances of both libvirt and the non-libvirt usage having
   their *1st* usage of libxml2 be concurrent is tiny

IOW, in practice, although our solution doesn't fully fix the thread
safety, it is good enough.

libxml2 should none the less still be fixed to make its global
initializers be thread safe without special actions by its API
consumers.

Resolves: https://gitlab.com/libvirt/libvirt/-/issues/788
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 src/util/virxml.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/src/util/virxml.c b/src/util/virxml.c
index 9d46e5f32f..c2d5a109dc 100644
--- a/src/util/virxml.c
+++ b/src/util/virxml.c
@@ -26,6 +26,8 @@
 
 #include <libxml/xmlsave.h>
 #include <libxml/xpathInternals.h>
+#include <libxml/xmlschemastypes.h>
+#include <libxml/catalog.h>
 
 #include "virerror.h"
 #include "virxml.h"
@@ -35,6 +37,7 @@
 #include "virstring.h"
 #include "virutil.h"
 #include "viruuid.h"
+#include "virthread.h"
 #include "configmake.h"
 
 #define VIR_FROM_THIS VIR_FROM_XML
@@ -50,6 +53,25 @@ struct virParserData {
 };
 
 
+static int virXMLSchemaOnceInit(void)
+{
+    if (xmlSchemaInitTypes() < 0) {
+        virReportError(VIR_ERR_INTERNAL_ERROR,
+                       _("Unable to initialize libxml2 schema types"));
+        return -1;
+    }
+    if (xmlRelaxNGInitTypes() < 0) {
+        virReportError(VIR_ERR_INTERNAL_ERROR,
+                       _("Unable to initialize libxml2 RelaxNG data"));
+        return -1;
+    }
+    /* No return value, it just ignores errors :-( */
+    xmlInitializeCatalog();
+    return 0;
+}
+
+VIR_ONCE_GLOBAL_INIT(virXMLSchema);
+
 static xmlXPathContextPtr
 virXMLXPathContextNew(xmlDocPtr xml)
 {
@@ -1603,6 +1625,9 @@ virXMLValidatorInit(const char *schemafile)
 {
     g_autoptr(virXMLValidator) validator = NULL;
 
+    if (virXMLSchemaInitialize() < 0)
+        return NULL;
+
     validator = g_new0(virXMLValidator, 1);
 
     validator->schemafile = g_strdup(schemafile);
-- 
2.49.0
Re: [PATCH] util: workaround libxml2 lack of thread safe initialization
Posted by Daniel P. Berrangé via Devel 5 months, 3 weeks ago
On Mon, Jun 23, 2025 at 12:02:31PM +0100, Daniel P. Berrangé wrote:
> From: Daniel P. Berrangé <berrange@redhat.com>
> 
> The main XML parser code global initializer historically had a mutex
> protecting it, and more recently uses a pthread_once. The RelaxNG
> code, however, relies on three other global initializers that are
> not thread safe, just relying on setting an integer "initialized"
> flag.
> 
> Calling the relevant initializers from libvirt in a protected global
> initializer will protect libvirt's own concurrent usage, however, it
> cannot protect against other libraries loaded in process that might
> be using libxml2's schema code. Fortunately:
> 
>  * The chances of other loaded non-libvirt code using libxml is
>    relatively low
>  * The chances of other loaded non-libvirt code using the schema
>    validation / catalog functionality inside libxml is even
>    lower
>  * The chances of both libvirt and the non-libvirt usage having
>    their *1st* usage of libxml2 be concurrent is tiny
> 
> IOW, in practice, although our solution doesn't fully fix the thread
> safety, it is good enough.
> 
> libxml2 should none the less still be fixed to make its global
> initializers be thread safe without special actions by its API
> consumers.
> 
> Resolves: https://gitlab.com/libvirt/libvirt/-/issues/788
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  src/util/virxml.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/src/util/virxml.c b/src/util/virxml.c
> index 9d46e5f32f..c2d5a109dc 100644
> --- a/src/util/virxml.c
> +++ b/src/util/virxml.c
> @@ -26,6 +26,8 @@
>  
>  #include <libxml/xmlsave.h>
>  #include <libxml/xpathInternals.h>
> +#include <libxml/xmlschemastypes.h>
> +#include <libxml/catalog.h>
>  
>  #include "virerror.h"
>  #include "virxml.h"
> @@ -35,6 +37,7 @@
>  #include "virstring.h"
>  #include "virutil.h"
>  #include "viruuid.h"
> +#include "virthread.h"
>  #include "configmake.h"
>  
>  #define VIR_FROM_THIS VIR_FROM_XML
> @@ -50,6 +53,25 @@ struct virParserData {
>  };
>  
>  
> +static int virXMLSchemaOnceInit(void)
> +{
> +    if (xmlSchemaInitTypes() < 0) {
> +        virReportError(VIR_ERR_INTERNAL_ERROR,
> +                       _("Unable to initialize libxml2 schema types"));
> +        return -1;
> +    }

Sigh, this doesn't compile on c9s because this function was
'void' in 2.10.0 and earlier, where as now it is 'int' return
type. I'll post a v2...


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [PATCH] util: workaround libxml2 lack of thread safe initialization
Posted by Peter Krempa via Devel 5 months, 3 weeks ago
On Mon, Jun 23, 2025 at 12:02:31 +0100, Daniel P. Berrangé via Devel wrote:
> From: Daniel P. Berrangé <berrange@redhat.com>
> 
> The main XML parser code global initializer historically had a mutex
> protecting it, and more recently uses a pthread_once. The RelaxNG
> code, however, relies on three other global initializers that are
> not thread safe, just relying on setting an integer "initialized"
> flag.
> 
> Calling the relevant initializers from libvirt in a protected global
> initializer will protect libvirt's own concurrent usage, however, it
> cannot protect against other libraries loaded in process that might
> be using libxml2's schema code. Fortunately:
> 
>  * The chances of other loaded non-libvirt code using libxml is
>    relatively low
>  * The chances of other loaded non-libvirt code using the schema
>    validation / catalog functionality inside libxml is even
>    lower
>  * The chances of both libvirt and the non-libvirt usage having
>    their *1st* usage of libxml2 be concurrent is tiny

Additionaly IIUC this could be problem only when using the embedded
driver mode as we don't use libxml2 in the exported API

> IOW, in practice, although our solution doesn't fully fix the thread
> safety, it is good enough.
> 
> libxml2 should none the less still be fixed to make its global
> initializers be thread safe without special actions by its API
> consumers.
> 
> Resolves: https://gitlab.com/libvirt/libvirt/-/issues/788
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  src/util/virxml.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/src/util/virxml.c b/src/util/virxml.c
> index 9d46e5f32f..c2d5a109dc 100644
> --- a/src/util/virxml.c
> +++ b/src/util/virxml.c
> @@ -26,6 +26,8 @@
>  
>  #include <libxml/xmlsave.h>
>  #include <libxml/xpathInternals.h>
> +#include <libxml/xmlschemastypes.h>
> +#include <libxml/catalog.h>
>  
>  #include "virerror.h"
>  #include "virxml.h"
> @@ -35,6 +37,7 @@
>  #include "virstring.h"
>  #include "virutil.h"
>  #include "viruuid.h"
> +#include "virthread.h"
>  #include "configmake.h"
>  
>  #define VIR_FROM_THIS VIR_FROM_XML
> @@ -50,6 +53,25 @@ struct virParserData {
>  };
>  
>  
> +static int virXMLSchemaOnceInit(void)

Please use the preferred style of return type on it's own line.

> +{
> +    if (xmlSchemaInitTypes() < 0) {
> +        virReportError(VIR_ERR_INTERNAL_ERROR,
> +                       _("Unable to initialize libxml2 schema types"));
> +        return -1;
> +    }
> +    if (xmlRelaxNGInitTypes() < 0) {
> +        virReportError(VIR_ERR_INTERNAL_ERROR,
> +                       _("Unable to initialize libxml2 RelaxNG data"));
> +        return -1;
> +    }
> +    /* No return value, it just ignores errors :-( */
> +    xmlInitializeCatalog();
> +    return 0;
> +}
> +
> +VIR_ONCE_GLOBAL_INIT(virXMLSchema);
> +
>  static xmlXPathContextPtr
>  virXMLXPathContextNew(xmlDocPtr xml)
>  {
> @@ -1603,6 +1625,9 @@ virXMLValidatorInit(const char *schemafile)
>  {
>      g_autoptr(virXMLValidator) validator = NULL;
>  
> +    if (virXMLSchemaInitialize() < 0)
> +        return NULL;
> +
>      validator = g_new0(virXMLValidator, 1);
>  
>      validator->schemafile = g_strdup(schemafile);
> -- 
> 2.49.0
> 

Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Re: [PATCH] util: workaround libxml2 lack of thread safe initialization
Posted by Daniel P. Berrangé via Devel 5 months, 3 weeks ago
On Mon, Jun 23, 2025 at 01:16:01PM +0200, Peter Krempa wrote:
> On Mon, Jun 23, 2025 at 12:02:31 +0100, Daniel P. Berrangé via Devel wrote:
> > From: Daniel P. Berrangé <berrange@redhat.com>
> > 
> > The main XML parser code global initializer historically had a mutex
> > protecting it, and more recently uses a pthread_once. The RelaxNG
> > code, however, relies on three other global initializers that are
> > not thread safe, just relying on setting an integer "initialized"
> > flag.
> > 
> > Calling the relevant initializers from libvirt in a protected global
> > initializer will protect libvirt's own concurrent usage, however, it
> > cannot protect against other libraries loaded in process that might
> > be using libxml2's schema code. Fortunately:
> > 
> >  * The chances of other loaded non-libvirt code using libxml is
> >    relatively low
> >  * The chances of other loaded non-libvirt code using the schema
> >    validation / catalog functionality inside libxml is even
> >    lower
> >  * The chances of both libvirt and the non-libvirt usage having
> >    their *1st* usage of libxml2 be concurrent is tiny
> 
> Additionaly IIUC this could be problem only when using the embedded
> driver mode as we don't use libxml2 in the exported API

Actually we can trigger the problem easily via the public libvirt API
both locally (using test:///default) and remotely (using qemu:///system),
just by using VIR_DOMAIN_DEFINE_VALIDATE concurrently. See the test
program in the bug report:

  https://gitlab.com/-/project/192693/uploads/72900e26116909000d926de4db7e6b27/badschema.c

It is unlikely something else loaded by libvirtd uses libxml2, but never
say never, as we link to loads of libraries indirectly.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|