[PATCH 2/2] daemon: Introduce the possibility for users to register custom XML validator

Peter Krempa posted 2 patches 3 years, 4 months ago
[PATCH 2/2] daemon: Introduce the possibility for users to register custom XML validator
Posted by Peter Krempa 3 years, 4 months ago
Introduce a new config option 'xml_validator' into the daemon config
file which will allow users to make libvirt daemons use a custom XML
validator.

The rationale is that validators such as 'jing'[1] provide drastically
better error specification when compared to the native libxml2 validator
we use. A drawback though is that jing is written in Java and thus
unusable by libvirt directly and also not a popular package present in
distros.

For power users and developers it still is worth having this feature to
provide better errors in a native way.

An example showing the quality of the errors:

XML used:

  <vcpu placement='static' current='1'>asdf8</vcpu>

native validator:
  error: XML document failed to validate against schema: Unable to validate doc against /home/pipo/libvirt/src/conf/schemas/domain.rng
  Extra element vcpu in interleave
  Invalid sequence in interleave
  Element domain failed to validate content

jing:
  error: XML document failed to validate against schema: Unable to validate doc against /home/pipo/libvirt/src/conf/schemas/domain.rng
  /dev/stdin:6:52: error: character content of element "vcpu" invalid; must be an integer

Example script to make this feature work with jing:

  #!/bin/bash
  java -jar /home/pipo/git/jing-trang/build/jing.jar $1 /dev/stdin 2>&1 || exit 1

[1] https://github.com/relaxng/jing-trang

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
---
 src/remote/libvirtd.aug.in        |  1 +
 src/remote/libvirtd.conf.in       | 21 +++++++++++++++++++++
 src/remote/remote_daemon.c        |  3 +++
 src/remote/remote_daemon_config.c |  4 ++++
 src/remote/remote_daemon_config.h |  2 ++
 src/remote/test_libvirtd.aug.in   |  1 +
 6 files changed, 32 insertions(+)

diff --git a/src/remote/libvirtd.aug.in b/src/remote/libvirtd.aug.in
index d744548f41..9a15eb14e5 100644
--- a/src/remote/libvirtd.aug.in
+++ b/src/remote/libvirtd.aug.in
@@ -91,6 +91,7 @@ module @DAEMON_NAME_UC@ =
    let misc_entry = str_entry "host_uuid"
                   | str_entry "host_uuid_source"
                   | int_entry "ovs_timeout"
+                  | str_entry "xml_validator"

    (* Each entry in the config is one of the following three ... *)
    let entry = sock_acl_entry
diff --git a/src/remote/libvirtd.conf.in b/src/remote/libvirtd.conf.in
index 80a98b1529..5cfe0c0918 100644
--- a/src/remote/libvirtd.conf.in
+++ b/src/remote/libvirtd.conf.in
@@ -541,3 +541,24 @@
 # potential infinite waits blocking libvirt.
 #
 #ovs_timeout = 5
+
+###################################################################
+# Custom XML validator
+#
+# The following option instructs @DAEMON_NAME@ to use a custom
+# XML validator binary or script before invoking the default
+# validator from libxml2.
+#
+# The custom validator is called using following arguments.
+#
+#   /path/to/validator /path/to/schema.rng
+#
+# The actual XML file is provided on the standard input of the process.
+# If the validation fails the process is expected to return a non-zero exit
+# code. The standard output and standard error output are used as error message
+# provided back to the user.
+#
+# Note that the XMLs sent for validation may contain security sensitive
+# information.
+#
+#xml_validator = "/path/to/validator"
diff --git a/src/remote/remote_daemon.c b/src/remote/remote_daemon.c
index f369d09d35..cbbf81e141 100644
--- a/src/remote/remote_daemon.c
+++ b/src/remote/remote_daemon.c
@@ -936,6 +936,9 @@ int main(int argc, char **argv) {
         exit(EXIT_FAILURE);
     }

+    if (config->xml_validator)
+        virXMLParseSetCustomValidator(config->xml_validator);
+
     /* Let's try to initialize global variable that holds the host's boot time. */
     if (virHostBootTimeInit() < 0) {
         /* This is acceptable failure. Maybe we won't need the boot time
diff --git a/src/remote/remote_daemon_config.c b/src/remote/remote_daemon_config.c
index 3567e337c4..affafa2a86 100644
--- a/src/remote/remote_daemon_config.c
+++ b/src/remote/remote_daemon_config.c
@@ -219,6 +219,7 @@ daemonConfigFree(struct daemonConfig *data)
     g_free(data->host_uuid_source);
     g_free(data->log_filters);
     g_free(data->log_outputs);
+    g_free(data->xml_validator);

     g_free(data);
 }
@@ -380,6 +381,9 @@ daemonConfigLoadOptions(struct daemonConfig *data,
     if (virConfGetValueUInt(conf, "ovs_timeout", &data->ovs_timeout) < 0)
         return -1;

+    if (virConfGetValueString(conf, "xml_validator", &data->xml_validator) < 0)
+        return -1;
+
     return 0;
 }

diff --git a/src/remote/remote_daemon_config.h b/src/remote/remote_daemon_config.h
index 9f9e54e838..4bc8bce90f 100644
--- a/src/remote/remote_daemon_config.h
+++ b/src/remote/remote_daemon_config.h
@@ -96,6 +96,8 @@ struct daemonConfig {
     unsigned int admin_keepalive_count;

     unsigned int ovs_timeout;
+
+    char *xml_validator;
 };


diff --git a/src/remote/test_libvirtd.aug.in b/src/remote/test_libvirtd.aug.in
index c27680e130..eedc3af24d 100644
--- a/src/remote/test_libvirtd.aug.in
+++ b/src/remote/test_libvirtd.aug.in
@@ -67,3 +67,4 @@ module Test_@DAEMON_NAME@ =
         { "admin_keepalive_interval" = "5" }
         { "admin_keepalive_count" = "5" }
         { "ovs_timeout" = "5" }
+        { "xml_validator" = "/path/to/validator" }
-- 
2.37.1
Re: [PATCH 2/2] daemon: Introduce the possibility for users to register custom XML validator
Posted by Daniel P. Berrangé 3 years, 4 months ago
On Fri, Sep 23, 2022 at 05:42:13PM +0200, Peter Krempa wrote:
> Introduce a new config option 'xml_validator' into the daemon config
> file which will allow users to make libvirt daemons use a custom XML
> validator.
> 
> The rationale is that validators such as 'jing'[1] provide drastically
> better error specification when compared to the native libxml2 validator
> we use. A drawback though is that jing is written in Java and thus
> unusable by libvirt directly and also not a popular package present in
> distros.
> 
> For power users and developers it still is worth having this feature to
> provide better errors in a native way.

I kinda wonder if those users need it integrated in libvirt
though, as opposed to just calling jing themselves when
needed. It feels like a fairly narrow set of users benefitting
here from this change, made even smaller by the fact that those
users have to remember to reconfigure libvirt to enable this.



> An example showing the quality of the errors:
> 
> XML used:
> 
>   <vcpu placement='static' current='1'>asdf8</vcpu>
> 
> native validator:
>   error: XML document failed to validate against schema: Unable to validate doc against /home/pipo/libvirt/src/conf/schemas/domain.rng
>   Extra element vcpu in interleave
>   Invalid sequence in interleave
>   Element domain failed to validate content
> 
> jing:
>   error: XML document failed to validate against schema: Unable to validate doc against /home/pipo/libvirt/src/conf/schemas/domain.rng
>   /dev/stdin:6:52: error: character content of element "vcpu" invalid; must be an integer

No doubt the errors are way better, but a power user
can just invoke this script below themselves if they
ever want to understand a XML problem better.

> 
> Example script to make this feature work with jing:
> 
>   #!/bin/bash
>   java -jar /home/pipo/git/jing-trang/build/jing.jar $1 /dev/stdin 2>&1 || exit 1
> 
> [1] https://github.com/relaxng/jing-trang

To me this is only compelling if there's an option we can reasonably
wire up out of the box.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [PATCH 2/2] daemon: Introduce the possibility for users to register custom XML validator
Posted by Peter Krempa 3 years, 4 months ago
On Mon, Sep 26, 2022 at 12:56:44 +0100, Daniel P. Berrangé wrote:
> On Fri, Sep 23, 2022 at 05:42:13PM +0200, Peter Krempa wrote:
> > Introduce a new config option 'xml_validator' into the daemon config
> > file which will allow users to make libvirt daemons use a custom XML
> > validator.
> > 
> > The rationale is that validators such as 'jing'[1] provide drastically
> > better error specification when compared to the native libxml2 validator
> > we use. A drawback though is that jing is written in Java and thus
> > unusable by libvirt directly and also not a popular package present in
> > distros.
> > 
> > For power users and developers it still is worth having this feature to
> > provide better errors in a native way.
> 
> I kinda wonder if those users need it integrated in libvirt
> though, as opposed to just calling jing themselves when
> needed. It feels like a fairly narrow set of users benefitting
> here from this change, made even smaller by the fact that those
> users have to remember to reconfigure libvirt to enable this.

Well, it makes it much more convenient when dealing with XML which is
not on disk, such as when doing a 'virsh edit' or editing the XML in
virt-manager's XML edit pane.

That way you don't have to go back to the edit window and copy-out the
XML and then run a validator.

[...]

> > 
> > Example script to make this feature work with jing:
> > 
> >   #!/bin/bash
> >   java -jar /home/pipo/git/jing-trang/build/jing.jar $1 /dev/stdin 2>&1 || exit 1
> > 
> > [1] https://github.com/relaxng/jing-trang
> 
> To me this is only compelling if there's an option we can reasonably
> wire up out of the box.

Well, I unfortunately didn't yet find any validator that I would
consider plugging in by default. Specifically in this case the overhead
of running a JVM is definitely not something I'd want do even if jing
were a widely distributed validator.
Re: [PATCH 2/2] daemon: Introduce the possibility for users to register custom XML validator
Posted by Ján Tomko 3 years, 4 months ago
On a Monday in 2022, Daniel P. Berrangé wrote:
>On Fri, Sep 23, 2022 at 05:42:13PM +0200, Peter Krempa wrote:
>> Introduce a new config option 'xml_validator' into the daemon config
>> file which will allow users to make libvirt daemons use a custom XML
>> validator.
>>
>> The rationale is that validators such as 'jing'[1] provide drastically
>> better error specification when compared to the native libxml2 validator
>> we use. A drawback though is that jing is written in Java and thus
>> unusable by libvirt directly and also not a popular package present in
>> distros.
>>
>> For power users and developers it still is worth having this feature to
>> provide better errors in a native way.
>
>I kinda wonder if those users need it integrated in libvirt
>though, as opposed to just calling jing themselves when
>needed. It feels like a fairly narrow set of users benefitting
>here from this change, made even smaller by the fact that those
>users have to remember to reconfigure libvirt to enable this.
>
>

I would be more convinced if jing was still packaged for Fedora.

>
>> An example showing the quality of the errors:
>>
>> XML used:
>>
>>   <vcpu placement='static' current='1'>asdf8</vcpu>
>>
>> native validator:
>>   error: XML document failed to validate against schema: Unable to validate doc against /home/pipo/libvirt/src/conf/schemas/domain.rng
>>   Extra element vcpu in interleave
>>   Invalid sequence in interleave
>>   Element domain failed to validate content
>>
>> jing:
>>   error: XML document failed to validate against schema: Unable to validate doc against /home/pipo/libvirt/src/conf/schemas/domain.rng
>>   /dev/stdin:6:52: error: character content of element "vcpu" invalid; must be an integer
>
>No doubt the errors are way better, but a power user
>can just invoke this script below themselves if they
>ever want to understand a XML problem better.
>
>>
>> Example script to make this feature work with jing:
>>
>>   #!/bin/bash
>>   java -jar /home/pipo/git/jing-trang/build/jing.jar $1 /dev/stdin 2>&1 || exit 1
>>
>> [1] https://github.com/relaxng/jing-trang
>
>To me this is only compelling if there's an option we can reasonably
>wire up out of the box.

Would switching to a different schema format make finding a validator
with frendlier messages easier?

Jano

>
>With regards,
>Daniel
>-- 
>|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
>|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
>|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>
Re: [PATCH 2/2] daemon: Introduce the possibility for users to register custom XML validator
Posted by Daniel P. Berrangé 3 years, 4 months ago
On Mon, Sep 26, 2022 at 02:09:23PM +0200, Ján Tomko wrote:
> On a Monday in 2022, Daniel P. Berrangé wrote:
> > On Fri, Sep 23, 2022 at 05:42:13PM +0200, Peter Krempa wrote:
> > > Introduce a new config option 'xml_validator' into the daemon config
> > > file which will allow users to make libvirt daemons use a custom XML
> > > validator.
> > > 
> > > The rationale is that validators such as 'jing'[1] provide drastically
> > > better error specification when compared to the native libxml2 validator
> > > we use. A drawback though is that jing is written in Java and thus
> > > unusable by libvirt directly and also not a popular package present in
> > > distros.
> > > 
> > > For power users and developers it still is worth having this feature to
> > > provide better errors in a native way.
> > 
> > I kinda wonder if those users need it integrated in libvirt
> > though, as opposed to just calling jing themselves when
> > needed. It feels like a fairly narrow set of users benefitting
> > here from this change, made even smaller by the fact that those
> > users have to remember to reconfigure libvirt to enable this.
> > 
> > 
> 
> I would be more convinced if jing was still packaged for Fedora.
> 
> > 
> > > An example showing the quality of the errors:
> > > 
> > > XML used:
> > > 
> > >   <vcpu placement='static' current='1'>asdf8</vcpu>
> > > 
> > > native validator:
> > >   error: XML document failed to validate against schema: Unable to validate doc against /home/pipo/libvirt/src/conf/schemas/domain.rng
> > >   Extra element vcpu in interleave
> > >   Invalid sequence in interleave
> > >   Element domain failed to validate content
> > > 
> > > jing:
> > >   error: XML document failed to validate against schema: Unable to validate doc against /home/pipo/libvirt/src/conf/schemas/domain.rng
> > >   /dev/stdin:6:52: error: character content of element "vcpu" invalid; must be an integer
> > 
> > No doubt the errors are way better, but a power user
> > can just invoke this script below themselves if they
> > ever want to understand a XML problem better.
> > 
> > > 
> > > Example script to make this feature work with jing:
> > > 
> > >   #!/bin/bash
> > >   java -jar /home/pipo/git/jing-trang/build/jing.jar $1 /dev/stdin 2>&1 || exit 1
> > > 
> > > [1] https://github.com/relaxng/jing-trang
> > 
> > To me this is only compelling if there's an option we can reasonably
> > wire up out of the box.
> 
> Would switching to a different schema format make finding a validator
> with frendlier messages easier?

XSD (XML Schema) is the main alternative to RNG, and I'm not seeing
obviously great choices for validation.  Xerces-C seems to be the
closest option, but that's C++ not C so would involve wrapped access
somehow. It'd also mean either we're parsing twice, or we replace
libxml with xerces for parsing too. Neither is hugely appealing.


I do wonder if it is practical to "merely" improve libxml's RNG
error messages. It feels like libxml knows the the 'vcpu' element
content was invalid, not matching integer, but it has thrown
away that information, and reported a more generic error at the
level above, that no elements in the RNG <interleave> matched.

I can understand why libxml might have some this - the <interleave>
could contain many branches all defining a 'vcpu' element, so when
finding no match it would have to decide which interleave choice
was the cloest match and report the error message from that context.
To add to complexity, the difference might be far away in nesting
RNG defintion, from where the interleave occurs.

None the less, ting does look to be making such an intelligent
analysis to decide which is the best error to report.  I wonder if
they've done something very clever with global scoring a "distance"
value, or if they've just done some limited tweaks in certain areas.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|