RFC: Generate object-model code based on relax-ng files

[RFC 00/29] RFC: Generate object-model code based on relax-ng files

Posted by Shi Lei 4 years, 1 month ago

 Outline
=========

In libvirt world, objects(like domain, network, etc.) are described with two
representations: structures in c-language and XML specified by relax-ng.
Since c-language-implementation and xml restricted by relax-ng are merely
the different expressions of the same object-model, we can make a tool to
generate those C-language codes based on relax-ng files.


 RNG2C tool
===========

RNG2C is a python tool which can translate hierarchy of notations
in relax-ng into c-language structures. In general, <element> is converted to
struct in C, struct's members derive from its <attribute>s and its sub <element>s;
<choice> with multiple <value>s is converted to enum in C; <data> are converted
into various builtin-type in C, such as int, char *.

Also, RNG2C can generate relative conversion functions: vir[XXX]ParseXML,
vir[XXX]FormatBuf and vir[XXX]Clear.

These structures and functions can be used to replace hardcoded codes in current
libvirt source.

The tool RNG2C has three subcommands: 'list', 'show' and 'generate'. The 'list' and
'show' are used for previewing generated code, and the 'generate' is prepared to be
invoked by Makefile.

1) list - list all types generated by this tool.

Example:

# ./rng2c/go list

SHORT_ID  META              LOCATION
--------  ----             ----------
3df63b7a  String            String
9b81a5f5  UInt              UInt
9d10c738  Enum              /basictypes.rng/virOnOff.define/choice
5a024c44  Struct            /network.rng/network.define/network.element/dns.element
f6f0c927  Struct            /network.rng/network.define/network.element
... ...


2) show - show details of the target type. The target is specified with
   SHORT_ID or LOCATION. Option '-k' specifies kinds of output information,
   'scpf' stands for [s]tructure | [c]learfunc | [p]arsefunc | [f]ormatfunc.

Example:

# ./rng2c/go show 5a024c44 -kscpf

Or

# ./rng2c/go show /network.rng/network.define/network.element/dns.element -kscpf

It will show four aspects of this Type: the directive of it; the declaration
and implementation of its structure; code of its clearfunc; code of its parsefunc;
code of its formatfunc.


3) generate - just like show, but it dosen't print information on screen. It is
   only for Makefile. It will generate c-language-code, such as struct/enum/
   parsefunc/etc., to take the place of the relative hardcoded stuff.


 Problems
=========

If we replace hardcoded code with generated code directly, there're still
several problems.

Problem1 - name convension
  Generated code follows specific name convension, now there're some differences
  between generated code and current libvirt codes.
  For example, generated struct's name will be like virXXXDef, parsefunc's name
  will be like virXXXDefParseXML; the 'XXX' is a kind of Camel-Case name which
  implies the struct's place in the hierarchy.
  But in current code, there're many variations of name.
  So if we replace current codes with generated codes directly, it requires
  a lot of modifications in the caller codes.

Problem2 - error-checking codes in the parsefunc
  Most of current parse functions have error-checking codes after parsing XML
  code. These code lines validate existence and parsing results.
  Now RNG2C can't generate error-checking codes in the parsefunc. So we
  should find a way to handle it.

Problem3 - hierarchy consistence
  The RNG2C generate a 'struct'(C code) for each 'element'(relax-ng). Most
  of current code follows this convention, but there're still some exceptions.
  For example, the element 'bridge'(in network.rng) should be translated into a
  struct called 'virNetworkBridgeDef'; but in fact, no this struct in the current
  codes, and its members (stp/delay/...) are exposed to the struct 'virNetworkDef'.

Problem4 - target output files for generated code
  The current codes to be replaced exist in various source files. They
  include other source files, and are included by other source files. So we
  should specify proper target files for generated codes to take the place of
  old codes and inherit those original relationships.


 Directive
===========

To solve these problems, the directive mechanism is introduced.

           stage1                      stage2
relax-ng ---------> directives(json) ---------> c-language-code
  ^                     |
  |                     |
  |---------------------|
    feedback-injection


The basic process of generating codes includes two internal stages:
Stage1: RNG2C parses the content of relax-ng files and generates directives.
Stage2: RNG2C generates c-language code based on directives. So the code's form
depends on directives directly.

Injecting directives with new values into relax-ng files can override the
default directives of stage1. Then in the stage2, the final c-code will be different.

Directives are in the form of json and then wrapped as an special XML comment.
So when common relax-ng parsers parse these relax-ng files, these directives will be
ignored. Only RNG2C can recognize and handle them.

For example:

We want to replace the hardcoded virNetworkDNSHostDef with the generated structure.
The original hardcoded structure is as below:

    typedef struct _virNetworkDNSHostDef virNetworkDNSHostDef;
    typedef virNetworkDNSHostDef *virNetworkDNSHostDefPtr;
    struct _virNetworkDNSHostDef {
        virSocketAddr ip;
        size_t nnames;
        char **names;
    };

Take several steps as below:

 1) Execute subcommand list to find it

    # ./rng2c/go list | grep dns | grep host

        SHORT_ID  META    LOCATION
        7892979d  Struct  /network.rng/network.define/network.element/dns.element/host.element

 2) Execute subcommand show with SHORT-ID to check its generated structure

    # ./rng2c/go show 7892979d -ks


        ###### Directive ######

        <!-- VIRT:DIRECTIVE {
          "id": "7892979d7e7a89882e9a324d95198d05d851d02b",
          "location": "/network.rng/network.define/network.element/dns.element/host.element",
          "name": "virNetworkDNSHostDef",
          "meta": "Struct",
          "members": [
            {"id": "ip", "type": "String:b48b4186"},
            {"id": "hostname", "more": true, "type": "String:7d3736b9"}
          ]
        } -->

        ###### Structure (Disabled: NO OUTPUT for "structure".) ######

        [.h]

        typedef struct _virNetworkDNSHostDef virNetworkDNSHostDef;
        typedef virNetworkDNSHostDef *virNetworkDNSHostDefPtr;
        struct _virNetworkDNSHostDef {
            virSocketAddr ip;
            size_t nhostnames;
            char **hostnames;
        };


    We can see, the name of the generated struct depends on root property "name"
    of the directive. Fortunately, it's the same and we don't need override it.

    There're two differences between generated structure and hardcoded stuff:
    In original struct, names of the second and third member are 'nnames' and
    'names'; but in generated code, they are 'nhostnames' and 'hostnames'.

    These names are controlled by this part:

      "members": [{"id": "hostname", ...}].

 3) Override properties of the directive and reinject it into the relax-ng

    Based on "location" of the directive, we find the injection point and only need
    to fill changed parts, just as below:

    <!-- VIRT:DIRECTIVE {
      "members": [{"id": "hostname", "name": "name"}]
    } -->
    <element name="host">
      ... ...

    We need to specify which member will be altered by property "id".

 4) Re-execute subcommand 'show' to confirm the sameness

    After we bridge the gap between the generated structure and the hardcoded
    stuff, we can perform the substitution.


 Directive properties
=====================

The schema of directive are defined in rng2c/schema.json.
Directive consists of properties. Properties retain or derive from the notations
in the relax-ng.

1) root properties

id - string, READ-ONLY.
  Unique identity of the type.
  The id is generated by SHA-1. Its leftmost 8 digits are taken out as SHORT-ID.

location - string, READ-ONLY.
  The location where the type derive from.
  When injecting a directive into relax-ng file, the injection point is based on
  this location.

tag - string, READ-ONLY.
  The type's source tag. It can be 'element', 'data' or 'choice'.

name - string.
  The type's name.
  For struct and enum, name is generated automatically. For struct, its default
  form is vir[XXX]Def; for enum, its default form is vir[XXX]Type. The main
  part 'XXX' represents its position in the hierarchy.
  If name is specified, it will override the default name.
  For example, inject directive as below:

  <!-- VIRT:DIRECTIVE {"name": "virNetDevIPRoute"} -->
  <element name="route">
  ... ...

  Then the name of the generated struct will be 'virNetDevIPRoute'.

  This root property 'name' can be used to solve Problem1.

meta - string.
  The type's meta.
  For struct, it is 'Struct'; for enum, it is 'Enum'; for builtins,
  it is the builtin's meta.

  Meta can be overrided. For example, inject directive as below:

  <define name="UUID">
    <!-- VIRT:DIRECTIVE {
      "meta": "UChars",
      "structure": {"array.size": "VIR_UUID_BUFLEN"}
    } -->
    <data type="string">
      ... ...

  Then this builtin will change into UChars from String(i.e. char *).
  Note: RNG2C don't generate implementation for builtin, they need to be
  hardcoded in libvirt source.

unpack - boolean. Only for struct.
  Only valid for struct. On setting it true, this type will expose its members
  to its upper-level struct rather than act as a standalone struct.
  For example, inject directive on element 'bridge' as below:

  <!-- VIRT:DIRECTIVE {
    "unpack": true,
    "members": [ ... ... ]
  } -->
  <element name="bridge">

  So struct virNetworkBridgeDef will not appear. All of its members, such as
  bridgeName/bridgeZone/stp/delay, are exposed to its parent struct virNetworkDef.

  The 'unpack' and 'pack' can be used to solve Problem3.

pack - boolean.
  It can only work on <define> in the relax-ng.
  Setting it true, children of <define> are converted into members; and then
  these members are packed into a struct for reuse.
  For example, inject directive as below:

  <!-- VIRT:DIRECTIVE {
    "name": "virPCIDeviceAddress",
    "pack": true,
    ... ...
  } -->
  <define name="pciaddress">
    ... ...

  The <define name="pciaddress"> will be packed as a struct virPCIDeviceAddress
  which name is specified by the property "name".

union - string.
  It can only work on <choice> in the relax-ng.
  By default, all children of <choice> will act as members and be exposed to
  the upper-level struct. When setting 'union' a non-null string which is called
  union-id, those members will be packed into a struct; then, a new member
  defined by this struct will be exposed to the upper-level struct.
  The new member's id and name are specified by the union-id.
  For example, inject directive as below:

  <!-- VIRT:DIRECTIVE {
    "union": "if",
    "name": "virNetworkForwardIfDef"
  } -->
  <choice>
    <group>
      <zeroOrMore>
        <!-- VIRT:DIRECTIVE {"unpack": true} -->
        <element name='interface'>
        ...
    </group>
    <group>
      <zeroOrMore>
      <!-- VIRT:DIRECTIVE {"unpack": true} -->
        <element name='address'>
        ...
    </group>
  </choice>

  A new struct virNetworkForwardIfDef will be created; then a new member named 'if'
  is exposed to the parent struct.

namespace - boolean. Only for struct.
  By default, there's no stuff about namespace in the struct. When setting
  it true, additional namespace members will be inserted into the struct.
  And the struct's clearfunc, parsefunc and formatfunc will include relative codes
  to handle these namespace stuff.
  For example, inject directive as below:

  <!-- VIRT:DIRECTIVE {
    "namespace": true,
    ... ...
  } -->
  <element name="network">

  Then in the struct virNetworkDef, there're two additional members:

  typedef virNetworkDef *virNetworkDefPtr;
  struct _virNetworkDef {
      ... ...
      void *namespaceData;
      virXMLNamespace ns;
  }

  In the virNetworkDefClear, there're the codes to invoke ns.free;
  in the virNetworkDefParseXML, there're the codes to invoke ns.parse;
  in the virNetworkDefFormatBuf, there're the codes to invoke ns.format.

values - array. Only for enum.
  All values of an enum except 'none' and 'last'.
  It can be overrided.
  For example, inject directive as below:

  <!-- VIRT:DIRECTIVE {
    "name": "virNetworkForwardType",
    "values": ["nat", "route", "open", "bridge",
               "private", "vepa", "passthrough", "hostdev"]
  } -->
  <choice>
    ... ...
    <value>passthrough</value>
    <value>private</value>
    <value>vepa</value>
    <value>hostdev</value>
  </choice>

  Generally the property 'values' doesn't need to be overrided,
  only if the order of values in the enum need to be adjusted.

members - array. Only for struct.
  Property 'members' is used to display all the members of the struct.
  But when altering a member, only fill in this member's 'id' and
  properties to be changed.
  For example, inject directive as below:

  <!-- VIRT:DIRECTIVE {
    "members": [
      {"id": "localPtr", "name": "localPTR"},
      {"id": "tftp", "name": "tftproot"}
    ]
  } -->
  <element name="ip">

  Specify the member "localPtr" and "tftp" by their id, then only
  these two members will be altered. In this case, their names are changed.
  For more member properties, refer to 6)

PRESERVE and APPLY - string.
  By default, directives take effect on the injection point.
  But sometimes, we need preserve a directive at first and then
  apply it somewhere else.
  APPLY corresponds to PRESERVE through their value called preserve-id.
  This preserve-id can be any form of string.

  These two properties can be used on two notations, <choice> and <ref>.
  Here's an example for using them on <choice>, inject directive as below:

  <define name="virtualPortProfile">
    <!-- VIRT:DIRECTIVE {
      "PRESERVE": "virtualport.element",
      "name": "virNetDevVPortProfile",
      ... ...
    } -->
    <choice>
      <group>
        <!-- VIRT:DIRECTIVE {"APPLY": "virtualport.element"} -->
        <element name="virtualport">
      </group>
      <group>
        <!-- VIRT:DIRECTIVE {"APPLY": "virtualport.element"} -->
        <element name="virtualport">
      </group>
      ... ...

  In this case, there're many <element name="virtualport"> notation
  under <choice>. We can define a preserve directive and apply it to all
  the <element name="virtualport">.

  Here's an example for using them on <ref>, inject PRESERVE as below:

  <!-- VIRT:DIRECTIVE {
    "PRESERVE": "port.element",
    "unpack": true,
    ... ...
    "members": [{"id": "isolated", "name": "isolatedPort"}]
  } -->
  <ref name="portOptions"/>
    ... ...

  And in <define>, inject APPLY as below:

  <define name="portOptions">
    <!-- VIRT:DIRECTIVE {"APPLY": "port.element"} -->
    <element name="port">
    ... ...

  It is useful in a case: a <define> is referenced by several <ref>s.
  We can define different directives on different <ref>s and expect different
  effects when they are applied in the <define>.

TOUCH - boolean.
  Request to handle the target <define> explicitly.
  By default, the process of RNG2C starts at <start>, and handles a <define>
  only when a <ref> references to it. If a <define> isn't touched, RNG2C cannot
  find it.
  Inject a directive with TOUCH on a <define> to touch it.

  For example, the <define> 'zpciaddress' hasn't been referenced in the process
  of parsing network.rng (also networkcommon.rng and basictypes.rng), but we hope
  that this notation will be handle.
  Just inject the directive as below:

  <!-- VIRT:DIRECTIVE {"TOUCH": true} -->
  <define name="zpciaddress">
    <element name="zpci">
    ... ...

  Then it can be discovered when executing subcommand 'list'.


2) structure properties

output - string. Only for struct and enum.
  Control whether to output the structure code and where to output those code.
  Refer to 7)

enum.default - string. Only for enum.
  By default, for enum, first item's name is 'none'.
  Override it with other name.
  For example, inject directive as below:

  <define name="macTableManager">
    <!-- VIRT:DIRECTIVE {
      "name": "virNetworkBridgeMACTableManagerType",
      "structure": {"enum.default": "default"}
    } -->
    <choice>
      <value>kernel</value>
      ... ...

  Then the declaration of enum virNetworkBridgeMACTableManagerType will be like:

  typedef enum {
      VIR_NETWORK_BRIDGE_MAC_TABLE_MANAGER_DEFAULT = 0,
      VIR_NETWORK_BRIDGE_MAC_TABLE_MANAGER_KERNEL,
      ... ...
  } virNetworkBridgeMACTableManagerType;

  The first item is suffixed with '_DEFAULT' rather than '_NONE'.

array.size - string. Only for char array or unsigned char array.
  It specifies length of array. The value can be a macro or a literal number.
  For example, inject directive as below:

  <define name="UUID">
    <choice>
    <!-- VIRT:DIRECTIVE {
      "meta": "UChars",
      "structure": {"array.size": "VIR_UUID_BUFLEN"}
    } -->
    <data type="string">
    ... ...

  Then it will generate an array type:
  unsigned char XXX[VIR_UUID_BUFLEN];

3) clearfunc properties

output - string. Only for struct.
  Control whether to output the clearfunc code and where to output those code.
  Refer to 7)

name - string.
  The clearfunc's name.
  By default, its name is based type's name and suffixed 'Clear'.
  It can be overrided with any other name.

args - array.
  Extra formal args for clearfunc. But now it hasn't been used.

4) parsefunc properties
  The parsefunc's declaration and implementation are controlled by properties.
  The basic form of parsefunc's declaration is as below:

  int
  ${parsefunc.name}(xmlNodePtr curnode,
                    ${type.name}Ptr def,
                    xmlXPathContextPtr ctxt,  /* omit if args.noctxt is true */
                    const char *instanceName, /* exist if args.instname is true */
                    ${parent.name}Ptr parent, /* exist if args.parent is true */
                    ${args});                 /* exist if args is NOT none */

output - string. Only for struct.
  Control whether to output the parsefunc code and where to output those code.
  Refer to 7)

name - string.
  The parsefunc's name.
  By default, its name is based type's name and suffixed 'ParseXML'.
  It can be overrided with any other name.

args - array.
  Extra formal args for parsefunc.
  Each arg can have four properties: name, type, pointer and ctype.
  The first three are according to the usage of namesake properties in member.
  And ctype is an alternate to type when type can't express the argument's type.
  For example, inject directive as below:

  <!-- VIRT:DIRECTIVE {
    "parsefunc": {
      "args": [{"name": "partialOkay", "type": "Bool"}]
    }
  } -->
  <element name="txt">
  ... ...

  Then the args of parsefunc will be appended by a arg, as below:

  int
  virNetworkDNSTxtDefParseXML(xmlNodePtr curnode,
                              virNetworkDNSTxtDefPtr def,
                              xmlXPathContextPtr ctxt,
                              bool partialOkay);

args.noctxt - boolean.
  By default, there's a arg 'ctxt' in the function arguments.
  When 'args.noctxt' is set to true, omit ctxt from the arguments.
  For example, inject directive as below:

  <!-- VIRT:DIRECTIVE {
    "parsefunc": {
      "args.noctxt": true,
    }
  } -->
  <element name="txt">
  ... ...

  Then the arg 'ctxt' will be removed from the default args.

  int
  virNetworkDNSTxtDefParseXML(xmlNodePtr curnode,
                              virNetworkDNSTxtDefPtr def);

  At the same time, the generated implementation of
  virNetworkDNSTxtDefParseXML will be different according to
  this property.

args.instname - boolean.
  By default, there's no instance-name in the function arguments.
  When it is set to true, add instname into the arguments.
  For example, inject directive as below:

  <!-- VIRT:DIRECTIVE {
    "parsefunc": {
      "args.instname": true,
    }
  } -->
  <element name="txt">
  ... ...

  Then the arg 'instname' will be added into the args.

  int
  virNetworkDNSTxtDefParseXML(xmlNodePtr curnode,
                              virNetworkDNSTxtDefPtr def,
                              xmlXPathContextPtr ctxt,
                              const char *instanceName);

args.parent - boolean.
  By default, there's no parent struct pointer name in the function arguments.
  When it is set to true, add parent struct pointer into the arguments.
  For example, inject directive as below:

  <!-- VIRT:DIRECTIVE {
    "parsefunc": {
      "args.parent": true
    }
  } -->
  <element name="dhcp">

  Then the arg 'parent' will be added into the args.

  int
  virNetworkIPDHCPDefParseXML(xmlNodePtr curnode,
                              virNetworkIPDHCPDefPtr def,
                              xmlXPathContextPtr ctxt,
                              virNetworkIPDefPtr parentdef);

post - boolean.
  By default, there's no post hook in the parse function.
  This property enables a post hook for parsefunc. The hook is invoked after
  all members have been parsed. The post hook function's name is based on
  parsefunc's name, and suffxied with 'Post'.
  To solve the foregoing Problem2, extract error-checking code from the current
  parsefunc and move them to the post hook function, then current parsefunc
  can be replaced with generated parsefunc.

  For example, inject directive as below:

  <!-- VIRT:DIRECTIVE {
    "parsefunc": {"post": true}
  } -->
  <element name="dns">
  ... ...

  The declaration of the post hook will be generated, as below:

  int
  virNetworkDNSDefParseXMLPost(xmlNodePtr curnode,
                               virNetworkDNSDefPtr def,
                               xmlXPathContextPtr ctxt,
                               const char *enableStr,
                               const char *forwardPlainNamesStr,
                               int nForwarderNodes,
                               int nTxtNodes,
                               int nSrvNodes,
                               int nHostNodes);

  Implement it. It will be invoked by generated parsefunc automatically.

post.notmpvars - boolean. It is valid only when post is set.
  By default, all the temporary variables in the parsefunc will be passed to
  the post hook function. When it is set to true, there's no those temporary
  variables as arguments in the post hook functions.

  For example, inject directive as below:

  <!-- VIRT:DIRECTIVE {
    "parsefunc": {"post": true, "post.notmpvars": true}
  } -->
  <element name="dns">
  ... ...

  The declaration of the post hook will be generated, as below:

  int
  virNetworkDNSDefParseXMLPost(xmlNodePtr curnode,
                               virNetworkDNSDefPtr def,
                               xmlXPathContextPtr ctxt);

  Now all tmpvars are removed from the args of parse hook function.

5) formatfunc properties

output - string. Only for struct.
  Control whether to output the formatfunc code and where to output those code.
  Refer to 7)

name - string.
  The formatfunc's name.
  By default, its name is based type's name and suffixed 'FormatBuf'.
  It can be overrided with any other name.

args - array.
  Extra formal args for formatfunc. Just like the args of parsefunc.

shorthand.ignore - boolean.
  By default, formatfunc adopts shorthand closing tag notation when there're no
  elements to output. If shorthand.ignore is set to true, this default
  optimization will be prevented.

  For example, virNetworkIPDefFormat does output <ip ...></ip> rather than
  <ip .../> when there're no child element 'tftp' and 'dhcp'.
  So this property is used to meet the need.

order - array.
  It is a array which item is member's id.
  In formatfunc, the relative order of formatting members is sensitive.
  If the order is not correct, setting this property to re-arrange the order.
  And when the array includes only part of ids, other members will follow and
  keep their original relative order.

  For example, virPortGroupDefFormat has a special order to format its members.

  <!-- VIRT:DIRECTIVE {
    "formatfunc": {
      "order": ["name", "default", "trustGuestRxFilters", "vlan"]
    }
  } -->

  Then format order will be altered according to this.

6) member properties
  Member properties can only appear in the members' array. Use these properties
  to override the struct members' default settings.

id - string, READ-ONLY.
  When id is set, this property specify the member to be changed.
  When id is ignored, it means this member is a new member and needed to append
  to the struct's members.

  For example, inject a directive as below:

  <!-- VIRT:DIRECTIVE {
    "members": [
      {"id": "dev", "name": "devname"},
      {"name": "connections", "type": "Int"}
    ]
  } -->
  <element name='pf'>
  ... ...

  The the member 'dev' will be altered; and then, a new member 'connections' will
  be added. For a new member, its type must be specified.

name - string.
  Member's name.
  By default, name is the same with id.
  It can be overrided with any name.

opt - boolean.
  Whether optional or not. In the parsefunc, non-existence for optional member
  is legal; otherwise, there's a error-checking to check its existence and
  report error.
  For example, inject a directive as below:

  <!-- VIRT:DIRECTIVE {
    "members": [
      {"id": "value", "opt": true}
    ]
  } -->
  <element name="txt">
    ... ...

  Then in the virNetworkDNSTxtDefParseXML, it's ok when the notation 'value'
  is missing.

more - boolean.
  When it is set, the member is a list.
  For example, inject a directive as below:

  <!-- VIRT:DIRECTIVE {
    "members": [
      {"id": "pf", "more": true}
    ]
  } -->
  <element name="forward">
    ... ...

  Then the member 'pf' will be declared as below:

  size_t npfs;
  virNetworkForwardPfDefPtr pfs;

tag - string.
  Display the source's tag.
  When a new member is to add, it can be specified.

type - string.
  Member's type. Its form is "meta[:short_id]", meta is the type's meta, short_id
  is the leftmost 8 digits of type's id. For builtins, ignore colon and short_id.
  This property can be overrided with other type.

pointer - boolean.
  By default, member is an instance defined by its type.
  When this property is set to true, member is a pointer of its type.
  For example, inject directive as below:

  <!-- VIRT:DIRECTIVE {
    "members": [
      {"id": "bandwidth", "pointer": true}
    ]
  } -->

  Then type of the member 'bandwidth' will be changed to pointer, as below:
  virBandwidthDefPtr bandwidth;

specified - boolean.
  When this property is set to true, member will be attached with a new member
  which specifies the existence of its master. The new additional member's form
  is "bool XXX_specified;".
  For example, inject directive as below:

  <!-- VIRT:DIRECTIVE {
    "members": [
      {"id": "managerid", "name": "managerID", "specified": true},
      ... ...
    ]
  } -->
  <element name="parameters">
  ... ...

  An addional member 'managerID_specified' will follow the 'managerID'.

  uint8_t       managerID;
  bool          managerID_specified;

declare.comment - string.
  Specify a comment behind member's declaration in struct.
  For example, inject a directive like below:

  <!-- VIRT:DIRECTIVE {
    "members": [
      {"id": "uid", "declare.comment": "exempt from syntax-check"}
    ]
  } -->
  <element name="zpci">
  ... ...

  Then the declaration of the member 'uid' is like below:
  int uid;  /* exempt from syntax-check */

parse.disable - boolean.
  When this property is set to true, member will not appear in parsefunc.

parse.args - array. Only for struct-type member.
  Specify the extra actual arguments when member is parsed by calling parsefunc.
  If member's type is struct and the struct's parsefunc has extra formal
  argument, passing actual arguments by this property.

  For example, type of the member 'host' is virNetworkDHCPHostDef, and the parsefunc
  related to the type has an extra arg partialOkey(specified by "args" in "parsefunc").
  Here this extra arg needs to be specified a value.
  Directive is like below:

  <!-- VIRT:DIRECTIVE {
    "members": [
      {"id": "host", "parse.args": [{"name": "partialOkay", "value": "false"}]}
    ]
  } -->
  <element name="dhcp">
  ... ...

  Then the parsing code for 'host' in the parsefunc is like below:

  if (virNetworkIPDHCPHostDefParseXML(node, &def->hosts[i], ctxt, false) < 0)
    goto error;

  A specific value is passed into the function as an actual arg.

parse.instname - string. Only for struct-type member.
  When member's type is struct and the struct's parsefunc needs a instance-name
  argument, passing instance name through this property.

  For example:

  <!-- VIRT:DIRECTIVE {
    "members": [
      {"id": "ip", "parse.instname": "def->name"}
    ]
  } -->
  <element name="network">
  ... ...

  Then parsing code for 'ip' in the parsefunc will be:

  if (virNetworkIPDefParseXML(node, &def->ips[i], ctxt, def->name) < 0)
    goto error;

  The 'def->name' is passed in as instance-name.

parse.default - string.
  Specify a value for member when it doesn't get value from XML.

  For example:
  <!-- VIRT:DIRECTIVE {
    "members": [
      {"id": "mode", "parse.default": "VIR_NETWORK_FORWARD_NAT"}
    ]
  } -->

  Then when the member 'mode' dosen't get value in parsefunc, it will be
  assigned the default value 'VIR_NETWORK_FORWARD_NAT'.

format.disable - boolean.
  When this property is set to true, member will not be skipped in formatfunc.

format.nocheck - boolean.
  Format and dump member to XML despite of the conditions.
  By default, a member needs to be check whether its value is empty before
  handling it in formatfunc. When this property is set to true, ignore these
  checks and always output.

  For example:
  <!-- VIRT:DIRECTIVE {
    "members": [
      {"id": "domain", "format.nocheck": true}
    ]
  } -->
  <define name="pciaddress">
  ... ...

  Then whether or not the member 'domain' is zero, it will be formatted and output.

format.precheck - string. Only valid when format.nocheck isn't set.
  In formatfunc, invoke a function as condition-check rather than default checks.
  The function should be hardcoded in advance and its name is specified by this
  property.

  For example, in virNetworkDefFormatBuf, the member 'connections' will be formatted
  and output only when some conditions evaluate to true. So set this property to
  enable a check function named 'checkXMLInactive', and put logic of checking
  conditions into the function.

  For example:
  <!-- VIRT:DIRECTIVE {
    "members": [
      {"id": "connections", "format.precheck": "checkXMLInactive"}
    ]
  } -->
  <define name="network">
  ... ...

  Then RNG2C generates the declaration of checkXMLInactive.

  bool
  checkXMLInactive(const unsigned int def G_GNUC_UNUSED,
                   const virNetworkDef *parent G_GNUC_UNUSED,
                   unsigned int flags);

  Implement this function in libvirt source.

format.fmt - string.
  For every builtin, there's a default format when they're formatting.
  This property can be overrided to change the default format.

  For example:
  <!-- VIRT:DIRECTIVE {
    "members": [
      {"id": "slot", "format.fmt": "0x%02x"}
    ]
  } -->
  <define name="pciaddress">
  ... ...

  Then the member 'slot' will be formatting based on format "0x%02x".

format.args - array. Only for struct-type member.
  This property is just like parse.args, except that it is for formatfunc.
  Now it hasn't been used.

hint - string. READ-ONLY.
  Display special cases for member.
  The 'unpack' indicates member's type is unpack; 'pack' indicates member's type
  is pack; 'union' indicates member's type is union; 'new' means this member
  is added.

7) output property
  This property is for structure, clearfunc, parsefunc and formatfunc.
  It has two effects:

  First, it controls whether or not to output the code to the files.
  Second, it controls what file to output.

  Value of the property indicates a base path, and it is suffixed with
  ".generated.[c|h]" to form two generated files.
  The base path should be according to the file in which the code is about to
  be replaced with generated code.
  So the relationship between original files and generated files is as below:

  original.c ---> original.h ---------
                     ^               |
                     |               v
            original.generated.c  original.generated.h

  The original.generated.c includes original.h automatically, and the
  original.h needs to be inserted with a line '#include "original.generated.h"'
  manually.

  This method can solve the Problem4.


 Steps of each patch
=====================

Replacing hardcoded code with generated stuff, we need a series of patches.
Each patch just replaces a target, i.e. a structure or a function.
The steps are as below:

 1) Execute subcommand 'list' to query the target's id and location.

 2) Execute subcommand 'show' to preview the target's directive and generated code.

 3) Modify the directive and re-inject it into the relax-ng file to bridge
   the gap between current code and generated code.

 4) Execute subcommand 'show' again to check the new directive and new generated code,
   until generated code can take the place of hardcoded stuff.

 5) Remove the hardcoded code.

 6) Compile & Test the libvirt source.


 About the series of patches
============================

Now I select network as a test target and make a try to generate
virNetworkDef, virNetworkDefClear, virNetworkDefParseXML and
virNetworkDefFormatBuf and their subordinate stuff automatically.

But this needs around 130 patches. That's too many!

So this series just include about 30 patches for evaluation.


Thanks!


Shi Lei (29):
  rng2c: Add tool RNG2C to translate relax-ng into c-language-code.
  maint: Call RNG2C automatically when relax-ng files change
  util: Add two helper functions virXMLChildNode and virXMLChildNodeSet
  util: Move VIR_ENUM_IMPL and VIR_ENUM_DECL from virenum.h to
    internal.h
  util: Replace virTristateBool|virTristateSwitch(hardcoded) with
    namesakes(generated)
  util: Add a helper function virStringParseOnOff
  util: Add some helper functions for virSocketAddr
  util: Add a helper function virStrToLong_u8p
  conf: Replace virNetworkDNSTxtDef(hardcoded) with namesake(generated)
  conf: Replace virNetworkDNSSrvDef(hardcoded) with namesake(generated)
  conf: Replace virNetworkDNSHostDef(hardcoded) with namesake(generated)
  conf: Replace virNetworkDNSForwarder(hardcoded) with
    namesake(generated)
  conf: Replace virNetworkDNSDef(hardcoded) with namesake(generated)
  conf: Replace virNetworkDNSDefClear(hardcoded) with
    namesake(generated)
  conf: Extract error-checking code from virNetworkDNSHostDefParseXML
  conf: Replace virNetworkDNSHostDefParseXML(hardcoded) with
    namesake(generated)
  conf: Extract error-checking code from virNetworkDNSSrvDefParseXML
  conf: Replace virNetworkDNSSrvDefParseXML(hardcoded) with
    namesake(generated)
  conf: Extract error-checking code from virNetworkDNSTxtDefParseXML
  conf: Replace virNetworkDNSTxtDefParseXML(hardcoded) with
    namesake(generated)
  conf: Add virNetworkDNSForwarderParseXML and
    virNetworkDNSForwarderParseXMLPost
  conf: Replace virNetworkDNSForwarderParseXML(hardcoded) with
    namesake(generated)
  conf: Extract error-checking code from virNetworkDNSDefParseXML
  conf: Replace virNetworkDNSDefParseXML(hardcoded) with
    namesake(generated)
  conf: Apply virNetworkDNSForwarderFormatBuf(generated) in
    virNetworkDNSDefFormat
  conf: Apply virNetworkDNSTxtDefFormatBuf(generated) in
    virNetworkDNSDefFormat
  conf: Apply virNetworkDNSSrvDefFormatBuf(generated) in
    virNetworkDNSDefFormat
  conf: Apply virNetworkDNSHostDefFormatBuf(generated) in
    virNetworkDNSDefFormat
  conf: Replace virNetworkDNSDefFormat(hardcoded) with
    virNetworkDNSDefFormatBuf(generated)

 docs/schemas/basictypes.rng     |   15 +
 docs/schemas/network.rng        |   76 ++
 docs/schemas/networkcommon.rng  |    2 +-
 po/POTFILES.in                  |    2 +
 rng2c/directive.py              | 1693 +++++++++++++++++++++++++++++++
 rng2c/generator.py              |  504 +++++++++
 rng2c/go                        |    8 +
 rng2c/schema.json               |  113 +++
 rng2c/utils.py                  |  163 +++
 src/Makefile.am                 |   16 +-
 src/access/Makefile.inc.am      |    2 +-
 src/conf/Makefile.inc.am        |   12 +-
 src/conf/network_conf.c         |  739 ++++----------
 src/conf/network_conf.h         |   51 +-
 src/esx/Makefile.inc.am         |    2 +-
 src/interface/Makefile.inc.am   |    2 +-
 src/internal.h                  |   19 +
 src/lxc/Makefile.inc.am         |    1 +
 src/network/Makefile.inc.am     |    2 +-
 src/network/bridge_driver.c     |    2 +-
 src/node_device/Makefile.inc.am |    2 +-
 src/nwfilter/Makefile.inc.am    |    2 +-
 src/qemu/Makefile.inc.am        |    1 +
 src/remote/Makefile.inc.am      |    2 +-
 src/secret/Makefile.inc.am      |    2 +-
 src/storage/Makefile.inc.am     |    2 +-
 src/test/Makefile.inc.am        |    2 +-
 src/util/Makefile.inc.am        |   13 +-
 src/util/virenum.c              |   15 -
 src/util/virenum.h              |   39 +-
 src/util/virsocketaddr.c        |   27 +
 src/util/virsocketaddr.h        |   10 +
 src/util/virstring.c            |   37 +
 src/util/virstring.h            |   10 +
 src/util/virxml.c               |   57 ++
 src/util/virxml.h               |    3 +
 src/vbox/Makefile.inc.am        |    2 +-
 tests/Makefile.am               |    2 +
 tools/Makefile.am               |    2 +
 39 files changed, 3000 insertions(+), 654 deletions(-)
 create mode 100644 rng2c/directive.py
 create mode 100755 rng2c/generator.py
 create mode 100755 rng2c/go
 create mode 100644 rng2c/schema.json
 create mode 100644 rng2c/utils.py

-- 
2.17.1

Re: [RFC 00/29] RFC: Generate object-model code based on relax-ng files

Posted by Daniel P. Berrangé 4 years, 1 month ago

On Wed, Mar 25, 2020 at 03:11:40PM +0800, Shi Lei wrote:
>  Outline
> =========
> 
> In libvirt world, objects(like domain, network, etc.) are described with two
> representations: structures in c-language and XML specified by relax-ng.
> Since c-language-implementation and xml restricted by relax-ng are merely
> the different expressions of the same object-model, we can make a tool to
> generate those C-language codes based on relax-ng files.
> 
> 
>  RNG2C tool
> ===========
> 
> RNG2C is a python tool which can translate hierarchy of notations
> in relax-ng into c-language structures. In general, <element> is converted to
> struct in C, struct's members derive from its <attribute>s and its sub <element>s;
> <choice> with multiple <value>s is converted to enum in C; <data> are converted
> into various builtin-type in C, such as int, char *.
> 
> Also, RNG2C can generate relative conversion functions: vir[XXX]ParseXML,
> vir[XXX]FormatBuf and vir[XXX]Clear.
> 
> These structures and functions can be used to replace hardcoded codes in current
> libvirt source.
> 
> The tool RNG2C has three subcommands: 'list', 'show' and 'generate'. The 'list' and
> 'show' are used for previewing generated code, and the 'generate' is prepared to be
> invoked by Makefile.

[snip]

Wow, this is a very impressive and ambitious bit of work, so
thanks for proposing this idea !

The idea of generating the XML parsers is pretty appealing and the
kind of thing I wish we had done right back at the very start of
libvirt development. I really came to like the idea of a metadata
driven XML parser when I created the libvirt-go-xml project structs.
As you already no doubt found, doing the auto-generation for integrating
into our existing code is even more tricky.

I expect the hardest parts are probably going to be bits of the schema
which are discriminated enums. These were what caused me most pain in
the Go XML bindings.

eg with disks where the "type=xxx" attribute on <disk> in turn controls
which child elements we accept in the parser. In the RNG schema we have
often been lazy and not represented the restrictions which are imposed
by the actual parser. Similarly in the actual structs, we often don;t
represent the discriminated enums conceptually.

IOW if we go down the route of generating everything from the RNG schema,
we're quite likely to need to do work to make the RNG schema more accurate.
We'll probably aso need to change a fair number of the struts we use. The
domain XML schema is the one most affected by this. All the other XML
schemas are fairly simple in general. None of this is a bad thing, but it
is potentially a lot of work. 

This is a big & complex series / topic, so it will take a little time
for me to properly review this idea and give some more detailed
feedback. So it will probably be next week before i reply again.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [RFC 00/29] RFC: Generate object-model code based on relax-ng files

Posted by Peter Krempa 4 years, 1 month ago

On Thu, Mar 26, 2020 at 18:19:20 +0000, Daniel Berrange wrote:
> On Wed, Mar 25, 2020 at 03:11:40PM +0800, Shi Lei wrote:

[...]

> IOW if we go down the route of generating everything from the RNG schema,
> we're quite likely to need to do work to make the RNG schema more accurate.
> We'll probably aso need to change a fair number of the struts we use. The
> domain XML schema is the one most affected by this. All the other XML
> schemas are fairly simple in general. None of this is a bad thing, but it
> is potentially a lot of work. 

What I'm afraid of is making the internal structs closer to the RNG
schema in many cases. I spent some time sanitizing the storage of data
in the internal structs which don't necessarily match the RNG schema.

Making them look more like the schema would be a pretty big regression
in understandibility of the code and I'd like to avoid that at all
costs. Specifically since we can't change the schema.

Most notable is the ultra-messy NUMA topology definition, where some
things are split accross multiple not-very appropriate elements and
fixing that will be unpleasant.

Specifically NUMA topology is a child of <cpu>, properties of numa nodes
are under <numatune> etc ... These must not be changed otherwise it will
be a mess.

One other thing to worry about is, that schema validation is not
mandatory. Coupled with guarantee of backwards compatibility, we must
preserve the quirks of the parser, including parsing of stuff which is
not in the schema and this will be extremely hard to preserve.

The other extreme is when the schema is too generic and there's specific
checking in the parser. We've got plenty of that too.

As you've pointed out in terms of the disk schema, there's plenty of the
above going on, including under (we were accepting seclabels even for
disks which don't use them, it came handy once) and overspecified
schema and also plenty of non-conformat storage. Specifically the <disk>
element contains the type, which actually belongs to the storage source.

The storage source also has 'backingStore' but in the XML it's not a
child but a sibling. Changing this to the format in the schema would
make the code, messy and more unmaintainable than it is now.

In this regards, the best thing that we could do, is to generate the
parser and then hand-write transformation from the XML schema into what
we use internally, but that is not really better than the current state.

Re: [RFC 00/29] RFC: Generate object-model code based on relax-ng files

Posted by Daniel P. Berrangé 4 years ago

On Wed, Mar 25, 2020 at 03:11:40PM +0800, Shi Lei wrote:
>  Outline
> =========
> 
> In libvirt world, objects(like domain, network, etc.) are described with two
> representations: structures in c-language and XML specified by relax-ng.
> Since c-language-implementation and xml restricted by relax-ng are merely
> the different expressions of the same object-model, we can make a tool to
> generate those C-language codes based on relax-ng files.

>  Problems
> =========
> 
> If we replace hardcoded code with generated code directly, there're still
> several problems.
> 
> Problem1 - name convension
>   Generated code follows specific name convension, now there're some differences
>   between generated code and current libvirt codes.
>   For example, generated struct's name will be like virXXXDef, parsefunc's name
>   will be like virXXXDefParseXML; the 'XXX' is a kind of Camel-Case name which
>   implies the struct's place in the hierarchy.
>   But in current code, there're many variations of name.
>   So if we replace current codes with generated codes directly, it requires
>   a lot of modifications in the caller codes.

Regardless of whether or how we auto-generate code, cleaning up current
naming conventions to be more consistent is a benefit. So as a general
rule, instead of having a code generator which knows how to generate
every wierd name, we should just fix the wierd names to follow a standard
convention.

> Problem2 - error-checking codes in the parsefunc
>   Most of current parse functions have error-checking codes after parsing XML
>   code. These code lines validate existence and parsing results.
>   Now RNG2C can't generate error-checking codes in the parsefunc. So we
>   should find a way to handle it.

In the virDomainDef  XML code, we've been slowly splitting out the code
for doing error validation into separate methods.

These two bugs illustrate the core ideas:

  https://gitlab.com/libvirt/libvirt/-/issues/7
  https://gitlab.com/libvirt/libvirt/-/issues/8

the same concept would apply to other XML parsers such as the network
parser, but it hasn't been a high priority for non-domain XML schemas.

> Problem3 - hierarchy consistence
>   The RNG2C generate a 'struct'(C code) for each 'element'(relax-ng). Most
>   of current code follows this convention, but there're still some exceptions.
>   For example, the element 'bridge'(in network.rng) should be translated into a
>   struct called 'virNetworkBridgeDef'; but in fact, no this struct in the current
>   codes, and its members (stp/delay/...) are exposed to the struct 'virNetworkDef'.

Yep, this is the really hard problem I think. The idea of generating
the structs from the RNG has a fundamental assumption that the way
we express & modularize the RNG matches  the way we want to express
and modularize the C structs.

To "fix" this, either we have to change the C structs in to something
that may be worse for the code, or we have to change the RNG schema
into something that may be worse for the schema validation.

There's a case in the domain XML which illustrates how painfull his
can be:

  <define name="os">
    <choice>
      <ref name="osxen"/>
      <ref name="oshvm"/>
      <ref name="osexe"/>
    </choice>
  </define>

here, the osxen/oshvm share many attributes, and thus at the C level
we want them all in the same struct, but at the RNG level we wanted
them separate to more strictly validate the XML documents.

...cut all the technical details about the rng2c DIRECTIVE lang....

What I'm really interested in above all is what the end result looks like
for both maintainers, and casual contributors. As a general goal we want
to simplify libvirt to make it easier to contribute to.

Eliminating the need to write custom XML parsing code certainly fits with
that goal at a conceptual level.

The code generator itself though needs some input so that it knows what to
generate, and we have to be mindful of what this input data looks like and
how easy it will be for maintainers/contributors to understand.

In last 6 months, we've had a general aim to reduce the number of languages
we expose contributors to. We've eliminated alot of Perl usage. We've started
to replace HTML with RST, and also want to replace our XML/XSL based website
generator with something simpler and non-XML based.

There are some places where we can't avoid another language, as XDR for the
RPC protocol.  We'll never eliminate XML from libvirt entirely, since it is
a fundamental part of our public API, but I am interested in how we can
minimize the visibility of XML to our contributors.

Being able to auto-generate XML parsers is thus interesting, but if the
source input for the auto-generator is another XML document, that is kind
of failing. This is where I'm finding the rng2c tool quite unappealing.

The libvirt code is primarily C based, with a little bit of magic in places
where we auto-generate code. For example the RPC code generator has magic
comments in the XDR protocol definition, that is used to generate client
and server dispatch code. I think of the XDR language as "psuedo-C" - it
is conceptually close enough to C that C programmers can understand the
XDR definitions fairly easily.

The RNG schemas are a very different beast. Since they're XML based they
are obviously completely unlike any of the C code, and I find that people
have quite a strong dislike of use of XML in general.  With this in mind
I'm not enthusiastic about the idea of auto-generating the XML parsers
from the RNG schemas, as it means everyone needs to know more about both
the RNG schema language, and also learn this custom DIRECTIVE language
used by the rng2c tool.

If we consider this series, we've deleted 530 lines from network_conf.c
and added 200 new lines for post-parse validation logic. This is is good
overall improvement.  No matter what approach or tool we use for XML
parser auto-generation we'll get the similar result.

Now consider the network_conf.h file, we have deleted

    typedef struct _virNetworkDNSTxtDef virNetworkDNSTxtDef;
    typedef virNetworkDNSTxtDef *virNetworkDNSTxtDefPtr;
    struct _virNetworkDNSTxtDef {
        char *name;
        char *value;
    };

    typedef struct _virNetworkDNSSrvDef virNetworkDNSSrvDef;
    typedef virNetworkDNSSrvDef *virNetworkDNSSrvDefPtr;
    struct _virNetworkDNSSrvDef {
        char *domain;
        char *service;
        char *protocol;
        char *target;
        unsigned int port;
        unsigned int priority;
        unsigned int weight;
    };

    typedef struct _virNetworkDNSHostDef virNetworkDNSHostDef;
    typedef virNetworkDNSHostDef *virNetworkDNSHostDefPtr;
    struct _virNetworkDNSHostDef {
        virSocketAddr ip;
        size_t nnames;
        char **names;
    };

    typedef struct _virNetworkDNSForwarder virNetworkDNSForwarder;
    typedef virNetworkDNSForwarder *virNetworkDNSForwarderPtr;
    struct _virNetworkDNSForwarder {
        virSocketAddr addr;
        char *domain;
    };

    typedef struct _virNetworkDNSDef virNetworkDNSDef;
    typedef virNetworkDNSDef *virNetworkDNSDefPtr;
    struct _virNetworkDNSDef {
        int enable;            /* enum virTristateBool */
        int forwardPlainNames; /* enum virTristateBool */
        size_t ntxts;
        virNetworkDNSTxtDefPtr txts;
        size_t nhosts;
        virNetworkDNSHostDefPtr hosts;
        size_t nsrvs;
        virNetworkDNSSrvDefPtr srvs;
        size_t nfwds;
        virNetworkDNSForwarderPtr forwarders;
    };

and instead of this, we now have have these extra rules in the
RNG schemas:

      <!-- VIRT:DIRECTIVE {
        "structure": {"output": "src/conf/network_conf"},
        "clearfunc": {"output": "src/conf/network_conf"},
        "parsefunc": {
          "output": "src/conf/network_conf",
          "post": true,
          "args.instname": true
        },
        "formatfunc": {"output": "src/conf/network_conf"}
      } -->

      <!-- VIRT:DIRECTIVE {
        "name": "virNetworkDNSForwarder",
        "structure": {"output": "src/conf/network_conf"},
        "clearfunc": {"output": "src/conf/network_conf"},
        "parsefunc": {
          "output": "src/conf/network_conf",
          "post": true,
          "args.noctxt": true,
          "args.instname": true
        },
        "formatfunc": {
          "output": "src/conf/network_conf",
          "order": ["domain", "addr"]
        }
      } -->

      <!-- VIRT:DIRECTIVE {
        "structure": {"output": "src/conf/network_conf"},
        "clearfunc": {"output": "src/conf/network_conf"},
        "parsefunc": {
          "output": "src/conf/network_conf",
          "post": true,
          "args.noctxt": true,
          "args.instname": true,
          "args": [
            {"name": "partialOkay", "type": "Bool"}
          ]
        },
        "formatfunc": {"output": "src/conf/network_conf"},
        "members": [
          {"id": "value", "opt": true}
        ]
      } -->

      <!-- VIRT:DIRECTIVE {
        "structure": {"output": "src/conf/network_conf"},
        "clearfunc": {"output": "src/conf/network_conf"},
        "parsefunc": {
          "output": "src/conf/network_conf",
          "post": true,
          "args.instname": true,
          "args": [
            {"name": "partialOkay", "type": "Bool"}
          ]
        },
        "formatfunc": {"output": "src/conf/network_conf"},
        "members": [
          {"id": "service", "opt": true},
          {"id": "protocol", "opt": true}
        ]
      } -->

      <!-- VIRT:DIRECTIVE {
        "structure": {"output": "src/conf/network_conf"},
        "clearfunc": {"output": "src/conf/network_conf"},
        "parsefunc": {
          "output": "src/conf/network_conf",
          "args.instname": true,
          "post": true,
          "args": [
            {"name": "partialOkay", "type": "Bool"}
          ]
        },
        "formatfunc": {"output": "src/conf/network_conf"},
        "members": [
          {"id": "ip", "opt": true},
          {"id": "hostname", "name": "name", "opt": true}
        ]
      } -->

If I come to libvirt as a contributor with C language skils, but little
experiance of RNG, I think this is a significant step backwards in the
ability to understand libvirt code.  It is way easier to understand
what's going on from the C structs, than from the RNG schema and the
VIRT:DIRECTIVE IMHO.  Even as a maintainer, and having read the cover
letter here,  I find the VIR:DIRECTIVE metadata to be way too verbose
and hard to understand compared to the structs.

So I think that although the RNG based code generator elimintes alot
of C code, it has the cost of forcing people to know more about the
RNG code. Overall I don't think that's a clear win.

This doesn't mean auto-generating code is a bad idea. I think it just
means that the RNG schema is not the right place to drive auto-generation
from.

I'm wondering if you've ever done any programming with Golang and used
its XML parsing capabilities ?

Golang has a very clever approach to XML/JSON/YAML parsing which is
based on directives recorded against the native Go structs. In the
libvirt-go-xml.git repository, we've used this to map all the libvirt
XML schemas into Go structs. IME, this has been the most pleasant
way I've come across for parsing XML.

If we consider just the DNS structs that you used to illustrate
rng2c in this patch series. To add support for Go XML parsing
and formatting, required the following comments against the
Golang structs:

    type NetworkDNSTXT struct {
            XMLName xml.Name `xml:"txt"`
            Name    string   `xml:"name,attr"`
            Value   string   `xml:"value,attr"`
    }
    type NetworkDNSSRV struct {
            XMLName  xml.Name `xml:"srv"`
            Service  string   `xml:"service,attr,omitempty"`
            Protocol string   `xml:"protocol,attr,omitempty"`
            Target   string   `xml:"target,attr,omitempty"`
            Port     uint     `xml:"port,attr,omitempty"`
            Priority uint     `xml:"priority,attr,omitempty"`
            Weight   uint     `xml:"weight,attr,omitempty"`
            Domain   string   `xml:"domain,attr,omitempty"`
    }
    type NetworkDNSHostHostname struct {
            Hostname string `xml:",chardata"`
    }

    type NetworkDNSHost struct {
            XMLName   xml.Name                 `xml:"host"`
            IP        string                   `xml:"ip,attr"`
            Hostnames []NetworkDNSHostHostname `xml:"hostname"`
    }
    type NetworkDNSForwarder struct {
            Domain string `xml:"domain,attr,omitempty"`
            Addr   string `xml:"addr,attr,omitempty"`
    }
    type NetworkDNS struct {
            Enable            string                `xml:"enable,attr,omitempty"`
            ForwardPlainNames string                `xml:"forwardPlainNames,attr,omitempty"`
            Forwarders        []NetworkDNSForwarder `xml:"forwarder"`
            TXTs              []NetworkDNSTXT       `xml:"txt"`
            Host              []NetworkDNSHost      `xml:"host"`
            SRVs              []NetworkDNSSRV       `xml:"srv"`
    }

The key important thing here is that the programmer is still fundamentally
working with their normal Golang struct types / fields. All that is required
to handle XML parsing & formatting is to add some magic comments which serve
as directives to the XML parser, telling it attribute/element names, whether
things are optional or not, etc.  The attribute/element names are only needed
if the struct field name is different from the XML name.

My gut feeling is that if we want to go ahead with auto-generating C code
for XML parsing/formatting, we should ignore the RNG schemas, and instead
try to do something similar to the Golang approach to XML.

The hard thing is that this would require us to write something which can
parse the C header files. Generally in our XML parser header files we don't
try to do anything too fancy - it is quite boring C code, so we would not
have to parse the full C language. We can cope with a fairly simplified
parser, that assumes the C header is following certain conventions.

So we would have to add some magic comment directives to each struct
we have

    typedef struct _virNetworkDNSTxtDef virNetworkDNSTxtDef;
    typedef virNetworkDNSTxtDef *virNetworkDNSTxtDefPtr;
    struct _virNetworkDNSTxtDef {
        char *name; /* xmlattr */
        char *value; /* xmlattr */
    };

    typedef struct _virNetworkDNSSrvDef virNetworkDNSSrvDef;
    typedef virNetworkDNSSrvDef *virNetworkDNSSrvDefPtr;
    struct _virNetworkDNSSrvDef {
        char *domain; /* xmlattr,omitempty */
        char *service; /* xmlattr,omitempty */
        char *protocol; /* xmlattr,omitempty */
        char *target; /* xmlattr,omitempty */
        unsigned int port; /* xmlattr,omitempty */
        unsigned int priority; /* xmlattr,omitempty */
        unsigned int weight; /* xmlattr,omitempty */
    };

    typedef struct _virNetworkDNSHostDef virNetworkDNSHostDef;
    typedef virNetworkDNSHostDef *virNetworkDNSHostDefPtr;
    struct _virNetworkDNSHostDef {
        virSocketAddr ip; /* xmlcallback */
        size_t nnames;
        char **names; /* xmlchardata:hostname,array */
    };

    typedef struct _virNetworkDNSForwarder virNetworkDNSForwarder;
    typedef virNetworkDNSForwarder *virNetworkDNSForwarderPtr;
    struct _virNetworkDNSForwarder {
        virSocketAddr addr; /* xmlcallback */
        char *domain; /* xmlattr */
    };

    typedef struct _virNetworkDNSDef virNetworkDNSDef;
    typedef virNetworkDNSDef *virNetworkDNSDefPtr;
    struct _virNetworkDNSDef {
        int enable;            /* xmlattr */
        int forwardPlainNames; /* xmlattr */
        size_t ntxts;
        virNetworkDNSTxtDefPtr txts; /* xmlelement:txt,array */
        size_t nhosts;
        virNetworkDNSHostDefPtr hosts; /* xmlelement:host,array */
        size_t nsrvs;
        virNetworkDNSSrvDefPtr srvs; /* xmlelement:srv,array */
        size_t nfwds;
        virNetworkDNSForwarderPtr forwarders; /* xmlelement:forwarder,array */
    };

Some explanation:

  - xmlattr

    Parse the field as an XML attribute with the same name
    as the struct field

  - xmlattr:thename

    Parse the field as an XML attribute called "thenanme"

  - xmlelement

    Parse the field as a child struct, populating from the
    XML element with same name.

  - xmlelement:thename

    Parse the field as a child struct, populating from the
    XML element called 'thename'

  - xmlcallback

    Call out to custom written XML handler methods to handle
    converting the data to/from the custom data type. eg for
    a field virSocketAddr, we'd call virSocketAddrXMLParse
    and virSocketAddrXMLFormat.

  - ,omitempty

    Don't format the attribute/element/chardata is the struct field
    is a NULL pointer, or an integer with value 0.

BTW, when looking at libvirt-go-xml.git, you'll see there are still a bunch
of places where we have to hand-write parsing/formatting code. There are
essentially two reasons for this

 - Golang has no support for unions.  As a result you have to fake
   unions by having a struct, with a bunch of optional fields each
   corresponding to a new struct. Rely on convention that only one
   field can be non-NULL. This requires custom parse functions since
   the Golang XML parser has no support for this code pattern.

 - The XML parser has no built-in directive to control whether it
   parses/formats in decimal vs hex.

Both of these are things we won't suffer from if we did this in C, so
potentially the XML parsing annotations needed for our C struct would
result in something even simpler than the libvirt-go-xml.git code.

The key question is just how difficult will it be to write a tool that
can parse the C header files, and magic comments, to output suitable
XML parser/formatter functions ? There's no easy way to answer that
without someone trying it.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|