[PATCH 5/7] conf: parse/format <teaming> element in plain <hostdev>

Laine Stump posted 7 patches 4 years, 12 months ago
[PATCH 5/7] conf: parse/format <teaming> element in plain <hostdev>
Posted by Laine Stump 4 years, 12 months ago
The <teaming> element in <interface> allows pairing two interfaces
together as a simple "failover bond" network device in a guest. One of
the devices will be the "transient" interface - it will be preferred
for all network traffic when it is present, but may be removed when
necessary, in particular during migration, when traffic will instead
go through the other interface of the pair - the "persistent"
interface. As it happens, in the QEMU implementation of this teaming
pair (called "virtio failover" in QEMU) the transient interface is
always a host network device assigned to the guest using VFIO (aka
"hostdev"); the persistent interface is always an emulated virtio NIC.

When support was initially added for <teaming>, it was written to
require that the transient/hostdev device be defined using <interface
type='hostdev'>; this was done because the virtio failover
implementation in QEMU and the virtio guest driver demands that the
two interfaces in the pair have matching MAC addresses, and the only
way libvirt can guarantee the MAC address of a hostdev network device
is to use <interface type='hostdev'>, whose main purpose is to
configure the device's MAC address. (note that <interface
type='hostdev'> in turn requires that the network device be an SRIOV
VF (Virtual Function), as that is the only type of network device
whose MAC address we can set in a way that will survive the device's
driver init in the guest).

It has recently come up that some users are unable to use <teaming>
because they are running in a container environment where libvirt
doesn't have the necessary privileges or resources to set the VF's MAC
address (because setting the VF MAC is done via the same device's PF
(Physical Function), and the PF is not exposed to libvirt's container.

At the same time, they *are* able to set the VF's MAC address in
advance of staring up libvirt in the container. So they could
theoretically use the <teaming> feature if libvirt just skipped the
"setting the MAC address" part.

Fortunately, that is *exactly* the difference between <interface
type='hostdev'> (a "hostdev VF") and <hostdev> (a "plain hostdev" - it
could be *any PCI device; libvirt doesn't know what type of PCI device
it is, and doesn't care).

But what *is* still needed is for libvirt to provide a small bit of
information on the commandline argument for the hostdev, telling QEMU
that this device will be part of a team ("failover pair"), and the id
of the other device in the pair.

So, what we need to do is add support for the <teaming> element to
plain <hostdev>, and that is what this patch does.

(actually, this patch adds parsing/formatting of the <teaming> element
in <hostdev>. The next patch will actually wire that into the qemu
driver.)

Signed-off-by: Laine Stump <laine@redhat.com>
---
 docs/formatdomain.rst                         | 51 +++++++++++++++
 docs/schemas/domaincommon.rng                 |  3 +
 src/conf/domain_conf.c                        |  5 ++
 src/conf/domain_conf.h                        |  1 +
 src/conf/domain_validate.c                    | 19 ++++++
 .../net-virtio-teaming-hostdev.xml            | 48 ++++++++++++++
 .../net-virtio-teaming-hostdev.xml            | 64 +++++++++++++++++++
 tests/qemuxml2xmltest.c                       |  3 +
 8 files changed, 194 insertions(+)
 create mode 100644 tests/qemuxml2argvdata/net-virtio-teaming-hostdev.xml
 create mode 100644 tests/qemuxml2xmloutdata/net-virtio-teaming-hostdev.xml

diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
index 2493be595f..eafd6b3396 100644
--- a/docs/formatdomain.rst
+++ b/docs/formatdomain.rst
@@ -4837,6 +4837,22 @@ support in the hypervisor and the guest network driver).
    </devices>
    ...
 
+The second interface in this example is referencing a network that is
+a pool of SRIOV VFs (i.e. a "hostdev network"). You could instead
+directly reference an SRIOV VF device:
+
+::
+
+   ...
+     <interface type='hostdev'>
+       <source>
+         <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
+       </source>
+       <mac address='00:11:22:33:44:55:66'/>
+       <teaming type='transient' persistent='ua-backup0'/>
+     </interface>
+   ...
+
 The ``<teaming>`` element required attribute ``type`` will be set to either
 ``"persistent"`` to indicate a device that should always be present in the
 domain, or ``"transient"`` to indicate a device that may periodically be
@@ -4858,6 +4874,41 @@ once migration is completed; while migration is taking place, network traffic
 will use the virtio NIC. (Of course the emulated virtio NIC and the hostdev NIC
 must be connected to the same subnet for bonding to work properly).
 
+:since:`Since 7.1.0` The ``<teaming>`` element can also be added to a
+plain ``<hostdev>`` device.
+
+::
+
+   ...
+     <hostdev mode='subsystem' type='pci' managed='no'>
+       <source>
+         <address domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
+       </source>
+       <mac address='00:11:22:33:44:55:66'/>
+       <teaming type='transient' persistent='ua-backup0'/>
+     </interface>
+   ...
+
+This device must be a network device, but not necessarily an SRIOV
+VF. Using plain ``<hostdev>`` rather than ``<interface
+type='hostdev'>`` or ``<interface type='network'>`` is useful if the
+device that will be assigned with VFIO is a standard NIC (not a VF) or
+if libvirt doesn't have the necessary resources and privileges to set
+the VF's MAC address (e.g. if libvirt is running unprivileged, or in a
+container). This of course means that the user (or another
+application) is responsible for setting the MAC address of the device
+in a way such that it will survive guest driver initialization. For
+standard NICs (i.e. not an SRIOV VF) this probably means that the
+NIC's factory-programmed MAC address will need to be used for the
+teaming pair (since any driver init in the guest will reset the MAC
+back to factory). If it is an SRIOV VF, then its MAC address will need
+to be set via the VF's PF, e.g. if you are going to use VF 2 of the PF
+enp2s0f1, you would use something like this command:
+
+::
+
+  ip link set enp2s0f1 vf 2 mac 52:54:00:11:22:33
+
 NB1: Since you must know the alias name of the virtio NIC when configuring the
 hostdev NIC, it will need to be manually set in the virtio NIC's configuration
 (as with all other manually set alias names, this means it must start with
diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
index 31960fb7cf..e6de934456 100644
--- a/docs/schemas/domaincommon.rng
+++ b/docs/schemas/domaincommon.rng
@@ -5156,6 +5156,9 @@
           <empty/>
         </element>
       </optional>
+      <optional>
+        <ref name="teaming"/>
+      </optional>
       <element name="source">
         <optional>
           <ref name="startupPolicy"/>
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index 3fe8517f39..8701136aa9 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -15015,6 +15015,9 @@ virDomainHostdevDefParseXML(virDomainXMLOptionPtr xmlopt,
         }
     }
 
+    if (virDomainNetTeamingInfoParseXML(ctxt, &def->teaming) < 0)
+        goto error;
+
     return def;
 
  error:
@@ -27433,6 +27436,8 @@ virDomainHostdevDefFormat(virBufferPtr buf,
         break;
     }
 
+    virDomainNetTeamingInfoFormat(def->teaming, buf);
+
     if (def->readonly)
         virBufferAddLit(buf, "<readonly/>\n");
     if (def->shareable)
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
index fb695a212b..e263e6098b 100644
--- a/src/conf/domain_conf.h
+++ b/src/conf/domain_conf.h
@@ -354,6 +354,7 @@ struct _virDomainHostdevDef {
         virDomainHostdevCaps caps;
     } source;
     virDomainHostdevOrigStates origstates;
+    virDomainNetTeamingInfoPtr teaming;
     virDomainDeviceInfoPtr info; /* Guest address */
 };
 
diff --git a/src/conf/domain_validate.c b/src/conf/domain_validate.c
index 703946b3e5..b47ecba86b 100644
--- a/src/conf/domain_validate.c
+++ b/src/conf/domain_validate.c
@@ -1585,6 +1585,25 @@ virDomainHostdevDefValidate(const virDomainHostdevDef *hostdev)
             break;
         }
     }
+
+    if (hostdev->teaming) {
+        if (hostdev->teaming->type != VIR_DOMAIN_NET_TEAMING_TYPE_TRANSIENT) {
+            virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
+                           _("teaming hostdev devices must have type='transient'"));
+            return -1;
+        }
+        if (!hostdev->teaming->persistent) {
+            virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
+                           _("missing required persistent attribute in hostdev teaming element"));
+            return -1;
+        }
+        if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS ||
+            hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) {
+            virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
+                           _("teaming is only supported for pci hostdev devices"));
+            return -1;
+        }
+    }
     return 0;
 }
 
diff --git a/tests/qemuxml2argvdata/net-virtio-teaming-hostdev.xml b/tests/qemuxml2argvdata/net-virtio-teaming-hostdev.xml
new file mode 100644
index 0000000000..b176e66a00
--- /dev/null
+++ b/tests/qemuxml2argvdata/net-virtio-teaming-hostdev.xml
@@ -0,0 +1,48 @@
+<domain type='qemu'>
+  <name>QEMUGuest1</name>
+  <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid>
+  <memory unit='KiB'>219100</memory>
+  <currentMemory unit='KiB'>219100</currentMemory>
+  <vcpu placement='static'>1</vcpu>
+  <os>
+    <type arch='i686' machine='pc'>hvm</type>
+    <boot dev='hd'/>
+  </os>
+  <clock offset='utc'/>
+  <on_poweroff>destroy</on_poweroff>
+  <on_reboot>restart</on_reboot>
+  <on_crash>destroy</on_crash>
+  <devices>
+    <emulator>/usr/bin/qemu-system-i386</emulator>
+    <disk type='block' device='disk'>
+      <source dev='/dev/HostVG/QEMUGuest1'/>
+      <target dev='hda' bus='ide'/>
+    </disk>
+    <controller type='usb' index='0'/>
+    <interface type='user'>
+      <mac address='00:11:22:33:44:55'/>
+      <model type='virtio'/>
+      <teaming type='persistent'/>
+      <alias name='ua-backup0'/>
+    </interface>
+    <interface type='user'>
+      <mac address='66:44:33:22:11:00'/>
+      <model type='virtio'/>
+      <teaming type='persistent'/>
+      <alias name='ua-backup1'/>
+    </interface>
+    <hostdev mode='subsystem' type='pci' managed='yes'>
+      <source>
+        <address domain='0x0000' bus='0x03' slot='0x07' function='0x1'/>
+      </source>
+      <teaming type='transient' persistent='ua-backup0'/>
+    </hostdev>
+    <hostdev mode='subsystem' type='pci' managed='yes'>
+      <source>
+        <address domain='0x0000' bus='0x03' slot='0x07' function='0x2'/>
+      </source>
+      <teaming type='transient' persistent='ua-backup1'/>
+    </hostdev>
+    <memballoon model='virtio'/>
+  </devices>
+</domain>
diff --git a/tests/qemuxml2xmloutdata/net-virtio-teaming-hostdev.xml b/tests/qemuxml2xmloutdata/net-virtio-teaming-hostdev.xml
new file mode 100644
index 0000000000..a7edab63bd
--- /dev/null
+++ b/tests/qemuxml2xmloutdata/net-virtio-teaming-hostdev.xml
@@ -0,0 +1,64 @@
+<domain type='qemu'>
+  <name>QEMUGuest1</name>
+  <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid>
+  <memory unit='KiB'>219100</memory>
+  <currentMemory unit='KiB'>219100</currentMemory>
+  <vcpu placement='static'>1</vcpu>
+  <os>
+    <type arch='i686' machine='pc'>hvm</type>
+    <boot dev='hd'/>
+  </os>
+  <clock offset='utc'/>
+  <on_poweroff>destroy</on_poweroff>
+  <on_reboot>restart</on_reboot>
+  <on_crash>destroy</on_crash>
+  <devices>
+    <emulator>/usr/bin/qemu-system-i386</emulator>
+    <disk type='block' device='disk'>
+      <driver name='qemu' type='raw'/>
+      <source dev='/dev/HostVG/QEMUGuest1'/>
+      <target dev='hda' bus='ide'/>
+      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
+    </disk>
+    <controller type='usb' index='0'>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
+    </controller>
+    <controller type='pci' index='0' model='pci-root'/>
+    <controller type='ide' index='0'>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
+    </controller>
+    <interface type='user'>
+      <mac address='00:11:22:33:44:55'/>
+      <model type='virtio'/>
+      <teaming type='persistent'/>
+      <alias name='ua-backup0'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
+    </interface>
+    <interface type='user'>
+      <mac address='66:44:33:22:11:00'/>
+      <model type='virtio'/>
+      <teaming type='persistent'/>
+      <alias name='ua-backup1'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
+    </interface>
+    <input type='mouse' bus='ps2'/>
+    <input type='keyboard' bus='ps2'/>
+    <hostdev mode='subsystem' type='pci' managed='yes'>
+      <source>
+        <address domain='0x0000' bus='0x03' slot='0x07' function='0x1'/>
+      </source>
+      <teaming type='transient' persistent='ua-backup0'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
+    </hostdev>
+    <hostdev mode='subsystem' type='pci' managed='yes'>
+      <source>
+        <address domain='0x0000' bus='0x03' slot='0x07' function='0x2'/>
+      </source>
+      <teaming type='transient' persistent='ua-backup1'/>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
+    </hostdev>
+    <memballoon model='virtio'>
+      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
+    </memballoon>
+  </devices>
+</domain>
diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c
index a00ebd7d76..5cd945f28f 100644
--- a/tests/qemuxml2xmltest.c
+++ b/tests/qemuxml2xmltest.c
@@ -438,6 +438,9 @@ mymain(void)
     DO_TEST("net-virtio-teaming-network",
             QEMU_CAPS_VIRTIO_NET_FAILOVER,
             QEMU_CAPS_DEVICE_VFIO_PCI);
+    DO_TEST("net-virtio-teaming-hostdev",
+            QEMU_CAPS_VIRTIO_NET_FAILOVER,
+            QEMU_CAPS_DEVICE_VFIO_PCI);
     DO_TEST_CAPS_LATEST("net-isolated-port");
     DO_TEST("net-hostdev", NONE);
     DO_TEST("net-hostdev-bootorder", NONE);
-- 
2.29.2

Re: [PATCH 5/7] conf: parse/format <teaming> element in plain <hostdev>
Posted by Michal Privoznik 4 years, 12 months ago
On 2/11/21 8:57 AM, Laine Stump wrote:
> The <teaming> element in <interface> allows pairing two interfaces
> together as a simple "failover bond" network device in a guest. One of
> the devices will be the "transient" interface - it will be preferred
> for all network traffic when it is present, but may be removed when
> necessary, in particular during migration, when traffic will instead
> go through the other interface of the pair - the "persistent"
> interface. As it happens, in the QEMU implementation of this teaming
> pair (called "virtio failover" in QEMU) the transient interface is
> always a host network device assigned to the guest using VFIO (aka
> "hostdev"); the persistent interface is always an emulated virtio NIC.
> 
> When support was initially added for <teaming>, it was written to
> require that the transient/hostdev device be defined using <interface
> type='hostdev'>; this was done because the virtio failover
> implementation in QEMU and the virtio guest driver demands that the
> two interfaces in the pair have matching MAC addresses, and the only
> way libvirt can guarantee the MAC address of a hostdev network device
> is to use <interface type='hostdev'>, whose main purpose is to
> configure the device's MAC address. (note that <interface
> type='hostdev'> in turn requires that the network device be an SRIOV
> VF (Virtual Function), as that is the only type of network device
> whose MAC address we can set in a way that will survive the device's
> driver init in the guest).
> 
> It has recently come up that some users are unable to use <teaming>
> because they are running in a container environment where libvirt
> doesn't have the necessary privileges or resources to set the VF's MAC
> address (because setting the VF MAC is done via the same device's PF
> (Physical Function), and the PF is not exposed to libvirt's container.
> 
> At the same time, they *are* able to set the VF's MAC address in
> advance of staring up libvirt in the container. So they could
> theoretically use the <teaming> feature if libvirt just skipped the
> "setting the MAC address" part.
> 
> Fortunately, that is *exactly* the difference between <interface
> type='hostdev'> (a "hostdev VF") and <hostdev> (a "plain hostdev" - it
> could be *any PCI device; libvirt doesn't know what type of PCI device
> it is, and doesn't care).
> 
> But what *is* still needed is for libvirt to provide a small bit of
> information on the commandline argument for the hostdev, telling QEMU
> that this device will be part of a team ("failover pair"), and the id
> of the other device in the pair.
> 
> So, what we need to do is add support for the <teaming> element to
> plain <hostdev>, and that is what this patch does.
> 
> (actually, this patch adds parsing/formatting of the <teaming> element
> in <hostdev>. The next patch will actually wire that into the qemu
> driver.)
> 
> Signed-off-by: Laine Stump <laine@redhat.com>
> ---
>   docs/formatdomain.rst                         | 51 +++++++++++++++
>   docs/schemas/domaincommon.rng                 |  3 +
>   src/conf/domain_conf.c                        |  5 ++
>   src/conf/domain_conf.h                        |  1 +
>   src/conf/domain_validate.c                    | 19 ++++++
>   .../net-virtio-teaming-hostdev.xml            | 48 ++++++++++++++
>   .../net-virtio-teaming-hostdev.xml            | 64 +++++++++++++++++++
>   tests/qemuxml2xmltest.c                       |  3 +
>   8 files changed, 194 insertions(+)
>   create mode 100644 tests/qemuxml2argvdata/net-virtio-teaming-hostdev.xml
>   create mode 100644 tests/qemuxml2xmloutdata/net-virtio-teaming-hostdev.xml

There're only very few of differences between these two files (mostly 
PCI addresses in the out xml) and neither of them is related to this 
feature. Perhaps make one symlink of the other and add those diffs into 
the input xml?

> 
> diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
> index 2493be595f..eafd6b3396 100644
> --- a/docs/formatdomain.rst
> +++ b/docs/formatdomain.rst
> @@ -4837,6 +4837,22 @@ support in the hypervisor and the guest network driver).
>      </devices>
>      ...
>   
> +The second interface in this example is referencing a network that is
> +a pool of SRIOV VFs (i.e. a "hostdev network"). You could instead
> +directly reference an SRIOV VF device:
> +
> +::
> +
> +   ...
> +     <interface type='hostdev'>
> +       <source>
> +         <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
> +       </source>
> +       <mac address='00:11:22:33:44:55:66'/>
> +       <teaming type='transient' persistent='ua-backup0'/>
> +     </interface>
> +   ...
> +
>   The ``<teaming>`` element required attribute ``type`` will be set to either
>   ``"persistent"`` to indicate a device that should always be present in the
>   domain, or ``"transient"`` to indicate a device that may periodically be
> @@ -4858,6 +4874,41 @@ once migration is completed; while migration is taking place, network traffic
>   will use the virtio NIC. (Of course the emulated virtio NIC and the hostdev NIC
>   must be connected to the same subnet for bonding to work properly).
>   
> +:since:`Since 7.1.0` The ``<teaming>`` element can also be added to a
> +plain ``<hostdev>`` device.
> +
> +::
> +
> +   ...
> +     <hostdev mode='subsystem' type='pci' managed='no'>
> +       <source>
> +         <address domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
> +       </source>
> +       <mac address='00:11:22:33:44:55:66'/>
> +       <teaming type='transient' persistent='ua-backup0'/>
> +     </interface>
> +   ...
> +
> +This device must be a network device, but not necessarily an SRIOV
> +VF. Using plain ``<hostdev>`` rather than ``<interface
> +type='hostdev'>`` or ``<interface type='network'>`` is useful if the
> +device that will be assigned with VFIO is a standard NIC (not a VF) or
> +if libvirt doesn't have the necessary resources and privileges to set
> +the VF's MAC address (e.g. if libvirt is running unprivileged, or in a
> +container). This of course means that the user (or another
> +application) is responsible for setting the MAC address of the device
> +in a way such that it will survive guest driver initialization. For
> +standard NICs (i.e. not an SRIOV VF) this probably means that the
> +NIC's factory-programmed MAC address will need to be used for the
> +teaming pair (since any driver init in the guest will reset the MAC
> +back to factory). If it is an SRIOV VF, then its MAC address will need
> +to be set via the VF's PF, e.g. if you are going to use VF 2 of the PF
> +enp2s0f1, you would use something like this command:
> +
> +::
> +
> +  ip link set enp2s0f1 vf 2 mac 52:54:00:11:22:33
> +
>   NB1: Since you must know the alias name of the virtio NIC when configuring the
>   hostdev NIC, it will need to be manually set in the virtio NIC's configuration
>   (as with all other manually set alias names, this means it must start with
> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
> index 31960fb7cf..e6de934456 100644
> --- a/docs/schemas/domaincommon.rng
> +++ b/docs/schemas/domaincommon.rng
> @@ -5156,6 +5156,9 @@
>             <empty/>
>           </element>
>         </optional>
> +      <optional>
> +        <ref name="teaming"/>
> +      </optional>
>         <element name="source">
>           <optional>
>             <ref name="startupPolicy"/>
> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
> index 3fe8517f39..8701136aa9 100644
> --- a/src/conf/domain_conf.c
> +++ b/src/conf/domain_conf.c
> @@ -15015,6 +15015,9 @@ virDomainHostdevDefParseXML(virDomainXMLOptionPtr xmlopt,
>           }
>       }
>   
> +    if (virDomainNetTeamingInfoParseXML(ctxt, &def->teaming) < 0)
> +        goto error;
> +
>       return def;
>   
>    error:

This should be paired with freeing the teaming info:

diff --git i/src/conf/domain_conf.c w/src/conf/domain_conf.c
index 3208e89c65..1d2a507fa1 100644
--- i/src/conf/domain_conf.c
+++ w/src/conf/domain_conf.c
@@ -3024,6 +3024,8 @@ void 
virDomainHostdevDefClear(virDomainHostdevDefPtr def)
      if (!def->parentnet)
          virDomainDeviceInfoFree(def->info);

+    virDomainNetTeamingInfoFree(def->teaming);
+
      switch (def->mode) {
      case VIR_DOMAIN_HOSTDEV_MODE_CAPABILITIES:
          switch ((virDomainHostdevCapsType) def->source.caps.type) {



Michal