[libvirt PATCH] conf: properly clear out autogenerated macvtap names when formatting/parsing

Laine Stump posted 1 patch 3 years, 8 months ago
Test syntax-check failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20200823045218.588366-1-laine@redhat.com
src/conf/domain_conf.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
[libvirt PATCH] conf: properly clear out autogenerated macvtap names when formatting/parsing
Posted by Laine Stump 3 years, 8 months ago
Back when macvtap support was added in commit 315baab9443 in Feb. 2010
(libvirt-0.7.7), it was setup to autogenerate a name for the device if
one wasn't supplied, in the pattern "macvtap%d" (or "macvlan%d"),
similar to the way an unspecified standard tap device name will lead
to an autogenerated "vnet%d".

As a matter of fact, in commit ca1b7cc8e45 added in May 2010, the code
was changed to *always* ignore a supplied device name for macvtap
interfaces by deleting any name immediately during the <interface>
parsing (this was intended to prevent one domain which had failed to
completely start from deleting the macvtap device of another domain
which had subsequently been provided the same device name (this will
seem mildly ironic later). This was later fixed to only clear the
device name when inactive XML was being parsed. HOWEVER - this was
only done if the xml was <interface type='direct'> - autogenerated
names were not cleared for <interface type='network'> (which could
also result in a macvtap device).

Although the names of "vnetX" tap devices had always been
automatically cleared when parsing <interface> (see commit d1304583d
from July 2008 (!)), at the time macvtap support was added, both vnetX
and macvtapX device names were always included when formatting the
XML.

Then in commit a8be259d0cc (July 2011, libvirt-0.9.4), <interface>
formatting was changed to also clear out "vnetX" device names during
XML formatting as well. However the same treatment wasn't given to
"macvtapX".

The result of all this (among other things) was that when a running guest

Now in 2020, there has been a report that a failed migration leads to
the macvtap device of some other unrelated guest on the destination
host losing its network connectivity. It was determined that this was
due to the domain XML in the migration containing a macvtap device
name, e.g. "macvtap0", that was already in use on the
destination. Normally this wouldn't be a problem, because libvirt
would see that the device was already in use, and then find a
different unused name. But in this case, other external problems were
causing the migration to fail prior to selecting a macvtap device and
successfully opening it, and during error recovery, qemuProcessStop()
was called, which went through all def->nets objects and (if they were
macvtap) deleted the device specified in net->ifname; since libvirt
hadn't gotten to the point of replacing the incoming "macvtap0" with
the name of a device it actually created for this guest, that meant
that "macvtap0" was deleted, *even though it was currently in use by a
different guest*!

Whew!

So, it turns out that when formatting "migratable" XML, "vnetX"
devices are omitted, just as when formatting "inactive" XML (I'm sure
there's a bit of code somewhere that says "if (migratable) then set
inactive too", but found it was easier to just try it out with "virsh
dumpxml blah --migratable"). By making the code in both interface
parsing and formatting consistent for "vnetX", "macvtapX", and
"macvlanX", we can thus make sure that the autogenerated (and unneeded
/ completely *not* wanted) macvtap device name will not be sent with
the migration XML. This way when a migration fails, net->ifname will
be NULL, and libvirt won't have any device to try and delete.

Signed-off-by: Laine Stump <laine@redhat.com>
---
 src/conf/domain_conf.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index 8e7981bf25..6064d31b99 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -12428,14 +12428,6 @@ virDomainNetDefParseXML(virDomainXMLOptionPtr xmlopt,
         }
 
         def->data.direct.linkdev = g_steal_pointer(&dev);
-
-        if (ifname &&
-            flags & VIR_DOMAIN_DEF_PARSE_INACTIVE &&
-            (STRPREFIX(ifname, VIR_NET_GENERATED_MACVTAP_PREFIX) ||
-             STRPREFIX(ifname, VIR_NET_GENERATED_MACVLAN_PREFIX))) {
-            VIR_FREE(ifname);
-        }
-
         break;
 
     case VIR_DOMAIN_NET_TYPE_HOSTDEV:
@@ -12481,6 +12473,8 @@ virDomainNetDefParseXML(virDomainXMLOptionPtr xmlopt,
     if (def->managed_tap != VIR_TRISTATE_BOOL_NO && ifname &&
         (flags & VIR_DOMAIN_DEF_PARSE_INACTIVE) &&
         (STRPREFIX(ifname, VIR_NET_GENERATED_TAP_PREFIX) ||
+         STRPREFIX(ifname, VIR_NET_GENERATED_MACVTAP_PREFIX) ||
+         STRPREFIX(ifname, VIR_NET_GENERATED_MACVLAN_PREFIX) ||
          (prefix && STRPREFIX(ifname, prefix)))) {
         /* An auto-generated target name, blank it out */
         VIR_FREE(ifname);
@@ -26796,6 +26790,8 @@ virDomainNetDefFormat(virBufferPtr buf,
         (def->managed_tap == VIR_TRISTATE_BOOL_NO ||
          !((flags & VIR_DOMAIN_DEF_FORMAT_INACTIVE) &&
            (STRPREFIX(def->ifname, VIR_NET_GENERATED_TAP_PREFIX) ||
+            STRPREFIX(def->ifname, VIR_NET_GENERATED_MACVTAP_PREFIX) ||
+            STRPREFIX(def->ifname, VIR_NET_GENERATED_MACVLAN_PREFIX) ||
             (prefix && STRPREFIX(def->ifname, prefix)))))) {
         /* Skip auto-generated target names for inactive config. */
         virBufferEscapeString(&attrBuf, " dev='%s'", def->ifname);
-- 
2.26.2

Re: [libvirt PATCH] conf: properly clear out autogenerated macvtap names when formatting/parsing
Posted by Daniel Henrique Barboza 3 years, 7 months ago

On 8/23/20 1:52 AM, Laine Stump wrote:
> Back when macvtap support was added in commit 315baab9443 in Feb. 2010
> (libvirt-0.7.7), it was setup to autogenerate a name for the device if
> one wasn't supplied, in the pattern "macvtap%d" (or "macvlan%d"),
> similar to the way an unspecified standard tap device name will lead
> to an autogenerated "vnet%d".
> 
> As a matter of fact, in commit ca1b7cc8e45 added in May 2010, the code
> was changed to *always* ignore a supplied device name for macvtap
> interfaces by deleting any name immediately during the <interface>
> parsing (this was intended to prevent one domain which had failed to
> completely start from deleting the macvtap device of another domain
> which had subsequently been provided the same device name (this will
> seem mildly ironic later). This was later fixed to only clear the
> device name when inactive XML was being parsed. HOWEVER - this was
> only done if the xml was <interface type='direct'> - autogenerated
> names were not cleared for <interface type='network'> (which could
> also result in a macvtap device).
> 
> Although the names of "vnetX" tap devices had always been
> automatically cleared when parsing <interface> (see commit d1304583d
> from July 2008 (!)), at the time macvtap support was added, both vnetX
> and macvtapX device names were always included when formatting the
> XML.
> 
> Then in commit a8be259d0cc (July 2011, libvirt-0.9.4), <interface>
> formatting was changed to also clear out "vnetX" device names during
> XML formatting as well. However the same treatment wasn't given to
> "macvtapX".
> 
> The result of all this (among other things) was that when a running guest

I guess this is a stray sentence ^

> 
> Now in 2020, there has been a report that a failed migration leads to
> the macvtap device of some other unrelated guest on the destination
> host losing its network connectivity. It was determined that this was
> due to the domain XML in the migration containing a macvtap device
> name, e.g. "macvtap0", that was already in use on the
> destination. Normally this wouldn't be a problem, because libvirt
> would see that the device was already in use, and then find a
> different unused name. But in this case, other external problems were
> causing the migration to fail prior to selecting a macvtap device and
> successfully opening it, and during error recovery, qemuProcessStop()
> was called, which went through all def->nets objects and (if they were
> macvtap) deleted the device specified in net->ifname; since libvirt
> hadn't gotten to the point of replacing the incoming "macvtap0" with
> the name of a device it actually created for this guest, that meant
> that "macvtap0" was deleted, *even though it was currently in use by a
> different guest*!
> 
> Whew!
> 
> So, it turns out that when formatting "migratable" XML, "vnetX"
> devices are omitted, just as when formatting "inactive" XML (I'm sure
> there's a bit of code somewhere that says "if (migratable) then set
> inactive too", but found it was easier to just try it out with "virsh
> dumpxml blah --migratable"). By making the code in both interface
> parsing and formatting consistent for "vnetX", "macvtapX", and
> "macvlanX", we can thus make sure that the autogenerated (and unneeded
> / completely *not* wanted) macvtap device name will not be sent with
> the migration XML. This way when a migration fails, net->ifname will
> be NULL, and libvirt won't have any device to try and delete.
> 
> Signed-off-by: Laine Stump <laine@redhat.com>
> ---

Cool story and good patch.


Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

>   src/conf/domain_conf.c | 12 ++++--------
>   1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
> index 8e7981bf25..6064d31b99 100644
> --- a/src/conf/domain_conf.c
> +++ b/src/conf/domain_conf.c
> @@ -12428,14 +12428,6 @@ virDomainNetDefParseXML(virDomainXMLOptionPtr xmlopt,
>           }
>   
>           def->data.direct.linkdev = g_steal_pointer(&dev);
> -
> -        if (ifname &&
> -            flags & VIR_DOMAIN_DEF_PARSE_INACTIVE &&
> -            (STRPREFIX(ifname, VIR_NET_GENERATED_MACVTAP_PREFIX) ||
> -             STRPREFIX(ifname, VIR_NET_GENERATED_MACVLAN_PREFIX))) {
> -            VIR_FREE(ifname);
> -        }
> -
>           break;
>   
>       case VIR_DOMAIN_NET_TYPE_HOSTDEV:
> @@ -12481,6 +12473,8 @@ virDomainNetDefParseXML(virDomainXMLOptionPtr xmlopt,
>       if (def->managed_tap != VIR_TRISTATE_BOOL_NO && ifname &&
>           (flags & VIR_DOMAIN_DEF_PARSE_INACTIVE) &&
>           (STRPREFIX(ifname, VIR_NET_GENERATED_TAP_PREFIX) ||
> +         STRPREFIX(ifname, VIR_NET_GENERATED_MACVTAP_PREFIX) ||
> +         STRPREFIX(ifname, VIR_NET_GENERATED_MACVLAN_PREFIX) ||
>            (prefix && STRPREFIX(ifname, prefix)))) {
>           /* An auto-generated target name, blank it out */
>           VIR_FREE(ifname);
> @@ -26796,6 +26790,8 @@ virDomainNetDefFormat(virBufferPtr buf,
>           (def->managed_tap == VIR_TRISTATE_BOOL_NO ||
>            !((flags & VIR_DOMAIN_DEF_FORMAT_INACTIVE) &&
>              (STRPREFIX(def->ifname, VIR_NET_GENERATED_TAP_PREFIX) ||
> +            STRPREFIX(def->ifname, VIR_NET_GENERATED_MACVTAP_PREFIX) ||
> +            STRPREFIX(def->ifname, VIR_NET_GENERATED_MACVLAN_PREFIX) ||
>               (prefix && STRPREFIX(def->ifname, prefix)))))) {
>           /* Skip auto-generated target names for inactive config. */
>           virBufferEscapeString(&attrBuf, " dev='%s'", def->ifname);
> 

Re: [libvirt PATCH] conf: properly clear out autogenerated macvtap names when formatting/parsing
Posted by Laine Stump 3 years, 7 months ago
On 8/24/20 6:28 PM, Daniel Henrique Barboza wrote:
> 
> 
> On 8/23/20 1:52 AM, Laine Stump wrote:
>> Back when macvtap support was added in commit 315baab9443 in Feb. 2010
>> (libvirt-0.7.7), it was setup to autogenerate a name for the device if
>> one wasn't supplied, in the pattern "macvtap%d" (or "macvlan%d"),
>> similar to the way an unspecified standard tap device name will lead
>> to an autogenerated "vnet%d".
>>
>> As a matter of fact, in commit ca1b7cc8e45 added in May 2010, the code
>> was changed to *always* ignore a supplied device name for macvtap
>> interfaces by deleting any name immediately during the <interface>
>> parsing (this was intended to prevent one domain which had failed to
>> completely start from deleting the macvtap device of another domain
>> which had subsequently been provided the same device name (this will
>> seem mildly ironic later). This was later fixed to only clear the
>> device name when inactive XML was being parsed. HOWEVER - this was
>> only done if the xml was <interface type='direct'> - autogenerated
>> names were not cleared for <interface type='network'> (which could
>> also result in a macvtap device).
>>
>> Although the names of "vnetX" tap devices had always been
>> automatically cleared when parsing <interface> (see commit d1304583d
>> from July 2008 (!)), at the time macvtap support was added, both vnetX
>> and macvtapX device names were always included when formatting the
>> XML.
>>
>> Then in commit a8be259d0cc (July 2011, libvirt-0.9.4), <interface>
>> formatting was changed to also clear out "vnetX" device names during
>> XML formatting as well. However the same treatment wasn't given to
>> "macvtapX".
>>
>> The result of all this (among other things) was that when a running guest
> 
> I guess this is a stray sentence ^

You are guessing correctly.

My ferret-like brain is easily distracted in the middle of a thought, 
and sometimes when I pop my stack back to where I was before, I forget 
to clean up some loose ends. In this case, the above fragment just needs 
to be removed.

> 
>>
>> Now in 2020, there has been a report that a failed migration leads to
>> the macvtap device of some other unrelated guest on the destination
>> host losing its network connectivity. It was determined that this was
>> due to the domain XML in the migration containing a macvtap device
>> name, e.g. "macvtap0", that was already in use on the
>> destination. Normally this wouldn't be a problem, because libvirt
>> would see that the device was already in use, and then find a
>> different unused name. But in this case, other external problems were
>> causing the migration to fail prior to selecting a macvtap device and
>> successfully opening it, and during error recovery, qemuProcessStop()
>> was called, which went through all def->nets objects and (if they were
>> macvtap) deleted the device specified in net->ifname; since libvirt
>> hadn't gotten to the point of replacing the incoming "macvtap0" with
>> the name of a device it actually created for this guest, that meant
>> that "macvtap0" was deleted, *even though it was currently in use by a
>> different guest*!
>>
>> Whew!
>>
>> So, it turns out that when formatting "migratable" XML, "vnetX"
>> devices are omitted, just as when formatting "inactive" XML (I'm sure
>> there's a bit of code somewhere that says "if (migratable) then set
>> inactive too", but found it was easier to just try it out with "virsh
>> dumpxml blah --migratable"). By making the code in both interface
>> parsing and formatting consistent for "vnetX", "macvtapX", and
>> "macvlanX", we can thus make sure that the autogenerated (and unneeded
>> / completely *not* wanted) macvtap device name will not be sent with
>> the migration XML. This way when a migration fails, net->ifname will
>> be NULL, and libvirt won't have any device to try and delete.
>>
>> Signed-off-by: Laine Stump <laine@redhat.com>
>> ---
> 
> Cool story and good patch.

Well, I live for cool stories, after all :-) (seriously, I don't know if 
anyone else so frequently has commit log messages longer than the actual 
patch. My wife would say I'm confusing the situation by giving too much 
information; I say I'm being thorough :-))

Re: [libvirt PATCH] conf: properly clear out autogenerated macvtap names when formatting/parsing
Posted by Daniel Henrique Barboza 3 years, 7 months ago

On 8/24/20 8:01 PM, Laine Stump wrote:
> On 8/24/20 6:28 PM, Daniel Henrique Barboza wrote:
>>
>>
>> On 8/23/20 1:52 AM, Laine Stump wrote:
>>> Back when macvtap support was added in commit 315baab9443 in Feb. 2010
>>> (libvirt-0.7.7), it was setup to autogenerate a name for the device if
>>> one wasn't supplied, in the pattern "macvtap%d" (or "macvlan%d"),
>>> similar to the way an unspecified standard tap device name will lead
>>> to an autogenerated "vnet%d".
>>>
>>> As a matter of fact, in commit ca1b7cc8e45 added in May 2010, the code
>>> was changed to *always* ignore a supplied device name for macvtap
>>> interfaces by deleting any name immediately during the <interface>
>>> parsing (this was intended to prevent one domain which had failed to
>>> completely start from deleting the macvtap device of another domain
>>> which had subsequently been provided the same device name (this will
>>> seem mildly ironic later). This was later fixed to only clear the
>>> device name when inactive XML was being parsed. HOWEVER - this was
>>> only done if the xml was <interface type='direct'> - autogenerated
>>> names were not cleared for <interface type='network'> (which could
>>> also result in a macvtap device).
>>>
>>> Although the names of "vnetX" tap devices had always been
>>> automatically cleared when parsing <interface> (see commit d1304583d
>>> from July 2008 (!)), at the time macvtap support was added, both vnetX
>>> and macvtapX device names were always included when formatting the
>>> XML.
>>>
>>> Then in commit a8be259d0cc (July 2011, libvirt-0.9.4), <interface>
>>> formatting was changed to also clear out "vnetX" device names during
>>> XML formatting as well. However the same treatment wasn't given to
>>> "macvtapX".
>>>
>>> The result of all this (among other things) was that when a running guest
>>
>> I guess this is a stray sentence ^
> 
> You are guessing correctly.
> 
> My ferret-like brain is easily distracted in the middle of a thought, and sometimes when I pop my stack back to where I was before, I forget to clean up some loose ends. In this case, the above fragment just needs to be removed.
> 
>>
>>>
>>> Now in 2020, there has been a report that a failed migration leads to
>>> the macvtap device of some other unrelated guest on the destination
>>> host losing its network connectivity. It was determined that this was
>>> due to the domain XML in the migration containing a macvtap device
>>> name, e.g. "macvtap0", that was already in use on the
>>> destination. Normally this wouldn't be a problem, because libvirt
>>> would see that the device was already in use, and then find a
>>> different unused name. But in this case, other external problems were
>>> causing the migration to fail prior to selecting a macvtap device and
>>> successfully opening it, and during error recovery, qemuProcessStop()
>>> was called, which went through all def->nets objects and (if they were
>>> macvtap) deleted the device specified in net->ifname; since libvirt
>>> hadn't gotten to the point of replacing the incoming "macvtap0" with
>>> the name of a device it actually created for this guest, that meant
>>> that "macvtap0" was deleted, *even though it was currently in use by a
>>> different guest*!
>>>
>>> Whew!
>>>
>>> So, it turns out that when formatting "migratable" XML, "vnetX"
>>> devices are omitted, just as when formatting "inactive" XML (I'm sure
>>> there's a bit of code somewhere that says "if (migratable) then set
>>> inactive too", but found it was easier to just try it out with "virsh
>>> dumpxml blah --migratable"). By making the code in both interface
>>> parsing and formatting consistent for "vnetX", "macvtapX", and
>>> "macvlanX", we can thus make sure that the autogenerated (and unneeded
>>> / completely *not* wanted) macvtap device name will not be sent with
>>> the migration XML. This way when a migration fails, net->ifname will
>>> be NULL, and libvirt won't have any device to try and delete.
>>>
>>> Signed-off-by: Laine Stump <laine@redhat.com>
>>> ---
>>
>> Cool story and good patch.
> 
> Well, I live for cool stories, after all :-) (seriously, I don't know if anyone else so frequently has commit log messages longer than the actual patch. My wife would say I'm confusing the situation by giving too much information; I say I'm being thorough :-))


I don't shy away from long commit messages as well:


https://github.com/qemu/qemu/commit/6ca080453ea403959ccde661030ca16264acc181


I'm keeping it under control nowadays, but sometimes a relapse hits and when
I notice it's a 100 line commit msg to explain why I'm removing an "IF"
clause :)




DHB


>