[PATCH] Fixed missing VM vport when batch start or migration partially failed

gongwei@smartx.com posted 1 patch 3 years, 10 months ago
Test syntax-check failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20200612061809.163313-1-gongwei@smartx.com
src/qemu/qemu_process.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
[PATCH] Fixed missing VM vport when batch start or migration partially failed
Posted by gongwei@smartx.com 3 years, 10 months ago
From: gongwei <gongwei@smartx.com>

start to failed will not remove the openvswitch port,
the port recycling in this case lets openvswitch handle it by itself

Signed-off-by: gongwei <gongwei@smartx.com>
---
 src/qemu/qemu_process.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index d36088ba98..439bd5b396 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -7482,7 +7482,8 @@ void qemuProcessStop(virQEMUDriverPtr driver,
         if (vport) {
             if (vport->virtPortType == VIR_NETDEV_VPORT_PROFILE_MIDONET) {
                 ignore_value(virNetDevMidonetUnbindPort(vport));
-            } else if (vport->virtPortType == VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH) {
+            } else if (vport->virtPortType == VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH &&
+                       reason != VIR_DOMAIN_SHUTOFF_FAILED) {
                 ignore_value(virNetDevOpenvswitchRemovePort(
                                  virDomainNetGetActualBridgeName(net),
                                  net->ifname));
-- 
2.18.2

Re: [PATCH] Fixed missing VM vport when batch start or migration partially failed
Posted by Daniel Henrique Barboza 3 years, 10 months ago

On 6/12/20 3:18 AM, gongwei@smartx.com wrote:
> From: gongwei <gongwei@smartx.com>
> 
> start to failed will not remove the openvswitch port,
> the port recycling in this case lets openvswitch handle it by itself
> 
> Signed-off-by: gongwei <gongwei@smartx.com>
> ---

Can you please elaborate on the commit message? By the commit title and
the code, I'm assuming that you're saying that we shouldn't remove the
openvswitch port if the QEMU process failed to start, for any other
reason aside from SHUTOFF_FAILED.


The code itself looks ok.



>   src/qemu/qemu_process.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index d36088ba98..439bd5b396 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -7482,7 +7482,8 @@ void qemuProcessStop(virQEMUDriverPtr driver,
>           if (vport) {
>               if (vport->virtPortType == VIR_NETDEV_VPORT_PROFILE_MIDONET) {
>                   ignore_value(virNetDevMidonetUnbindPort(vport));
> -            } else if (vport->virtPortType == VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH) {
> +            } else if (vport->virtPortType == VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH &&
> +                       reason != VIR_DOMAIN_SHUTOFF_FAILED) {
>                   ignore_value(virNetDevOpenvswitchRemovePort(
>                                    virDomainNetGetActualBridgeName(net),
>                                    net->ifname));
> 

Re: [PATCH] Fixed missing VM vport when batch start or migration partially failed
Posted by Laine Stump 3 years, 10 months ago
On 6/15/20 2:04 PM, Daniel Henrique Barboza wrote:
>
>
> On 6/12/20 3:18 AM, gongwei@smartx.com wrote:
>> From: gongwei <gongwei@smartx.com>
>>
>> start to failed will not remove the openvswitch port,
>> the port recycling in this case lets openvswitch handle it by itself
>>
>> Signed-off-by: gongwei <gongwei@smartx.com>
>> ---
>
> Can you please elaborate on the commit message? By the commit title and
> the code, I'm assuming that you're saying that we shouldn't remove the
> openvswitch port if the QEMU process failed to start, for any other
> reason aside from SHUTOFF_FAILED.


More importantly, what "port recycling" will take effect dependent on 
how the qemu process is stopped (which I would think wouldn't make any 
different to OVS), and why is it necessary for libvirt to not do it.


Up until now, what I have known is that ports will not be removed from 
an OVS switch unless they are explicitly removed with ovs-vsctl, and 
this attachment will persist across reboots of the host system. As a 
matter of fact I've had cases during development where libvirt didn't 
remove the OVS port for a tap device when a guest was terminated, and 
then many *days* (and several reboots) later the same tap device name 
was used for a different guest that was using a Linux host bridge, and 
the tap device failed to attach to the Linux host bridge because it had 
already been auto-attached back to the OVS switch as soon as it was created.


Can you desccribe how to reproduce the situation where libvirt removes 
the OVS port when it shouldn't, and what is the bad outcome of that 
happening?



>
> The code itself looks ok.
>
>
>
>>   src/qemu/qemu_process.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>> index d36088ba98..439bd5b396 100644
>> --- a/src/qemu/qemu_process.c
>> +++ b/src/qemu/qemu_process.c
>> @@ -7482,7 +7482,8 @@ void qemuProcessStop(virQEMUDriverPtr driver,
>>           if (vport) {
>>               if (vport->virtPortType == 
>> VIR_NETDEV_VPORT_PROFILE_MIDONET) {
>> ignore_value(virNetDevMidonetUnbindPort(vport));
>> -            } else if (vport->virtPortType == 
>> VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH) {
>> +            } else if (vport->virtPortType == 
>> VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH &&
>> +                       reason != VIR_DOMAIN_SHUTOFF_FAILED) {
>>                   ignore_value(virNetDevOpenvswitchRemovePort(
>> virDomainNetGetActualBridgeName(net),
>>                                    net->ifname));
>>
>

Re: [PATCH] Fixed missing VM vport when batch start or migration partially failed
Posted by Wei Gong 3 years, 10 months ago
  environment:libvirt-4.3.0 qemu-kvm-ev-2.10.0 kernel-3.10.0-1062 centos7
openvswitch-2.3.1

 vm network xml :
<interface type='bridge'>
  <mac address='52:54:00:46:45:95'/>
  <source bridge='ovsbr-mgt'/>
  <vlan>
    <tag id='0'/>
  </vlan>
  <virtualport type='openvswitch'>
    <parameters interfaceid='596c6ab7-4557-4935-af97-62a35d933f8d'/>
  </virtualport>
  <target dev='vnet0'/>
  <model type='virtio'/>
  <link state='up'/>
  <alias name='net0'/>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x04'
function='0x0'/>
</interface>

qemuProcessStart in qemu_process.c failed to start.
The first is qemu process stop(At this time, the kernel will recycle tap
device,
and the tap device is applied by other virtual machines).Then, ovs
removevport.
It is possible to processing concurrently qemuProcessStart and
qemuProcessStop.
qemuProcessStop(ovs removevport) may remove ports of other virtual machines
while using openvswitch virtualport.

for example:
Failure to start the vm1, the tap device vnet0 will be recovered first(at
this time vm2 starts and
uses vnet0 device,and ovs add vnet0 port), then the removevport vnet0(
remove vnet0
belonging to vm2 at this time ). During this time interval,
vm2 will apply for the same tap device vnet0 and add port vnet0.
 At this time, removing the port from vm1 will cause the port of vm2 to be
lost.
vm2 will not be able to access the network through this vnet0.

reproduce:
Batch start or migrate 10 virtual machines to the same node, one of the
virtual machines start failed.
This failure may be that the storage cannot connect or other failures(when
we reproduced internally,
 one of the virtual machines was connected to an invalid storage, and it
was artificially failed).

this problem will cause:
After batch migration, the network of a virtual machine cannot be accessed,
and the virtual machine service is interrupted

libvirt handles ovs logs:
Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
ovs-vsctl --timeout=5 -- --if-exists del-port vnet4 -- add-port ovsbr-mgt
vnet4 tag=0 -- set Interface vnet4
"external-ids:attached-mac=\"52:54:00:92:7e:7f\"" -- set Interface vnet4
"external-ids:iface-id=\"afb3a67a-5e5d-4ca6-b625-ebce6a9c8d03\"" -- set
Interface vnet4
"external-ids:vm-id=\"7b9e4d5a-e8e9-4527-9b89-dd1f74d02526\"" -- set
Interface vnet4 external-ids:iface-status=active
Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 entered promiscuous mode
Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 left promiscuous mode
Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
ovs-vsctl --timeout=5 -- --if-exists del-port vnet4 -- add-port ovsbr-mgt
vnet4 tag=0 -- set Interface vnet4
"external-ids:attached-mac=\"52:54:00:b7:f4:07\"" -- set Interface vnet4
"external-ids:iface-id=\"c837d02d-4a4e-4f9c-9bee-7e5efce01a8e\"" -- set
Interface vnet4
"external-ids:vm-id=\"83035f1e-faed-43d6-951e-08c90c9006a9\"" -- set
Interface vnet4 external-ids:iface-status=active
Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 entered promiscuous mode
Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
ovs-vsctl --timeout=5 -- --if-exists del-port vnet4


Thanks

Laine Stump <laine@redhat.com> 于2020年6月16日周二 上午10:01写道:

> On 6/15/20 2:04 PM, Daniel Henrique Barboza wrote:
> >
> >
> > On 6/12/20 3:18 AM, gongwei@smartx.com wrote:
> >> From: gongwei <gongwei@smartx.com>
> >>
> >> start to failed will not remove the openvswitch port,
> >> the port recycling in this case lets openvswitch handle it by itself
> >>
> >> Signed-off-by: gongwei <gongwei@smartx.com>
> >> ---
> >
> > Can you please elaborate on the commit message? By the commit title and
> > the code, I'm assuming that you're saying that we shouldn't remove the
> > openvswitch port if the QEMU process failed to start, for any other
> > reason aside from SHUTOFF_FAILED.
>
>
> More importantly, what "port recycling" will take effect dependent on
> how the qemu process is stopped (which I would think wouldn't make any
> different to OVS), and why is it necessary for libvirt to not do it.
>
>
> Up until now, what I have known is that ports will not be removed from
> an OVS switch unless they are explicitly removed with ovs-vsctl, and
> this attachment will persist across reboots of the host system. As a
> matter of fact I've had cases during development where libvirt didn't
> remove the OVS port for a tap device when a guest was terminated, and
> then many *days* (and several reboots) later the same tap device name
> was used for a different guest that was using a Linux host bridge, and
> the tap device failed to attach to the Linux host bridge because it had
> already been auto-attached back to the OVS switch as soon as it was
> created.
>
>
> Can you desccribe how to reproduce the situation where libvirt removes
> the OVS port when it shouldn't, and what is the bad outcome of that
> happening?
>
>
>
> >
> > The code itself looks ok.
> >
> >
> >
> >>   src/qemu/qemu_process.c | 3 ++-
> >>   1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> >> index d36088ba98..439bd5b396 100644
> >> --- a/src/qemu/qemu_process.c
> >> +++ b/src/qemu/qemu_process.c
> >> @@ -7482,7 +7482,8 @@ void qemuProcessStop(virQEMUDriverPtr driver,
> >>           if (vport) {
> >>               if (vport->virtPortType ==
> >> VIR_NETDEV_VPORT_PROFILE_MIDONET) {
> >> ignore_value(virNetDevMidonetUnbindPort(vport));
> >> -            } else if (vport->virtPortType ==
> >> VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH) {
> >> +            } else if (vport->virtPortType ==
> >> VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH &&
> >> +                       reason != VIR_DOMAIN_SHUTOFF_FAILED) {
> >>                   ignore_value(virNetDevOpenvswitchRemovePort(
> >> virDomainNetGetActualBridgeName(net),
> >>                                    net->ifname));
> >>
> >
>
>

-- 

龚伟


手机:18883262137
Re: [PATCH] Fixed missing VM vport when batch start or migration partially failed
Posted by Laine Stump 3 years, 10 months ago
On 6/15/20 11:10 PM, Wei Gong wrote:
>   environment:libvirt-4.3.0 qemu-kvm-ev-2.10.0 kernel-3.10.0-1062 
> centos7 openvswitch-2.3.1
>  vm network xml :
> <interface type='bridge'>
>   <mac address='52:54:00:46:45:95'/>
>   <source bridge='ovsbr-mgt'/>
>   <vlan>
>     <tag id='0'/>
>   </vlan>
>   <virtualport type='openvswitch'>
>     <parameters interfaceid='596c6ab7-4557-4935-af97-62a35d933f8d'/>
>   </virtualport>
>   <target dev='vnet0'/>
>   <model type='virtio'/>
>   <link state='up'/>
>   <alias name='net0'/>
>   <address type='pci' domain='0x0000' bus='0x00' slot='0x04' 
> function='0x0'/>
> </interface>
>
> qemuProcessStart in qemu_process.c failed to start.
> The first is qemu process stop(At this time, the kernel will recycle 
> tap device,
> and the tap device is applied by other virtual machines).Then, ovs 
> removevport.
> It is possible to processing concurrently qemuProcessStart and 
> qemuProcessStop.
> qemuProcessStop(ovs removevport) may remove ports of other virtual 
> machines
> while using openvswitch virtualport.
>
> for example:
> Failure to start the vm1, the tap device vnet0 will be recovered 
> first(at this time vm2 starts and
> uses vnet0 device,and ovs add vnet0 port), then the removevport vnet0( 
> remove vnet0
> belonging to vm2 at this time ). During this time interval,
> vm2 will apply for the same tap device vnet0 and add port vnet0.
>  At this time, removing the port from vm1 will cause the port of vm2 
> to be lost.
> vm2 will not be able to access the network through this vnet0.
>
> reproduce:
> Batch start or migrate 10 virtual machines to the same node, one of 
> the virtual machines start failed.
> This failure may be that the storage cannot connect or other 
> failures(when we reproduced internally,
>  one of the virtual machines was connected to an invalid storage, and 
> it was artificially failed).
>
> this problem will cause:
> After batch migration, the network of a virtual machine cannot be 
> accessed,
> and the virtual machine service is interrupted


Okay, I understand the problem now, but your patch doesn't fix it.


The problem is (as also described in 
https://www.redhat.com/archives/libvir-list/2020-June/msg00481.html ) a 
race condition created when the qemu process is shutdown just as a new 
qemu process is started - since the old tap device is deleted (and its 
name made available for re-use) implicitly as a part of the old qemu 
process being terminated, and since the old qemu process has terminated 
before we remove the port from OVS, a new tap (with the old name, as the 
kernel thinks it is now available) may have already been created by the 
kernel by the time qemuProcessStop() gets around to removing the port 
associated (by name) with the old tap from the OVS switch.


And we can't eliminate the race by simply moving the call to 
virNetDevOpenvswitchRemovePort() up before the call to qemuProcessKill() 
- it is also possible that qemu could have exited by itself, or that 
some outside force other than libvirt killed it - in this case the tap 
has already been deleted by the time qemuProcessStop() is reached.


As for your method of eliminating the race, there are two problems:


1) if virNetDevOpenvswitchRemovePort() isn't called, then OVS will 
automatically grab the new tap device as soon as it is created and 
re-attach it to the old switch. As long as the new qemu process asks to 
attach it to that same switch, then there is no problem. But if the new 
process tries to attach the device to a *different* switch (for example, 
a Linux host bridge) then the attach will fail.


2) your method of deciding whether or not 
virNetDevOpenvswitchRemovePort() should be called by libvirt is invalid 
- the reason isn't always set to VIR_DOMAIN_SHUTOFF_FAILED when the qemu 
process has been terminated external to libvirt. But beyond that, the 
code shows that the qemu process is *always* terminated prior to the 
call to virNetDevOpenvswitchRemovePort(). So at most, your patch might 
be making the race window smaller in some cases, but it isn't 
eliminating it.


Fixing this race condition requires something more than just adding an 
extra clause to a conditional. It may be possible to tell OVS to 
automatically delete the port as the tap is deleted (which would be 
nice, but I'm actually not expecting to find a way to do that), or it 
may require libvirt to name and track tap devices itself (as it already 
does for macvtap devices), which *also* has problems - in particular 
whether or not we need to account for the possibility of multiple 
simultaneous libvirtd processes)


> libvirt handles ovs logs:
> Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called 
> as ovs-vsctl --timeout=5 -- --if-exists del-port vnet4 -- add-port 
> ovsbr-mgt vnet4 tag=0 -- set Interface vnet4 
> "external-ids:attached-mac=\"52:54:00:92:7e:7f\"" -- set Interface 
> vnet4 "external-ids:iface-id=\"afb3a67a-5e5d-4ca6-b625-ebce6a9c8d03\"" 
> -- set Interface vnet4 
> "external-ids:vm-id=\"7b9e4d5a-e8e9-4527-9b89-dd1f74d02526\"" -- set 
> Interface vnet4 external-ids:iface-status=active
> Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 entered promiscuous 
> mode
> Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 left promiscuous mode
> Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called 
> as ovs-vsctl --timeout=5 -- --if-exists del-port vnet4 -- add-port 
> ovsbr-mgt vnet4 tag=0 -- set Interface vnet4 
> "external-ids:attached-mac=\"52:54:00:b7:f4:07\"" -- set Interface 
> vnet4 "external-ids:iface-id=\"c837d02d-4a4e-4f9c-9bee-7e5efce01a8e\"" 
> -- set Interface vnet4 
> "external-ids:vm-id=\"83035f1e-faed-43d6-951e-08c90c9006a9\"" -- set 
> Interface vnet4 external-ids:iface-status=active
> Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 entered promiscuous 
> mode
> Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called 
> as ovs-vsctl --timeout=5 -- --if-exists del-port vnet4
>
>
> Thanks
>
> Laine Stump <laine@redhat.com <mailto:laine@redhat.com>> 
> 于2020年6月16日周二 上午10:01写道:
>
>     On 6/15/20 2:04 PM, Daniel Henrique Barboza wrote:
>     >
>     >
>     > On 6/12/20 3:18 AM, gongwei@smartx.com
>     <mailto:gongwei@smartx.com> wrote:
>     >> From: gongwei <gongwei@smartx.com <mailto:gongwei@smartx.com>>
>     >>
>     >> start to failed will not remove the openvswitch port,
>     >> the port recycling in this case lets openvswitch handle it by
>     itself
>     >>
>     >> Signed-off-by: gongwei <gongwei@smartx.com
>     <mailto:gongwei@smartx.com>>
>     >> ---
>     >
>     > Can you please elaborate on the commit message? By the commit
>     title and
>     > the code, I'm assuming that you're saying that we shouldn't
>     remove the
>     > openvswitch port if the QEMU process failed to start, for any other
>     > reason aside from SHUTOFF_FAILED.
>
>
>     More importantly, what "port recycling" will take effect dependent on
>     how the qemu process is stopped (which I would think wouldn't make
>     any
>     different to OVS), and why is it necessary for libvirt to not do it.
>
>
>     Up until now, what I have known is that ports will not be removed
>     from
>     an OVS switch unless they are explicitly removed with ovs-vsctl, and
>     this attachment will persist across reboots of the host system. As a
>     matter of fact I've had cases during development where libvirt didn't
>     remove the OVS port for a tap device when a guest was terminated, and
>     then many *days* (and several reboots) later the same tap device name
>     was used for a different guest that was using a Linux host bridge,
>     and
>     the tap device failed to attach to the Linux host bridge because
>     it had
>     already been auto-attached back to the OVS switch as soon as it
>     was created.
>
>
>     Can you desccribe how to reproduce the situation where libvirt
>     removes
>     the OVS port when it shouldn't, and what is the bad outcome of that
>     happening?
>
>
>
>     >
>     > The code itself looks ok.
>     >
>     >
>     >
>     >>   src/qemu/qemu_process.c | 3 ++-
>     >>   1 file changed, 2 insertions(+), 1 deletion(-)
>     >>
>     >> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>     >> index d36088ba98..439bd5b396 100644
>     >> --- a/src/qemu/qemu_process.c
>     >> +++ b/src/qemu/qemu_process.c
>     >> @@ -7482,7 +7482,8 @@ void qemuProcessStop(virQEMUDriverPtr driver,
>     >>           if (vport) {
>     >>               if (vport->virtPortType ==
>     >> VIR_NETDEV_VPORT_PROFILE_MIDONET) {
>     >> ignore_value(virNetDevMidonetUnbindPort(vport));
>     >> -            } else if (vport->virtPortType ==
>     >> VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH) {
>     >> +            } else if (vport->virtPortType ==
>     >> VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH &&
>     >> +                       reason != VIR_DOMAIN_SHUTOFF_FAILED) {
>     >> ignore_value(virNetDevOpenvswitchRemovePort(
>     >> virDomainNetGetActualBridgeName(net),
>     >>                                    net->ifname));
>     >>
>     >
>
>
>
> -- 
>
> 龚伟
>
>
> 手机:18883262137
>

Re: [PATCH] Fixed missing VM vport when batch start or migration partially failed
Posted by Laine Stump 3 years, 10 months ago
To complete the circle, here is my response to a *different* patch 
trying to fix this same problem. I did a bit more investigating during 
my reply, so there is better / more complete information:

https://www.redhat.com/archives/libvir-list/2020-June/msg00681.html

On 6/15/20 11:10 PM, Wei Gong wrote:
>   environment:libvirt-4.3.0 qemu-kvm-ev-2.10.0 kernel-3.10.0-1062 
> centos7 openvswitch-2.3.1
>  vm network xml :
> <interface type='bridge'>
>   <mac address='52:54:00:46:45:95'/>
>   <source bridge='ovsbr-mgt'/>
>   <vlan>
>     <tag id='0'/>
>   </vlan>
>   <virtualport type='openvswitch'>
>     <parameters interfaceid='596c6ab7-4557-4935-af97-62a35d933f8d'/>
>   </virtualport>
>   <target dev='vnet0'/>
>   <model type='virtio'/>
>   <link state='up'/>
>   <alias name='net0'/>
>   <address type='pci' domain='0x0000' bus='0x00' slot='0x04' 
> function='0x0'/>
> </interface>
>
> qemuProcessStart in qemu_process.c failed to start.
> The first is qemu process stop(At this time, the kernel will recycle 
> tap device,
> and the tap device is applied by other virtual machines).Then, ovs 
> removevport.
> It is possible to processing concurrently qemuProcessStart and 
> qemuProcessStop.
> qemuProcessStop(ovs removevport) may remove ports of other virtual 
> machines
> while using openvswitch virtualport.
>
> for example:
> Failure to start the vm1, the tap device vnet0 will be recovered 
> first(at this time vm2 starts and
> uses vnet0 device,and ovs add vnet0 port), then the removevport vnet0( 
> remove vnet0
> belonging to vm2 at this time ). During this time interval,
> vm2 will apply for the same tap device vnet0 and add port vnet0.
>  At this time, removing the port from vm1 will cause the port of vm2 
> to be lost.
> vm2 will not be able to access the network through this vnet0.
>
> reproduce:
> Batch start or migrate 10 virtual machines to the same node, one of 
> the virtual machines start failed.
> This failure may be that the storage cannot connect or other 
> failures(when we reproduced internally,
>  one of the virtual machines was connected to an invalid storage, and 
> it was artificially failed).
>
> this problem will cause:
> After batch migration, the network of a virtual machine cannot be 
> accessed,
> and the virtual machine service is interrupted
>
> libvirt handles ovs logs:
> Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called 
> as ovs-vsctl --timeout=5 -- --if-exists del-port vnet4 -- add-port 
> ovsbr-mgt vnet4 tag=0 -- set Interface vnet4 
> "external-ids:attached-mac=\"52:54:00:92:7e:7f\"" -- set Interface 
> vnet4 "external-ids:iface-id=\"afb3a67a-5e5d-4ca6-b625-ebce6a9c8d03\"" 
> -- set Interface vnet4 
> "external-ids:vm-id=\"7b9e4d5a-e8e9-4527-9b89-dd1f74d02526\"" -- set 
> Interface vnet4 external-ids:iface-status=active
> Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 entered promiscuous 
> mode
> Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 left promiscuous mode
> Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called 
> as ovs-vsctl --timeout=5 -- --if-exists del-port vnet4 -- add-port 
> ovsbr-mgt vnet4 tag=0 -- set Interface vnet4 
> "external-ids:attached-mac=\"52:54:00:b7:f4:07\"" -- set Interface 
> vnet4 "external-ids:iface-id=\"c837d02d-4a4e-4f9c-9bee-7e5efce01a8e\"" 
> -- set Interface vnet4 
> "external-ids:vm-id=\"83035f1e-faed-43d6-951e-08c90c9006a9\"" -- set 
> Interface vnet4 external-ids:iface-status=active
> Jun 10 19:11:32 zbs-sh-elf-11 kernel: device vnet4 entered promiscuous 
> mode
> Jun 10 19:11:32 zbs-sh-elf-11 ovs-vsctl: ovs|00001|vsctl|INFO|Called 
> as ovs-vsctl --timeout=5 -- --if-exists del-port vnet4
>
>
> Thanks
>
> Laine Stump <laine@redhat.com <mailto:laine@redhat.com>> 
> 于2020年6月16日周二 上午10:01写道:
>
>     On 6/15/20 2:04 PM, Daniel Henrique Barboza wrote:
>     >
>     >
>     > On 6/12/20 3:18 AM, gongwei@smartx.com
>     <mailto:gongwei@smartx.com> wrote:
>     >> From: gongwei <gongwei@smartx.com <mailto:gongwei@smartx.com>>
>     >>
>     >> start to failed will not remove the openvswitch port,
>     >> the port recycling in this case lets openvswitch handle it by
>     itself
>     >>
>     >> Signed-off-by: gongwei <gongwei@smartx.com
>     <mailto:gongwei@smartx.com>>
>     >> ---
>     >
>     > Can you please elaborate on the commit message? By the commit
>     title and
>     > the code, I'm assuming that you're saying that we shouldn't
>     remove the
>     > openvswitch port if the QEMU process failed to start, for any other
>     > reason aside from SHUTOFF_FAILED.
>
>
>     More importantly, what "port recycling" will take effect dependent on
>     how the qemu process is stopped (which I would think wouldn't make
>     any
>     different to OVS), and why is it necessary for libvirt to not do it.
>
>
>     Up until now, what I have known is that ports will not be removed
>     from
>     an OVS switch unless they are explicitly removed with ovs-vsctl, and
>     this attachment will persist across reboots of the host system. As a
>     matter of fact I've had cases during development where libvirt didn't
>     remove the OVS port for a tap device when a guest was terminated, and
>     then many *days* (and several reboots) later the same tap device name
>     was used for a different guest that was using a Linux host bridge,
>     and
>     the tap device failed to attach to the Linux host bridge because
>     it had
>     already been auto-attached back to the OVS switch as soon as it
>     was created.
>
>
>     Can you desccribe how to reproduce the situation where libvirt
>     removes
>     the OVS port when it shouldn't, and what is the bad outcome of that
>     happening?
>
>
>
>     >
>     > The code itself looks ok.
>     >
>     >
>     >
>     >>   src/qemu/qemu_process.c | 3 ++-
>     >>   1 file changed, 2 insertions(+), 1 deletion(-)
>     >>
>     >> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>     >> index d36088ba98..439bd5b396 100644
>     >> --- a/src/qemu/qemu_process.c
>     >> +++ b/src/qemu/qemu_process.c
>     >> @@ -7482,7 +7482,8 @@ void qemuProcessStop(virQEMUDriverPtr driver,
>     >>           if (vport) {
>     >>               if (vport->virtPortType ==
>     >> VIR_NETDEV_VPORT_PROFILE_MIDONET) {
>     >> ignore_value(virNetDevMidonetUnbindPort(vport));
>     >> -            } else if (vport->virtPortType ==
>     >> VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH) {
>     >> +            } else if (vport->virtPortType ==
>     >> VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH &&
>     >> +                       reason != VIR_DOMAIN_SHUTOFF_FAILED) {
>     >> ignore_value(virNetDevOpenvswitchRemovePort(
>     >> virDomainNetGetActualBridgeName(net),
>     >>                                    net->ifname));
>     >>
>     >
>
>
>
> -- 
>
> 龚伟
>
>
> 手机:18883262137
>