src/qemu/qemu_process.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
when shutdown vm, the qemuProcessStop cleanup virtual interface in two steps:
1. qemuProcessKill kill qemu process, and vif disappeared
2. ovs-vsctl del-port from the brige
if start a vm in the middle of the two steps, the new vm will reused the vif,
but removed from bridge by step 2
Signed-off-by: Bingsong Si <owen.si@ucloud.cn>
---
src/qemu/qemu_process.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index d36088ba98..706248815a 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -7483,9 +7483,11 @@ void qemuProcessStop(virQEMUDriverPtr driver,
if (vport->virtPortType == VIR_NETDEV_VPORT_PROFILE_MIDONET) {
ignore_value(virNetDevMidonetUnbindPort(vport));
} else if (vport->virtPortType == VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH) {
- ignore_value(virNetDevOpenvswitchRemovePort(
- virDomainNetGetActualBridgeName(net),
- net->ifname));
+ virMacAddr mac;
+ if (virNetDevGetMAC(net->ifname, &mac) < 0 || !virMacAddrCmp(&mac, &net->mac))
+ ignore_value(virNetDevOpenvswitchRemovePort(
+ virDomainNetGetActualBridgeName(net),
+ net->ifname));
}
}
--
2.18.2
On 6/11/20 6:58 AM, Bingsong Si wrote: > when shutdown vm, the qemuProcessStop cleanup virtual interface in two steps: s/when/When > 1. qemuProcessKill kill qemu process, and vif disappeared > 2. ovs-vsctl del-port from the brige > > if start a vm in the middle of the two steps, the new vm will reused the vif, s/if/If > but removed from bridge by step 2 > > Signed-off-by: Bingsong Si <owen.si@ucloud.cn> > --- > src/qemu/qemu_process.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c > index d36088ba98..706248815a 100644 > --- a/src/qemu/qemu_process.c > +++ b/src/qemu/qemu_process.c > @@ -7483,9 +7483,11 @@ void qemuProcessStop(virQEMUDriverPtr driver, > if (vport->virtPortType == VIR_NETDEV_VPORT_PROFILE_MIDONET) { > ignore_value(virNetDevMidonetUnbindPort(vport)); > } else if (vport->virtPortType == VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH) { > - ignore_value(virNetDevOpenvswitchRemovePort( > - virDomainNetGetActualBridgeName(net), > - net->ifname)); > + virMacAddr mac; > + if (virNetDevGetMAC(net->ifname, &mac) < 0 || !virMacAddrCmp(&mac, &net->mac)) Extra space between "||" and "!virMacAddrCmp(.." With these nits fixed: Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> > + ignore_value(virNetDevOpenvswitchRemovePort( > + virDomainNetGetActualBridgeName(net), > + net->ifname)); > } > } > >
Thanks for your review, I will fix in version 2. 发件人: Daniel Henrique Barboza 发送时间: 2020年6月16日星期二 上午2:36 收件人: Bingsong Si; libvir-list@redhat.com 主题: Re: [PATCH] network: Fix a race condition when shutdown & start vm at the same time On 6/11/20 6:58 AM, Bingsong Si wrote: > when shutdown vm, the qemuProcessStop cleanup virtual interface in two steps: s/when/When > 1. qemuProcessKill kill qemu process, and vif disappeared > 2. ovs-vsctl del-port from the brige > > if start a vm in the middle of the two steps, the new vm will reused the vif, s/if/If > but removed from bridge by step 2 > > Signed-off-by: Bingsong Si <owen.si@ucloud.cn> > --- > src/qemu/qemu_process.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c > index d36088ba98..706248815a 100644 > --- a/src/qemu/qemu_process.c > +++ b/src/qemu/qemu_process.c > @@ -7483,9 +7483,11 @@ void qemuProcessStop(virQEMUDriverPtr driver, > if (vport->virtPortType == VIR_NETDEV_VPORT_PROFILE_MIDONET) { > ignore_value(virNetDevMidonetUnbindPort(vport)); > } else if (vport->virtPortType == VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH) { > - ignore_value(virNetDevOpenvswitchRemovePort( > - virDomainNetGetActualBridgeName(net), > - net->ifname)); > + virMacAddr mac; > + if (virNetDevGetMAC(net->ifname, &mac) < 0 || !virMacAddrCmp(&mac, &net->mac)) Extra space between "||" and "!virMacAddrCmp(.." With these nits fixed: Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> > + ignore_value(virNetDevOpenvswitchRemovePort( > + virDomainNetGetActualBridgeName(net), > + net->ifname)); > } > } > >
(BTW, this other patch is also trying to solve the same problem: https://www.redhat.com/archives/libvir-list/2020-June/msg00525.html I made comments there earlier, and have learned a bit more since then: https://www.redhat.com/archives/libvir-list/2020-June/msg00634.html On 6/15/20 2:36 PM, Daniel Henrique Barboza wrote: > > > On 6/11/20 6:58 AM, Bingsong Si wrote: >> when shutdown vm, the qemuProcessStop cleanup virtual interface in >> two steps: > > s/when/When > >> 1. qemuProcessKill kill qemu process, and vif disappeared >> 2. ovs-vsctl del-port from the brige >> >> if start a vm in the middle of the two steps, the new vm will reused >> the vif, > > s/if/If > >> but removed from bridge by step 2 >> >> Signed-off-by: Bingsong Si <owen.si@ucloud.cn> >> --- >> src/qemu/qemu_process.c | 8 +++++--- >> 1 file changed, 5 insertions(+), 3 deletions(-) >> >> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c >> index d36088ba98..706248815a 100644 >> --- a/src/qemu/qemu_process.c >> +++ b/src/qemu/qemu_process.c >> @@ -7483,9 +7483,11 @@ void qemuProcessStop(virQEMUDriverPtr driver, >> if (vport->virtPortType == >> VIR_NETDEV_VPORT_PROFILE_MIDONET) { >> ignore_value(virNetDevMidonetUnbindPort(vport)); >> } else if (vport->virtPortType == >> VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH) { >> - ignore_value(virNetDevOpenvswitchRemovePort( >> - virDomainNetGetActualBridgeName(net), >> - net->ifname)); >> + virMacAddr mac; >> + if (virNetDevGetMAC(net->ifname, &mac) < 0 || >> !virMacAddrCmp(&mac, &net->mac)) (Before anything else - virNetDevGetMAC() will actually *log* an error in libvirt's logs if the device isn't found (which will be in nearly 100% of all cases). That will lead to people reporting it as a bug, which gets very time consuming and expensive for anyone providing commercial support for a product that uses libvirt. If it is really necessary to check the MAC address of a device that legitimately may or may not exist, then there will need to be a "Quiet" version of the function that doesn't log any errors.)(Update after thinking about it - I don't think we should be checking the MAC address anyway, as it doesn't reliably differentiate "new" tap from "old" tap). > > Extra space between "||" and "!virMacAddrCmp(.." > > > With these nits fixed: > > > Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> This patch is attempting to solve a race condition between one guest starting at the same time another is stopping, and the mess that results if the new guest uses the same name for its tap device as the old guest used. For example, lets say libvirt thread A / guest A is doing this: A1) the QEMU process is terminated (and tap device, e.g. "vnet0", is implicitly deleted) either by libvirtd or by some external force (including possibly the QEMU process itself A2) the tap's associated port (also "vnet0") is removed from the OVS switch by libvirt. While libvirt thread B is doing this: B1) a new tap device is created for a new QEMU process. If A1 has already happened, then the kernel will likely give the new tap the same name - "vnet0". B2) the new tap device is attached to an OVS switch (or possibly a Linux host bridge). The problem occurs when if B2 happens before A2, which could result in B2 attaching the new tap to the OVS switch, and then A2 disconnecting it from the switch. So libvirt thinks the new QEMU guest tap is attached to the switch, but it isn't. This patch attempts to eliminate the race by checking, prior to removing "old tap"s port on the switch, that 1) the tap device doesn't exist, or that if it does 2) that the MAC address of the tap device is unchanged. Assuming that the two guests would not use the same MAC address for their tap devices (which is probably the case, but isn't *required* to be true), this does significantly narrow the potential time for a race condition, and in particular makes sure that we never remove a port that hasn't just been "re-added" by the new QEMU. However, this just creates a smaller window for the race, and different problem for the remainder of the time. 1) smaller window - it would still be possible for the following to happen: a) old qemu terminates, tap device "vnet0" is deleted b) libvirt checks MAC address and learns that there is no device "vnet0", so it calls virNetDevOpenvswitchRemovePort(), but before ovs-vsctl can be called... c) libvirt creates a new tap device for new QEMU, kernel names it "vnet0" d) libvirt calls virNetDevOpenvswitchAddPort() and the new tap device "vnet0" to the switch e) the ovs-vsctl from (b) is finally able to run, and removes "vnet0" from the switch. Granted, this is *highly* unlikely, since there is nothing extra between checking MAC address and removing the port, but there is nothing enforcing it. 2) New Problem - I think the testing here was done with two guests who both attached their tap to the same (or another) OVS switch. It's relying on the ability to attach the tap device to a new switch/bridge even if it is already attached to some other switch bridge. Normally ovs-vsctl would refuse to add a port to a switch if a port by that same name was already on any OVS switch. I just looked it up though, and in this case libvirt is able to make this work by including "--if-exists del-port $ifname" in the ovs-vsctl command that *adds* the new port. However, if you have the same situation but the new switch device is instead a Linux bridge, the attempt to attach to the bridge fails. So, if thread "B" has already created the new tap device by the time thread "A" is deciding whether or not to remove the old port, the port won't be removed, and if the new guest "B" is using a Linux host bridge, libvirt will fail to attach the new tap to the bridge. So, a summary of the problems with this patch: 1) The race window is reduced (and may be gone in practical terms), but not guaranteed to be eliminated. 2) For the time during the "previous race window start" and "new race window start" a new problem has been created - if the new guest uses a Linux host bridge, the connection will fail 3) virNetDevGetMAC() will put an error in the system logs if the device doesn't exist (and it almost always will *not* exist, so this will be significant 4) Although it is almost always the case that two guests will not use the same MAC address for their network interfaces, there is nothing preventing it - we shouldn't assume that MAC addresses are unique. I think that check is actually superfluous, since the qemu process has always been terminated by the time we get to that place in the code, so the tap device should have been auto-deleted. What do I think should be done? Good question. Possibly we could: A) Call virNetDevTapReattachBridge() rather than virNetDevTapAttachBridge() in virNetDevTapCreateInBridgePort(). This would eliminate problem (2). B) Instead of checking if the tap device MAC address matches, just call virNetDevExists() - if it exists, then skip the RemovePort() - this eliminates problems (3) and (4). (NB - this would fail if it turns out that tap device deletion isn't completed synchronously with qemu process termination!) C) If we want to make it 100% sound, we need to make "check for interface existence + removeport" an atomic operation, and mutually exclusive with virNetDevTapCreate(). This would eliminate problem (1) > >> + ignore_value(virNetDevOpenvswitchRemovePort( >> + virDomainNetGetActualBridgeName(net), >> + net->ifname)); >> } >> } >> >
Thanks for your detailed analysis, I will remake a patch 发件人: Laine Stump 发送时间: Wednesday, June 17, 2020 7:46 AM 收件人: libvir-list@redhat.com 抄送: Daniel Henrique Barboza; Bingsong Si; Wei Gong 主题: Re: [PATCH] network: Fix a race condition when shutdown & start vm at the same time (BTW, this other patch is also trying to solve the same problem: https://www.redhat.com/archives/libvir-list/2020-June/msg00525.html I made comments there earlier, and have learned a bit more since then: https://www.redhat.com/archives/libvir-list/2020-June/msg00634.html On 6/15/20 2:36 PM, Daniel Henrique Barboza wrote: > > > On 6/11/20 6:58 AM, Bingsong Si wrote: >> when shutdown vm, the qemuProcessStop cleanup virtual interface in >> two steps: > > s/when/When > >> 1. qemuProcessKill kill qemu process, and vif disappeared >> 2. ovs-vsctl del-port from the brige >> >> if start a vm in the middle of the two steps, the new vm will reused >> the vif, > > s/if/If > >> but removed from bridge by step 2 >> >> Signed-off-by: Bingsong Si <owen.si@ucloud.cn> >> --- >> src/qemu/qemu_process.c | 8 +++++--- >> 1 file changed, 5 insertions(+), 3 deletions(-) >> >> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c >> index d36088ba98..706248815a 100644 >> --- a/src/qemu/qemu_process.c >> +++ b/src/qemu/qemu_process.c >> @@ -7483,9 +7483,11 @@ void qemuProcessStop(virQEMUDriverPtr driver, >> if (vport->virtPortType == >> VIR_NETDEV_VPORT_PROFILE_MIDONET) { >> ignore_value(virNetDevMidonetUnbindPort(vport)); >> } else if (vport->virtPortType == >> VIR_NETDEV_VPORT_PROFILE_OPENVSWITCH) { >> - ignore_value(virNetDevOpenvswitchRemovePort( >> - virDomainNetGetActualBridgeName(net), >> - net->ifname)); >> + virMacAddr mac; >> + if (virNetDevGetMAC(net->ifname, &mac) < 0 || >> !virMacAddrCmp(&mac, &net->mac)) (Before anything else - virNetDevGetMAC() will actually *log* an error in libvirt's logs if the device isn't found (which will be in nearly 100% of all cases). That will lead to people reporting it as a bug, which gets very time consuming and expensive for anyone providing commercial support for a product that uses libvirt. If it is really necessary to check the MAC address of a device that legitimately may or may not exist, then there will need to be a "Quiet" version of the function that doesn't log any errors.)(Update after thinking about it - I don't think we should be checking the MAC address anyway, as it doesn't reliably differentiate "new" tap from "old" tap). > > Extra space between "||" and "!virMacAddrCmp(.." > > > With these nits fixed: > > > Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> This patch is attempting to solve a race condition between one guest starting at the same time another is stopping, and the mess that results if the new guest uses the same name for its tap device as the old guest used. For example, lets say libvirt thread A / guest A is doing this: A1) the QEMU process is terminated (and tap device, e.g. "vnet0", is implicitly deleted) either by libvirtd or by some external force (including possibly the QEMU process itself A2) the tap's associated port (also "vnet0") is removed from the OVS switch by libvirt. While libvirt thread B is doing this: B1) a new tap device is created for a new QEMU process. If A1 has already happened, then the kernel will likely give the new tap the same name - "vnet0". B2) the new tap device is attached to an OVS switch (or possibly a Linux host bridge). The problem occurs when if B2 happens before A2, which could result in B2 attaching the new tap to the OVS switch, and then A2 disconnecting it from the switch. So libvirt thinks the new QEMU guest tap is attached to the switch, but it isn't. This patch attempts to eliminate the race by checking, prior to removing "old tap"s port on the switch, that 1) the tap device doesn't exist, or that if it does 2) that the MAC address of the tap device is unchanged. Assuming that the two guests would not use the same MAC address for their tap devices (which is probably the case, but isn't *required* to be true), this does significantly narrow the potential time for a race condition, and in particular makes sure that we never remove a port that hasn't just been "re-added" by the new QEMU. However, this just creates a smaller window for the race, and different problem for the remainder of the time. 1) smaller window - it would still be possible for the following to happen: a) old qemu terminates, tap device "vnet0" is deleted b) libvirt checks MAC address and learns that there is no device "vnet0", so it calls virNetDevOpenvswitchRemovePort(), but before ovs-vsctl can be called... c) libvirt creates a new tap device for new QEMU, kernel names it "vnet0" d) libvirt calls virNetDevOpenvswitchAddPort() and the new tap device "vnet0" to the switch e) the ovs-vsctl from (b) is finally able to run, and removes "vnet0" from the switch. Granted, this is *highly* unlikely, since there is nothing extra between checking MAC address and removing the port, but there is nothing enforcing it. 2) New Problem - I think the testing here was done with two guests who both attached their tap to the same (or another) OVS switch. It's relying on the ability to attach the tap device to a new switch/bridge even if it is already attached to some other switch bridge. Normally ovs-vsctl would refuse to add a port to a switch if a port by that same name was already on any OVS switch. I just looked it up though, and in this case libvirt is able to make this work by including "--if-exists del-port $ifname" in the ovs-vsctl command that *adds* the new port. However, if you have the same situation but the new switch device is instead a Linux bridge, the attempt to attach to the bridge fails. So, if thread "B" has already created the new tap device by the time thread "A" is deciding whether or not to remove the old port, the port won't be removed, and if the new guest "B" is using a Linux host bridge, libvirt will fail to attach the new tap to the bridge. So, a summary of the problems with this patch: 1) The race window is reduced (and may be gone in practical terms), but not guaranteed to be eliminated. 2) For the time during the "previous race window start" and "new race window start" a new problem has been created - if the new guest uses a Linux host bridge, the connection will fail 3) virNetDevGetMAC() will put an error in the system logs if the device doesn't exist (and it almost always will *not* exist, so this will be significant 4) Although it is almost always the case that two guests will not use the same MAC address for their network interfaces, there is nothing preventing it - we shouldn't assume that MAC addresses are unique. I think that check is actually superfluous, since the qemu process has always been terminated by the time we get to that place in the code, so the tap device should have been auto-deleted. What do I think should be done? Good question. Possibly we could: A) Call virNetDevTapReattachBridge() rather than virNetDevTapAttachBridge() in virNetDevTapCreateInBridgePort(). This would eliminate problem (2). B) Instead of checking if the tap device MAC address matches, just call virNetDevExists() - if it exists, then skip the RemovePort() - this eliminates problems (3) and (4). (NB - this would fail if it turns out that tap device deletion isn't completed synchronously with qemu process termination!) C) If we want to make it 100% sound, we need to make "check for interface existence + removeport" an atomic operation, and mutually exclusive with virNetDevTapCreate(). This would eliminate problem (1) > >> + ignore_value(virNetDevOpenvswitchRemovePort( >> + virDomainNetGetActualBridgeName(net), >> + net->ifname)); >> } >> } >> >
On 6/17/20 1:46 AM, Laine Stump wrote: > > What do I think should be done? Good question. Possibly we could: > > > A) Call virNetDevTapReattachBridge() rather than > virNetDevTapAttachBridge() in virNetDevTapCreateInBridgePort(). This > would eliminate problem (2). > > > B) Instead of checking if the tap device MAC address matches, just call > virNetDevExists() - if it exists, then skip the RemovePort() - this > eliminates problems (3) and (4). (NB - this would fail if it turns out > that tap device deletion isn't completed synchronously with qemu process > termination!) > > > C) If we want to make it 100% sound, we need to make "check for > interface existence + removeport" an atomic operation, and mutually > exclusive with virNetDevTapCreate(). This would eliminate problem (1) > I still don't quite understand how there can be a race. I mean, from system POV, libvirt creates a TAP, plugs it into a bridge (when starting the first domain). And when shutting it down and starting the second domain in parallel a new TAP device (with different index and MAC) is created independent of the first TAP, no? Michal
On 6/19/20 12:00 PM, Michal Privoznik wrote: > On 6/17/20 1:46 AM, Laine Stump wrote: > >> >> What do I think should be done? Good question. Possibly we could: >> >> >> A) Call virNetDevTapReattachBridge() rather than >> virNetDevTapAttachBridge() in virNetDevTapCreateInBridgePort(). This >> would eliminate problem (2). >> >> >> B) Instead of checking if the tap device MAC address matches, just >> call virNetDevExists() - if it exists, then skip the RemovePort() - >> this eliminates problems (3) and (4). (NB - this would fail if it >> turns out that tap device deletion isn't completed synchronously with >> qemu process termination!) >> >> >> C) If we want to make it 100% sound, we need to make "check for >> interface existence + removeport" an atomic operation, and mutually >> exclusive with virNetDevTapCreate(). This would eliminate problem (1) >> > > I still don't quite understand how there can be a race. I mean, from > system POV, libvirt creates a TAP, plugs it into a bridge (when > starting the first domain). And when shutting it down and starting the > second domain in parallel a new TAP device (with different index and > MAC) is created independent of the first TAP, no? Yes. There's no problem with the tap device creation/deletion by itself. The problem is when that tap device is attached to an OVS switch. A "port" of the same name as the tap device is created in OVS, and that port continues to exist after the tap device itself is deleted. If a new tap device of the same name is created without first deleting the port from OVS, then the new tap device is automatically connected to the existing port of the same name (I don't know the exact details of how this is done, but OVS must have it's claws somewhere in the kernel or udev to make this happen, as I've seen it (and can reproduce it) myself). Even that, by itself, isn't a problem. But if you have a tap device attached to an OVS port, and you delete the tap, that frees up the name so that the next time someone asks to create a tap device they get that same name. But if that happens before the old port has been removed from OVS (which is a separate operation), then the new tap device is attached to the old switch. In that case, simple requests to attach the new tap to a different switch (or a Linux bridge) will fail. We solve that problem when the new tap is being attached to an OVS switch by prepending "--if-exists del-port blah" to the ovs-vsctl command that attaches to the new switch, but that doesn't help in the case that the new tap is being attached to a Linux host bridge. Performing items (1) and (2) in my list will probably get rid of all bad behavior in practice, but there is still technically the possibility that the thread (B) for the new tap device will create that tap device after the thread (A) for the old tap device checks that the device doesn't exist (i.e. that a new tap with the same name hasn't yet been created) but before removing the port (of the same name) from the OVS switch. This would all be much simpler if the kernel would put a "FIN WAIT" state on all tap device names so that they couldn't be re-used for a few seconds after deletion, but it doesn't, so we have to work with what we've got.
On Fri, Jun 19, 2020 at 12:19:05PM -0400, Laine Stump wrote: > > This would all be much simpler if the kernel would put a "FIN WAIT" state on > all tap device names so that they couldn't be re-used for a few seconds > after deletion, but it doesn't, so we have to work with what we've got. The problem is that the kernel reuses tap device names. Can we just take the kernel out of the equation and do auto-assignment of names ourselves. Maintain a global "int nextTapID" counter, and just iterate on this. NIC names can be upto 16 bytes, so we'll create billions of devices before we have any chance of wrapping around and risking a collision with a concurrently shutting down guest. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On 6/19/20 12:26 PM, Daniel P. Berrangé wrote: > On Fri, Jun 19, 2020 at 12:19:05PM -0400, Laine Stump wrote: >> This would all be much simpler if the kernel would put a "FIN WAIT" state on >> all tap device names so that they couldn't be re-used for a few seconds >> after deletion, but it doesn't, so we have to work with what we've got. > The problem is that the kernel reuses tap device names. > > Can we just take the kernel out of the equation and do auto-assignment > of names ourselves. Yeah, that's what we do for macvtap. It is extra code, non-trivial, and therefore prone to errors (as in, "anything over 1 line long is prone to errors"). I *thought* I had remembered that couldn't work properly for standard tap devices couldn't work due to the way that tap devices are "created" (you first open /dev/net/tun, then issue a ioctl(TUNSETIFF, "desiredname") on the handle returned by open. I had been concerned that if you did that with an existing name it would just connect you to the existing tap, but I tried it just now and got an error the 2nd time I tried to use the same tap device name, so perhaps/probably I'm again misremembering. > > Maintain a global "int nextTapID" counter, and just iterate on this. > NIC names can be upto 16 bytes, so we'll create billions of devices > before we have any chance of wrapping around and risking a collision > with a concurrently shutting down guest. I remember I tried doing a monotonically increasing "next" counter for macvtap (never actually pushed, but shown to [some users, I forget who, maybe ovirt devs?], and they didn't like it because the devices ended up with long difficult-for-a-human-to-remember/pick-out/recite names like macvtap42301 and macvtap43021. So instead we keep track of the names that are in use, and always look for the lowest available number when creating a new one. (of course doing that would greatly increase the likelyhood of exposing any race conditions, so...) Definitely if we change the behavior in this way we're going to hear about it, though :-) Another issue I remember from macvtap is the possibility of a different process (not libvirt) having already created a macvtap device with the name that we are wanting to use. That is easily solved in the case of macvtap by just iterating through until we find one that we can create without failure. So basically you have the list/bitmap/whatever of devicenames currently in use, use that as a starting point to pick a name for the new device, but then also allow for that name to already be used, and retry. Note that since macvtap and standard tap devices (and, actually *all* network devices) in the same namespace need to not have conflicting names, we could keep a single list of in-use names. Since we would never auto-generate a tap device name that conflicts with an auto-generated macvtap name, that is probably unnecessary, but just in case it makes things easier... Also, in the name of backward compatibility, we will need to support tap device names from the config that have %d in the name, e.g. the default name we send is "vnet%d", but we allow for someone to specify "mytap%d". In the end, if we can do something simple that preserves current behavior while covering the hole, that would probably be more expedient, but I have to say the thought of naming the devices ourselves has crossed my mind.
On Fri, Jun 19, 2020 at 01:06:33PM -0400, Laine Stump wrote: > On 6/19/20 12:26 PM, Daniel P. Berrangé wrote: > > Maintain a global "int nextTapID" counter, and just iterate on this. > > NIC names can be upto 16 bytes, so we'll create billions of devices > > before we have any chance of wrapping around and risking a collision > > with a concurrently shutting down guest. > > > I remember I tried doing a monotonically increasing "next" counter for > macvtap (never actually pushed, but shown to [some users, I forget who, > maybe ovirt devs?], and they didn't like it because the devices ended up > with long difficult-for-a-human-to-remember/pick-out/recite names like > macvtap42301 and macvtap43021. So instead we keep track of the names that > are in use, and always look for the lowest available number when creating a > new one. (of course doing that would greatly increase the likelyhood of > exposing any race conditions, so...) Definitely if we change the behavior in > this way we're going to hear about it, though :-) People might complain, but I'm not convinced it really matters. Pid numbers used to be nice & short until someone raised pid max and now my processes have 7-digit long pids. It is surprising at first, but it didn't really cause me any functional problems. And there's still the option of just providing a fixed name if you really need something predictable for a guest. > Another issue I remember from macvtap is the possibility of a different > process (not libvirt) having already created a macvtap device with the name > that we are wanting to use. That is easily solved in the case of macvtap by > just iterating through until we find one that we can create without failure. > So basically you have the list/bitmap/whatever of devicenames currently in > use, use that as a starting point to pick a name for the new device, but > then also allow for that name to already be used, and retry. I wasn't thinking we need a bitmap, literally just a single counter. We start where we left off, trying until we succeeed, which is what the kernel does anyway IIRC. Most other competing processes will rely on the kernel auto-allocation so won't clash with us very frequently if at all. > Also, in the name of backward compatibility, we will need to support tap > device names from the config that have %d in the name, e.g. the default name > we send is "vnet%d", but we allow for someone to specify "mytap%d". We could keep a counter per prefix, or just have a single global counter. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On 6/19/20 1:16 PM, Daniel P. Berrangé wrote: > On Fri, Jun 19, 2020 at 01:06:33PM -0400, Laine Stump wrote: >> On 6/19/20 12:26 PM, Daniel P. Berrangé wrote: >>> Maintain a global "int nextTapID" counter, and just iterate on this. >>> NIC names can be upto 16 bytes, so we'll create billions of devices >>> before we have any chance of wrapping around and risking a collision >>> with a concurrently shutting down guest. >> >> I remember I tried doing a monotonically increasing "next" counter for >> macvtap (never actually pushed, but shown to [some users, I forget who, >> maybe ovirt devs?], and they didn't like it because the devices ended up >> with long difficult-for-a-human-to-remember/pick-out/recite names like >> macvtap42301 and macvtap43021. So instead we keep track of the names that >> are in use, and always look for the lowest available number when creating a >> new one. (of course doing that would greatly increase the likelyhood of >> exposing any race conditions, so...) Definitely if we change the behavior in >> this way we're going to hear about it, though :-) > People might complain, but I'm not convinced it really matters. Pid > numbers used to be nice & short until someone raised pid max and now > my processes have 7-digit long pids. It is surprising at first, but > it didn't really cause me any functional problems. This is a good point. But even more convincing than that is the revelation that this isn't an issue just with OVS switch ports - the same race is also causing problems with libvirt nwfilter driver bindings, as reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1837395 After seeing that report and realizing that this same race is the cause, I've decided to change libvirt's tap device creation to provide names based on a monotonically increasing counter as you suggest, rather than relying on the kernel. I'm hoping to have something to send to the list in a day or two. > > And there's still the option of just providing a fixed name if you > really need something predictable for a guest. > >> Another issue I remember from macvtap is the possibility of a different >> process (not libvirt) having already created a macvtap device with the name >> that we are wanting to use. That is easily solved in the case of macvtap by >> just iterating through until we find one that we can create without failure. >> So basically you have the list/bitmap/whatever of devicenames currently in >> use, use that as a starting point to pick a name for the new device, but >> then also allow for that name to already be used, and retry. > I wasn't thinking we need a bitmap, literally just a single counter. > We start where we left off, trying until we succeeed, which is what > the kernel does anyway IIRC. Most other competing processes will > rely on the kernel auto-allocation so won't clash with us very > frequently if at all. > >> Also, in the name of backward compatibility, we will need to support tap >> device names from the config that have %d in the name, e.g. the default name >> we send is "vnet%d", but we allow for someone to specify "mytap%d". > We could keep a counter per prefix, or just have a single global > counter. > > Regards, > Daniel
© 2016 - 2024 Red Hat, Inc.