net/openvswitch/vport-netdev.c | 6 ++- .../selftests/net/openvswitch/openvswitch.sh | 37 +++++++++++++++++++ .../selftests/net/openvswitch/ovs-dpctl.py | 19 +++++++--- 3 files changed, 55 insertions(+), 7 deletions(-)
Two patches - the fix for the actual bug and the selftest that reproduces it.
I missed the self-deadlock in the original patch that introduced the issue,
because testing required code modification in the ovs-vswitchd to force it to
use legacy tunnel ports. I thought I made the change correctly, but apparently
something went wrong and the tests were run with the standard LWT infra instead.
The selftest added in this patch set will at least prevent this kind of mistakes
in the future.
I mentioned, however, that these tunnel vports are legacy and not actually used
by ovs-vswitchd. RTM_NEWLINK + COLLECT_METADATA is used in conjunction with the
standard OVS_VPORT_TYPE_NETDEV instead since 2017. The code to use the legacy
tunnels still exists in ovs-vswitchd however, but only as a fallback for older
kernels and we're planning to remove it in the next release. I'll be sending an
RFC to remove support for these legacy tunnel types from the kernel, as they
serve no real purpose today and only increase the uAPI surface for CVEs, but
we need to fix the known bugs for stable versions.
Version 2:
- Added Ack from Eelco to the first patch (not to the second as it
changed a little).
- Removed now unused import socket in the dpctl.py [pylint/ruff].
- Regarding comments from both Sashiko instances on the selftest patch:
* The background process is not waited for / not killed.
If it hangs it will not be killable anyway, so it's not a problem.
* The 'gre' choice for dpctl.py --ptype is not fully handled for --lwt.
While this is not needed for this patch, I agree that it's not
fully consistent. Added the proper handling in the TUNNEL_DEFAULTS
loop in this version.
* Python version concern for argparse.BooleanOptionalAction.
Python 3.9 is the oldest supported version and it has it, so it's
not an issue. Creating extra detection will only complicate the
script with no real benefits.
Version 1:
https://lore.kernel.org/netdev/20260429151756.4157670-1-i.maximets@ovn.org/
Ilya Maximets (2):
openvswitch: vport: fix self-deadlock on release of tunnel ports
selftests: openvswitch: add tests for tunnel vport refcounting
net/openvswitch/vport-netdev.c | 6 ++-
.../selftests/net/openvswitch/openvswitch.sh | 37 +++++++++++++++++++
.../selftests/net/openvswitch/ovs-dpctl.py | 19 +++++++---
3 files changed, 55 insertions(+), 7 deletions(-)
--
2.53.0
On 5/1/26 1:38 AM, Ilya Maximets wrote: > Two patches - the fix for the actual bug and the selftest that reproduces it. > > I missed the self-deadlock in the original patch that introduced the issue, > because testing required code modification in the ovs-vswitchd to force it to > use legacy tunnel ports. I thought I made the change correctly, but apparently > something went wrong and the tests were run with the standard LWT infra instead. > The selftest added in this patch set will at least prevent this kind of mistakes > in the future. > > I mentioned, however, that these tunnel vports are legacy and not actually used > by ovs-vswitchd. RTM_NEWLINK + COLLECT_METADATA is used in conjunction with the > standard OVS_VPORT_TYPE_NETDEV instead since 2017. The code to use the legacy > tunnels still exists in ovs-vswitchd however, but only as a fallback for older > kernels and we're planning to remove it in the next release. I'll be sending an > RFC to remove support for these legacy tunnel types from the kernel, as they > serve no real purpose today and only increase the uAPI surface for CVEs, but > we need to fix the known bugs for stable versions. > > > Version 2: > - Added Ack from Eelco to the first patch (not to the second as it > changed a little). > - Removed now unused import socket in the dpctl.py [pylint/ruff]. > > - Regarding comments from both Sashiko instances on the selftest patch: > > * The background process is not waited for / not killed. > If it hangs it will not be killable anyway, so it's not a problem. Both sashiko instances still flag this. Looks like the cover letter is not included in the prompt. If someone thinks I should add the suggested kill on exit, I can, but it will not be effective in case the process hangs. Best regards, Ilya Maximets.
Ilya Maximets <i.maximets@ovn.org> writes: > On 5/1/26 1:38 AM, Ilya Maximets wrote: >> Two patches - the fix for the actual bug and the selftest that reproduces it. >> >> I missed the self-deadlock in the original patch that introduced the issue, >> because testing required code modification in the ovs-vswitchd to force it to >> use legacy tunnel ports. I thought I made the change correctly, but apparently >> something went wrong and the tests were run with the standard LWT infra instead. >> The selftest added in this patch set will at least prevent this kind of mistakes >> in the future. >> >> I mentioned, however, that these tunnel vports are legacy and not actually used >> by ovs-vswitchd. RTM_NEWLINK + COLLECT_METADATA is used in conjunction with the >> standard OVS_VPORT_TYPE_NETDEV instead since 2017. The code to use the legacy >> tunnels still exists in ovs-vswitchd however, but only as a fallback for older >> kernels and we're planning to remove it in the next release. I'll be sending an >> RFC to remove support for these legacy tunnel types from the kernel, as they >> serve no real purpose today and only increase the uAPI surface for CVEs, but >> we need to fix the known bugs for stable versions. >> >> >> Version 2: >> - Added Ack from Eelco to the first patch (not to the second as it >> changed a little). >> - Removed now unused import socket in the dpctl.py [pylint/ruff]. >> >> - Regarding comments from both Sashiko instances on the selftest patch: >> >> * The background process is not waited for / not killed. >> If it hangs it will not be killable anyway, so it's not a problem. > > Both sashiko instances still flag this. Looks like the cover letter is not > included in the prompt. > > If someone thinks I should add the suggested kill on exit, I can, but it will > not be effective in case the process hangs. One option is to put a comment in the test itself documenting this kind of behavior. At least, then the model might not flag it. I don't feel strongly about that, however. > Best regards, Ilya Maximets.
© 2016 - 2026 Red Hat, Inc.