drivers/net/ethernet/intel/iavf/iavf.h | 9 +-- drivers/net/ethernet/intel/iavf/iavf_main.c | 53 ++++--------- .../net/ethernet/intel/iavf/iavf_virtchnl.c | 76 +++++++++---------- 3 files changed, 54 insertions(+), 84 deletions(-)
The iavf VLAN filter state machine has several design issues that lead to race conditions between userspace add/del calls and the watchdog task's virtchnl processing. Filters can get lost or leak HW resources, especially during interface down/up cycles and namespace moves. The root problems: 1) On interface down, all VLAN filters are sent as DEL to PF and re-added on interface up. This is unnecessary and creates multiple race windows (details below). 2) The DELETE path immediately frees the filter struct after sending the DEL message, without waiting for PF confirmation. If the PF rejects the DEL, the filter remains in HW but the driver lost its tracking structure. Race conditions between a pending DEL and add/reset operations cannot be resolved because the struct is gone. 3) VIRTCHNL_OP_ADD_VLAN (V1) had no success completion handler, so filters stayed in IS_NEW state permanently. Why removing VLAN filters on down/up is unnecessary: Unlike MAC filters, which need to be re-evaluated on up because the PF can administratively change the MAC address during down, VLAN filters are purely user-controlled. The PF cannot change them while the VF is down. When the VF goes down, VIRTCHNL_OP_DISABLE_QUEUES stops all traffic -- VLAN filters sitting in PF HW are harmless because no packets flow through the disabled queues. Compare with other filter types in iavf_down(): - MAC filters: only the current MAC is removed (it gets re-read from PF on up in case it was administratively changed) - Cloud filters: left as-is across down/up - FDIR filters: left as-is across down/up VLAN filters were the only type going through a full DEL+ADD cycle, and this caused real problems: - With spoofcheck enabled, the PF activates TX VLAN anti-spoof on the first non-zero VLAN ADD. During the re-add phase after up, the filter list is transiently incomplete -- traffic for VLANs not yet re-added gets dropped by anti-spoof. - Rapid down/up can overlap with pending DEL messages. The old code used DISABLE/INACTIVE states to track this, but the DISABLE state could overwrite a concurrent REMOVE from userspace, causing the filter to be restored instead of deleted. - Namespace moves trigger implicit ndo_vlan_rx_kill_vid() calls concurrent with the down/up sequence. The DEL from the namespace teardown races with the DISABLE from iavf_down(), and the filter can end up leaked in num_vlan_filters with no associated netdev. After reset, VF-configured VLAN filters are properly re-added via the VIRTCHNL_OP_GET_VF_RESOURCES / GET_OFFLOAD_VLAN_V2_CAPS response handlers, which unconditionally set all filters to ADD state. This path is unaffected by these changes. This series addresses all three issues: Patch 1 renames IS_NEW to ADDING for clarity. Patch 2 removes the DISABLE/INACTIVE state machinery so VLAN filters stay ACTIVE across down/up cycles. This is the core behavioral change -- VLAN filters are no longer sent as DEL to PF on interface down, and iavf_restore_filters() is removed since there is nothing to restore. Patch 3 adds a REMOVING state to make the DELETE path symmetric with ADD -- filters are only freed after PF confirms the deletion. If the PF rejects the DEL, the filter reverts to ACTIVE instead of being lost. Patch 4 hardens the remaining race windows: adds V1 ADD success handler and prevents redundant DEL on filters already in REMOVING state. Petr Oros (4): iavf: rename IAVF_VLAN_IS_NEW to IAVF_VLAN_ADDING iavf: stop removing VLAN filters from PF on interface down iavf: wait for PF confirmation before removing VLAN filters iavf: harden VLAN filter state machine race handling drivers/net/ethernet/intel/iavf/iavf.h | 9 +-- drivers/net/ethernet/intel/iavf/iavf_main.c | 53 ++++--------- .../net/ethernet/intel/iavf/iavf_virtchnl.c | 76 +++++++++---------- 3 files changed, 54 insertions(+), 84 deletions(-) -- 2.52.0
I leveraged Claude Opus 4.6 to develop a stress-test suite with a primary 'break-it' objective targeting VF stability. The suite focuses on aggressive edge cases, specifically cyclic VF migration between network namespaces while VLAN filtering is active a sequence known to trigger state machine regressions. The following output demonstrates the failure state on an unpatched iavf driver (prior to the 'fix VLAN filter state machine races' patch): echo 8 > /sys/class/net/enp65s0f0np0/device/sriov_numvfs # ./tools/testing/selftests/drivers/net/iavf_vlan_state.sh ================================================ iavf VLAN state machine test suite ================================================ VF1: enp65s0f0v0 (0000:41:01.0) -> iavf-t1-6502 VF2: enp65s0f0v1 (0000:41:01.1) -> iavf-t2-6502 PF: enp65s0f0np0 (0000:41:00.0) MAX: 8 user VLANs per VF ================================================ PASS state: basic add/remove RTNETLINK answers: Input/output error Cannot find device "enp65s0f0v0.107" Cannot find device "enp65s0f0v0.107" FAIL state: 8 VLANs add/remove (only 7 created) PASS state: VLAN persists across down/up PASS state: 5 VLANs persist across down/up PASS state: rapid add/del same VLAN x100 PASS state: add during remove (REMOVING race) RTNETLINK answers: Input/output error Cannot find device "enp65s0f0v0.107" Cannot find device "enp65s0f0v0.107" PASS state: bulk 8 add then remove PASS state: 20x rapid down/up with VLAN PASS state: add VLAN while down PASS state: remove VLAN while down PASS state: down -> remove -> up PASS state: add VLANs while down, verify all after up PASS state: double add same VLAN (idempotent) PASS state: double remove same VLAN PASS state: interleaved add/remove different VIDs PASS state: remove+re-add loop x50 RTNETLINK answers: Input/output error Cannot find device "enp65s0f0v0.107" Cannot find device "enp65s0f0v0.107" FAIL state: stress 8 VLANs (fill to max) (expected 8, got 7) PASS state: VLAN VID 1 (common edge case) PASS state: VLAN VID 4094 (max) PASS state: concurrent VLAN adds (4 parallel) PASS state: concurrent VLAN deletes (4 parallel) PASS state: add/del storm (200 ops, 5 VIDs) RTNETLINK answers: Input/output error Cannot find device "enp65s0f0v0.107" Cannot find device "enp65s0f0v0.107" FAIL state: over-limit VLAN rejected, existing survive (fill: expected 8, got 7) PASS reset: VLANs recover after VF PCI FLR PASS reset: 5 VLANs recover after VF PCI FLR PASS reset: rapid VF resets x5 with VLANs PASS reset: VLANs survive PF link flap PASS reset: 5 VLANs survive PF link flap PASS reset: VLANs survive 3x PF link flap PASS reset: VLANs survive PF PCI FLR RTNETLINK answers: Input/output error Cannot find device "enp65s0f0v0.107" Cannot find device "enp65s0f0v0.107" FAIL reset: all 8 VLANs recover after VF FLR (VLAN 107 gone) RTNETLINK answers: Input/output error Cannot find device "enp65s0f0v0.107" Cannot find device "enp65s0f0v0.107" FAIL reset: all 8 VLANs survive PF link flap (VLAN 107 gone) RTNETLINK answers: Input/output error Cannot find device "enp65s0f0v0.107" Cannot find device "enp65s0f0v0.107" FAIL reset: all 8 VLANs survive PF PCI FLR (VLAN 107 gone) PASS reset: FLR during VLAN add/del (race) PASS reset: VF driver unbind/bind cycle PASS ping: basic VLAN traffic PASS ping: 5 VLANs simultaneously PASS ping: survives VF down/up PASS ping: survives 10x rapid VF flap PASS ping: survives VF PCI FLR PASS ping: survives PF link flap PASS ping: survives PF PCI FLR PASS ping: stable while adding/removing other VLANs PASS ping: all 3 VLANs work after down/up PASS ping: parallel VLAN churn from both VFs PASS ping: VLANs work after rapid add/del churn PASS ping: VLANs survive repeated NS move cycle PASS ping: all VLANs survive PF link flap PASS ping: VLAN isolation (no cross-VLAN leakage) PASS ping: traffic works with spoofchk enabled PASS ping: port VLAN (PF-assigned pvid) PASS dmesg: no call traces / BUGs / stalls ================================================ PASS 46 | FAIL 6 | SKIP 0 | TOTAL 52 ================================================ RESULT: FAIL -- check dmesg The underlying failures stem from a breakdown in state synchronization between the VF and the PF. This desynchronization prevents the driver from maintaining a consistent hardware state during rapid configuration cycles, leading to the observed issues. ................... Patched kernel: # echo 8 > /sys/class/net/enp65s0f0np0/device/sriov_numvfs # ./tools/testing/selftests/drivers/net/iavf_vlan_state.sh ================================================ iavf VLAN state machine test suite ================================================ VF1: enp65s0f0v0 (0000:41:01.0) -> iavf-t1-6573 VF2: enp65s0f0v1 (0000:41:01.1) -> iavf-t2-6573 PF: enp65s0f0np0 (0000:41:00.0) MAX: 8 user VLANs per VF ================================================ PASS state: basic add/remove PASS state: 8 VLANs add/remove PASS state: VLAN persists across down/up PASS state: 5 VLANs persist across down/up PASS state: rapid add/del same VLAN x100 PASS state: add during remove (REMOVING race) PASS state: bulk 8 add then remove PASS state: 20x rapid down/up with VLAN PASS state: add VLAN while down PASS state: remove VLAN while down PASS state: down -> remove -> up PASS state: add VLANs while down, verify all after up PASS state: double add same VLAN (idempotent) PASS state: double remove same VLAN PASS state: interleaved add/remove different VIDs PASS state: remove+re-add loop x50 PASS state: stress 8 VLANs (fill to max) PASS state: VLAN VID 1 (common edge case) PASS state: VLAN VID 4094 (max) PASS state: concurrent VLAN adds (4 parallel) PASS state: concurrent VLAN deletes (4 parallel) PASS state: add/del storm (200 ops, 5 VIDs) PASS state: over-limit VLAN rejected, existing survive PASS reset: VLANs recover after VF PCI FLR PASS reset: 5 VLANs recover after VF PCI FLR PASS reset: rapid VF resets x5 with VLANs PASS reset: VLANs survive PF link flap PASS reset: 5 VLANs survive PF link flap PASS reset: VLANs survive 3x PF link flap PASS reset: VLANs survive PF PCI FLR PASS reset: all 8 VLANs recover after VF FLR PASS reset: all 8 VLANs survive PF link flap PASS reset: all 8 VLANs survive PF PCI FLR PASS reset: FLR during VLAN add/del (race) PASS reset: VF driver unbind/bind cycle PASS ping: basic VLAN traffic PASS ping: 5 VLANs simultaneously PASS ping: survives VF down/up PASS ping: survives 10x rapid VF flap PASS ping: survives VF PCI FLR PASS ping: survives PF link flap PASS ping: survives PF PCI FLR PASS ping: stable while adding/removing other VLANs PASS ping: all 3 VLANs work after down/up PASS ping: parallel VLAN churn from both VFs PASS ping: VLANs work after rapid add/del churn PASS ping: VLANs survive repeated NS move cycle PASS ping: all VLANs survive PF link flap PASS ping: VLAN isolation (no cross-VLAN leakage) PASS ping: traffic works with spoofchk enabled PASS ping: port VLAN (PF-assigned pvid) PASS dmesg: no call traces / BUGs / stalls ================================================ PASS 52 | FAIL 0 | SKIP 0 | TOTAL 52 ================================================ RESULT: OK Additionally, interface up/down performance with active VLAN filtering is significantly improved. The previous bottleneck—a synchronous VLAN filtering cycle (VF -> PF -> HW -> PF -> VF) utilizing AdminQ for per-VLAN updates introduced substantial latency. Test suite: https://github.com/torvalds/linux/commit/5c60850c33da80a1c2497fb6bc31f956316197a9 Regards, Petr
© 2016 - 2026 Red Hat, Inc.