[PATCH RFC iwl-next 0/4] iavf: fix VLAN filter state machine races

Petr Oros posted 4 patches 1 month, 1 week ago
There is a newer version of this series
drivers/net/ethernet/intel/iavf/iavf.h        |  9 +--
drivers/net/ethernet/intel/iavf/iavf_main.c   | 53 ++++---------
.../net/ethernet/intel/iavf/iavf_virtchnl.c   | 76 +++++++++----------
3 files changed, 54 insertions(+), 84 deletions(-)
[PATCH RFC iwl-next 0/4] iavf: fix VLAN filter state machine races
Posted by Petr Oros 1 month, 1 week ago
The iavf VLAN filter state machine has several design issues that lead
to race conditions between userspace add/del calls and the watchdog
task's virtchnl processing.  Filters can get lost or leak HW resources,
especially during interface down/up cycles and namespace moves.

The root problems:

1) On interface down, all VLAN filters are sent as DEL to PF and
   re-added on interface up.  This is unnecessary and creates multiple
   race windows (details below).

2) The DELETE path immediately frees the filter struct after sending
   the DEL message, without waiting for PF confirmation.  If the PF
   rejects the DEL, the filter remains in HW but the driver lost its
   tracking structure.  Race conditions between a pending DEL and
   add/reset operations cannot be resolved because the struct is gone.

3) VIRTCHNL_OP_ADD_VLAN (V1) had no success completion handler, so
   filters stayed in IS_NEW state permanently.


Why removing VLAN filters on down/up is unnecessary:

Unlike MAC filters, which need to be re-evaluated on up because the
PF can administratively change the MAC address during down, VLAN
filters are purely user-controlled.  The PF cannot change them while
the VF is down.  When the VF goes down, VIRTCHNL_OP_DISABLE_QUEUES
stops all traffic -- VLAN filters sitting in PF HW are harmless
because no packets flow through the disabled queues.

Compare with other filter types in iavf_down():
- MAC filters: only the current MAC is removed (it gets re-read from
  PF on up in case it was administratively changed)
- Cloud filters: left as-is across down/up
- FDIR filters: left as-is across down/up

VLAN filters were the only type going through a full DEL+ADD cycle,
and this caused real problems:

- With spoofcheck enabled, the PF activates TX VLAN anti-spoof on
  the first non-zero VLAN ADD.  During the re-add phase after up,
  the filter list is transiently incomplete -- traffic for VLANs not
  yet re-added gets dropped by anti-spoof.

- Rapid down/up can overlap with pending DEL messages.  The old code
  used DISABLE/INACTIVE states to track this, but the DISABLE state
  could overwrite a concurrent REMOVE from userspace, causing the
  filter to be restored instead of deleted.

- Namespace moves trigger implicit ndo_vlan_rx_kill_vid() calls
  concurrent with the down/up sequence.  The DEL from the namespace
  teardown races with the DISABLE from iavf_down(), and the filter
  can end up leaked in num_vlan_filters with no associated netdev.

After reset, VF-configured VLAN filters are properly re-added via
the VIRTCHNL_OP_GET_VF_RESOURCES / GET_OFFLOAD_VLAN_V2_CAPS response
handlers, which unconditionally set all filters to ADD state.  This
path is unaffected by these changes.


This series addresses all three issues:

Patch 1 renames IS_NEW to ADDING for clarity.

Patch 2 removes the DISABLE/INACTIVE state machinery so VLAN filters
stay ACTIVE across down/up cycles.  This is the core behavioral
change -- VLAN filters are no longer sent as DEL to PF on interface
down, and iavf_restore_filters() is removed since there is nothing
to restore.

Patch 3 adds a REMOVING state to make the DELETE path symmetric with
ADD -- filters are only freed after PF confirms the deletion.  If the
PF rejects the DEL, the filter reverts to ACTIVE instead of being
lost.

Patch 4 hardens the remaining race windows: adds V1 ADD success
handler and prevents redundant DEL on filters already in REMOVING
state.

Petr Oros (4):
  iavf: rename IAVF_VLAN_IS_NEW to IAVF_VLAN_ADDING
  iavf: stop removing VLAN filters from PF on interface down
  iavf: wait for PF confirmation before removing VLAN filters
  iavf: harden VLAN filter state machine race handling

 drivers/net/ethernet/intel/iavf/iavf.h        |  9 +--
 drivers/net/ethernet/intel/iavf/iavf_main.c   | 53 ++++---------
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 76 +++++++++----------
 3 files changed, 54 insertions(+), 84 deletions(-)

-- 
2.52.0
Re: [PATCH RFC iwl-next 0/4] iavf: fix VLAN filter state machine races
Posted by Petr Oros 1 month ago
I leveraged Claude Opus 4.6 to develop a stress-test suite with a
primary 'break-it' objective targeting VF stability. The suite focuses
on aggressive edge cases, specifically cyclic VF migration between
network namespaces while VLAN filtering is active a sequence known
to trigger state machine regressions. The following output
demonstrates the failure state on an unpatched iavf driver (prior to
the 'fix VLAN filter state machine races' patch):

echo 8 > /sys/class/net/enp65s0f0np0/device/sriov_numvfs
# ./tools/testing/selftests/drivers/net/iavf_vlan_state.sh
================================================
   iavf VLAN state machine test suite
================================================
   VF1:  enp65s0f0v0 (0000:41:01.0) -> iavf-t1-6502
   VF2:  enp65s0f0v1 (0000:41:01.1) -> iavf-t2-6502
   PF:   enp65s0f0np0 (0000:41:00.0)
   MAX:  8 user VLANs per VF
================================================
   PASS  state: basic add/remove
RTNETLINK answers: Input/output error
Cannot find device "enp65s0f0v0.107"
Cannot find device "enp65s0f0v0.107"
   FAIL  state: 8 VLANs add/remove  (only 7 created)
   PASS  state: VLAN persists across down/up
   PASS  state: 5 VLANs persist across down/up
   PASS  state: rapid add/del same VLAN x100
   PASS  state: add during remove (REMOVING race)
RTNETLINK answers: Input/output error
Cannot find device "enp65s0f0v0.107"
Cannot find device "enp65s0f0v0.107"
   PASS  state: bulk 8 add then remove
   PASS  state: 20x rapid down/up with VLAN
   PASS  state: add VLAN while down
   PASS  state: remove VLAN while down
   PASS  state: down -> remove -> up
   PASS  state: add VLANs while down, verify all after up
   PASS  state: double add same VLAN (idempotent)
   PASS  state: double remove same VLAN
   PASS  state: interleaved add/remove different VIDs
   PASS  state: remove+re-add loop x50
RTNETLINK answers: Input/output error
Cannot find device "enp65s0f0v0.107"
Cannot find device "enp65s0f0v0.107"
   FAIL  state: stress 8 VLANs (fill to max)  (expected 8, got 7)
   PASS  state: VLAN VID 1 (common edge case)
   PASS  state: VLAN VID 4094 (max)
   PASS  state: concurrent VLAN adds (4 parallel)
   PASS  state: concurrent VLAN deletes (4 parallel)
   PASS  state: add/del storm (200 ops, 5 VIDs)
RTNETLINK answers: Input/output error
Cannot find device "enp65s0f0v0.107"
Cannot find device "enp65s0f0v0.107"
   FAIL  state: over-limit VLAN rejected, existing survive  (fill: 
expected 8, got 7)
   PASS  reset: VLANs recover after VF PCI FLR
   PASS  reset: 5 VLANs recover after VF PCI FLR
   PASS  reset: rapid VF resets x5 with VLANs
   PASS  reset: VLANs survive PF link flap
   PASS  reset: 5 VLANs survive PF link flap
   PASS  reset: VLANs survive 3x PF link flap
   PASS  reset: VLANs survive PF PCI FLR
RTNETLINK answers: Input/output error
Cannot find device "enp65s0f0v0.107"
Cannot find device "enp65s0f0v0.107"
   FAIL  reset: all 8 VLANs recover after VF FLR  (VLAN 107 gone)
RTNETLINK answers: Input/output error
Cannot find device "enp65s0f0v0.107"
Cannot find device "enp65s0f0v0.107"
   FAIL  reset: all 8 VLANs survive PF link flap  (VLAN 107 gone)
RTNETLINK answers: Input/output error
Cannot find device "enp65s0f0v0.107"
Cannot find device "enp65s0f0v0.107"
   FAIL  reset: all 8 VLANs survive PF PCI FLR  (VLAN 107 gone)
   PASS  reset: FLR during VLAN add/del (race)
   PASS  reset: VF driver unbind/bind cycle
   PASS  ping: basic VLAN traffic
   PASS  ping: 5 VLANs simultaneously
   PASS  ping: survives VF down/up
   PASS  ping: survives 10x rapid VF flap
   PASS  ping: survives VF PCI FLR
   PASS  ping: survives PF link flap
   PASS  ping: survives PF PCI FLR
   PASS  ping: stable while adding/removing other VLANs
   PASS  ping: all 3 VLANs work after down/up
   PASS  ping: parallel VLAN churn from both VFs
   PASS  ping: VLANs work after rapid add/del churn
   PASS  ping: VLANs survive repeated NS move cycle
   PASS  ping: all VLANs survive PF link flap
   PASS  ping: VLAN isolation (no cross-VLAN leakage)
   PASS  ping: traffic works with spoofchk enabled
   PASS  ping: port VLAN (PF-assigned pvid)
   PASS  dmesg: no call traces / BUGs / stalls

================================================
   PASS 46  |  FAIL 6  |  SKIP 0  |  TOTAL 52
================================================
   RESULT: FAIL  -- check dmesg


The underlying failures stem from a breakdown in state synchronization
between the VF and the PF. This desynchronization prevents the driver
from maintaining a consistent hardware state during rapid configuration
cycles, leading to the observed issues.

...................

Patched kernel:

# echo 8 > /sys/class/net/enp65s0f0np0/device/sriov_numvfs
# ./tools/testing/selftests/drivers/net/iavf_vlan_state.sh
================================================
   iavf VLAN state machine test suite
================================================
   VF1:  enp65s0f0v0 (0000:41:01.0) -> iavf-t1-6573
   VF2:  enp65s0f0v1 (0000:41:01.1) -> iavf-t2-6573
   PF:   enp65s0f0np0 (0000:41:00.0)
   MAX:  8 user VLANs per VF
================================================
   PASS  state: basic add/remove
   PASS  state: 8 VLANs add/remove
   PASS  state: VLAN persists across down/up
   PASS  state: 5 VLANs persist across down/up
   PASS  state: rapid add/del same VLAN x100
   PASS  state: add during remove (REMOVING race)
   PASS  state: bulk 8 add then remove
   PASS  state: 20x rapid down/up with VLAN
   PASS  state: add VLAN while down
   PASS  state: remove VLAN while down
   PASS  state: down -> remove -> up
   PASS  state: add VLANs while down, verify all after up
   PASS  state: double add same VLAN (idempotent)
   PASS  state: double remove same VLAN
   PASS  state: interleaved add/remove different VIDs
   PASS  state: remove+re-add loop x50
   PASS  state: stress 8 VLANs (fill to max)
   PASS  state: VLAN VID 1 (common edge case)
   PASS  state: VLAN VID 4094 (max)
   PASS  state: concurrent VLAN adds (4 parallel)
   PASS  state: concurrent VLAN deletes (4 parallel)
   PASS  state: add/del storm (200 ops, 5 VIDs)
   PASS  state: over-limit VLAN rejected, existing survive
   PASS  reset: VLANs recover after VF PCI FLR
   PASS  reset: 5 VLANs recover after VF PCI FLR
   PASS  reset: rapid VF resets x5 with VLANs
   PASS  reset: VLANs survive PF link flap
   PASS  reset: 5 VLANs survive PF link flap
   PASS  reset: VLANs survive 3x PF link flap
   PASS  reset: VLANs survive PF PCI FLR
   PASS  reset: all 8 VLANs recover after VF FLR
   PASS  reset: all 8 VLANs survive PF link flap
   PASS  reset: all 8 VLANs survive PF PCI FLR
   PASS  reset: FLR during VLAN add/del (race)
   PASS  reset: VF driver unbind/bind cycle
   PASS  ping: basic VLAN traffic
   PASS  ping: 5 VLANs simultaneously
   PASS  ping: survives VF down/up
   PASS  ping: survives 10x rapid VF flap
   PASS  ping: survives VF PCI FLR
   PASS  ping: survives PF link flap
   PASS  ping: survives PF PCI FLR
   PASS  ping: stable while adding/removing other VLANs
   PASS  ping: all 3 VLANs work after down/up
   PASS  ping: parallel VLAN churn from both VFs
   PASS  ping: VLANs work after rapid add/del churn
   PASS  ping: VLANs survive repeated NS move cycle
   PASS  ping: all VLANs survive PF link flap
   PASS  ping: VLAN isolation (no cross-VLAN leakage)
   PASS  ping: traffic works with spoofchk enabled
   PASS  ping: port VLAN (PF-assigned pvid)
   PASS  dmesg: no call traces / BUGs / stalls

================================================
   PASS 52  |  FAIL 0  |  SKIP 0  |  TOTAL 52
================================================
   RESULT: OK

Additionally, interface up/down performance with active VLAN
filtering is significantly improved. The previous bottleneck—a
synchronous VLAN filtering cycle (VF -> PF -> HW -> PF -> VF)
utilizing AdminQ for per-VLAN updates introduced substantial
latency.

Test suite:

https://github.com/torvalds/linux/commit/5c60850c33da80a1c2497fb6bc31f956316197a9 


Regards,

Petr