[PATCH iwl-net v2 0/4] iavf: fix VLAN filter state machine races

Petr Oros posted 4 patches 1 month, 3 weeks ago
drivers/net/ethernet/intel/iavf/iavf.h        |  9 +--
drivers/net/ethernet/intel/iavf/iavf_main.c   | 52 +++----------
.../net/ethernet/intel/iavf/iavf_virtchnl.c   | 76 +++++++++----------
3 files changed, 52 insertions(+), 85 deletions(-)
[PATCH iwl-net v2 0/4] iavf: fix VLAN filter state machine races
Posted by Petr Oros 1 month, 3 weeks ago
The iavf VLAN filter state machine has several design issues that lead
to race conditions between userspace add/del calls and the watchdog
task's virtchnl processing.  Filters can get lost or leak HW resources,
especially during interface down/up cycles and namespace moves.

The root problems:

1) On interface down, all VLAN filters are sent as DEL to PF and
   re-added on interface up.  This is unnecessary and creates multiple
   race windows (details below).

2) The DELETE path immediately frees the filter struct after sending
   the DEL message, without waiting for PF confirmation.  If the PF
   rejects the DEL, the filter remains in HW but the driver lost its
   tracking structure.  Race conditions between a pending DEL and
   add/reset operations cannot be resolved because the struct is gone.

3) VIRTCHNL_OP_ADD_VLAN (V1) had no success completion handler, so
   filters stayed in IS_NEW state permanently.


Why removing VLAN filters on down/up is unnecessary:

Unlike MAC filters, which need to be re-evaluated on up because the
PF can administratively change the MAC address during down, VLAN
filters are purely user-controlled.  The PF cannot change them while
the VF is down.  When the VF goes down, VIRTCHNL_OP_DISABLE_QUEUES
stops all traffic; VLAN filters sitting in PF HW are harmless
because no packets flow through the disabled queues.

Compare with other filter types in iavf_down():
- MAC filters: only the current MAC is removed (it gets re-read from
  PF on up in case it was administratively changed)
- Cloud filters: left as-is across down/up
- FDIR filters: left as-is across down/up

VLAN filters were the only type going through a full DEL+ADD cycle,
and this caused real problems:

- With spoofcheck enabled, the PF activates TX VLAN anti-spoof on
  the first non-zero VLAN ADD.  During the re-add phase after up,
  the filter list is transiently incomplete; traffic for VLANs not
  yet re-added gets dropped by anti-spoof.

- Rapid down/up can overlap with pending DEL messages.  The old code
  used DISABLE/INACTIVE states to track this, but the DISABLE state
  could overwrite a concurrent REMOVE from userspace, causing the
  filter to be restored instead of deleted.

- Namespace moves trigger implicit ndo_vlan_rx_kill_vid() calls
  concurrent with the down/up sequence.  The DEL from the namespace
  teardown races with the DISABLE from iavf_down(), and the filter
  can end up leaked in num_vlan_filters with no associated netdev.

After reset, VF-configured VLAN filters are properly re-added via
the VIRTCHNL_OP_GET_VF_RESOURCES / GET_OFFLOAD_VLAN_V2_CAPS response
handlers, which unconditionally set all filters to ADD state.  This
path is unaffected by these changes.


This series addresses all three issues:

Patch 1 renames IS_NEW to ADDING for clarity.

Patch 2 removes the DISABLE/INACTIVE state machinery so VLAN filters
stay ACTIVE across down/up cycles.  This is the core behavioral
change; VLAN filters are no longer sent as DEL to PF on interface
down, and iavf_restore_filters() is removed since there is nothing
to restore.

Patch 3 adds a REMOVING state to make the DELETE path symmetric with
ADD; filters are only freed after PF confirms the deletion.  If the
PF rejects the DEL, the filter reverts to ACTIVE instead of being
lost.

Patch 4 hardens the remaining race windows: adds V1 ADD success
handler and prevents redundant DEL on filters already in REMOVING
state.

v2: Retarget from iwl-next to iwl-net; these are bug fixes.
    Rebase on current net tree (conflict resolved).

Petr Oros (4):
  iavf: rename IAVF_VLAN_IS_NEW to IAVF_VLAN_ADDING
  iavf: stop removing VLAN filters from PF on interface down
  iavf: wait for PF confirmation before removing VLAN filters
  iavf: add VIRTCHNL_OP_ADD_VLAN to success completion handler

 drivers/net/ethernet/intel/iavf/iavf.h        |  9 +--
 drivers/net/ethernet/intel/iavf/iavf_main.c   | 52 +++----------
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 76 +++++++++----------
 3 files changed, 52 insertions(+), 85 deletions(-)

-- 
2.52.0
Re: [PATCH iwl-net v2 0/4] iavf: fix VLAN filter state machine races
Posted by Simon Horman 1 month, 2 weeks ago
On Fri, Apr 17, 2026 at 04:29:41PM +0200, Petr Oros wrote:
> The iavf VLAN filter state machine has several design issues that lead
> to race conditions between userspace add/del calls and the watchdog
> task's virtchnl processing.  Filters can get lost or leak HW resources,
> especially during interface down/up cycles and namespace moves.

...

Hi Petr,

Sashiko has a bit to say about this patch.
I'd appreciate it if you could look over that.

In particular, the feedback on patches 2 and 3 may warrant
some updates to this patchset, while I think 4 is more
in the realm of possible future work.
Re: [PATCH iwl-net v2 0/4] iavf: fix VLAN filter state machine races
Posted by Jacob Keller 1 month, 2 weeks ago
On 4/21/2026 2:02 AM, Simon Horman wrote:
> On Fri, Apr 17, 2026 at 04:29:41PM +0200, Petr Oros wrote:
>> The iavf VLAN filter state machine has several design issues that lead
>> to race conditions between userspace add/del calls and the watchdog
>> task's virtchnl processing.  Filters can get lost or leak HW resources,
>> especially during interface down/up cycles and namespace moves.
> 
> ...
> 
> Hi Petr,
> 
> Sashiko has a bit to say about this patch.
> I'd appreciate it if you could look over that.
> 
> In particular, the feedback on patches 2 and 3 may warrant
> some updates to this patchset, while I think 4 is more
> in the realm of possible future work.

@Petr,

Could you please review the Sashiko reports and clarify whether a new
version will be needed?

The original series posted as a net-next was Tested-by, and it would be
good to get this moving, but I don't want to queue it up for sending
until certain it won't simply get rejected due to these unresolved comments.

Thanks,
Jake
Re: [PATCH iwl-net v2 0/4] iavf: fix VLAN filter state machine races
Posted by Petr Oros 1 month, 1 week ago
On 4/23/26 22:48, Jacob Keller wrote:
> On 4/21/2026 2:02 AM, Simon Horman wrote:
>> On Fri, Apr 17, 2026 at 04:29:41PM +0200, Petr Oros wrote:
>>> The iavf VLAN filter state machine has several design issues that lead
>>> to race conditions between userspace add/del calls and the watchdog
>>> task's virtchnl processing.  Filters can get lost or leak HW resources,
>>> especially during interface down/up cycles and namespace moves.
>> ...
>>
>> Hi Petr,
>>
>> Sashiko has a bit to say about this patch.
>> I'd appreciate it if you could look over that.
>>
>> In particular, the feedback on patches 2 and 3 may warrant
>> some updates to this patchset, while I think 4 is more
>> in the realm of possible future work.
> @Petr,
>
> Could you please review the Sashiko reports and clarify whether a new
> version will be needed?
>
> The original series posted as a net-next was Tested-by, and it would be
> good to get this moving, but I don't want to queue it up for sending
> until certain it won't simply get rejected due to these unresolved comments.
>
> Thanks,
> Jake
>
Hi Jake,

The Sashiko review identified seven concerns across the four patches.

Five of them describe sub millisecond race windows. Rapid del and re add
of a VLAN in IAVF_VLAN_ADDING state. Pending IAVF_VLAN_ADD lost across
down and up before the watchdog ships the request. REMOVING combined with
user re add and user re del state confusion. The reset path resurrecting
filters that are in REMOVE or REMOVING state. Phantom ACTIVE after the PF
rejects an ADD whose user side del raced through.

The remaining two are deterministic pre existing V1 bugs unrelated to
this series. The V1 ADD_VLAN error path has never called 
iavf_vlan_add_reject().
The V2 path got it in 968996c070ef ("iavf: Fix VLAN_V2 addition/rejection")
and V1 was missed. These manifest whenever the PF rejects an ADD on i40e
for example a port VLAN conflict or an untrusted cap reached, and they
belong in a separate fix.

The five race window findings require tight syscall sequencing via ip batch
or sysfs FLR concurrent with del to reach. These patterns do not match how
NetworkManager, systemd-networkd, libvirt or cloud-init configure VLANs.
Those tools add VLANs once on VF setup and do not issue rapid del and
re add or trigger FLR mid operation. The current version keeps the state
machine minimal. Closing these windows requires per filter flag tracking
that adds complexity disproportionate to the user visible benefit on real
workloads.

Two larger problems are worth addressing in follow up work.
The first is num_vlan_filters accounting on V2 under high churn.
Post series, filters in REMOVING state count against
iavf_get_max_vlans_allowed until the PF confirms the deletion.
This can cause a transient EIO on rapid del then add when at the cap.
Pre series this was avoided by immediate kfree. The trade off here is
correctness (no HW resource leak on PF reject) at the cost of a transient
userspace error.

The second is the i40e silent ADD reject. The i40e PF rejects over cap or
untrusted VF VLAN ADDs by returning VIRTCHNL_STATUS_SUCCESS, so iavf cannot
surface the failure to userspace. ip link add ... type vlan reports success
while no filter exists in HW. V2 on ice avoids this via the client side cap.
Closing this gap requires PF and driver ABI coordination.

The series has been tested across documented user workflows on both
ice and i40e PFs in trusted and untrusted modes. The tested scenarios
include interface up and down cycles, namespace migration, VF reset,
VLAN add and remove sequences, parallel VLAN operations across two VFs,
traffic verification via ping under spoofcheck, port VLAN, and multi
VLAN configurations. The workflow scenarios pass on the patched kernel.

The small number of test failures observed were test framework artifacts
(missing IP configuration on probe interfaces, settle time too short for
PF round trip drainage, V1 PF reject classification) and not kernel 
regressions.

Regards,

Petr
Re: [PATCH iwl-net v2 0/4] iavf: fix VLAN filter state machine races
Posted by Przemek Kitszel 1 month, 3 weeks ago
On 4/17/26 16:29, Petr Oros wrote:
> The iavf VLAN filter state machine has several design issues that lead
> to race conditions between userspace add/del calls and the watchdog
> task's virtchnl processing.  Filters can get lost or leak HW resources,
> especially during interface down/up cycles and namespace moves.
> 

[...]

> 
> v2: Retarget from iwl-next to iwl-net; these are bug fixes.
>      Rebase on current net tree (conflict resolved).
> 
> Petr Oros (4):
>    iavf: rename IAVF_VLAN_IS_NEW to IAVF_VLAN_ADDING
>    iavf: stop removing VLAN filters from PF on interface down
>    iavf: wait for PF confirmation before removing VLAN filters
>    iavf: add VIRTCHNL_OP_ADD_VLAN to success completion handler
> 
>   drivers/net/ethernet/intel/iavf/iavf.h        |  9 +--
>   drivers/net/ethernet/intel/iavf/iavf_main.c   | 52 +++----------
>   .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 76 +++++++++----------
>   3 files changed, 52 insertions(+), 85 deletions(-)
> 

Thank you for the series, it looks good.
Also thanks for the not obvious details, like changing
list_for_each_entry_safe() -> list_for_each_entry() in places that
no longer alter the list

for the series:
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>