[v1] iavf: fix deadlock in reset handling

[PATCH net] iavf: fix deadlock in reset handling

Posted by Petr Oros 5 days, 6 hours ago

Three driver callbacks schedule a reset and wait for its completion:
ndo_change_mtu(), ethtool set_ringparam(), and ethtool set_channels().

Waiting for reset in ndo_change_mtu() and set_ringparam() was added by
commit c2ed2403f12c74 ("iavf: Wait for reset in callbacks which trigger
it") to fix a race condition where adding an interface to bonding
immediately after MTU or ring parameter change failed because the
interface was still in __RESETTING state. The same commit also added
waiting in iavf_set_priv_flags(), which was later removed by commit
53844673d55529 ("iavf: kill "legacy-rx" for good").

Waiting in set_channels() was introduced earlier by commit 4e5e6b5d9d13
("iavf: Fix return of set the new channel count") to ensure the PF has
enough time to complete the VF reset when changing channel count, and to
return correct error codes to userspace.

Commit ef490bbb226702 ("iavf: Add net_shaper_ops support") added
net_shaper_ops to iavf, which required reset_task to use _locked NAPI
variants (napi_enable_locked, napi_disable_locked) that need the netdev
instance lock.

Later, commit 7e4d784f5810 ("net: hold netdev instance lock during
rtnetlink operations") and commit 2bcf4772e45adb ("net: ethtool: try to
protect all callback with netdev instance lock") started holding the
netdev instance lock during ndo and ethtool callbacks for drivers with
net_shaper_ops.

The combination of waiting for reset and the new locking requirements
creates a deadlock: the callback holds the lock and waits for reset_task,
but reset_task is blocked waiting for the same lock:

  Thread 1 (callback)               Thread 2 (reset_task)
  -------------------               ---------------------
  netdev_lock()
  ndo_change_mtu() or ethtool op
    iavf_schedule_reset()
    iavf_wait_for_reset()           iavf_reset_task()
      waiting...                      netdev_lock() <- DEADLOCK

Reproducer:

  # echo 16 > /sys/class/net/$PF/device/sriov_numvfs
  # ip link set $VF up
  # ip link set $VF mtu 5000
  RTNETLINK answers: Device or resource busy

  # dmesg | tail -1
  iavf: MTU change interrupted waiting for reset

Fix this by temporarily releasing the lock while waiting for reset to
complete. This follows the pattern used elsewhere in the kernel (e.g.,
do_set_master() releases rtnl_lock before calling ndo_add_slave()).

Fixes: 7e4d784f5810 ("net: hold netdev instance lock during rtnetlink operations")
Signed-off-by: Petr Oros <poros@redhat.com>
---
 drivers/net/ethernet/intel/iavf/iavf_main.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 8aa6e92c16431f..d7738fb8fa60bc 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -189,13 +189,22 @@ static bool iavf_is_reset_in_progress(struct iavf_adapter *adapter)
  * iavf_wait_for_reset - Wait for reset to finish.
  * @adapter: board private structure
  *
+ * The iavf driver selects NET_SHAPER, so callbacks that trigger reset are
+ * always called with netdev instance lock held, while reset_task also needs
+ * this lock. Release the lock while waiting to avoid deadlock.
+ *
  * Returns 0 if reset finished successfully, negative on timeout or interrupt.
  */
 int iavf_wait_for_reset(struct iavf_adapter *adapter)
 {
-	int ret = wait_event_interruptible_timeout(adapter->reset_waitqueue,
-					!iavf_is_reset_in_progress(adapter),
-					msecs_to_jiffies(5000));
+	struct net_device *netdev = adapter->netdev;
+	int ret;
+
+	netdev_unlock(netdev);
+	ret = wait_event_interruptible_timeout(adapter->reset_waitqueue,
+					       !iavf_is_reset_in_progress(adapter),
+					       msecs_to_jiffies(5000));
+	netdev_lock(netdev);
 
 	/* If ret < 0 then it means wait was interrupted.
 	 * If ret == 0 then it means we got a timeout while waiting
-- 
2.52.0

Re: [PATCH net] iavf: fix deadlock in reset handling

Posted by Jakub Kicinski 4 days, 15 hours ago

On Mon,  2 Feb 2026 09:48:20 +0100 Petr Oros wrote:
> +	netdev_unlock(netdev);
> +	ret = wait_event_interruptible_timeout(adapter->reset_waitqueue,
> +					       !iavf_is_reset_in_progress(adapter),
> +					       msecs_to_jiffies(5000));
> +	netdev_lock(netdev);

Dropping locks taken by the core around the driver callback
is obviously unacceptable. SMH.

Re: [Intel-wired-lan] [PATCH net] iavf: fix deadlock in reset handling

Posted by Jacob Keller 4 days, 14 hours ago

On 2/2/2026 3:58 PM, Jakub Kicinski wrote:
> On Mon,  2 Feb 2026 09:48:20 +0100 Petr Oros wrote:
>> +	netdev_unlock(netdev);
>> +	ret = wait_event_interruptible_timeout(adapter->reset_waitqueue,
>> +					       !iavf_is_reset_in_progress(adapter),
>> +					       msecs_to_jiffies(5000));
>> +	netdev_lock(netdev);
> 
> Dropping locks taken by the core around the driver callback
> is obviously unacceptable. SMH.

Right. It seems like the correct fix is to either a) have reset take and 
hold the netdev lock (now that its distinct from the global RTNL lock) 
or b) refactor reset so that it can defer any of the netdev related 
stuff somehow.

Re: [Intel-wired-lan] [PATCH net] iavf: fix deadlock in reset handling

Posted by Petr Oros 4 days, 6 hours ago

On 2/3/26 02:00, Jacob Keller wrote:
>
>
> On 2/2/2026 3:58 PM, Jakub Kicinski wrote:
>> On Mon,  2 Feb 2026 09:48:20 +0100 Petr Oros wrote:
>>> +    netdev_unlock(netdev);
>>> +    ret = wait_event_interruptible_timeout(adapter->reset_waitqueue,
>>> + !iavf_is_reset_in_progress(adapter),
>>> +                           msecs_to_jiffies(5000));
>>> +    netdev_lock(netdev);
>>
>> Dropping locks taken by the core around the driver callback
>> is obviously unacceptable. SMH.
>
> Right. It seems like the correct fix is to either a) have reset take 
> and hold the netdev lock (now that its distinct from the global RTNL 
> lock) or b) refactor reset so that it can defer any of the netdev 
> related stuff somehow.
>
I modeled this after the existing pattern in iavf_close() (ndo_stop), 
which also temporarily releases the netdev instance lock taken by the 
core to wait for an async operation to complete:

static int iavf_close(struct net_device *netdev)
{
         netdev_assert_locked(netdev);
         ...
         iavf_down(adapter);
         iavf_change_state(adapter, __IAVF_DOWN_PENDING);
         iavf_free_traffic_irqs(adapter);

         netdev_unlock(netdev);

         status = wait_event_timeout(adapter->down_waitqueue,
                                     adapter->state == __IAVF_DOWN,
                                     msecs_to_jiffies(500));
         if (!status)
                 netdev_warn(netdev, "Device resources not yet released\n");
         netdev_lock(netdev);
         ...
}

This was introduced by commit 120f28a6f314fe ("iavf: get rid of the crit 
lock"), and ndo_stop is called with netdev instance lock held by the 
core just like ndo_change_mtu is. Could you clarify why the 
unlock-wait-lock pattern is acceptable in ndo_stop but not here?

Re: [Intel-wired-lan] [PATCH net] iavf: fix deadlock in reset handling

Posted by Przemek Kitszel 4 days, 5 hours ago

On 2/3/26 09:44, Petr Oros wrote:
> 
> On 2/3/26 02:00, Jacob Keller wrote:
>>
>>
>> On 2/2/2026 3:58 PM, Jakub Kicinski wrote:
>>> On Mon,  2 Feb 2026 09:48:20 +0100 Petr Oros wrote:
>>>> +    netdev_unlock(netdev);
>>>> +    ret = wait_event_interruptible_timeout(adapter->reset_waitqueue,
>>>> + !iavf_is_reset_in_progress(adapter),
>>>> +                           msecs_to_jiffies(5000));
>>>> +    netdev_lock(netdev);
>>>
>>> Dropping locks taken by the core around the driver callback
>>> is obviously unacceptable. SMH.
>>
>> Right. It seems like the correct fix is to either a) have reset take 
>> and hold the netdev lock (now that its distinct from the global RTNL 
>> lock) or b) refactor reset so that it can defer any of the netdev 
>> related stuff somehow.
>>
> I modeled this after the existing pattern in iavf_close() (ndo_stop), 
> which also temporarily releases the netdev instance lock taken by the 
> core to wait for an async operation to complete:

First of all, thank you for working on that, I was hit by the very same
problem (no series yet), but my local fix is the same as of now.

I don't see an easy fix (w/o substantial driver refactor).

> 
> static int iavf_close(struct net_device *netdev)
> {
>          netdev_assert_locked(netdev);
>          ...
>          iavf_down(adapter);
>          iavf_change_state(adapter, __IAVF_DOWN_PENDING);
>          iavf_free_traffic_irqs(adapter);
> 
>          netdev_unlock(netdev);
> 
>          status = wait_event_timeout(adapter->down_waitqueue,
>                                      adapter->state == __IAVF_DOWN,
>                                      msecs_to_jiffies(500));
>          if (!status)
>                  netdev_warn(netdev, "Device resources not yet 
> released\n");
>          netdev_lock(netdev);
>          ...
> }
> 
> This was introduced by commit 120f28a6f314fe ("iavf: get rid of the crit 
> lock"), and ndo_stop is called with netdev instance lock held by the 
> core just like ndo_change_mtu is. 

technically it was introduced by commmit afc664987ab3 ("eth: iavf:
extend the netdev_lock usage")

> Could you clarify why the unlock-wait- 
> lock pattern is acceptable in ndo_stop but not here?
> 

perhaps just closing netdev is a special kind of operation

Other thing is that the lock was added to allow further NAPI
development, and one silly driver should not stop that effort.
Sadly, we have not managed to re-design the driver yet. I would like to
do so personally, but have much work accumulated/pending to free my time

Re: [Intel-wired-lan] [PATCH net] iavf: fix deadlock in reset handling

Posted by Petr Oros 4 days, 4 hours ago

On 2/3/26 11:19, Przemek Kitszel wrote:
> On 2/3/26 09:44, Petr Oros wrote:
>>
>> On 2/3/26 02:00, Jacob Keller wrote:
>>>
>>>
>>> On 2/2/2026 3:58 PM, Jakub Kicinski wrote:
>>>> On Mon,  2 Feb 2026 09:48:20 +0100 Petr Oros wrote:
>>>>> +    netdev_unlock(netdev);
>>>>> +    ret = wait_event_interruptible_timeout(adapter->reset_waitqueue,
>>>>> + !iavf_is_reset_in_progress(adapter),
>>>>> +                           msecs_to_jiffies(5000));
>>>>> +    netdev_lock(netdev);
>>>>
>>>> Dropping locks taken by the core around the driver callback
>>>> is obviously unacceptable. SMH.
>>>
>>> Right. It seems like the correct fix is to either a) have reset take 
>>> and hold the netdev lock (now that its distinct from the global RTNL 
>>> lock) or b) refactor reset so that it can defer any of the netdev 
>>> related stuff somehow.
>>>
>> I modeled this after the existing pattern in iavf_close() (ndo_stop), 
>> which also temporarily releases the netdev instance lock taken by the 
>> core to wait for an async operation to complete:
>
> First of all, thank you for working on that, I was hit by the very same
> problem (no series yet), but my local fix is the same as of now.
>
> I don't see an easy fix (w/o substantial driver refactor).
>
>>
>> static int iavf_close(struct net_device *netdev)
>> {
>>          netdev_assert_locked(netdev);
>>          ...
>>          iavf_down(adapter);
>>          iavf_change_state(adapter, __IAVF_DOWN_PENDING);
>>          iavf_free_traffic_irqs(adapter);
>>
>>          netdev_unlock(netdev);
>>
>>          status = wait_event_timeout(adapter->down_waitqueue,
>>                                      adapter->state == __IAVF_DOWN,
>>                                      msecs_to_jiffies(500));
>>          if (!status)
>>                  netdev_warn(netdev, "Device resources not yet 
>> released\n");
>>          netdev_lock(netdev);
>>          ...
>> }
>>
>> This was introduced by commit 120f28a6f314fe ("iavf: get rid of the 
>> crit lock"), and ndo_stop is called with netdev instance lock held by 
>> the core just like ndo_change_mtu is. 
>
> technically it was introduced by commmit afc664987ab3 ("eth: iavf:
> extend the netdev_lock usage")
>
>> Could you clarify why the unlock-wait- lock pattern is acceptable in 
>> ndo_stop but not here?
>>
>
> perhaps just closing netdev is a special kind of operation
>
> Other thing is that the lock was added to allow further NAPI
> development, and one silly driver should not stop that effort.
> Sadly, we have not managed to re-design the driver yet. I would like to
> do so personally, but have much work accumulated/pending to free my time
>
I agree, the unlock-wait-lock pattern is fundamentally flawed (I now 
understand
why it is unacceptable) and should be avoided.

What can we do now?

* Eliminating the wait is not an option: As noted in the description of 
commit
c2ed2403f12c, this wait was originally added to fix a race condition where
adding an interface to bonding failed because the device remained in
__RESETTING state after the callback returned.
* Passing the lock into reset is impractical: The reset path is 
triggered from
numerous contexts, many of which are not under the netdev_lock, making this
even more complex than a full refactor.

If dropping the lock is a no-go, the only viable path forward is to 
split the
reset_task so that the waiting portion is decoupled from the netdev_lock
critical section.

The fact remains that MTU configuration and ring parameter changes are
currently broken in iavf. Changing the MTU on a Virtual Function is a
fundamental configuration not an obscure edge case that can remain 
non-functional.

I would appreciate any further guidance on how you would prefer...

Re: [Intel-wired-lan] [PATCH net] iavf: fix deadlock in reset handling

Posted by Jacob Keller 3 days, 15 hours ago

On 2/3/2026 3:32 AM, Petr Oros wrote:
> 
> On 2/3/26 11:19, Przemek Kitszel wrote:
>> On 2/3/26 09:44, Petr Oros wrote:
>>>
>>> On 2/3/26 02:00, Jacob Keller wrote:
>>>>
>>>>
>>>> On 2/2/2026 3:58 PM, Jakub Kicinski wrote:
>>>>> On Mon,  2 Feb 2026 09:48:20 +0100 Petr Oros wrote:
>>>>>> +    netdev_unlock(netdev);
>>>>>> +    ret = wait_event_interruptible_timeout(adapter->reset_waitqueue,
>>>>>> + !iavf_is_reset_in_progress(adapter),
>>>>>> +                           msecs_to_jiffies(5000));
>>>>>> +    netdev_lock(netdev);
>>>>>
>>>>> Dropping locks taken by the core around the driver callback
>>>>> is obviously unacceptable. SMH.
>>>>
>>>> Right. It seems like the correct fix is to either a) have reset take 
>>>> and hold the netdev lock (now that its distinct from the global RTNL 
>>>> lock) or b) refactor reset so that it can defer any of the netdev 
>>>> related stuff somehow.
>>>>
>>> I modeled this after the existing pattern in iavf_close() (ndo_stop), 
>>> which also temporarily releases the netdev instance lock taken by the 
>>> core to wait for an async operation to complete:
>>
>> First of all, thank you for working on that, I was hit by the very same
>> problem (no series yet), but my local fix is the same as of now.
>>
>> I don't see an easy fix (w/o substantial driver refactor).
>>
>>>
>>> static int iavf_close(struct net_device *netdev)
>>> {
>>>          netdev_assert_locked(netdev);
>>>          ...
>>>          iavf_down(adapter);
>>>          iavf_change_state(adapter, __IAVF_DOWN_PENDING);
>>>          iavf_free_traffic_irqs(adapter);
>>>
>>>          netdev_unlock(netdev);
>>>
>>>          status = wait_event_timeout(adapter->down_waitqueue,
>>>                                      adapter->state == __IAVF_DOWN,
>>>                                      msecs_to_jiffies(500));
>>>          if (!status)
>>>                  netdev_warn(netdev, "Device resources not yet 
>>> released\n");
>>>          netdev_lock(netdev);
>>>          ...
>>> }
>>>
>>> This was introduced by commit 120f28a6f314fe ("iavf: get rid of the 
>>> crit lock"), and ndo_stop is called with netdev instance lock held by 
>>> the core just like ndo_change_mtu is. 
>>
>> technically it was introduced by commmit afc664987ab3 ("eth: iavf:
>> extend the netdev_lock usage")
>>
>>> Could you clarify why the unlock-wait- lock pattern is acceptable in 
>>> ndo_stop but not here?
>>>

It may simply be that no one spotted it in ndo_stop.

>>
>> perhaps just closing netdev is a special kind of operation
>>
>> Other thing is that the lock was added to allow further NAPI
>> development, and one silly driver should not stop that effort.
>> Sadly, we have not managed to re-design the driver yet. I would like to
>> do so personally, but have much work accumulated/pending to free my time
>>
> I agree, the unlock-wait-lock pattern is fundamentally flawed (I now 
> understand
> why it is unacceptable) and should be avoided.
> > What can we do now?
> 
> * Eliminating the wait is not an option: As noted in the description of 
> commit
> c2ed2403f12c, this wait was originally added to fix a race condition where
> adding an interface to bonding failed because the device remained in
> __RESETTING state after the callback returned.
> * Passing the lock into reset is impractical: The reset path is 
> triggered from
> numerous contexts, many of which are not under the netdev_lock, making this
> even more complex than a full refactor.
Hm. I was thinking we could just hold the netdev lock for the reset, 
but... we already take the netdev lock there as of commit ef490bbb2267 
("iavf: Add net_shaper_ops support").

That's actually what causes the deadlock: prior to that change reset 
didn't hold the netdev lock for the duration. This is what gets us 
stuck. We are waiting for reset to finish while holding the lock that 
blocks the reset.

But I am not really sure how to fix this: We want change MTU to only 
return once reset is complete.. but reset is dependent on the very lock 
that we're holding. And there's no way to communicate this fact to the 
reset handler...

> 
> If dropping the lock is a no-go, the only viable path forward is to 
> split the
> reset_task so that the waiting portion is decoupled from the netdev_lock
> critical section.
> 
We used to do this back before the netdev shaper ops. We didn't acquire 
either netdev lock or RTNL during reset.

I thought we had some code in the past that would handle netdev stuff 
outside of reset.. but I don't really know and git blame is not making 
it easy to find this information.

Perhaps we don't actually need to hold the netdev lock over the reset 
task.. except Przemek's refactor to remove the critical lock now makes 
us fully dependent on the netdev lock in this case for reset :(

> The fact remains that MTU configuration and ring parameter changes are
> currently broken in iavf. Changing the MTU on a Virtual Function is a
> fundamental configuration not an obscure edge case that can remain non- 
> functional.
> 

Agreed. This needs a resolution. It is just very tricky to figure out 
what the solution should be.

We need to hold the netdev lock during reset, and we need to have our 
handlers wait for reset to complete in order to be certain their task is 
done... but reset is a separate thread so we can't really communicate to 
it that we're holding the lock, and attempts to do that would be a huge 
problem.

We don't want to go back to the critical lock and all of its horrible 
problems either. The commit that removed it is here: 120f28a6f314 
("iavf: get rid of the crit lock")

> I would appreciate any further guidance on how you would prefer...
> 

I wish I had some better ideas..

Bad ideas I've thought about so far:

1) this patch with its drop lock and wait, which we discussed as 
problematic before. It creates a lot of issues since it means the 
operations are no longer atomic and we could potentially get stuck with 
some other operation in the event of another thread starting some core 
netdev task. No good.

2) not holding netdev_lock in reset, which is now a nogo since we 
removed the crit_lock, and apparently we held netdev_lock prior to that 
too anyways...

3) we could maybe do some sort of ref counting dance where we take some 
reference in threads that queue a reset, and the reset task would know 
if that reference was non-zero then another driver thread is holding 
netdev_lock so its safe to go into reset without locking... but this 
feels extremely ugly and error prone to me...

4) convert reset handling to a separate function that depends on the 
netdev_lock, and call that directly from within the threads that 
currently "wait for reset" while holding the netdev lock. Thus, we 
basically move this entire call chain into the thread already holding 
the lock, and call it from the context of the function like the MTU 
change, etc. This feels like its also a huge issue, and could 
potentially lead to some sort of issue where we need to still block the 
reset thread from going if we reset at the end of the netdev_lock thread..

I don't really like any of these solutions, even if (3) and (4) aren't 
fully ruled out as completely broken... they probably have all kinds of 
issues...

Re: [Intel-wired-lan] [PATCH net] iavf: fix deadlock in reset handling

Posted by Przemek Kitszel 3 days, 9 hours ago

[...]

>> If dropping the lock is a no-go, the only viable path forward is to 
>> split the
>> reset_task so that the waiting portion is decoupled from the netdev_lock
>> critical section.
>>
> We used to do this back before the netdev shaper ops. We didn't acquire 
> either netdev lock or RTNL during reset.
> 
> I thought we had some code in the past that would handle netdev stuff 
> outside of reset.. but I don't really know and git blame is not making 
> it easy to find this information.

IIRC it was iavf_config_task that is used to complete stuff requested
under RTNL but later

> 
> Perhaps we don't actually need to hold the netdev lock over the reset 
> task.. except Przemek's refactor to remove the critical lock now makes 
> us fully dependent on the netdev lock in this case for reset :(
> 
>> The fact remains that MTU configuration and ring parameter changes are
>> currently broken in iavf. Changing the MTU on a Virtual Function is a
>> fundamental configuration not an obscure edge case that can remain 
>> non- functional.
>>
> 
> Agreed. This needs a resolution. It is just very tricky to figure out 
> what the solution should be.
> 
> We need to hold the netdev lock during reset, and we need to have our 
> handlers wait for reset to complete in order to be certain their task is 
> done... but reset is a separate thread so we can't really communicate to 
> it that we're holding the lock, and attempts to do that would be a huge 
> problem.
> 
> We don't want to go back to the critical lock and all of its horrible 
> problems either. The commit that removed it is here: 120f28a6f314 
> ("iavf: get rid of the crit lock")
> 
>> I would appreciate any further guidance on how you would prefer...
>>
> 
> I wish I had some better ideas..
> 
> Bad ideas I've thought about so far:
> 
> 1) this patch with its drop lock and wait, which we discussed as 
> problematic before. It creates a lot of issues since it means the 
> operations are no longer atomic and we could potentially get stuck with 
> some other operation in the event of another thread starting some core 
> netdev task. No good.
> 
> 2) not holding netdev_lock in reset, which is now a nogo since we 
> removed the crit_lock, and apparently we held netdev_lock prior to that 
> too anyways...
> 
> 3) we could maybe do some sort of ref counting dance where we take some 
> reference in threads that queue a reset, and the reset task would know 
> if that reference was non-zero then another driver thread is holding 
> netdev_lock so its safe to go into reset without locking... but this 
> feels extremely ugly and error prone to me...
> 
> 4) convert reset handling to a separate function that depends on the 
> netdev_lock, and call that directly from within the threads that 
> currently "wait for reset" while holding the netdev lock. Thus, we 
> basically move this entire call chain into the thread already holding 
> the lock, and call it from the context of the function like the MTU 
> change, etc. 

reset_thread() {
	while (!stopped) {
		netdev_lock();
		reset_step();
		netdev_unlock();
	}
}

looks cool, IIRC I did something similar with the "after crit lock
removal refactor continuation", but I've put it on hold
https://github.com/pkitszel/linux/commits/undeadlock/
The linked code went even further and merged all of our admin-worker 
threads into one and the whole was protected by the netdev_lock :)

This feels like its also a huge issue, and could
> potentially lead to some sort of issue where we need to still block the 
> reset thread from going if we reset at the end of the netdev_lock thread..

that should queue on the "do later stuff queue", which we don't have
right now (and keep some of such stuff in the state machine), but would
be useful for many other actions too (like virtchnl messages, for which
we have queue-of-1 right now

> 
> I don't really like any of these solutions, even if (3) and (4) aren't 
> fully ruled out as completely broken... they probably have all kinds of 
> issues...

Re: [Intel-wired-lan] [PATCH net] iavf: fix deadlock in reset handling

Posted by Jacob Keller 2 days, 20 hours ago


On 2/3/2026 10:12 PM, Przemek Kitszel wrote:
> 
> [...]
> 
>>> If dropping the lock is a no-go, the only viable path forward is to 
>>> split the
>>> reset_task so that the waiting portion is decoupled from the netdev_lock
>>> critical section.
>>>
>> We used to do this back before the netdev shaper ops. We didn't 
>> acquire either netdev lock or RTNL during reset.
>>
>> I thought we had some code in the past that would handle netdev stuff 
>> outside of reset.. but I don't really know and git blame is not making 
>> it easy to find this information.
> 
> IIRC it was iavf_config_task that is used to complete stuff requested
> under RTNL but later
> 
>>
>> Perhaps we don't actually need to hold the netdev lock over the reset 
>> task.. except Przemek's refactor to remove the critical lock now makes 
>> us fully dependent on the netdev lock in this case for reset :(
>>
>>> The fact remains that MTU configuration and ring parameter changes are
>>> currently broken in iavf. Changing the MTU on a Virtual Function is a
>>> fundamental configuration not an obscure edge case that can remain 
>>> non- functional.
>>>
>>
>> Agreed. This needs a resolution. It is just very tricky to figure out 
>> what the solution should be.
>>
>> We need to hold the netdev lock during reset, and we need to have our 
>> handlers wait for reset to complete in order to be certain their task 
>> is done... but reset is a separate thread so we can't really 
>> communicate to it that we're holding the lock, and attempts to do that 
>> would be a huge problem.
>>
>> We don't want to go back to the critical lock and all of its horrible 
>> problems either. The commit that removed it is here: 120f28a6f314 
>> ("iavf: get rid of the crit lock")
>>
>>> I would appreciate any further guidance on how you would prefer...
>>>
>>
>> I wish I had some better ideas..
>>
>> Bad ideas I've thought about so far:
>>
>> 1) this patch with its drop lock and wait, which we discussed as 
>> problematic before. It creates a lot of issues since it means the 
>> operations are no longer atomic and we could potentially get stuck 
>> with some other operation in the event of another thread starting some 
>> core netdev task. No good.
>>
>> 2) not holding netdev_lock in reset, which is now a nogo since we 
>> removed the crit_lock, and apparently we held netdev_lock prior to 
>> that too anyways...
>>
>> 3) we could maybe do some sort of ref counting dance where we take 
>> some reference in threads that queue a reset, and the reset task would 
>> know if that reference was non-zero then another driver thread is 
>> holding netdev_lock so its safe to go into reset without locking... 
>> but this feels extremely ugly and error prone to me...
>>
>> 4) convert reset handling to a separate function that depends on the 
>> netdev_lock, and call that directly from within the threads that 
>> currently "wait for reset" while holding the netdev lock. Thus, we 
>> basically move this entire call chain into the thread already holding 
>> the lock, and call it from the context of the function like the MTU 
>> change, etc. 
> 
> reset_thread() {
>      while (!stopped) {
>          netdev_lock();
>          reset_step();
>          netdev_unlock();
>      }
> }
> 
> looks cool, IIRC I did something similar with the "after crit lock
> removal refactor continuation", but I've put it on hold
> https://github.com/pkitszel/linux/commits/undeadlock/
> The linked code went even further and merged all of our admin-worker 
> threads into one and the whole was protected by the netdev_lock :)
> 

That is a lot of patches and doesn't feel like a minimal fix.

> This feels like its also a huge issue, and could
>> potentially lead to some sort of issue where we need to still block 
>> the reset thread from going if we reset at the end of the netdev_lock 
>> thread..
> 
> that should queue on the "do later stuff queue", which we don't have
> right now (and keep some of such stuff in the state machine), but would
> be useful for many other actions too (like virtchnl messages, for which
> we have queue-of-1 right now
> 

I think we need to lay out what we need and maybe that will help figure 
out a solution.


1) several NDO operations need to perform tasks that require AVF 
hardware to reset, and which need to be certain that happens before they 
return.

2) reset is handled by a separate workqueue task

3) reset also requires the netdev lock


We can't make the ndo operations avoid the netdev lock

We pretty much can't make reset not require the netdev lock, especially 
since we are now using it as effectively the main driver lock now.

I am not certain we can move all of the reset code to work from the ndo 
thread context, as we still also have to wait for hardware notification 
that reset happened, which seems like a pretty massive refactor to get 
right.

We can't drop the waiting in the ndo operation because we need to be 
sure that the operation applied properly, otherwise we get errors elsewhere.

So that leaves me without any solution so far, certainly not one that 
can be completed quickly... which means we're stuck with a guaranteed 
deadlock anytime the MTU is changed?

>>
>> I don't really like any of these solutions, even if (3) and (4) aren't 
>> fully ruled out as completely broken... they probably have all kinds 
>> of issues...

Re: [Intel-wired-lan] [PATCH net] iavf: fix deadlock in reset handling

Posted by Petr Oros 2 days, 3 hours ago

On 2/4/26 20:25, Jacob Keller wrote:
>
>
> On 2/3/2026 10:12 PM, Przemek Kitszel wrote:
>>
>> [...]
>>
>>>> If dropping the lock is a no-go, the only viable path forward is to 
>>>> split the
>>>> reset_task so that the waiting portion is decoupled from the 
>>>> netdev_lock
>>>> critical section.
>>>>
>>> We used to do this back before the netdev shaper ops. We didn't 
>>> acquire either netdev lock or RTNL during reset.
>>>
>>> I thought we had some code in the past that would handle netdev 
>>> stuff outside of reset.. but I don't really know and git blame is 
>>> not making it easy to find this information.
>>
>> IIRC it was iavf_config_task that is used to complete stuff requested
>> under RTNL but later
>>
>>>
>>> Perhaps we don't actually need to hold the netdev lock over the 
>>> reset task.. except Przemek's refactor to remove the critical lock 
>>> now makes us fully dependent on the netdev lock in this case for 
>>> reset :(
>>>
>>>> The fact remains that MTU configuration and ring parameter changes are
>>>> currently broken in iavf. Changing the MTU on a Virtual Function is a
>>>> fundamental configuration not an obscure edge case that can remain 
>>>> non- functional.
>>>>
>>>
>>> Agreed. This needs a resolution. It is just very tricky to figure 
>>> out what the solution should be.
>>>
>>> We need to hold the netdev lock during reset, and we need to have 
>>> our handlers wait for reset to complete in order to be certain their 
>>> task is done... but reset is a separate thread so we can't really 
>>> communicate to it that we're holding the lock, and attempts to do 
>>> that would be a huge problem.
>>>
>>> We don't want to go back to the critical lock and all of its 
>>> horrible problems either. The commit that removed it is here: 
>>> 120f28a6f314 ("iavf: get rid of the crit lock")
>>>
>>>> I would appreciate any further guidance on how you would prefer...
>>>>
>>>
>>> I wish I had some better ideas..
>>>
>>> Bad ideas I've thought about so far:
>>>
>>> 1) this patch with its drop lock and wait, which we discussed as 
>>> problematic before. It creates a lot of issues since it means the 
>>> operations are no longer atomic and we could potentially get stuck 
>>> with some other operation in the event of another thread starting 
>>> some core netdev task. No good.
>>>
>>> 2) not holding netdev_lock in reset, which is now a nogo since we 
>>> removed the crit_lock, and apparently we held netdev_lock prior to 
>>> that too anyways...
>>>
>>> 3) we could maybe do some sort of ref counting dance where we take 
>>> some reference in threads that queue a reset, and the reset task 
>>> would know if that reference was non-zero then another driver thread 
>>> is holding netdev_lock so its safe to go into reset without 
>>> locking... but this feels extremely ugly and error prone to me...
>>>
>>> 4) convert reset handling to a separate function that depends on the 
>>> netdev_lock, and call that directly from within the threads that 
>>> currently "wait for reset" while holding the netdev lock. Thus, we 
>>> basically move this entire call chain into the thread already 
>>> holding the lock, and call it from the context of the function like 
>>> the MTU change, etc. 
>>
>> reset_thread() {
>>      while (!stopped) {
>>          netdev_lock();
>>          reset_step();
>>          netdev_unlock();
>>      }
>> }
>>
>> looks cool, IIRC I did something similar with the "after crit lock
>> removal refactor continuation", but I've put it on hold
>> https://github.com/pkitszel/linux/commits/undeadlock/
>> The linked code went even further and merged all of our admin-worker 
>> threads into one and the whole was protected by the netdev_lock :)
>>
>
> That is a lot of patches and doesn't feel like a minimal fix.
>
>> This feels like its also a huge issue, and could
>>> potentially lead to some sort of issue where we need to still block 
>>> the reset thread from going if we reset at the end of the 
>>> netdev_lock thread..
>>
>> that should queue on the "do later stuff queue", which we don't have
>> right now (and keep some of such stuff in the state machine), but would
>> be useful for many other actions too (like virtchnl messages, for which
>> we have queue-of-1 right now
>>
>
> I think we need to lay out what we need and maybe that will help 
> figure out a solution.
>
>
> 1) several NDO operations need to perform tasks that require AVF 
> hardware to reset, and which need to be certain that happens before 
> they return.
>
> 2) reset is handled by a separate workqueue task
>
> 3) reset also requires the netdev lock
>
>
> We can't make the ndo operations avoid the netdev lock
>
> We pretty much can't make reset not require the netdev lock, 
> especially since we are now using it as effectively the main driver 
> lock now.
>
> I am not certain we can move all of the reset code to work from the 
> ndo thread context, as we still also have to wait for hardware 
> notification that reset happened, which seems like a pretty massive 
> refactor to get right.
>
> We can't drop the waiting in the ndo operation because we need to be 
> sure that the operation applied properly, otherwise we get errors 
> elsewhere.
>
> So that leaves me without any solution so far, certainly not one that 
> can be completed quickly... which means we're stuck with a guaranteed 
> deadlock anytime the MTU is changed?

I’ve been looking at the code on GitHub, and in my opinion, the fastest 
and most straightforward solution is to use a slightly modified approach 
based on patch 8c90ca2 ("iavf: split out body of reset task").

Since that patch refactors iavf_reset_task into iavf_reset_step and 
removes all netdev_lock acquisitions from its internal logic, we can 
resolve the deadlock by calling it directly.

In places where we currently wait (MTU, channels, ringparam), we should 
replace the asynchronous scheduling and waiting with a synchronous call. 
This avoids the deadlock by executing the reset in the thread that 
already holds the netdev_lock:

         if (netif_running(netdev)) {
-               iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED);
-               ret = iavf_wait_for_reset(adapter);
-               if (ret < 0)
-                       netdev_warn(netdev, "MTU change interrupted 
waiting for reset");
-               else if (ret)
-                       netdev_warn(netdev, "MTU change timed out 
waiting for reset");
+               adapter->flags |= IAVF_FLAG_RESET_NEEDED;
+               iavf_reset_step(adapter);
         }
What do you think about this?


>
>>>
>>> I don't really like any of these solutions, even if (3) and (4) 
>>> aren't fully ruled out as completely broken... they probably have 
>>> all kinds of issues...
>

Re: [Intel-wired-lan] [PATCH net] iavf: fix deadlock in reset handling

Posted by Jacob Keller 1 day, 15 hours ago


On 2/5/2026 4:24 AM, Petr Oros wrote:
> 
> On 2/4/26 20:25, Jacob Keller wrote:
>>
>>
>> On 2/3/2026 10:12 PM, Przemek Kitszel wrote:
>>>
>>> [...]
>>>
>>>>> If dropping the lock is a no-go, the only viable path forward is to 
>>>>> split the
>>>>> reset_task so that the waiting portion is decoupled from the 
>>>>> netdev_lock
>>>>> critical section.
>>>>>
>>>> We used to do this back before the netdev shaper ops. We didn't 
>>>> acquire either netdev lock or RTNL during reset.
>>>>
>>>> I thought we had some code in the past that would handle netdev 
>>>> stuff outside of reset.. but I don't really know and git blame is 
>>>> not making it easy to find this information.
>>>
>>> IIRC it was iavf_config_task that is used to complete stuff requested
>>> under RTNL but later
>>>
>>>>
>>>> Perhaps we don't actually need to hold the netdev lock over the 
>>>> reset task.. except Przemek's refactor to remove the critical lock 
>>>> now makes us fully dependent on the netdev lock in this case for 
>>>> reset :(
>>>>
>>>>> The fact remains that MTU configuration and ring parameter changes are
>>>>> currently broken in iavf. Changing the MTU on a Virtual Function is a
>>>>> fundamental configuration not an obscure edge case that can remain 
>>>>> non- functional.
>>>>>
>>>>
>>>> Agreed. This needs a resolution. It is just very tricky to figure 
>>>> out what the solution should be.
>>>>
>>>> We need to hold the netdev lock during reset, and we need to have 
>>>> our handlers wait for reset to complete in order to be certain their 
>>>> task is done... but reset is a separate thread so we can't really 
>>>> communicate to it that we're holding the lock, and attempts to do 
>>>> that would be a huge problem.
>>>>
>>>> We don't want to go back to the critical lock and all of its 
>>>> horrible problems either. The commit that removed it is here: 
>>>> 120f28a6f314 ("iavf: get rid of the crit lock")
>>>>
>>>>> I would appreciate any further guidance on how you would prefer...
>>>>>
>>>>
>>>> I wish I had some better ideas..
>>>>
>>>> Bad ideas I've thought about so far:
>>>>
>>>> 1) this patch with its drop lock and wait, which we discussed as 
>>>> problematic before. It creates a lot of issues since it means the 
>>>> operations are no longer atomic and we could potentially get stuck 
>>>> with some other operation in the event of another thread starting 
>>>> some core netdev task. No good.
>>>>
>>>> 2) not holding netdev_lock in reset, which is now a nogo since we 
>>>> removed the crit_lock, and apparently we held netdev_lock prior to 
>>>> that too anyways...
>>>>
>>>> 3) we could maybe do some sort of ref counting dance where we take 
>>>> some reference in threads that queue a reset, and the reset task 
>>>> would know if that reference was non-zero then another driver thread 
>>>> is holding netdev_lock so its safe to go into reset without 
>>>> locking... but this feels extremely ugly and error prone to me...
>>>>
>>>> 4) convert reset handling to a separate function that depends on the 
>>>> netdev_lock, and call that directly from within the threads that 
>>>> currently "wait for reset" while holding the netdev lock. Thus, we 
>>>> basically move this entire call chain into the thread already 
>>>> holding the lock, and call it from the context of the function like 
>>>> the MTU change, etc. 
>>>
>>> reset_thread() {
>>>      while (!stopped) {
>>>          netdev_lock();
>>>          reset_step();
>>>          netdev_unlock();
>>>      }
>>> }
>>>
>>> looks cool, IIRC I did something similar with the "after crit lock
>>> removal refactor continuation", but I've put it on hold
>>> https://github.com/pkitszel/linux/commits/undeadlock/
>>> The linked code went even further and merged all of our admin-worker 
>>> threads into one and the whole was protected by the netdev_lock :)
>>>
>>
>> That is a lot of patches and doesn't feel like a minimal fix.
>>
>>> This feels like its also a huge issue, and could
>>>> potentially lead to some sort of issue where we need to still block 
>>>> the reset thread from going if we reset at the end of the 
>>>> netdev_lock thread..
>>>
>>> that should queue on the "do later stuff queue", which we don't have
>>> right now (and keep some of such stuff in the state machine), but would
>>> be useful for many other actions too (like virtchnl messages, for which
>>> we have queue-of-1 right now
>>>
>>
>> I think we need to lay out what we need and maybe that will help 
>> figure out a solution.
>>
>>
>> 1) several NDO operations need to perform tasks that require AVF 
>> hardware to reset, and which need to be certain that happens before 
>> they return.
>>
>> 2) reset is handled by a separate workqueue task
>>
>> 3) reset also requires the netdev lock
>>
>>
>> We can't make the ndo operations avoid the netdev lock
>>
>> We pretty much can't make reset not require the netdev lock, 
>> especially since we are now using it as effectively the main driver 
>> lock now.
>>
>> I am not certain we can move all of the reset code to work from the 
>> ndo thread context, as we still also have to wait for hardware 
>> notification that reset happened, which seems like a pretty massive 
>> refactor to get right.
>>
>> We can't drop the waiting in the ndo operation because we need to be 
>> sure that the operation applied properly, otherwise we get errors 
>> elsewhere.
>>
>> So that leaves me without any solution so far, certainly not one that 
>> can be completed quickly... which means we're stuck with a guaranteed 
>> deadlock anytime the MTU is changed?
> 
> I’ve been looking at the code on GitHub, and in my opinion, the fastest 
> and most straightforward solution is to use a slightly modified approach 
> based on patch 8c90ca2 ("iavf: split out body of reset task").
> 
> Since that patch refactors iavf_reset_task into iavf_reset_step and 
> removes all netdev_lock acquisitions from its internal logic, we can 
> resolve the deadlock by calling it directly.
> 
> In places where we currently wait (MTU, channels, ringparam), we should 
> replace the asynchronous scheduling and waiting with a synchronous call. 
> This avoids the deadlock by executing the reset in the thread that 
> already holds the netdev_lock:
> 
>          if (netif_running(netdev)) {
> -               iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED);
> -               ret = iavf_wait_for_reset(adapter);
> -               if (ret < 0)
> -                       netdev_warn(netdev, "MTU change interrupted 
> waiting for reset");
> -               else if (ret)
> -                       netdev_warn(netdev, "MTU change timed out 
> waiting for reset");
> +               adapter->flags |= IAVF_FLAG_RESET_NEEDED;
> +               iavf_reset_step(adapter);
>          }
> What do you think about this?
> 
I think this works. The netdev lock itself will provide synchronization 
between threads that need to reset, (since we take the netdev lock 
around reset and we hold it here).

This is more or less one of my ideas earlier, but I didn't look closely 
enough to see how easy it is to implement.

We will need to do some double checking to make sure that nothing else 
checks the reset flags without holding the netdev lock, but I think 
since Przemek's cleanups that is true.

Good idea! Hopefully its not too much work to pull out just that part of 
the refactor and get it cleaned up.

I'm currently working on setting my system up to reproduce the deadlock.

[PATCH net] iavf: fix deadlock in reset handling

Posted by Petr Oros 1 day, 5 hours ago

description... 

Fixes: 120f28a6f314 ("iavf: get rid of the crit lock")
Signed-off-by: Petr Oros <poros@redhat.com>
---
 drivers/net/ethernet/intel/iavf/iavf.h        |  2 +-
 .../net/ethernet/intel/iavf/iavf_ethtool.c    | 21 +++---
 drivers/net/ethernet/intel/iavf/iavf_main.c   | 72 +++++++------------
 3 files changed, 33 insertions(+), 62 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index d552f912e8a947..0c3844b3ff1c86 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -625,5 +625,5 @@ void iavf_add_adv_rss_cfg(struct iavf_adapter *adapter);
 void iavf_del_adv_rss_cfg(struct iavf_adapter *adapter);
 struct iavf_mac_filter *iavf_add_filter(struct iavf_adapter *adapter,
 					const u8 *macaddr);
-int iavf_wait_for_reset(struct iavf_adapter *adapter);
+void iavf_reset_step(struct iavf_adapter *adapter);
 #endif /* _IAVF_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
index 2cc21289a70779..9b0f47f9340942 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
@@ -492,7 +492,6 @@ static int iavf_set_ringparam(struct net_device *netdev,
 {
 	struct iavf_adapter *adapter = netdev_priv(netdev);
 	u32 new_rx_count, new_tx_count;
-	int ret = 0;
 
 	if ((ring->rx_mini_pending) || (ring->rx_jumbo_pending))
 		return -EINVAL;
@@ -537,13 +536,11 @@ static int iavf_set_ringparam(struct net_device *netdev,
 	}
 
 	if (netif_running(netdev)) {
-		iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED);
-		ret = iavf_wait_for_reset(adapter);
-		if (ret)
-			netdev_warn(netdev, "Changing ring parameters timeout or interrupted waiting for reset");
+		adapter->flags |= IAVF_FLAG_RESET_NEEDED;
+		iavf_reset_step(adapter);
 	}
 
-	return ret;
+	return 0;
 }
 
 /**
@@ -1723,7 +1720,6 @@ static int iavf_set_channels(struct net_device *netdev,
 {
 	struct iavf_adapter *adapter = netdev_priv(netdev);
 	u32 num_req = ch->combined_count;
-	int ret = 0;
 
 	if ((adapter->vf_res->vf_cap_flags & VIRTCHNL_VF_OFFLOAD_ADQ) &&
 	    adapter->num_tc) {
@@ -1745,13 +1741,12 @@ static int iavf_set_channels(struct net_device *netdev,
 
 	adapter->num_req_queues = num_req;
 	adapter->flags |= IAVF_FLAG_REINIT_ITR_NEEDED;
-	iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED);
-
-	ret = iavf_wait_for_reset(adapter);
-	if (ret)
-		netdev_warn(netdev, "Changing channel count timeout or interrupted waiting for reset");
+	if (netif_running(netdev)) {
+		adapter->flags |= IAVF_FLAG_RESET_NEEDED;
+		iavf_reset_step(adapter);
+	}
 
-	return ret;
+	return 0;
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 8aa6e92c16431f..9c8d6125106f5a 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -185,31 +185,6 @@ static bool iavf_is_reset_in_progress(struct iavf_adapter *adapter)
 	return false;
 }
 
-/**
- * iavf_wait_for_reset - Wait for reset to finish.
- * @adapter: board private structure
- *
- * Returns 0 if reset finished successfully, negative on timeout or interrupt.
- */
-int iavf_wait_for_reset(struct iavf_adapter *adapter)
-{
-	int ret = wait_event_interruptible_timeout(adapter->reset_waitqueue,
-					!iavf_is_reset_in_progress(adapter),
-					msecs_to_jiffies(5000));
-
-	/* If ret < 0 then it means wait was interrupted.
-	 * If ret == 0 then it means we got a timeout while waiting
-	 * for reset to finish.
-	 * If ret > 0 it means reset has finished.
-	 */
-	if (ret > 0)
-		return 0;
-	else if (ret < 0)
-		return -EINTR;
-	else
-		return -EBUSY;
-}
-
 /**
  * iavf_allocate_dma_mem_d - OS specific memory alloc for shared code
  * @hw:   pointer to the HW structure
@@ -3100,18 +3075,16 @@ static void iavf_reconfig_qs_bw(struct iavf_adapter *adapter)
 }
 
 /**
- * iavf_reset_task - Call-back task to handle hardware reset
- * @work: pointer to work_struct
+ * iavf_reset_step - Perform the VF reset sequence
+ * @adapter: board private structure
  *
- * During reset we need to shut down and reinitialize the admin queue
- * before we can use it to communicate with the PF again. We also clear
- * and reinit the rings because that context is lost as well.
- **/
-static void iavf_reset_task(struct work_struct *work)
+ * Requests a reset from PF, polls for completion, and reconfigures
+ * the driver. Caller must hold the netdev instance lock.
+ *
+ * This can sleep for several seconds while polling HW registers.
+ */
+void iavf_reset_step(struct iavf_adapter *adapter)
 {
-	struct iavf_adapter *adapter = container_of(work,
-						      struct iavf_adapter,
-						      reset_task);
 	struct virtchnl_vf_resource *vfres = adapter->vf_res;
 	struct net_device *netdev = adapter->netdev;
 	struct iavf_hw *hw = &adapter->hw;
@@ -3122,7 +3095,7 @@ static void iavf_reset_task(struct work_struct *work)
 	int i = 0, err;
 	bool running;
 
-	netdev_lock(netdev);
+	netdev_assert_locked(netdev);
 
 	iavf_misc_irq_disable(adapter);
 	if (adapter->flags & IAVF_FLAG_RESET_NEEDED) {
@@ -3167,7 +3140,6 @@ static void iavf_reset_task(struct work_struct *work)
 		dev_err(&adapter->pdev->dev, "Reset never finished (%x)\n",
 			reg_val);
 		iavf_disable_vf(adapter);
-		netdev_unlock(netdev);
 		return; /* Do not attempt to reinit. It's dead, Jim. */
 	}
 
@@ -3179,7 +3151,6 @@ static void iavf_reset_task(struct work_struct *work)
 		iavf_startup(adapter);
 		queue_delayed_work(adapter->wq, &adapter->watchdog_task,
 				   msecs_to_jiffies(30));
-		netdev_unlock(netdev);
 		return;
 	}
 
@@ -3321,7 +3292,6 @@ static void iavf_reset_task(struct work_struct *work)
 	adapter->flags &= ~IAVF_FLAG_REINIT_ITR_NEEDED;
 
 	wake_up(&adapter->reset_waitqueue);
-	netdev_unlock(netdev);
 
 	return;
 reset_err:
@@ -3331,10 +3301,21 @@ static void iavf_reset_task(struct work_struct *work)
 	}
 	iavf_disable_vf(adapter);
 
-	netdev_unlock(netdev);
 	dev_err(&adapter->pdev->dev, "failed to allocate resources during reinit\n");
 }
 
+static void iavf_reset_task(struct work_struct *work)
+{
+	struct iavf_adapter *adapter = container_of(work,
+						      struct iavf_adapter,
+						      reset_task);
+	struct net_device *netdev = adapter->netdev;
+
+	netdev_lock(netdev);
+	iavf_reset_step(adapter);
+	netdev_unlock(netdev);
+}
+
 /**
  * iavf_adminq_task - worker thread to clean the admin queue
  * @work: pointer to work_struct containing our data
@@ -4600,22 +4581,17 @@ static int iavf_close(struct net_device *netdev)
 static int iavf_change_mtu(struct net_device *netdev, int new_mtu)
 {
 	struct iavf_adapter *adapter = netdev_priv(netdev);
-	int ret = 0;
 
 	netdev_dbg(netdev, "changing MTU from %d to %d\n",
 		   netdev->mtu, new_mtu);
 	WRITE_ONCE(netdev->mtu, new_mtu);
 
 	if (netif_running(netdev)) {
-		iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED);
-		ret = iavf_wait_for_reset(adapter);
-		if (ret < 0)
-			netdev_warn(netdev, "MTU change interrupted waiting for reset");
-		else if (ret)
-			netdev_warn(netdev, "MTU change timed out waiting for reset");
+		adapter->flags |= IAVF_FLAG_RESET_NEEDED;
+		iavf_reset_step(adapter);
 	}
 
-	return ret;
+	return 0;
 }
 
 /**
-- 
2.52.0

Re: [PATCH net] iavf: fix deadlock in reset handling

Posted by Jakub Kicinski 12 hours ago

On Fri,  6 Feb 2026 11:04:26 +0100 Petr Oros wrote:
> Fixes: 120f28a6f314 ("iavf: get rid of the crit lock")

This is so much better 👍️

Re: [PATCH net] iavf: fix deadlock in reset handling

Posted by Jacob Keller 14 hours ago

On 2/6/2026 2:04 AM, Petr Oros wrote:
> description...
> 
> Fixes: 120f28a6f314 ("iavf: get rid of the crit lock")
> Signed-off-by: Petr Oros <poros@redhat.com>
> ---

Code changes look correct to me. Description obviously needs to get 
written, but...

I loaded both the net as of ee9241524b46 ("amd-xgbe: do not select 
NET_SELFTESTS when INET is disabled") and tested it with and without 
this fix applied.

Before the fix, changing MTU till timeout after a few seconds with the 
complaint, after which the reset will happen and MTU will be applied 
late (the timeout in the wait event prevents a permanent deadlock). With 
the fix the command completes quickly with no errors and the MTU change 
applies immediately.

Tested-by: Jacob Keller <jacob.e.keller@intel.com>

Additionally, for a version with a proper commit message, please feel 
free to add:

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>

Thanks for working on this, Petr!

Regards,
Jake

Re: [PATCH net] iavf: fix deadlock in reset handling

Posted by Ivan Vecera 5 days, 2 hours ago

On 2/2/26 9:48 AM, Petr Oros wrote:
> Three driver callbacks schedule a reset and wait for its completion:
> ndo_change_mtu(), ethtool set_ringparam(), and ethtool set_channels().
> 
> Waiting for reset in ndo_change_mtu() and set_ringparam() was added by
> commit c2ed2403f12c74 ("iavf: Wait for reset in callbacks which trigger
> it") to fix a race condition where adding an interface to bonding
> immediately after MTU or ring parameter change failed because the
> interface was still in __RESETTING state. The same commit also added
> waiting in iavf_set_priv_flags(), which was later removed by commit
> 53844673d55529 ("iavf: kill "legacy-rx" for good").
> 
> Waiting in set_channels() was introduced earlier by commit 4e5e6b5d9d13
> ("iavf: Fix return of set the new channel count") to ensure the PF has
> enough time to complete the VF reset when changing channel count, and to
> return correct error codes to userspace.
> 
> Commit ef490bbb226702 ("iavf: Add net_shaper_ops support") added
> net_shaper_ops to iavf, which required reset_task to use _locked NAPI
> variants (napi_enable_locked, napi_disable_locked) that need the netdev
> instance lock.
> 
> Later, commit 7e4d784f5810 ("net: hold netdev instance lock during
> rtnetlink operations") and commit 2bcf4772e45adb ("net: ethtool: try to
> protect all callback with netdev instance lock") started holding the
> netdev instance lock during ndo and ethtool callbacks for drivers with
> net_shaper_ops.
> 
> The combination of waiting for reset and the new locking requirements
> creates a deadlock: the callback holds the lock and waits for reset_task,
> but reset_task is blocked waiting for the same lock:
> 
>    Thread 1 (callback)               Thread 2 (reset_task)
>    -------------------               ---------------------
>    netdev_lock()
>    ndo_change_mtu() or ethtool op
>      iavf_schedule_reset()
>      iavf_wait_for_reset()           iavf_reset_task()
>        waiting...                      netdev_lock() <- DEADLOCK
> 
> Reproducer:
> 
>    # echo 16 > /sys/class/net/$PF/device/sriov_numvfs
>    # ip link set $VF up
>    # ip link set $VF mtu 5000
>    RTNETLINK answers: Device or resource busy
> 
>    # dmesg | tail -1
>    iavf: MTU change interrupted waiting for reset
> 
> Fix this by temporarily releasing the lock while waiting for reset to
> complete. This follows the pattern used elsewhere in the kernel (e.g.,
> do_set_master() releases rtnl_lock before calling ndo_add_slave()).
> 
> Fixes: 7e4d784f5810 ("net: hold netdev instance lock during rtnetlink operations")
> Signed-off-by: Petr Oros <poros@redhat.com>
> ---
>   drivers/net/ethernet/intel/iavf/iavf_main.c | 15 ++++++++++++---
>   1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
> index 8aa6e92c16431f..d7738fb8fa60bc 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_main.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
> @@ -189,13 +189,22 @@ static bool iavf_is_reset_in_progress(struct iavf_adapter *adapter)
>    * iavf_wait_for_reset - Wait for reset to finish.
>    * @adapter: board private structure
>    *
> + * The iavf driver selects NET_SHAPER, so callbacks that trigger reset are
> + * always called with netdev instance lock held, while reset_task also needs
> + * this lock. Release the lock while waiting to avoid deadlock.
> + *
>    * Returns 0 if reset finished successfully, negative on timeout or interrupt.
>    */
>   int iavf_wait_for_reset(struct iavf_adapter *adapter)
>   {
> -	int ret = wait_event_interruptible_timeout(adapter->reset_waitqueue,
> -					!iavf_is_reset_in_progress(adapter),
> -					msecs_to_jiffies(5000));
> +	struct net_device *netdev = adapter->netdev;
> +	int ret;
> +
> +	netdev_unlock(netdev);
> +	ret = wait_event_interruptible_timeout(adapter->reset_waitqueue,
> +					       !iavf_is_reset_in_progress(adapter),
> +					       msecs_to_jiffies(5000));
> +	netdev_lock(netdev);
>   
>   	/* If ret < 0 then it means wait was interrupted.
>   	 * If ret == 0 then it means we got a timeout while waiting

Reviewed-by: Ivan Vecera <ivecera@redhat.com>

Re: [PATCH net] iavf: fix deadlock in reset handling

Posted by Jijie Shao 5 days, 4 hours ago

on 2026/2/2 16:48, Petr Oros wrote:
> Three driver callbacks schedule a reset and wait for its completion:
> ndo_change_mtu(), ethtool set_ringparam(), and ethtool set_channels().
>
> Waiting for reset in ndo_change_mtu() and set_ringparam() was added by
> commit c2ed2403f12c74 ("iavf: Wait for reset in callbacks which trigger
> it") to fix a race condition where adding an interface to bonding
> immediately after MTU or ring parameter change failed because the
> interface was still in __RESETTING state. The same commit also added
> waiting in iavf_set_priv_flags(), which was later removed by commit
> 53844673d55529 ("iavf: kill "legacy-rx" for good").
>
> Waiting in set_channels() was introduced earlier by commit 4e5e6b5d9d13
> ("iavf: Fix return of set the new channel count") to ensure the PF has
> enough time to complete the VF reset when changing channel count, and to
> return correct error codes to userspace.
>
> Commit ef490bbb226702 ("iavf: Add net_shaper_ops support") added
> net_shaper_ops to iavf, which required reset_task to use _locked NAPI
> variants (napi_enable_locked, napi_disable_locked) that need the netdev
> instance lock.
>
> Later, commit 7e4d784f5810 ("net: hold netdev instance lock during
> rtnetlink operations") and commit 2bcf4772e45adb ("net: ethtool: try to
> protect all callback with netdev instance lock") started holding the
> netdev instance lock during ndo and ethtool callbacks for drivers with
> net_shaper_ops.
>
> The combination of waiting for reset and the new locking requirements
> creates a deadlock: the callback holds the lock and waits for reset_task,
> but reset_task is blocked waiting for the same lock:
>
>    Thread 1 (callback)               Thread 2 (reset_task)
>    -------------------               ---------------------
>    netdev_lock()
>    ndo_change_mtu() or ethtool op
>      iavf_schedule_reset()
>      iavf_wait_for_reset()           iavf_reset_task()
>        waiting...                      netdev_lock() <- DEADLOCK
>
> Reproducer:
>
>    # echo 16 > /sys/class/net/$PF/device/sriov_numvfs
>    # ip link set $VF up
>    # ip link set $VF mtu 5000
>    RTNETLINK answers: Device or resource busy
>
>    # dmesg | tail -1
>    iavf: MTU change interrupted waiting for reset
>
> Fix this by temporarily releasing the lock while waiting for reset to
> complete. This follows the pattern used elsewhere in the kernel (e.g.,
> do_set_master() releases rtnl_lock before calling ndo_add_slave()).
>
> Fixes: 7e4d784f5810 ("net: hold netdev instance lock during rtnetlink operations")
> Signed-off-by: Petr Oros <poros@redhat.com>


Reviewed-by: Jijie Shao <shaojijie@huawei.com>

> ---
>   drivers/net/ethernet/intel/iavf/iavf_main.c | 15 ++++++++++++---
>   1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
> index 8aa6e92c16431f..d7738fb8fa60bc 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_main.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
> @@ -189,13 +189,22 @@ static bool iavf_is_reset_in_progress(struct iavf_adapter *adapter)
>    * iavf_wait_for_reset - Wait for reset to finish.
>    * @adapter: board private structure
>    *
> + * The iavf driver selects NET_SHAPER, so callbacks that trigger reset are
> + * always called with netdev instance lock held, while reset_task also needs
> + * this lock. Release the lock while waiting to avoid deadlock.
> + *
>    * Returns 0 if reset finished successfully, negative on timeout or interrupt.
>    */
>   int iavf_wait_for_reset(struct iavf_adapter *adapter)
>   {
> -	int ret = wait_event_interruptible_timeout(adapter->reset_waitqueue,
> -					!iavf_is_reset_in_progress(adapter),
> -					msecs_to_jiffies(5000));
> +	struct net_device *netdev = adapter->netdev;
> +	int ret;
> +
> +	netdev_unlock(netdev);
> +	ret = wait_event_interruptible_timeout(adapter->reset_waitqueue,
> +					       !iavf_is_reset_in_progress(adapter),
> +					       msecs_to_jiffies(5000));
> +	netdev_lock(netdev);
>   
>   	/* If ret < 0 then it means wait was interrupted.
>   	 * If ret == 0 then it means we got a timeout while waiting

RE: [Intel-wired-lan] [PATCH net] iavf: fix deadlock in reset handling

Posted by Loktionov, Aleksandr 5 days, 6 hours ago


> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Petr Oros
> Sent: Monday, February 2, 2026 9:48 AM
> To: netdev@vger.kernel.org
> Cc: Vecera, Ivan <ivecera@redhat.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; Eric Dumazet <edumazet@google.com>;
> linux-kernel@vger.kernel.org; Andrew Lunn <andrew+netdev@lunn.ch>;
> Stanislav Fomichev <sdf@fomichev.me>; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; intel-wired-lan@lists.osuosl.org;
> Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>;
> David S. Miller <davem@davemloft.net>
> Subject: [Intel-wired-lan] [PATCH net] iavf: fix deadlock in reset
> handling
> 
> Three driver callbacks schedule a reset and wait for its completion:
> ndo_change_mtu(), ethtool set_ringparam(), and ethtool
> set_channels().
> 
> Waiting for reset in ndo_change_mtu() and set_ringparam() was added
> by commit c2ed2403f12c74 ("iavf: Wait for reset in callbacks which
> trigger
> it") to fix a race condition where adding an interface to bonding
> immediately after MTU or ring parameter change failed because the
> interface was still in __RESETTING state. The same commit also added
> waiting in iavf_set_priv_flags(), which was later removed by commit
> 53844673d55529 ("iavf: kill "legacy-rx" for good").
> 
> Waiting in set_channels() was introduced earlier by commit
> 4e5e6b5d9d13
> ("iavf: Fix return of set the new channel count") to ensure the PF
> has enough time to complete the VF reset when changing channel
> count, and to return correct error codes to userspace.
> 
> Commit ef490bbb226702 ("iavf: Add net_shaper_ops support") added
> net_shaper_ops to iavf, which required reset_task to use _locked
> NAPI variants (napi_enable_locked, napi_disable_locked) that need
> the netdev instance lock.
> 
> Later, commit 7e4d784f5810 ("net: hold netdev instance lock during
> rtnetlink operations") and commit 2bcf4772e45adb ("net: ethtool: try
> to protect all callback with netdev instance lock") started holding
> the netdev instance lock during ndo and ethtool callbacks for
> drivers with net_shaper_ops.
> 
> The combination of waiting for reset and the new locking
> requirements creates a deadlock: the callback holds the lock and
> waits for reset_task, but reset_task is blocked waiting for the same
> lock:
> 
>   Thread 1 (callback)               Thread 2 (reset_task)
>   -------------------               ---------------------
>   netdev_lock()
>   ndo_change_mtu() or ethtool op
>     iavf_schedule_reset()
>     iavf_wait_for_reset()           iavf_reset_task()
>       waiting...                      netdev_lock() <- DEADLOCK
> 
> Reproducer:
> 
>   # echo 16 > /sys/class/net/$PF/device/sriov_numvfs
>   # ip link set $VF up
>   # ip link set $VF mtu 5000
>   RTNETLINK answers: Device or resource busy
> 
>   # dmesg | tail -1
>   iavf: MTU change interrupted waiting for reset
> 
> Fix this by temporarily releasing the lock while waiting for reset
> to complete. This follows the pattern used elsewhere in the kernel
> (e.g.,
> do_set_master() releases rtnl_lock before calling ndo_add_slave()).
> 
> Fixes: 7e4d784f5810 ("net: hold netdev instance lock during
> rtnetlink operations")
> Signed-off-by: Petr Oros <poros@redhat.com>
> ---
>  drivers/net/ethernet/intel/iavf/iavf_main.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c
> b/drivers/net/ethernet/intel/iavf/iavf_main.c
> index 8aa6e92c16431f..d7738fb8fa60bc 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_main.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
> @@ -189,13 +189,22 @@ static bool iavf_is_reset_in_progress(struct
> iavf_adapter *adapter)
>   * iavf_wait_for_reset - Wait for reset to finish.
>   * @adapter: board private structure
>   *
> + * The iavf driver selects NET_SHAPER, so callbacks that trigger
> reset
> + are
> + * always called with netdev instance lock held, while reset_task
> also
> + needs
> + * this lock. Release the lock while waiting to avoid deadlock.
> + *
>   * Returns 0 if reset finished successfully, negative on timeout or
> interrupt.
>   */
>  int iavf_wait_for_reset(struct iavf_adapter *adapter)  {
> -	int ret = wait_event_interruptible_timeout(adapter-
> >reset_waitqueue,
> -
> 	!iavf_is_reset_in_progress(adapter),
> -					msecs_to_jiffies(5000));
> +	struct net_device *netdev = adapter->netdev;
> +	int ret;
> +
> +	netdev_unlock(netdev);
> +	ret = wait_event_interruptible_timeout(adapter-
> >reset_waitqueue,
> +
> !iavf_is_reset_in_progress(adapter),
> +					       msecs_to_jiffies(5000));
> +	netdev_lock(netdev);
> 
>  	/* If ret < 0 then it means wait was interrupted.
>  	 * If ret == 0 then it means we got a timeout while waiting
> --
> 2.52.0

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>