drivers/net/ethernet/intel/e1000/e1000_main.c | 4 ++++ 1 file changed, 4 insertions(+)
e1000_down calls netif_queue_set_napi, which assumes that RTNL is held.
There are a few paths for e1000_down to be called in e1000 where RTNL is
not currently being held:
- e1000_shutdown (pci shutdown)
- e1000_suspend (power management)
- e1000_reinit_locked (via e1000_reset_task delayed work)
Hold RTNL in two places to fix this issue:
- e1000_reset_task
- __e1000_shutdown (which is called from both e1000_shutdown and
e1000_suspend).
The other paths which call e1000_down seemingly hold RTNL and are OK:
- e1000_close (ndo_stop)
- e1000_change_mtu (ndo_change_mtu)
I'm submitting this is as an RFC because:
- the e1000_reinit_locked issue appears very similar to commit
21f857f0321d ("e1000e: add rtnl_lock() to e1000_reset_task"), which
fixes a similar issue in e1000e
however
- adding rtnl to e1000_reinit_locked seemingly conflicts with an
earlier e1000 commit b2f963bfaeba ("e1000: fix lockdep warning in
e1000_reset_task").
Hopefully Intel can weigh in and shed some light on the correct way to
go.
Fixes: 8f7ff18a5ec7 ("e1000: Link NAPI instances to queues and IRQs")
Reported-by: Dmitry Antipov <dmantipov@yandex.ru>
Closes: https://lore.kernel.org/netdev/8cf62307-1965-46a0-a411-ff0080090ff9@yandex.ru/
Signed-off-by: Joe Damato <jdamato@fastly.com>
---
drivers/net/ethernet/intel/e1000/e1000_main.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 4de9b156b2be..9ed99c75d59e 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -3509,7 +3509,9 @@ static void e1000_reset_task(struct work_struct *work)
container_of(work, struct e1000_adapter, reset_task);
e_err(drv, "Reset adapter\n");
+ rtnl_lock();
e1000_reinit_locked(adapter);
+ rtnl_unlock();
}
/**
@@ -5074,7 +5076,9 @@ static int __e1000_shutdown(struct pci_dev *pdev, bool *enable_wake)
usleep_range(10000, 20000);
WARN_ON(test_bit(__E1000_RESETTING, &adapter->flags));
+ rtnl_lock();
e1000_down(adapter);
+ rtnl_unlock();
}
status = er32(STATUS);
base-commit: d811ac148f0afd2f3f7e1cd7f54de8da973ec5e3
--
2.25.1
On Tue, Oct 22, 2024 at 05:21:53PM +0000, Joe Damato wrote:
> e1000_down calls netif_queue_set_napi, which assumes that RTNL is held.
>
> There are a few paths for e1000_down to be called in e1000 where RTNL is
> not currently being held:
> - e1000_shutdown (pci shutdown)
> - e1000_suspend (power management)
> - e1000_reinit_locked (via e1000_reset_task delayed work)
>
> Hold RTNL in two places to fix this issue:
> - e1000_reset_task
> - __e1000_shutdown (which is called from both e1000_shutdown and
> e1000_suspend).
It looks like there's one other spot I missed:
e1000_io_error_detected (pci error handler) which should also hold
rtnl_lock:
+ if (netif_running(netdev)) {
+ rtnl_lock();
e1000_down(adapter);
+ rtnl_unlock();
+ }
I can send that update in the v2, but I'll wait to see if Intel has suggestions
on the below.
> The other paths which call e1000_down seemingly hold RTNL and are OK:
> - e1000_close (ndo_stop)
> - e1000_change_mtu (ndo_change_mtu)
>
> I'm submitting this is as an RFC because:
> - the e1000_reinit_locked issue appears very similar to commit
> 21f857f0321d ("e1000e: add rtnl_lock() to e1000_reset_task"), which
> fixes a similar issue in e1000e
>
> however
>
> - adding rtnl to e1000_reinit_locked seemingly conflicts with an
> earlier e1000 commit b2f963bfaeba ("e1000: fix lockdep warning in
> e1000_reset_task").
>
> Hopefully Intel can weigh in and shed some light on the correct way to
> go.
>
> Fixes: 8f7ff18a5ec7 ("e1000: Link NAPI instances to queues and IRQs")
> Reported-by: Dmitry Antipov <dmantipov@yandex.ru>
> Closes: https://lore.kernel.org/netdev/8cf62307-1965-46a0-a411-ff0080090ff9@yandex.ru/
> Signed-off-by: Joe Damato <jdamato@fastly.com>
On 10/22/2024 1:00 PM, Joe Damato wrote:
> On Tue, Oct 22, 2024 at 05:21:53PM +0000, Joe Damato wrote:
>> e1000_down calls netif_queue_set_napi, which assumes that RTNL is held.
>>
>> There are a few paths for e1000_down to be called in e1000 where RTNL is
>> not currently being held:
>> - e1000_shutdown (pci shutdown)
>> - e1000_suspend (power management)
>> - e1000_reinit_locked (via e1000_reset_task delayed work)
>>
>> Hold RTNL in two places to fix this issue:
>> - e1000_reset_task
>> - __e1000_shutdown (which is called from both e1000_shutdown and
>> e1000_suspend).
>
> It looks like there's one other spot I missed:
>
> e1000_io_error_detected (pci error handler) which should also hold
> rtnl_lock:
>
> + if (netif_running(netdev)) {
> + rtnl_lock();
> e1000_down(adapter);
> + rtnl_unlock();
> + }
>
> I can send that update in the v2, but I'll wait to see if Intel has suggestions
> on the below.
>
>> The other paths which call e1000_down seemingly hold RTNL and are OK:
>> - e1000_close (ndo_stop)
>> - e1000_change_mtu (ndo_change_mtu)
>>
>> I'm submitting this is as an RFC because:
>> - the e1000_reinit_locked issue appears very similar to commit
>> 21f857f0321d ("e1000e: add rtnl_lock() to e1000_reset_task"), which
>> fixes a similar issue in e1000e
>>
>> however
>>
>> - adding rtnl to e1000_reinit_locked seemingly conflicts with an
>> earlier e1000 commit b2f963bfaeba ("e1000: fix lockdep warning in
>> e1000_reset_task").
>>
>> Hopefully Intel can weigh in and shed some light on the correct way to
>> go.
>>
From my review, I think we need the RTNL lock around this function. The
deadlocks mentions in the fix lockdep patch appear to be due to having
an *extra* lock which could then cause issues.
>> Fixes: 8f7ff18a5ec7 ("e1000: Link NAPI instances to queues and IRQs")
>> Reported-by: Dmitry Antipov <dmantipov@yandex.ru>
>> Closes: https://lore.kernel.org/netdev/8cf62307-1965-46a0-a411-ff0080090ff9@yandex.ru/
>> Signed-off-by: Joe Damato <jdamato@fastly.com>
On Tue, Oct 22, 2024 at 01:00:47PM -0700, Joe Damato wrote:
> On Tue, Oct 22, 2024 at 05:21:53PM +0000, Joe Damato wrote:
> > e1000_down calls netif_queue_set_napi, which assumes that RTNL is held.
> >
> > There are a few paths for e1000_down to be called in e1000 where RTNL is
> > not currently being held:
> > - e1000_shutdown (pci shutdown)
> > - e1000_suspend (power management)
> > - e1000_reinit_locked (via e1000_reset_task delayed work)
> >
> > Hold RTNL in two places to fix this issue:
> > - e1000_reset_task
> > - __e1000_shutdown (which is called from both e1000_shutdown and
> > e1000_suspend).
>
> It looks like there's one other spot I missed:
>
> e1000_io_error_detected (pci error handler) which should also hold
> rtnl_lock:
>
> + if (netif_running(netdev)) {
> + rtnl_lock();
> e1000_down(adapter);
> + rtnl_unlock();
> + }
>
> I can send that update in the v2, but I'll wait to see if Intel has suggestions
> on the below.
>
> > The other paths which call e1000_down seemingly hold RTNL and are OK:
> > - e1000_close (ndo_stop)
> > - e1000_change_mtu (ndo_change_mtu)
> >
> > I'm submitting this is as an RFC because:
> > - the e1000_reinit_locked issue appears very similar to commit
> > 21f857f0321d ("e1000e: add rtnl_lock() to e1000_reset_task"), which
> > fixes a similar issue in e1000e
> >
> > however
> >
> > - adding rtnl to e1000_reinit_locked seemingly conflicts with an
> > earlier e1000 commit b2f963bfaeba ("e1000: fix lockdep warning in
> > e1000_reset_task").
> >
> > Hopefully Intel can weigh in and shed some light on the correct way to
> > go.
Regarding the above locations where rtnl_lock may need to be held,
comparing to other intel drivers:
- e1000_reset_task: it appears that igc, igb, and e100e all hold
rtnl_lock in their reset_task functions, so I think adding an
rtnl_lock / rtnl_unlock to e1000_reset_task should be OK,
despite the existence of commit b2f963bfaeba ("e1000: fix
lockdep warning in e1000_reset_task").
- e1000_io_error_detected:
- e1000e temporarily obtains and drops rtnl in
e1000e_pm_freeze
- ixgbe holds rtnl in the same path (toward the bottom of
ixgbe_io_error_detected)
- igb does NOT hold rtnl in this path (as far as I can tell)
- it was suggested in another thread to hold rtnl in this path
for igc [1].
Given that it will be added to igc and is held in this same
path in e1000e and ixgbe, I think it is safe to add it for
e1000, as well.
- e1000_shutdown:
- igb holds rtnl in the same path,
- e1000e temporarily holds it in this path (via
e1000e_pm_freeze)
- ixgbe holds rtnl in the same path
So based on the recommendation for igc [1], and the precedent set in
the other Intel drivers in most cases (except igb and the io_error
path), I think adding rtnl to all 3 locations described above is
correct.
Please let me know if you all agree. Thanks for reviewing this.
[1]: https://lore.kernel.org/netdev/40242f59-139a-4b45-8949-1210039f881b@intel.com/
On 10/22/2024 2:12 PM, Joe Damato wrote:
> On Tue, Oct 22, 2024 at 01:00:47PM -0700, Joe Damato wrote:
>> On Tue, Oct 22, 2024 at 05:21:53PM +0000, Joe Damato wrote:
>>> e1000_down calls netif_queue_set_napi, which assumes that RTNL is held.
>>>
>>> There are a few paths for e1000_down to be called in e1000 where RTNL is
>>> not currently being held:
>>> - e1000_shutdown (pci shutdown)
>>> - e1000_suspend (power management)
>>> - e1000_reinit_locked (via e1000_reset_task delayed work)
>>>
>>> Hold RTNL in two places to fix this issue:
>>> - e1000_reset_task
>>> - __e1000_shutdown (which is called from both e1000_shutdown and
>>> e1000_suspend).
>>
>> It looks like there's one other spot I missed:
>>
>> e1000_io_error_detected (pci error handler) which should also hold
>> rtnl_lock:
>>
>> + if (netif_running(netdev)) {
>> + rtnl_lock();
>> e1000_down(adapter);
>> + rtnl_unlock();
>> + }
>>
>> I can send that update in the v2, but I'll wait to see if Intel has suggestions
>> on the below.
>>
>>> The other paths which call e1000_down seemingly hold RTNL and are OK:
>>> - e1000_close (ndo_stop)
>>> - e1000_change_mtu (ndo_change_mtu)
>>>
>>> I'm submitting this is as an RFC because:
>>> - the e1000_reinit_locked issue appears very similar to commit
>>> 21f857f0321d ("e1000e: add rtnl_lock() to e1000_reset_task"), which
>>> fixes a similar issue in e1000e
>>>
>>> however
>>>
>>> - adding rtnl to e1000_reinit_locked seemingly conflicts with an
>>> earlier e1000 commit b2f963bfaeba ("e1000: fix lockdep warning in
>>> e1000_reset_task").
>>>
>>> Hopefully Intel can weigh in and shed some light on the correct way to
>>> go.
>
> Regarding the above locations where rtnl_lock may need to be held,
> comparing to other intel drivers:
>
> - e1000_reset_task: it appears that igc, igb, and e100e all hold
> rtnl_lock in their reset_task functions, so I think adding an
> rtnl_lock / rtnl_unlock to e1000_reset_task should be OK,
> despite the existence of commit b2f963bfaeba ("e1000: fix
> lockdep warning in e1000_reset_task").
>
> - e1000_io_error_detected:
> - e1000e temporarily obtains and drops rtnl in
> e1000e_pm_freeze
> - ixgbe holds rtnl in the same path (toward the bottom of
> ixgbe_io_error_detected)
> - igb does NOT hold rtnl in this path (as far as I can tell)
> - it was suggested in another thread to hold rtnl in this path
> for igc [1].
>
> Given that it will be added to igc and is held in this same
> path in e1000e and ixgbe, I think it is safe to add it for
> e1000, as well.
>
> - e1000_shutdown:
> - igb holds rtnl in the same path,
> - e1000e temporarily holds it in this path (via
> e1000e_pm_freeze)
> - ixgbe holds rtnl in the same path
>
> So based on the recommendation for igc [1], and the precedent set in
> the other Intel drivers in most cases (except igb and the io_error
> path), I think adding rtnl to all 3 locations described above is
> correct.
>
> Please let me know if you all agree. Thanks for reviewing this.
>
>
[1]:
https://lore.kernel.org/netdev/40242f59-139a-4b45-8949-1210039f881b@intel.com/
I agree with this assessment.
On Tue, Oct 22, 2024 at 02:15:27PM -0700, Jacob Keller wrote:
>
>
> On 10/22/2024 2:12 PM, Joe Damato wrote:
> > On Tue, Oct 22, 2024 at 01:00:47PM -0700, Joe Damato wrote:
> >> On Tue, Oct 22, 2024 at 05:21:53PM +0000, Joe Damato wrote:
> >>> e1000_down calls netif_queue_set_napi, which assumes that RTNL is held.
> >>>
> >>> There are a few paths for e1000_down to be called in e1000 where RTNL is
> >>> not currently being held:
> >>> - e1000_shutdown (pci shutdown)
> >>> - e1000_suspend (power management)
> >>> - e1000_reinit_locked (via e1000_reset_task delayed work)
> >>>
> >>> Hold RTNL in two places to fix this issue:
> >>> - e1000_reset_task
> >>> - __e1000_shutdown (which is called from both e1000_shutdown and
> >>> e1000_suspend).
> >>
> >> It looks like there's one other spot I missed:
> >>
> >> e1000_io_error_detected (pci error handler) which should also hold
> >> rtnl_lock:
> >>
> >> + if (netif_running(netdev)) {
> >> + rtnl_lock();
> >> e1000_down(adapter);
> >> + rtnl_unlock();
> >> + }
> >>
> >> I can send that update in the v2, but I'll wait to see if Intel has suggestions
> >> on the below.
> >>
> >>> The other paths which call e1000_down seemingly hold RTNL and are OK:
> >>> - e1000_close (ndo_stop)
> >>> - e1000_change_mtu (ndo_change_mtu)
> >>>
> >>> I'm submitting this is as an RFC because:
> >>> - the e1000_reinit_locked issue appears very similar to commit
> >>> 21f857f0321d ("e1000e: add rtnl_lock() to e1000_reset_task"), which
> >>> fixes a similar issue in e1000e
> >>>
> >>> however
> >>>
> >>> - adding rtnl to e1000_reinit_locked seemingly conflicts with an
> >>> earlier e1000 commit b2f963bfaeba ("e1000: fix lockdep warning in
> >>> e1000_reset_task").
> >>>
> >>> Hopefully Intel can weigh in and shed some light on the correct way to
> >>> go.
> >
> > Regarding the above locations where rtnl_lock may need to be held,
> > comparing to other intel drivers:
> >
> > - e1000_reset_task: it appears that igc, igb, and e100e all hold
> > rtnl_lock in their reset_task functions, so I think adding an
> > rtnl_lock / rtnl_unlock to e1000_reset_task should be OK,
> > despite the existence of commit b2f963bfaeba ("e1000: fix
> > lockdep warning in e1000_reset_task").
> >
> > - e1000_io_error_detected:
> > - e1000e temporarily obtains and drops rtnl in
> > e1000e_pm_freeze
> > - ixgbe holds rtnl in the same path (toward the bottom of
> > ixgbe_io_error_detected)
> > - igb does NOT hold rtnl in this path (as far as I can tell)
> > - it was suggested in another thread to hold rtnl in this path
> > for igc [1].
> >
> > Given that it will be added to igc and is held in this same
> > path in e1000e and ixgbe, I think it is safe to add it for
> > e1000, as well.
> >
> > - e1000_shutdown:
> > - igb holds rtnl in the same path,
> > - e1000e temporarily holds it in this path (via
> > e1000e_pm_freeze)
> > - ixgbe holds rtnl in the same path
> >
> > So based on the recommendation for igc [1], and the precedent set in
> > the other Intel drivers in most cases (except igb and the io_error
> > path), I think adding rtnl to all 3 locations described above is
> > correct.
> >
> > Please let me know if you all agree. Thanks for reviewing this.
> >
> >
> [1]:
> https://lore.kernel.org/netdev/40242f59-139a-4b45-8949-1210039f881b@intel.com/
>
> I agree with this assessment.
Thanks for taking a look. I will send an official iwl-net PATCH with
these changes once the 24 hour timer has expired.
© 2016 - 2026 Red Hat, Inc.