[v1] ice: fix synchronization between .ndo_bpf() and reset

[PATCH iwl-net 0/3] ice: fix synchronization between .ndo_bpf() and reset

Posted by Larysa Zaremba 1 year, 8 months ago

Fix the problems that are triggered by tx_timeout and ice_xdp() calls,
including both pool and program operations.

PF reset can be triggered asynchronously, e.g. by tx_timeout. With some
unfortunate timings both reset and .ndo_bpf will try to access and modify
XDP rings at the same time, causing system crash, such as the one below:

[ +1.999878] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 14
[ +2.002992] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 18
[Mar15 18:17] ice 0000:b1:00.0 ens801f0np0: NETDEV WATCHDOG: CPU: 38: transmit queue 14 timed out 80692736 ms
[ +0.000093] ice 0000:b1:00.0 ens801f0np0: tx_timeout: VSI_num: 6, Q 14, NTC: 0x0, HW_HEAD: 0x0, NTU: 0x0, INT: 0x4000001
[ +0.000012] ice 0000:b1:00.0 ens801f0np0: tx_timeout recovery level 1, txqueue 14
[ +0.394718] ice 0000:b1:00.0: PTP reset successful
[ +0.006184] BUG: kernel NULL pointer dereference, address: 0000000000000098
[ +0.000045] #PF: supervisor read access in kernel mode
[ +0.000023] #PF: error_code(0x0000) - not-present page
[ +0.000023] PGD 0 P4D 0
[ +0.000018] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ +0.000023] CPU: 38 PID: 7540 Comm: kworker/38:1 Not tainted 6.8.0-rc7 #1
[ +0.000031] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
[ +0.000036] Workqueue: ice ice_service_task [ice]
[ +0.000183] RIP: 0010:ice_clean_tx_ring+0xa/0xd0 [ice]
[...]
[ +0.000013] Call Trace:
[ +0.000016] <TASK>
[ +0.000014] ? __die+0x1f/0x70
[ +0.000029] ? page_fault_oops+0x171/0x4f0
[ +0.000029] ? schedule+0x3b/0xd0
[ +0.000027] ? exc_page_fault+0x7b/0x180
[ +0.000022] ? asm_exc_page_fault+0x22/0x30
[ +0.000031] ? ice_clean_tx_ring+0xa/0xd0 [ice]
[ +0.000194] ice_free_tx_ring+0xe/0x60 [ice]
[ +0.000186] ice_destroy_xdp_rings+0x157/0x310 [ice]
[ +0.000151] ice_vsi_decfg+0x53/0xe0 [ice]
[ +0.000180] ice_vsi_rebuild+0x239/0x540 [ice]
[ +0.000186] ice_vsi_rebuild_by_type+0x76/0x180 [ice]
[ +0.000145] ice_rebuild+0x18c/0x840 [ice]
[ +0.000145] ? delay_tsc+0x4a/0xc0
[ +0.000022] ? delay_tsc+0x92/0xc0
[ +0.000020] ice_do_reset+0x140/0x180 [ice]
[ +0.000886] ice_service_task+0x404/0x1030 [ice]
[ +0.000824] process_one_work+0x171/0x340
[ +0.000685] worker_thread+0x277/0x3a0
[ +0.000675] ? preempt_count_add+0x6a/0xa0
[ +0.000677] ? _raw_spin_lock_irqsave+0x23/0x50
[ +0.000679] ? __pfx_worker_thread+0x10/0x10
[ +0.000653] kthread+0xf0/0x120
[ +0.000635] ? __pfx_kthread+0x10/0x10
[ +0.000616] ret_from_fork+0x2d/0x50
[ +0.000612] ? __pfx_kthread+0x10/0x10
[ +0.000604] ret_from_fork_asm+0x1b/0x30
[ +0.000604] </TASK>

Larysa Zaremba (3):
  ice: synchronize XDP setup with reset
  ice: fix locking in ice_xsk_pool_setup()
  ice: make NAPI setting code aware that rtnl-locked request is waiting

 drivers/net/ethernet/intel/ice/ice.h      |  2 ++
 drivers/net/ethernet/intel/ice/ice_lib.c  | 23 ++++++++++---
 drivers/net/ethernet/intel/ice/ice_main.c | 39 ++++++++++++++++++++---
 drivers/net/ethernet/intel/ice/ice_xsk.c  | 12 ++-----
 4 files changed, 57 insertions(+), 19 deletions(-)

-- 
2.43.0

Re: [PATCH iwl-net 0/3] ice: fix synchronization between .ndo_bpf() and reset

Posted by Jakub Kicinski 1 year, 8 months ago

On Mon, 10 Jun 2024 17:37:12 +0200 Larysa Zaremba wrote:
> Fix the problems that are triggered by tx_timeout and ice_xdp() calls,
> including both pool and program operations.

Is there really no way for ice to fix the locking? :(
The busy loops and trylocks() are not great, and seem like duct tape.

Re: [PATCH iwl-net 0/3] ice: fix synchronization between .ndo_bpf() and reset

Posted by Larysa Zaremba 1 year, 8 months ago

On Tue, Jun 11, 2024 at 07:38:37PM -0700, Jakub Kicinski wrote:
> On Mon, 10 Jun 2024 17:37:12 +0200 Larysa Zaremba wrote:
> > Fix the problems that are triggered by tx_timeout and ice_xdp() calls,
> > including both pool and program operations.
> 
> Is there really no way for ice to fix the locking? :(
> The busy loops and trylocks() are not great, and seem like duct tape.
> 

The locking mechanisms I use here do not look pretty, but if I am not missing 
anything, the synchronization they provide must be robust.

A prettier way of protecting the same critical sections would be replacing 
ICE_CFG_BUSY around ice_vsi_rebuild() with rtnl_lock(), this would eliminate 
locking code from .ndo_bpf() altogether, ice_rebuild_pending() logic will have 
to stay.

At some point I have decided to avoid using rtnl_lock(), if I do not have to. I 
think this is a goal worth pursuing?

Re: [PATCH iwl-net 0/3] ice: fix synchronization between .ndo_bpf() and reset

Posted by Jakub Kicinski 1 year, 8 months ago

On Wed, 12 Jun 2024 08:56:38 +0200 Larysa Zaremba wrote:
> On Tue, Jun 11, 2024 at 07:38:37PM -0700, Jakub Kicinski wrote:
> > On Mon, 10 Jun 2024 17:37:12 +0200 Larysa Zaremba wrote:  
> > > Fix the problems that are triggered by tx_timeout and ice_xdp() calls,
> > > including both pool and program operations.  
> > 
> > Is there really no way for ice to fix the locking? :(
> > The busy loops and trylocks() are not great, and seem like duct tape.
> 
> The locking mechanisms I use here do not look pretty, but if I am not missing 
> anything, the synchronization they provide must be robust.

Robust as in they may be correct here, but you lose lockdep and all
other infra normal mutex would give you.

> A prettier way of protecting the same critical sections would be replacing 
> ICE_CFG_BUSY around ice_vsi_rebuild() with rtnl_lock(), this would eliminate 
> locking code from .ndo_bpf() altogether, ice_rebuild_pending() logic will have 
> to stay.
> 
> At some point I have decided to avoid using rtnl_lock(), if I do not have to. I 
> think this is a goal worth pursuing?

Is the reset for failure recovery, rather than reconfiguration? 
If so netif_device_detach() is generally the best way of avoiding
getting called (I think I mentioned it to someone @intal recently).

Re: [PATCH iwl-net 0/3] ice: fix synchronization between .ndo_bpf() and reset

Posted by Larysa Zaremba 1 year, 8 months ago

On Wed, Jun 12, 2024 at 02:09:35PM -0700, Jakub Kicinski wrote:
> On Wed, 12 Jun 2024 08:56:38 +0200 Larysa Zaremba wrote:
> > On Tue, Jun 11, 2024 at 07:38:37PM -0700, Jakub Kicinski wrote:
> > > On Mon, 10 Jun 2024 17:37:12 +0200 Larysa Zaremba wrote:  
> > > > Fix the problems that are triggered by tx_timeout and ice_xdp() calls,
> > > > including both pool and program operations.  
> > > 
> > > Is there really no way for ice to fix the locking? :(
> > > The busy loops and trylocks() are not great, and seem like duct tape.
> > 
> > The locking mechanisms I use here do not look pretty, but if I am not missing 
> > anything, the synchronization they provide must be robust.
> 
> Robust as in they may be correct here, but you lose lockdep and all
> other infra normal mutex would give you.
> 

I know, but __netif_queue_set_napi() requires rtnl_lock() inside the potential 
critical section and creates a deadlock this way. However, after reading 
patches that introduce this function, I think it is called too early in the
configuration. Seems like it should be called somewhere right after 
netif_set_real_num_rx/_tx_queues(), much later in the configuration where we 
already hold the rtnl_lock(). In such way, ice_vsi_rebuild() could be protected 
with an internal mutex. WDYT?

> > A prettier way of protecting the same critical sections would be replacing 
> > ICE_CFG_BUSY around ice_vsi_rebuild() with rtnl_lock(), this would eliminate 
> > locking code from .ndo_bpf() altogether, ice_rebuild_pending() logic will have 
> > to stay.
> > 
> > At some point I have decided to avoid using rtnl_lock(), if I do not have to. I 
> > think this is a goal worth pursuing?
> 
> Is the reset for failure recovery, rather than reconfiguration? 
> If so netif_device_detach() is generally the best way of avoiding
> getting called (I think I mentioned it to someone @intal recently).

AFAIK, netif_device_detach() does not affect .ndo_bpf() calls. We were trying 
such approach with idpf and it does work for ethtool, but not for XDP.

Re: [PATCH iwl-net 0/3] ice: fix synchronization between .ndo_bpf() and reset

Posted by Jakub Kicinski 1 year, 8 months ago

On Thu, 13 Jun 2024 10:54:12 +0200 Larysa Zaremba wrote:
> > > The locking mechanisms I use here do not look pretty, but if I am not missing 
> > > anything, the synchronization they provide must be robust.  
> > 
> > Robust as in they may be correct here, but you lose lockdep and all
> > other infra normal mutex would give you.
> 
> I know, but __netif_queue_set_napi() requires rtnl_lock() inside the potential 
> critical section and creates a deadlock this way. However, after reading 
> patches that introduce this function, I think it is called too early in the
> configuration. Seems like it should be called somewhere right after 
> netif_set_real_num_rx/_tx_queues(), much later in the configuration where we 
> already hold the rtnl_lock(). In such way, ice_vsi_rebuild() could be protected 
> with an internal mutex. WDYT?

On a quick look I think that may work. For setting the NAPI it makes
sense - netif_set_real_num_rx/_tx_queues() and netif_queue_set_napi()
both inform netdev about the queue config, so its logical to keep them
together. I was worried there may be an inconveniently placed
netif_queue_set_napi() call which is clearing the NAPI pointer.
But I don't see one.

> > > A prettier way of protecting the same critical sections would be replacing 
> > > ICE_CFG_BUSY around ice_vsi_rebuild() with rtnl_lock(), this would eliminate 
> > > locking code from .ndo_bpf() altogether, ice_rebuild_pending() logic will have 
> > > to stay.
> > > 
> > > At some point I have decided to avoid using rtnl_lock(), if I do not have to. I 
> > > think this is a goal worth pursuing?  
> > 
> > Is the reset for failure recovery, rather than reconfiguration? 
> > If so netif_device_detach() is generally the best way of avoiding
> > getting called (I think I mentioned it to someone @intal recently).  
> 
> AFAIK, netif_device_detach() does not affect .ndo_bpf() calls. We were trying 
> such approach with idpf and it does work for ethtool, but not for XDP.

I reckon that's an unintentional omission. In theory XDP is "pure
software" but if the device is running driver will likely have to
touch HW to reconfigure. So, if you're willing, do send a ndo_bpf 
patch to add a detached check.

Re: [PATCH iwl-net 0/3] ice: fix synchronization between .ndo_bpf() and reset

Posted by Larysa Zaremba 1 year, 8 months ago

On Thu, Jun 13, 2024 at 07:13:43AM -0700, Jakub Kicinski wrote:
> On Thu, 13 Jun 2024 10:54:12 +0200 Larysa Zaremba wrote:
> > > > The locking mechanisms I use here do not look pretty, but if I am not missing 
> > > > anything, the synchronization they provide must be robust.  
> > > 
> > > Robust as in they may be correct here, but you lose lockdep and all
> > > other infra normal mutex would give you.
> > 
> > I know, but __netif_queue_set_napi() requires rtnl_lock() inside the potential 
> > critical section and creates a deadlock this way. However, after reading 
> > patches that introduce this function, I think it is called too early in the
> > configuration. Seems like it should be called somewhere right after 
> > netif_set_real_num_rx/_tx_queues(), much later in the configuration where we 
> > already hold the rtnl_lock(). In such way, ice_vsi_rebuild() could be protected 
> > with an internal mutex. WDYT?
> 
> On a quick look I think that may work. For setting the NAPI it makes
> sense - netif_set_real_num_rx/_tx_queues() and netif_queue_set_napi()
> both inform netdev about the queue config, so its logical to keep them
> together. I was worried there may be an inconveniently placed
> netif_queue_set_napi() call which is clearing the NAPI pointer.
> But I don't see one.
>

Ok, will do this in v2. Thanks for the discussion.
 
> > > > A prettier way of protecting the same critical sections would be replacing 
> > > > ICE_CFG_BUSY around ice_vsi_rebuild() with rtnl_lock(), this would eliminate 
> > > > locking code from .ndo_bpf() altogether, ice_rebuild_pending() logic will have 
> > > > to stay.
> > > > 
> > > > At some point I have decided to avoid using rtnl_lock(), if I do not have to. I 
> > > > think this is a goal worth pursuing?  
> > > 
> > > Is the reset for failure recovery, rather than reconfiguration? 
> > > If so netif_device_detach() is generally the best way of avoiding
> > > getting called (I think I mentioned it to someone @intal recently).  
> > 
> > AFAIK, netif_device_detach() does not affect .ndo_bpf() calls. We were trying 
> > such approach with idpf and it does work for ethtool, but not for XDP.
> 
> I reckon that's an unintentional omission. In theory XDP is "pure
> software" but if the device is running driver will likely have to
> touch HW to reconfigure. So, if you're willing, do send a ndo_bpf 
> patch to add a detached check.

This does not seem that simple. In cases of program/pool detachment, 
.ndo_bpf() does not accept 'no' as an answer, so there is no easy existing way 
of handling !netif_device_present() either. And we have to notify the driver 
that pool/program is no longer needed no matter what. So what is left is somehow 
postpone pool/prog removal to after the netdev gets attached again.

Re: [PATCH iwl-net 0/3] ice: fix synchronization between .ndo_bpf() and reset

Posted by Jakub Kicinski 1 year, 8 months ago

On Thu, 13 Jun 2024 17:36:16 +0200 Larysa Zaremba wrote:
> > > AFAIK, netif_device_detach() does not affect .ndo_bpf() calls. We were trying 
> > > such approach with idpf and it does work for ethtool, but not for XDP.  
> > 
> > I reckon that's an unintentional omission. In theory XDP is "pure
> > software" but if the device is running driver will likely have to
> > touch HW to reconfigure. So, if you're willing, do send a ndo_bpf 
> > patch to add a detached check.  
> 
> This does not seem that simple. In cases of program/pool detachment, 
> .ndo_bpf() does not accept 'no' as an answer, so there is no easy existing way 
> of handling !netif_device_present() either. And we have to notify the driver 
> that pool/program is no longer needed no matter what. So what is left is somehow 
> postpone pool/prog removal to after the netdev gets attached again.

I see, thanks for investigating!

Re: [Intel-wired-lan] [PATCH iwl-net 0/3] ice: fix synchronization between .ndo_bpf() and reset

Posted by Przemek Kitszel 1 year, 8 months ago

On 6/13/24 10:54, Larysa Zaremba wrote:
> On Wed, Jun 12, 2024 at 02:09:35PM -0700, Jakub Kicinski wrote:
>> On Wed, 12 Jun 2024 08:56:38 +0200 Larysa Zaremba wrote:
>>> On Tue, Jun 11, 2024 at 07:38:37PM -0700, Jakub Kicinski wrote:
>>>> On Mon, 10 Jun 2024 17:37:12 +0200 Larysa Zaremba wrote:
>>>>> Fix the problems that are triggered by tx_timeout and ice_xdp() calls,
>>>>> including both pool and program operations.
>>>>
>>>> Is there really no way for ice to fix the locking? :(
>>>> The busy loops and trylocks() are not great, and seem like duct tape.
>>>
>>> The locking mechanisms I use here do not look pretty, but if I am not missing
>>> anything, the synchronization they provide must be robust.
>>
>> Robust as in they may be correct here, but you lose lockdep and all
>> other infra normal mutex would give you.
>>
> 
> I know, but __netif_queue_set_napi() requires rtnl_lock() inside the potential
> critical section and creates a deadlock this way. However, after reading
> patches that introduce this function, I think it is called too early in the
> configuration. Seems like it should be called somewhere right after
> netif_set_real_num_rx/_tx_queues(), much later in the configuration where we
> already hold the rtnl_lock(). In such way, ice_vsi_rebuild() could be protected
> with an internal mutex. WDYT?
> 
>>> A prettier way of protecting the same critical sections would be replacing
>>> ICE_CFG_BUSY around ice_vsi_rebuild() with rtnl_lock(), this would eliminate
>>> locking code from .ndo_bpf() altogether, ice_rebuild_pending() logic will have
>>> to stay.
>>>
>>> At some point I have decided to avoid using rtnl_lock(), if I do not have to. I
>>> think this is a goal worth pursuing?
>>
>> Is the reset for failure recovery, rather than reconfiguration?
>> If so netif_device_detach() is generally the best way of avoiding
>> getting called (I think I mentioned it to someone @intal recently).

for the reference, it was to Dawid here:
https://lore.kernel.org/netdev/20240610194756.5be5be90@kernel.org/

> 
> AFAIK, netif_device_detach() does not affect .ndo_bpf() calls. We were trying
> such approach with idpf and it does work for ethtool, but not for XDP.