From: Carolina Jubran <cjubran@nvidia.com>
There's a race between disabling DIM and NAPI callbacks using the dim
pointer on the RQ or SQ.
If NAPI checks the DIM state bit and sees it still set, it assumes
`rq->dim` or `sq->dim` is valid. But if DIM gets disabled right after
that check, the pointer might already be set to NULL, leading to a NULL
pointer dereference in net_dim().
Fix this by calling `synchronize_net()` before freeing the DIM context.
This ensures all in-progress NAPI callbacks are finished before the
pointer is cleared.
Kernel log:
BUG: kernel NULL pointer dereference, address: 0000000000000000
...
RIP: 0010:net_dim+0x23/0x190
...
Call Trace:
<TASK>
? __die+0x20/0x60
? page_fault_oops+0x150/0x3e0
? common_interrupt+0xf/0xa0
? sysvec_call_function_single+0xb/0x90
? exc_page_fault+0x74/0x130
? asm_exc_page_fault+0x22/0x30
? net_dim+0x23/0x190
? mlx5e_poll_ico_cq+0x41/0x6f0 [mlx5_core]
? sysvec_apic_timer_interrupt+0xb/0x90
mlx5e_handle_rx_dim+0x92/0xd0 [mlx5_core]
mlx5e_napi_poll+0x2cd/0xac0 [mlx5_core]
? mlx5e_poll_ico_cq+0xe5/0x6f0 [mlx5_core]
busy_poll_stop+0xa2/0x200
? mlx5e_napi_poll+0x1d9/0xac0 [mlx5_core]
? mlx5e_trigger_irq+0x130/0x130 [mlx5_core]
__napi_busy_loop+0x345/0x3b0
? sysvec_call_function_single+0xb/0x90
? asm_sysvec_call_function_single+0x16/0x20
? sysvec_apic_timer_interrupt+0xb/0x90
? pcpu_free_area+0x1e4/0x2e0
napi_busy_loop+0x11/0x20
xsk_recvmsg+0x10c/0x130
sock_recvmsg+0x44/0x70
__sys_recvfrom+0xbc/0x130
? __schedule+0x398/0x890
__x64_sys_recvfrom+0x20/0x30
do_syscall_64+0x4c/0x100
entry_SYSCALL_64_after_hwframe+0x4b/0x53
...
---[ end trace 0000000000000000 ]---
...
---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
Fixes: 445a25f6e1a2 ("net/mlx5e: Support updating coalescing configuration without resetting channels")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en_dim.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
index 298bb74ec5e9..d1d629697e28 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c
@@ -113,7 +113,7 @@ int mlx5e_dim_rx_change(struct mlx5e_rq *rq, bool enable)
__set_bit(MLX5E_RQ_STATE_DIM, &rq->state);
} else {
__clear_bit(MLX5E_RQ_STATE_DIM, &rq->state);
-
+ synchronize_net();
mlx5e_dim_disable(rq->dim);
rq->dim = NULL;
}
@@ -140,7 +140,7 @@ int mlx5e_dim_tx_change(struct mlx5e_txqsq *sq, bool enable)
__set_bit(MLX5E_SQ_STATE_DIM, &sq->state);
} else {
__clear_bit(MLX5E_SQ_STATE_DIM, &sq->state);
-
+ synchronize_net();
mlx5e_dim_disable(sq->dim);
sq->dim = NULL;
}
--
2.31.1
On 7/10/2025 6:53 AM, Tariq Toukan wrote: > From: Carolina Jubran <cjubran@nvidia.com> > > There's a race between disabling DIM and NAPI callbacks using the dim > pointer on the RQ or SQ. > > If NAPI checks the DIM state bit and sees it still set, it assumes > `rq->dim` or `sq->dim` is valid. But if DIM gets disabled right after > that check, the pointer might already be set to NULL, leading to a NULL > pointer dereference in net_dim(). > > Fix this by calling `synchronize_net()` before freeing the DIM context. > This ensures all in-progress NAPI callbacks are finished before the > pointer is cleared. > > Kernel log: > > BUG: kernel NULL pointer dereference, address: 0000000000000000 > ... > RIP: 0010:net_dim+0x23/0x190 > ... > Call Trace: > <TASK> > ? __die+0x20/0x60 > ? page_fault_oops+0x150/0x3e0 > ? common_interrupt+0xf/0xa0 > ? sysvec_call_function_single+0xb/0x90 > ? exc_page_fault+0x74/0x130 > ? asm_exc_page_fault+0x22/0x30 > ? net_dim+0x23/0x190 > ? mlx5e_poll_ico_cq+0x41/0x6f0 [mlx5_core] > ? sysvec_apic_timer_interrupt+0xb/0x90 > mlx5e_handle_rx_dim+0x92/0xd0 [mlx5_core] > mlx5e_napi_poll+0x2cd/0xac0 [mlx5_core] > ? mlx5e_poll_ico_cq+0xe5/0x6f0 [mlx5_core] > busy_poll_stop+0xa2/0x200 > ? mlx5e_napi_poll+0x1d9/0xac0 [mlx5_core] > ? mlx5e_trigger_irq+0x130/0x130 [mlx5_core] > __napi_busy_loop+0x345/0x3b0 > ? sysvec_call_function_single+0xb/0x90 > ? asm_sysvec_call_function_single+0x16/0x20 > ? sysvec_apic_timer_interrupt+0xb/0x90 > ? pcpu_free_area+0x1e4/0x2e0 > napi_busy_loop+0x11/0x20 > xsk_recvmsg+0x10c/0x130 > sock_recvmsg+0x44/0x70 > __sys_recvfrom+0xbc/0x130 > ? __schedule+0x398/0x890 > __x64_sys_recvfrom+0x20/0x30 > do_syscall_64+0x4c/0x100 > entry_SYSCALL_64_after_hwframe+0x4b/0x53 > ... > ---[ end trace 0000000000000000 ]--- > ... > ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- > Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> > Fixes: 445a25f6e1a2 ("net/mlx5e: Support updating coalescing configuration without resetting channels") > Signed-off-by: Carolina Jubran <cjubran@nvidia.com> > Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> > Signed-off-by: Tariq Toukan <tariqt@nvidia.com> > --- > drivers/net/ethernet/mellanox/mlx5/core/en_dim.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c > index 298bb74ec5e9..d1d629697e28 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dim.c > @@ -113,7 +113,7 @@ int mlx5e_dim_rx_change(struct mlx5e_rq *rq, bool enable) > __set_bit(MLX5E_RQ_STATE_DIM, &rq->state); > } else { > __clear_bit(MLX5E_RQ_STATE_DIM, &rq->state); > - > + synchronize_net(); We've requested NAPI be disabled by this point, and just need to guarantee that the already running threads close. Makes sense. > mlx5e_dim_disable(rq->dim); > rq->dim = NULL; > } > @@ -140,7 +140,7 @@ int mlx5e_dim_tx_change(struct mlx5e_txqsq *sq, bool enable) > __set_bit(MLX5E_SQ_STATE_DIM, &sq->state); > } else { > __clear_bit(MLX5E_SQ_STATE_DIM, &sq->state); > - > + synchronize_net(); > mlx5e_dim_disable(sq->dim); > sq->dim = NULL; > }
© 2016 - 2025 Red Hat, Inc.