net/ncsi/ncsi-rsp.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
From: Potin Lai <potin.lai@quantatw.com>
This reverts commit 790071347a0a1a89e618eedcd51c687ea783aeb3.
We are seeing kernel panic when enabling two NCSI interfaces at same
time. It looks like mutex lock is being used in softirq caused the
issue.
Kernel panic log:
```
[ 224.323380] 8021q: adding VLAN 0 to HW filter on device eth0
[ 224.337533] ftgmac100 1e670000.ethernet eth0: NCSI: Handler for packet type 0x82 returned -19
[ 224.358372] BUG: scheduling while atomic: systemd-network/697/0x00000100
[ 224.373274] Modules linked in:
[ 224.373817] 8021q: adding VLAN 0 to HW filter on device eth1
[ 224.380063] CPU: 0 PID: 697 Comm: systemd-network Tainted: G W 6.6.62-8ea1fc6-dirty-cbd80d0-gcbd80d04d13c #1
[ 224.380081] Hardware name: Generic DT based system
[ 224.380096] unwind_backtrace from show_stack+0x18/0x1c
[ 224.439407] show_stack from dump_stack_lvl+0x40/0x4c
[ 224.450573] dump_stack_lvl from __schedule_bug+0x5c/0x70
[ 224.462492] __schedule_bug from __schedule+0x884/0x968
[ 224.474026] __schedule from schedule+0x58/0xa8
[ 224.484026] schedule from schedule_preempt_disabled+0x14/0x18
[ 224.496906] schedule_preempt_disabled from __mutex_lock.constprop.0+0x350/0x76c
[ 224.513235] __mutex_lock.constprop.0 from ncsi_rsp_handler_oem_gma+0x104/0x1a0
[ 224.529367] ncsi_rsp_handler_oem_gma from ncsi_rcv_rsp+0x120/0x2cc
[ 224.543195] ncsi_rcv_rsp from __netif_receive_skb_one_core+0x60/0x84
[ 224.557413] __netif_receive_skb_one_core from netif_receive_skb+0x38/0x148
[ 224.572779] netif_receive_skb from ftgmac100_poll+0x358/0x444
[ 224.585656] ftgmac100_poll from __napi_poll.constprop.0+0x34/0x1d0
[ 224.599490] __napi_poll.constprop.0 from net_rx_action+0x350/0x43c
[ 224.613325] net_rx_action from handle_softirqs+0x114/0x32c
[ 224.625624] handle_softirqs from irq_exit+0x88/0xb8
[ 224.636575] irq_exit from call_with_stack+0x18/0x20
[ 224.647530] call_with_stack from __irq_usr+0x78/0xa0
[ 224.658675] Exception stack(0xe075dfb0 to 0xe075dff8)
[ 224.669799] dfa0: 00000000 00000000 00000000 00000020
[ 224.687843] dfc0: 00000069 aefde3e0 00000000 00000000 00000000 00000000 00000000 aefde4e4
[ 224.705887] dfe0: 01010101 aefddf20 a6b4331c a6b43618 600f0010 ffffffff
[ 224.721100] ------------[ cut here ]------------
```
Signed-off-by: Potin Lai <potin.lai.pt@gmail.com>
---
net/ncsi/ncsi-rsp.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
index e28be33bdf2c..0cd7b916d3f8 100644
--- a/net/ncsi/ncsi-rsp.c
+++ b/net/ncsi/ncsi-rsp.c
@@ -629,6 +629,7 @@ static int ncsi_rsp_handler_oem_gma(struct ncsi_request *nr, int mfr_id)
{
struct ncsi_dev_priv *ndp = nr->ndp;
struct net_device *ndev = ndp->ndev.dev;
+ const struct net_device_ops *ops = ndev->netdev_ops;
struct ncsi_rsp_oem_pkt *rsp;
struct sockaddr saddr;
u32 mac_addr_off = 0;
@@ -655,9 +656,7 @@ static int ncsi_rsp_handler_oem_gma(struct ncsi_request *nr, int mfr_id)
/* Set the flag for GMA command which should only be called once */
ndp->gma_flag = 1;
- rtnl_lock();
- ret = dev_set_mac_address(ndev, &saddr, NULL);
- rtnl_unlock();
+ ret = ops->ndo_set_mac_address(ndev, &saddr);
if (ret < 0)
netdev_warn(ndev, "NCSI: 'Writing mac address to device failed\n");
---
base-commit: 59b723cd2adbac2a34fc8e12c74ae26ae45bf230
change-id: 20241129-potin-revert-ncsi-set-mac-addr-7122f2896258
Best regards,
--
Potin Lai <potin.lai.pt@gmail.com>
On Fri, 29 Nov 2024 17:12:56 +0800 Potin Lai wrote: > This reverts commit 790071347a0a1a89e618eedcd51c687ea783aeb3. > > We are seeing kernel panic when enabling two NCSI interfaces at same > time. It looks like mutex lock is being used in softirq caused the > issue. I agree with Andrew that revert makes sense. On top of his suggestions please add a correctly formatted Fixes tag and make sure you CC the authors of the buggy change, and post a v2. -- pw-bot: cr
On Fri, Nov 29, 2024 at 05:12:56PM +0800, Potin Lai wrote: > From: Potin Lai <potin.lai@quantatw.com> > > This reverts commit 790071347a0a1a89e618eedcd51c687ea783aeb3. > > We are seeing kernel panic when enabling two NCSI interfaces at same > time. It looks like mutex lock is being used in softirq caused the > issue. So a revert does make sense, you are seeing a real problem from that commit. However with the revert, is the code actually correct? Or is it missing some locking? Normally dev_addr_sem is used to protect against two calls to change the MAC address at once. Is this protection needed? It would also be typical to hold RTNL while changing the MAC address. So it would be nice to see an analysis of the locking, and maybe the revert commit message says this gets you from a broken state to a less broken state, and the real fix will be submitted soon? Andrew
© 2016 - 2025 Red Hat, Inc.