[v1] There are some bugfix for hibmcge driver

[PATCH net 2/2] net: hibmcge: fix wrong ndo.open() after reset fail issue.

Posted by Jijie Shao 9 months, 2 weeks ago

If the driver reset fails, it may not work properly.
Therefore, the ndo.open() operation should be rejected.

In this patch, if a reset failure is detected in ndo.open(),
return directly.

Fixes: 3f5a61f6d504 ("net: hibmcge: Add reset supported in this module")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
---
 drivers/net/ethernet/hisilicon/hibmcge/hbg_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hibmcge/hbg_main.c b/drivers/net/ethernet/hisilicon/hibmcge/hbg_main.c
index 2e64dc1ab355..6c98f906bf0d 100644
--- a/drivers/net/ethernet/hisilicon/hibmcge/hbg_main.c
+++ b/drivers/net/ethernet/hisilicon/hibmcge/hbg_main.c
@@ -35,6 +35,9 @@ static int hbg_net_open(struct net_device *netdev)
 	struct hbg_priv *priv = netdev_priv(netdev);
 	int ret;
 
+	if (test_bit(HBG_NIC_STATE_RESET_FAIL, &priv->state))
+		return -EBUSY;
+
 	ret = hbg_txrx_init(priv);
 	if (ret)
 		return ret;
-- 
2.33.0

Re: [PATCH net 2/2] net: hibmcge: fix wrong ndo.open() after reset fail issue.

Posted by Jakub Kicinski 9 months, 2 weeks ago

On Wed, 30 Apr 2025 17:31:27 +0800 Jijie Shao wrote:
> If the driver reset fails, it may not work properly.
> Therefore, the ndo.open() operation should be rejected.

Why not call netif_device_detach() if the reset failed and let the core
code handle blocking the callbacks?
-- 
pw-bot: cr

Re: [PATCH net 2/2] net: hibmcge: fix wrong ndo.open() after reset fail issue.

Posted by Jijie Shao 9 months ago

on 2025/5/1 22:23, Jakub Kicinski wrote:
> On Wed, 30 Apr 2025 17:31:27 +0800 Jijie Shao wrote:
>> If the driver reset fails, it may not work properly.
>> Therefore, the ndo.open() operation should be rejected.
> Why not call netif_device_detach() if the reset failed and let the core
> code handle blocking the callbacks?

Hi:

If driver call netif_device_detach() after reset failed,
The network port cannot be operated. and I can't re-do the reset.
So how does the core code handle blocking callbacks?
Is there a good time to call netif_device_attach()?

Or I need to implement pci_error_handlers.resume()?


[root@localhost sjj]# ethtool --reset enp132s0f1 dedicated
ETHTOOL_RESET 0xffff
Cannot issue ETHTOOL_RESET: Device or resource busy
[root@localhost sjj]# ethtool --reset enp132s0f1 dedicated
ETHTOOL_RESET 0xffff
Cannot issue ETHTOOL_RESET: No such device
[root@localhost sjj]# ifconfig enp132s0f1 up
SIOCSIFFLAGS: No such device

Thanks,
Jijie Shao

Re: [PATCH net 2/2] net: hibmcge: fix wrong ndo.open() after reset fail issue.

Posted by Jakub Kicinski 9 months ago

On Wed, 14 May 2025 10:40:26 +0800 Jijie Shao wrote:
> on 2025/5/1 22:23, Jakub Kicinski wrote:
> > On Wed, 30 Apr 2025 17:31:27 +0800 Jijie Shao wrote:  
> >> If the driver reset fails, it may not work properly.
> >> Therefore, the ndo.open() operation should be rejected.  
> > Why not call netif_device_detach() if the reset failed and let the core
> > code handle blocking the callbacks?  
> 
> If driver call netif_device_detach() after reset failed,
> The network port cannot be operated. and I can't re-do the reset.
> So how does the core code handle blocking callbacks?
> Is there a good time to call netif_device_attach()?
> 
> Or I need to implement pci_error_handlers.resume()?
> 
> 
> [root@localhost sjj]# ethtool --reset enp132s0f1 dedicated
> ETHTOOL_RESET 0xffff
> Cannot issue ETHTOOL_RESET: Device or resource busy
> [root@localhost sjj]# ethtool --reset enp132s0f1 dedicated
> ETHTOOL_RESET 0xffff
> Cannot issue ETHTOOL_RESET: No such device
> [root@localhost sjj]# ifconfig enp132s0f1 up
> SIOCSIFFLAGS: No such device

netdev APIs may not be the right path to recover the device after reset
failure. Can you use a PCI reset (via sysfs) or devlink ?

Re: [PATCH net 2/2] net: hibmcge: fix wrong ndo.open() after reset fail issue.

Posted by Jijie Shao 9 months ago

on 2025/5/15 0:08, Jakub Kicinski wrote:
> On Wed, 14 May 2025 10:40:26 +0800 Jijie Shao wrote:
>> on 2025/5/1 22:23, Jakub Kicinski wrote:
>>> On Wed, 30 Apr 2025 17:31:27 +0800 Jijie Shao wrote:
>>>> If the driver reset fails, it may not work properly.
>>>> Therefore, the ndo.open() operation should be rejected.
>>> Why not call netif_device_detach() if the reset failed and let the core
>>> code handle blocking the callbacks?
>> If driver call netif_device_detach() after reset failed,
>> The network port cannot be operated. and I can't re-do the reset.
>> So how does the core code handle blocking callbacks?
>> Is there a good time to call netif_device_attach()?
>>
>> Or I need to implement pci_error_handlers.resume()?
>>
>>
>> [root@localhost sjj]# ethtool --reset enp132s0f1 dedicated
>> ETHTOOL_RESET 0xffff
>> Cannot issue ETHTOOL_RESET: Device or resource busy
>> [root@localhost sjj]# ethtool --reset enp132s0f1 dedicated
>> ETHTOOL_RESET 0xffff
>> Cannot issue ETHTOOL_RESET: No such device
>> [root@localhost sjj]# ifconfig enp132s0f1 up
>> SIOCSIFFLAGS: No such device
> netdev APIs may not be the right path to recover the device after reset
> failure. Can you use a PCI reset (via sysfs) or devlink ?

PCI reset (via sysfs) can be used:
[root@localhost sjj]# ethtool --reset enp132s0f1 dedicated
ETHTOOL_RESET 0xffff
Cannot issue ETHTOOL_RESET: No such device
[root@localhost sjj]# echo 1 > /sys/bus/pci/devices/0000\:84\:00.1/reset
[200643.771030] hibmcge 0000:84:00.1: reset done
[root@localhost sjj]# ethtool --reset enp132s0f1 dedicated
ETHTOOL_RESET 0xffff
Cannot issue ETHTOOL_RESET: No such device

So, I need call netif_device_attach() in pci_error_handlers.reset_done() ?

In this scenario, only PCI reset can be used, which imposes significant restrictions on users.

Thanks,
Jijie Shao