[v1] There are some bugfix for the HNS3 ethernet driver

[PATCH net 6/7] net: hns3: fix VF reset fail issue

Posted by Jijie Shao 2 years, 3 months ago

Currently the reset process in hns3 and firmware watchdog init process is
asynchronous. We think firmware watchdog initialization is completed
before VF clear the interrupt source. However, firmware initialization
may not complete early. So VF will receive multiple reset interrupts
and fail to reset.

So we add delay before VF interrupt source and 5 ms delay
is enough to avoid second reset interrupt.

Fixes: 427900d27d86 ("net: hns3: fix the timing issue of VF clearing interrupt sources")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
---
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c   | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index 1c62e58ff6d8..7b87da031be6 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -1924,8 +1924,14 @@ static void hclgevf_service_task(struct work_struct *work)
 	hclgevf_mailbox_service_task(hdev);
 }
 
-static void hclgevf_clear_event_cause(struct hclgevf_dev *hdev, u32 regclr)
+static void hclgevf_clear_event_cause(struct hclgevf_dev *hdev, u32 regclr,
+				      bool need_dalay)
 {
+#define HCLGEVF_RESET_DELAY		5
+
+	if (need_dalay)
+		mdelay(HCLGEVF_RESET_DELAY);
+
 	hclgevf_write_dev(&hdev->hw, HCLGE_COMM_VECTOR0_CMDQ_SRC_REG, regclr);
 }
 
@@ -1990,7 +1996,8 @@ static irqreturn_t hclgevf_misc_irq_handle(int irq, void *data)
 	hclgevf_enable_vector(&hdev->misc_vector, false);
 	event_cause = hclgevf_check_evt_cause(hdev, &clearval);
 	if (event_cause != HCLGEVF_VECTOR0_EVENT_OTHER)
-		hclgevf_clear_event_cause(hdev, clearval);
+		hclgevf_clear_event_cause(hdev, clearval,
+					  event_cause == HCLGEVF_VECTOR0_EVENT_RST);
 
 	switch (event_cause) {
 	case HCLGEVF_VECTOR0_EVENT_RST:
@@ -2340,7 +2347,7 @@ static int hclgevf_misc_irq_init(struct hclgevf_dev *hdev)
 		return ret;
 	}
 
-	hclgevf_clear_event_cause(hdev, 0);
+	hclgevf_clear_event_cause(hdev, 0, false);
 
 	/* enable misc. vector(vector 0) */
 	hclgevf_enable_vector(&hdev->misc_vector, true);
-- 
2.30.0

Re: [PATCH net 6/7] net: hns3: fix VF reset fail issue

Posted by Paolo Abeni 2 years, 3 months ago

On Sat, 2023-10-28 at 10:59 +0800, Jijie Shao wrote:
> Currently the reset process in hns3 and firmware watchdog init process is
> asynchronous. We think firmware watchdog initialization is completed
> before VF clear the interrupt source. However, firmware initialization
> may not complete early. So VF will receive multiple reset interrupts
> and fail to reset.
> 
> So we add delay before VF interrupt source and 5 ms delay
> is enough to avoid second reset interrupt.
> 
> Fixes: 427900d27d86 ("net: hns3: fix the timing issue of VF clearing interrupt sources")
> Signed-off-by: Jijie Shao <shaojijie@huawei.com>
> ---
>  .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c   | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
> index 1c62e58ff6d8..7b87da031be6 100644
> --- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
> @@ -1924,8 +1924,14 @@ static void hclgevf_service_task(struct work_struct *work)
>  	hclgevf_mailbox_service_task(hdev);
>  }
>  
> -static void hclgevf_clear_event_cause(struct hclgevf_dev *hdev, u32 regclr)
> +static void hclgevf_clear_event_cause(struct hclgevf_dev *hdev, u32 regclr,
> +				      bool need_dalay)
>  {
> +#define HCLGEVF_RESET_DELAY		5
> +
> +	if (need_dalay)
> +		mdelay(HCLGEVF_RESET_DELAY);

5ms delay in an interrupt handler is quite a lot. What about scheduling
a timer from the IH to clear the register when such delay is needed?

Thanks!

Paolo

Re: [PATCH net 6/7] net: hns3: fix VF reset fail issue

Posted by Jijie Shao 2 years, 3 months ago

on 2023/11/2 18:45, Paolo Abeni wrote:
> On Sat, 2023-10-28 at 10:59 +0800, Jijie Shao wrote:
>>   
>> -static void hclgevf_clear_event_cause(struct hclgevf_dev *hdev, u32 regclr)
>> +static void hclgevf_clear_event_cause(struct hclgevf_dev *hdev, u32 regclr,
>> +				      bool need_dalay)
>>   {
>> +#define HCLGEVF_RESET_DELAY		5
>> +
>> +	if (need_dalay)
>> +		mdelay(HCLGEVF_RESET_DELAY);
> 5ms delay in an interrupt handler is quite a lot. What about scheduling
> a timer from the IH to clear the register when such delay is needed?
>
> Thanks!
>
> Paolo

Using timer in this case will complicate the code and make maintenance difficult.

We consider reducing the delay time by polling. For example,
the code cycles every 50 us to check whether the write register takes effect.
If yes, the function returns immediately. or the code cycles until 5 ms.

Is this method appropriate?

Thanks!
Jijie

Re: [PATCH net 6/7] net: hns3: fix VF reset fail issue

Posted by Paolo Abeni 2 years, 3 months ago

On Thu, 2023-11-02 at 20:16 +0800, Jijie Shao wrote:
> on 2023/11/2 18:45, Paolo Abeni wrote:
> > On Sat, 2023-10-28 at 10:59 +0800, Jijie Shao wrote:
> > >   
> > > -static void hclgevf_clear_event_cause(struct hclgevf_dev *hdev, u32 regclr)
> > > +static void hclgevf_clear_event_cause(struct hclgevf_dev *hdev, u32 regclr,
> > > +				      bool need_dalay)
> > >   {
> > > +#define HCLGEVF_RESET_DELAY		5
> > > +
> > > +	if (need_dalay)
> > > +		mdelay(HCLGEVF_RESET_DELAY);
> > 5ms delay in an interrupt handler is quite a lot. What about scheduling
> > a timer from the IH to clear the register when such delay is needed?
> > 
> > Thanks!
> > 
> > Paolo
> 
> Using timer in this case will complicate the code and make maintenance difficult.

Why? 

Would something alike the following be ok? (plus reset_timer
initialization at vf creation and cleanup at vf removal time):

---
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index a4d68fb216fb..626bc67065fc 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -1974,6 +1974,14 @@ static enum hclgevf_evt_cause hclgevf_check_evt_cause(struct hclgevf_dev *hdev,
 	return HCLGEVF_VECTOR0_EVENT_OTHER;
 }
 
+static void hclgevf_reset_timer(struct timer_list *t)
+{
+	struct hclgevf_dev *hdev = from_timer(hclgevf_dev, t, reset_timer);
+
+	hclgevf_clear_event_cause(hdev, HCLGEVF_VECTOR0_EVENT_RST);
+	hclgevf_reset_task_schedule(hdev);
+}
+
 static irqreturn_t hclgevf_misc_irq_handle(int irq, void *data)
 {
 	enum hclgevf_evt_cause event_cause;
@@ -1982,13 +1990,13 @@ static irqreturn_t hclgevf_misc_irq_handle(int irq, void *data)
 
 	hclgevf_enable_vector(&hdev->misc_vector, false);
 	event_cause = hclgevf_check_evt_cause(hdev, &clearval);
+	if (event_cause == HCLGEVF_VECTOR0_EVENT_RST)
+		mod_timer(hdev->reset_timer, jiffies + msecs_to_jiffies(5));
+
 	if (event_cause != HCLGEVF_VECTOR0_EVENT_OTHER)
 		hclgevf_clear_event_cause(hdev, clearval);
 
 	switch (event_cause) {
-	case HCLGEVF_VECTOR0_EVENT_RST:
-		hclgevf_reset_task_schedule(hdev);
-		break;
 	case HCLGEVF_VECTOR0_EVENT_MBX:
 		hclgevf_mbx_handler(hdev);
 		break;
---

> We consider reducing the delay time by polling. For example,
> the code cycles every 50 us to check whether the write register takes effect.
> If yes, the function returns immediately. or the code cycles until 5 ms.
> 
> Is this method appropriate?

IMHO such solution will not remove the problem. How frequent is
expected to be the irq generating such delay?

Thanks

Paolo