[v1] fpga: altera-cvp: allow interrupt to continue next time

[PATCH] fpga: altera-cvp: allow interrupt to continue next time

Posted by tien.sung.ang@intel.com 3 years, 10 months ago

From: Dinh Nguyen <dinh.nguyen@intel.com>

CFG_READY signal/bit may time-out due to firmware not responding
within the given time-out. This time varies due to numerous
factors like size of bitstream and others.
This time-out error does not impact the result of the CvP
previous transactions. The CvP driver shall then, respond with
EAGAIN instead Time out error.

Signed-off-by: Dinh Nguyen <dinh.nguyen@intel.com>
Signed-off-by: Ang Tien Sung <tien.sung.ang@intel.com>
---
 drivers/fpga/altera-cvp.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/fpga/altera-cvp.c b/drivers/fpga/altera-cvp.c
index 4ffb9da537d8..d74ff63c61e8 100644
--- a/drivers/fpga/altera-cvp.c
+++ b/drivers/fpga/altera-cvp.c
@@ -309,10 +309,22 @@ static int altera_cvp_teardown(struct fpga_manager *mgr,
 	/* STEP 15 - poll CVP_CONFIG_READY bit for 0 with 10us timeout */
 	ret = altera_cvp_wait_status(conf, VSE_CVP_STATUS_CFG_RDY, 0,
 				     conf->priv->poll_time_us);
-	if (ret)
+	if (ret) {
 		dev_err(&mgr->dev, "CFG_RDY == 0 timeout\n");
+		goto error_path;
+	}
 
 	return ret;
+
+error_path:
+	/* reset CVP_MODE and HIP_CLK_SEL bit */
+	altera_read_config_dword(conf, VSE_CVP_MODE_CTRL, &val);
+	val &= ~VSE_CVP_MODE_CTRL_HIP_CLK_SEL;
+	val &= ~VSE_CVP_MODE_CTRL_CVP_MODE;
+	altera_write_config_dword(conf, VSE_CVP_MODE_CTRL, val);
+
+	return -EAGAIN;
+
 }
 
 static int altera_cvp_write_init(struct fpga_manager *mgr,
-- 
2.25.1

Re: [PATCH] fpga: altera-cvp: allow interrupt to continue next time

Posted by Xu Yilun 3 years, 10 months ago

On Wed, May 18, 2022 at 03:38:44PM +0800, tien.sung.ang@intel.com wrote:
> From: Dinh Nguyen <dinh.nguyen@intel.com>
> 
> CFG_READY signal/bit may time-out due to firmware not responding
> within the given time-out. This time varies due to numerous
> factors like size of bitstream and others.
> This time-out error does not impact the result of the CvP
> previous transactions. The CvP driver shall then, respond with

Do you mean the reprogramming is successful even if you find the time
out in write_complete()? Then return 0 is better?

And could you specify what the time-out mean on write_init() phase?

Thanks,
Yilun

> EAGAIN instead Time out error.
> 
> Signed-off-by: Dinh Nguyen <dinh.nguyen@intel.com>
> Signed-off-by: Ang Tien Sung <tien.sung.ang@intel.com>
> ---
>  drivers/fpga/altera-cvp.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/fpga/altera-cvp.c b/drivers/fpga/altera-cvp.c
> index 4ffb9da537d8..d74ff63c61e8 100644
> --- a/drivers/fpga/altera-cvp.c
> +++ b/drivers/fpga/altera-cvp.c
> @@ -309,10 +309,22 @@ static int altera_cvp_teardown(struct fpga_manager *mgr,
>  	/* STEP 15 - poll CVP_CONFIG_READY bit for 0 with 10us timeout */
>  	ret = altera_cvp_wait_status(conf, VSE_CVP_STATUS_CFG_RDY, 0,
>  				     conf->priv->poll_time_us);
> -	if (ret)
> +	if (ret) {
>  		dev_err(&mgr->dev, "CFG_RDY == 0 timeout\n");
> +		goto error_path;
> +	}
>  
>  	return ret;
> +
> +error_path:
> +	/* reset CVP_MODE and HIP_CLK_SEL bit */
> +	altera_read_config_dword(conf, VSE_CVP_MODE_CTRL, &val);
> +	val &= ~VSE_CVP_MODE_CTRL_HIP_CLK_SEL;
> +	val &= ~VSE_CVP_MODE_CTRL_CVP_MODE;
> +	altera_write_config_dword(conf, VSE_CVP_MODE_CTRL, val);
> +
> +	return -EAGAIN;
> +
>  }
>  
>  static int altera_cvp_write_init(struct fpga_manager *mgr,
> -- 
> 2.25.1

Re: [PATCH] fpga: altera-cvp: allow interrupt to continue next time

Posted by tien.sung.ang@intel.com 3 years, 10 months ago

>> CFG_READY signal/bit may time-out due to firmware not responding
>> within the given time-out. This time varies due to numerous
>> factors like size of bitstream and others.
>> This time-out error does not impact the result of the CvP
>> previous transactions. The CvP driver shall then, respond with

>Do you mean the reprogramming is successful even if you find the time
>out in write_complete()? Then return 0 is better?
Based on the information given by the Intel FPGA firmware team,
CFG_READY is essential to indicate if the current FPGA 
configuration session is indeed a success. There are 
cases we test in the lab whereby, CFG_READY stays invalid and
the tests performed subsequently to verify the FPGA functionality
could not detect the failed session. A failed FPGA 
configuration session means, the new bitstream wasn't 
successfully configured and tests ran later will just be passing
on the previous working bitstream version. In short, CFG_READY
is esential, and an error indicating the time-out is a must.
Another example, using an incorrect SOF/Design FPGA results
in CFG_READY being invalid. The user must be informed of a 
potential error. 
I will correct the wordings i used earlier that says that
the timoeut error does not impact the results of the CvP
previous transactions. It may so if the firmware has some sort
of error. 

>And could you specify what the time-out mean on write_init() phase?
I could not really understand your question. We set huge 
time-outs of ~10seconds. Every wait for the firmware to respond
is potentially a hazard. The firmware CvP is has it's limitation
unfortunately.

Re: [PATCH] fpga: altera-cvp: allow interrupt to continue next time

Posted by Xu Yilun 3 years, 10 months ago

On Tue, May 31, 2022 at 10:20:04AM +0800, tien.sung.ang@intel.com wrote:
> >> CFG_READY signal/bit may time-out due to firmware not responding
> >> within the given time-out. This time varies due to numerous
> >> factors like size of bitstream and others.
> >> This time-out error does not impact the result of the CvP
> >> previous transactions. The CvP driver shall then, respond with
> 
> >Do you mean the reprogramming is successful even if you find the time
> >out in write_complete()? Then return 0 is better?
> Based on the information given by the Intel FPGA firmware team,
> CFG_READY is essential to indicate if the current FPGA 
> configuration session is indeed a success. There are 
> cases we test in the lab whereby, CFG_READY stays invalid and
> the tests performed subsequently to verify the FPGA functionality
> could not detect the failed session. A failed FPGA 
> configuration session means, the new bitstream wasn't 
> successfully configured and tests ran later will just be passing
> on the previous working bitstream version. In short, CFG_READY
> is esential, and an error indicating the time-out is a must.
> Another example, using an incorrect SOF/Design FPGA results
> in CFG_READY being invalid. The user must be informed of a 
> potential error. 
> I will correct the wordings i used earlier that says that
> the timoeut error does not impact the results of the CvP
> previous transactions. It may so if the firmware has some sort
> of error. 

Understood. But with your new comment why you must change the error
code to -EAGAIN rather than timeout?

I think you may change your commit message. The main change is adding
the error handling. The error code change is minor, even not necessary
if you don't have a strong reason.

Thanks,
Yilun

> 
> >And could you specify what the time-out mean on write_init() phase?
> I could not really understand your question. We set huge 
> time-outs of ~10seconds. Every wait for the firmware to respond
> is potentially a hazard. The firmware CvP is has it's limitation
> unfortunately.

Re: [PATCH] fpga: altera-cvp: allow interrupt to continue next time

Posted by Tom Rix 3 years, 10 months ago

On 5/18/22 12:38 AM, tien.sung.ang@intel.com wrote:
> From: Dinh Nguyen <dinh.nguyen@intel.com>
>
> CFG_READY signal/bit may time-out due to firmware not responding
> within the given time-out. This time varies due to numerous
> factors like size of bitstream and others.
> This time-out error does not impact the result of the CvP
> previous transactions. The CvP driver shall then, respond with
> EAGAIN instead Time out error.
>
> Signed-off-by: Dinh Nguyen <dinh.nguyen@intel.com>
> Signed-off-by: Ang Tien Sung <tien.sung.ang@intel.com>
> ---
>   drivers/fpga/altera-cvp.c | 14 +++++++++++++-
>   1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/fpga/altera-cvp.c b/drivers/fpga/altera-cvp.c
> index 4ffb9da537d8..d74ff63c61e8 100644
> --- a/drivers/fpga/altera-cvp.c
> +++ b/drivers/fpga/altera-cvp.c
> @@ -309,10 +309,22 @@ static int altera_cvp_teardown(struct fpga_manager *mgr,
>   	/* STEP 15 - poll CVP_CONFIG_READY bit for 0 with 10us timeout */
>   	ret = altera_cvp_wait_status(conf, VSE_CVP_STATUS_CFG_RDY, 0,
>   				     conf->priv->poll_time_us);
> -	if (ret)
> +	if (ret) {
>   		dev_err(&mgr->dev, "CFG_RDY == 0 timeout\n");
> +		goto error_path;
> +	}
>   
>   	return ret;
> +
> +error_path:
> +	/* reset CVP_MODE and HIP_CLK_SEL bit */
> +	altera_read_config_dword(conf, VSE_CVP_MODE_CTRL, &val);
> +	val &= ~VSE_CVP_MODE_CTRL_HIP_CLK_SEL;
> +	val &= ~VSE_CVP_MODE_CTRL_CVP_MODE;
> +	altera_write_config_dword(conf, VSE_CVP_MODE_CTRL, val);
> +
> +	return -EAGAIN;

This will set fpga_mgr->state to *_ERR.

Is this ok or do you think we need a couple new of *_BUSY enums ?

Tom

> +
>   }
>   
>   static int altera_cvp_write_init(struct fpga_manager *mgr,

Re: [PATCH] fpga: altera-cvp: Truncated bitstream error support

Posted by tien.sung.ang@intel.com 3 years, 10 months ago

Thanks for bringing this up. Yes, you are right that the fpga_mgr sees this
as an error irrespective of the value. The CvP driver is changed now to just
indicate the correct error which recommends a retry. To me understanding,
EAGAIN was this. The fpga manager now looks like is going to return a CvP
failure in short. 
A BUSY state does not seem to be able to solve this issue. 
Even an extended time-out didn't resolve this error state. The current time-out
is set to 10seconds.   
However, the main objective is to also handle the error if the CvP firmware
is not responsive. The error_path flow is to reset the CVP mode and HIP_CLK_SEL bit
as recommended by the firmware engineers. 
The flow prescribed here is also an identical copy of working CvP driver 
which is also owned by Intel. This driver is a downstream driver which is 
not part of the Linux kernel. We are now porting this differences over to 
the current upstream CvP driver.

Re: [PATCH] fpga: altera-cvp: Truncated bitstream error support

Posted by Xu Yilun 3 years, 10 months ago

On Thu, May 19, 2022 at 05:39:07PM +0800, tien.sung.ang@intel.com wrote:
> Thanks for bringing this up. Yes, you are right that the fpga_mgr sees this
> as an error irrespective of the value. The CvP driver is changed now to just
> indicate the correct error which recommends a retry. To me understanding,
> EAGAIN was this. The fpga manager now looks like is going to return a CvP
> failure in short. 
> A BUSY state does not seem to be able to solve this issue. 
> Even an extended time-out didn't resolve this error state. The current time-out
> is set to 10seconds.   
> However, the main objective is to also handle the error if the CvP firmware
> is not responsive. The error_path flow is to reset the CVP mode and HIP_CLK_SEL bit

Please add your main objective to commit message.

Thanks,
Yilun

> as recommended by the firmware engineers. 
> The flow prescribed here is also an identical copy of working CvP driver 
> which is also owned by Intel. This driver is a downstream driver which is 
> not part of the Linux kernel. We are now porting this differences over to 
> the current upstream CvP driver.

[PATCH v2] fpga: altera-cvp: allow interrupt to continue next time

Posted by tien.sung.ang@intel.com 3 years, 10 months ago

From: Dinh Nguyen <dinh.nguyen@intel.com>

The main objective of this change is to perform error handling
if the CvP firmware becomes unresponsive. The error_path flow
resets the CvP mode and HIP_CLK_SEL bit.

CFG_READY signal/bit may time-out due to firmware not responding
within the given time-out. This time varies due to numerous
factors like size of bitstream and others.
This time-out error may or may not impact the result of the CvP
previous transactions. The CvP driver shall then, respond with
EAGAIN instead Time out error.

Signed-off-by: Dinh Nguyen <dinh.nguyen@intel.com>
Signed-off-by: Ang Tien Sung <tien.sung.ang@intel.com>
---

changelog v2:
* Amend the commit message

---
 drivers/fpga/altera-cvp.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/fpga/altera-cvp.c b/drivers/fpga/altera-cvp.c
index 4ffb9da537d8..d74ff63c61e8 100644
--- a/drivers/fpga/altera-cvp.c
+++ b/drivers/fpga/altera-cvp.c
@@ -309,10 +309,22 @@ static int altera_cvp_teardown(struct fpga_manager *mgr,
 	/* STEP 15 - poll CVP_CONFIG_READY bit for 0 with 10us timeout */
 	ret = altera_cvp_wait_status(conf, VSE_CVP_STATUS_CFG_RDY, 0,
 				     conf->priv->poll_time_us);
-	if (ret)
+	if (ret) {
 		dev_err(&mgr->dev, "CFG_RDY == 0 timeout\n");
+		goto error_path;
+	}
 
 	return ret;
+
+error_path:
+	/* reset CVP_MODE and HIP_CLK_SEL bit */
+	altera_read_config_dword(conf, VSE_CVP_MODE_CTRL, &val);
+	val &= ~VSE_CVP_MODE_CTRL_HIP_CLK_SEL;
+	val &= ~VSE_CVP_MODE_CTRL_CVP_MODE;
+	altera_write_config_dword(conf, VSE_CVP_MODE_CTRL, val);
+
+	return -EAGAIN;
+
 }
 
 static int altera_cvp_write_init(struct fpga_manager *mgr,
-- 
2.25.1

Re: [PATCH v2] fpga: altera-cvp: allow interrupt to continue next time

Posted by Xu Yilun 3 years, 10 months ago

On Wed, Jun 01, 2022 at 09:40:27AM +0800, tien.sung.ang@intel.com wrote:
> From: Dinh Nguyen <dinh.nguyen@intel.com>
> 
> The main objective of this change is to perform error handling
> if the CvP firmware becomes unresponsive. The error_path flow
> resets the CvP mode and HIP_CLK_SEL bit.
> 
> CFG_READY signal/bit may time-out due to firmware not responding
> within the given time-out. This time varies due to numerous
> factors like size of bitstream and others.
> This time-out error may or may not impact the result of the CvP
> previous transactions. The CvP driver shall then, respond with
> EAGAIN instead Time out error.
> 
> Signed-off-by: Dinh Nguyen <dinh.nguyen@intel.com>
> Signed-off-by: Ang Tien Sung <tien.sung.ang@intel.com>
> ---
> 
> changelog v2:
> * Amend the commit message
> 
> ---
>  drivers/fpga/altera-cvp.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/fpga/altera-cvp.c b/drivers/fpga/altera-cvp.c
> index 4ffb9da537d8..d74ff63c61e8 100644
> --- a/drivers/fpga/altera-cvp.c
> +++ b/drivers/fpga/altera-cvp.c
> @@ -309,10 +309,22 @@ static int altera_cvp_teardown(struct fpga_manager *mgr,
>  	/* STEP 15 - poll CVP_CONFIG_READY bit for 0 with 10us timeout */
>  	ret = altera_cvp_wait_status(conf, VSE_CVP_STATUS_CFG_RDY, 0,
>  				     conf->priv->poll_time_us);
> -	if (ret)
> +	if (ret) {
>  		dev_err(&mgr->dev, "CFG_RDY == 0 timeout\n");
> +		goto error_path;

I assume the error handling is specific to CFG_RDY timeout, is it? Then it
could be embedded in this code block.

And also the -EAGAIN ret, please only return it in this code block.

Usually the goto error path is for common fail out.

> +	}
>  
>  	return ret;
> +
> +error_path:
> +	/* reset CVP_MODE and HIP_CLK_SEL bit */
> +	altera_read_config_dword(conf, VSE_CVP_MODE_CTRL, &val);
> +	val &= ~VSE_CVP_MODE_CTRL_HIP_CLK_SEL;
> +	val &= ~VSE_CVP_MODE_CTRL_CVP_MODE;
> +	altera_write_config_dword(conf, VSE_CVP_MODE_CTRL, val);
> +
> +	return -EAGAIN;

Please still specify the reason for -EAGAIN rather than timeout.

Thanks,
Yilun

> +
>  }
>  
>  static int altera_cvp_write_init(struct fpga_manager *mgr,
> -- 
> 2.25.1