[PATCH 1/2] firmware: smccc: add timeout, touch wdt

Vedashree Vidwans posted 2 patches 1 month, 2 weeks ago
[PATCH 1/2] firmware: smccc: add timeout, touch wdt
Posted by Vedashree Vidwans 1 month, 2 weeks ago
Enhance PRIME/ACTIVATION functions to touch watchdog and implement
timeout mechanism. This update ensures that any potential hangs are
detected promptly and that the LFA process is allocated sufficient
execution time before the watchdog timer expires. These changes improve
overall system reliability by reducing the risk of undetected process
stalls and unexpected watchdog resets.

Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
---
 drivers/firmware/smccc/lfa_fw.c | 40 +++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/drivers/firmware/smccc/lfa_fw.c b/drivers/firmware/smccc/lfa_fw.c
index da6b54fe1685..b0ace6fc8dac 100644
--- a/drivers/firmware/smccc/lfa_fw.c
+++ b/drivers/firmware/smccc/lfa_fw.c
@@ -17,6 +17,9 @@
 #include <linux/array_size.h>
 #include <linux/list.h>
 #include <linux/mutex.h>
+#include <linux/nmi.h>
+#include <linux/ktime.h>
+#include <linux/delay.h>
 
 #undef pr_fmt
 #define pr_fmt(fmt) "Arm LFA: " fmt
@@ -37,6 +40,14 @@
 #define LFA_PRIME_CALL_AGAIN		BIT(0)
 #define LFA_ACTIVATE_CALL_AGAIN		BIT(0)
 
+/* Prime loop limits, TODO: tune after testing */
+#define LFA_PRIME_BUDGET_US		30000000	/* 30s cap */
+#define LFA_PRIME_POLL_DELAY_US		10		/* 10us between polls */
+
+/* Activation loop limits, TODO: tune after testing */
+#define LFA_ACTIVATE_BUDGET_US		20000000	/* 20s cap */
+#define LFA_ACTIVATE_POLL_DELAY_US	10		/* 10us between polls */
+
 /* LFA return values */
 #define LFA_SUCCESS			0
 #define LFA_NOT_SUPPORTED		1
@@ -219,6 +230,7 @@ static int call_lfa_activate(void *data)
 	struct image_props *attrs = data;
 	struct arm_smccc_1_2_regs args = { 0 };
 	struct arm_smccc_1_2_regs res = { 0 };
+	ktime_t end = ktime_add_us(ktime_get(), LFA_ACTIVATE_BUDGET_US);
 
 	args.a0 = LFA_1_0_FN_ACTIVATE;
 	args.a1 = attrs->fw_seq_id; /* fw_seq_id under consideration */
@@ -232,6 +244,8 @@ static int call_lfa_activate(void *data)
 	args.a2 = !(attrs->cpu_rendezvous_forced || attrs->cpu_rendezvous);
 
 	for (;;) {
+		/* Touch watchdog, ACTIVATE shouldn't take longer than watchdog_thresh */
+		touch_nmi_watchdog();
 		arm_smccc_1_2_invoke(&args, &res);
 
 		if ((long)res.a0 < 0) {
@@ -241,6 +255,15 @@ static int call_lfa_activate(void *data)
 		}
 		if (!(res.a1 & LFA_ACTIVATE_CALL_AGAIN))
 			break; /* ACTIVATE successful */
+
+		/* SMC returned with call_again flag set */
+		if (ktime_before(ktime_get(), end)) {
+			udelay(LFA_ACTIVATE_POLL_DELAY_US);
+			continue;
+		}
+
+		pr_err("ACTIVATE for image %s timed out", attrs->image_name);
+		return -ETIMEDOUT;
 	}
 
 	return res.a0;
@@ -290,6 +313,7 @@ static int prime_fw_image(struct image_props *attrs)
 {
 	struct arm_smccc_1_2_regs args = { 0 };
 	struct arm_smccc_1_2_regs res = { 0 };
+	ktime_t end = ktime_add_us(ktime_get(), LFA_PRIME_BUDGET_US);
 	int ret;
 
 	mutex_lock(&lfa_lock);
@@ -317,6 +341,8 @@ static int prime_fw_image(struct image_props *attrs)
 	args.a0 = LFA_1_0_FN_PRIME;
 	args.a1 = attrs->fw_seq_id; /* fw_seq_id under consideration */
 	for (;;) {
+		/* Touch watchdog, PRIME shouldn't take longer than watchdog_thresh */
+		touch_nmi_watchdog();
 		arm_smccc_1_2_invoke(&args, &res);
 
 		if ((long)res.a0 < 0) {
@@ -328,6 +354,20 @@ static int prime_fw_image(struct image_props *attrs)
 		}
 		if (!(res.a1 & LFA_PRIME_CALL_AGAIN))
 			break; /* PRIME successful */
+
+		/* SMC returned with call_again flag set */
+		if (ktime_before(ktime_get(), end)) {
+			udelay(LFA_PRIME_POLL_DELAY_US);
+			continue;
+		}
+
+		pr_err("LFA_PRIME for image %s timed out", attrs->image_name);
+		mutex_unlock(&lfa_lock);
+
+		ret = lfa_cancel(attrs);
+		if (ret != 0)
+			return ret;
+		return -ETIMEDOUT;
 	}
 
 	mutex_unlock(&lfa_lock);
-- 
2.43.0
Re: [PATCH 1/2] firmware: smccc: add timeout, touch wdt
Posted by Andre Przywara 1 month, 1 week ago
Hi Veda,

On 2/10/26 23:40, Vedashree Vidwans wrote:
> Enhance PRIME/ACTIVATION functions to touch watchdog and implement
> timeout mechanism. This update ensures that any potential hangs are
> detected promptly and that the LFA process is allocated sufficient
> execution time before the watchdog timer expires. These changes improve
> overall system reliability by reducing the risk of undetected process
> stalls and unexpected watchdog resets.

Many thanks for that, I think it's a very good idea to take care of the 
watchdog and to avoid an infinite loop in the AGAIN case.
I have some comments about some details below ....

> Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
> ---
>   drivers/firmware/smccc/lfa_fw.c | 40 +++++++++++++++++++++++++++++++++
>   1 file changed, 40 insertions(+)
> 
> diff --git a/drivers/firmware/smccc/lfa_fw.c b/drivers/firmware/smccc/lfa_fw.c
> index da6b54fe1685..b0ace6fc8dac 100644
> --- a/drivers/firmware/smccc/lfa_fw.c
> +++ b/drivers/firmware/smccc/lfa_fw.c
> @@ -17,6 +17,9 @@
>   #include <linux/array_size.h>
>   #include <linux/list.h>
>   #include <linux/mutex.h>
> +#include <linux/nmi.h>
> +#include <linux/ktime.h>
> +#include <linux/delay.h>
>   
>   #undef pr_fmt
>   #define pr_fmt(fmt) "Arm LFA: " fmt
> @@ -37,6 +40,14 @@
>   #define LFA_PRIME_CALL_AGAIN		BIT(0)
>   #define LFA_ACTIVATE_CALL_AGAIN		BIT(0)
>   
> +/* Prime loop limits, TODO: tune after testing */
> +#define LFA_PRIME_BUDGET_US		30000000	/* 30s cap */
> +#define LFA_PRIME_POLL_DELAY_US		10		/* 10us between polls */
> +
> +/* Activation loop limits, TODO: tune after testing */
> +#define LFA_ACTIVATE_BUDGET_US		20000000	/* 20s cap */
> +#define LFA_ACTIVATE_POLL_DELAY_US	10		/* 10us between polls */
> +
>   /* LFA return values */
>   #define LFA_SUCCESS			0
>   #define LFA_NOT_SUPPORTED		1
> @@ -219,6 +230,7 @@ static int call_lfa_activate(void *data)
>   	struct image_props *attrs = data;
>   	struct arm_smccc_1_2_regs args = { 0 };
>   	struct arm_smccc_1_2_regs res = { 0 };
> +	ktime_t end = ktime_add_us(ktime_get(), LFA_ACTIVATE_BUDGET_US);
>   
>   	args.a0 = LFA_1_0_FN_ACTIVATE;
>   	args.a1 = attrs->fw_seq_id; /* fw_seq_id under consideration */
> @@ -232,6 +244,8 @@ static int call_lfa_activate(void *data)
>   	args.a2 = !(attrs->cpu_rendezvous_forced || attrs->cpu_rendezvous);
>   
>   	for (;;) {
> +		/* Touch watchdog, ACTIVATE shouldn't take longer than watchdog_thresh */
> +		touch_nmi_watchdog();
>   		arm_smccc_1_2_invoke(&args, &res);
>   
>   		if ((long)res.a0 < 0) {
> @@ -241,6 +255,15 @@ static int call_lfa_activate(void *data)
>   		}
>   		if (!(res.a1 & LFA_ACTIVATE_CALL_AGAIN))
>   			break; /* ACTIVATE successful */
> +
> +		/* SMC returned with call_again flag set */
> +		if (ktime_before(ktime_get(), end)) {
> +			udelay(LFA_ACTIVATE_POLL_DELAY_US);

I don't think we should wait here at all, and definitely not with 
udelay: https://docs.kernel.org/timers/delay_sleep_functions.html

Instead we should move the "call again" (and timeout) mechanism out of 
this function, into activate_fw_image(), so that we exit the 
stop_machine(). Otherwise we would still block everything. Doing it 
there, where we should be preemptible, would give the kernel a chance to 
do some housekeeping. If there is nothing for the kernel to do, then I 
think it's fine to immediately call lfa_activate() again, after a 
cond_resched(), for instance.

> +			continue;
> +		}
> +
> +		pr_err("ACTIVATE for image %s timed out", attrs->image_name);
> +		return -ETIMEDOUT;
>   	}
>   
>   	return res.a0;
> @@ -290,6 +313,7 @@ static int prime_fw_image(struct image_props *attrs)
>   {
>   	struct arm_smccc_1_2_regs args = { 0 };
>   	struct arm_smccc_1_2_regs res = { 0 };
> +	ktime_t end = ktime_add_us(ktime_get(), LFA_PRIME_BUDGET_US);
>   	int ret;
>   
>   	mutex_lock(&lfa_lock);
> @@ -317,6 +341,8 @@ static int prime_fw_image(struct image_props *attrs)
>   	args.a0 = LFA_1_0_FN_PRIME;
>   	args.a1 = attrs->fw_seq_id; /* fw_seq_id under consideration */
>   	for (;;) {
> +		/* Touch watchdog, PRIME shouldn't take longer than watchdog_thresh */
> +		touch_nmi_watchdog();
>   		arm_smccc_1_2_invoke(&args, &res);
>   
>   		if ((long)res.a0 < 0) {
> @@ -328,6 +354,20 @@ static int prime_fw_image(struct image_props *attrs)
>   		}
>   		if (!(res.a1 & LFA_PRIME_CALL_AGAIN))
>   			break; /* PRIME successful */
> +
> +		/* SMC returned with call_again flag set */
> +		if (ktime_before(ktime_get(), end)) {
> +			udelay(LFA_PRIME_POLL_DELAY_US);

same comment here, please no udelay().
This should also avoid the discussion about the exact values of the 
sleep periods.
I'd just have one generous timeout (a few seconds, basically what your 
BUDGET values do above), to avoid looping forever in case of a firmware 
bug, for instance.

Cheers,
Andre

> +			continue;
> +		}
> +
> +		pr_err("LFA_PRIME for image %s timed out", attrs->image_name);
> +		mutex_unlock(&lfa_lock);
> +
> +		ret = lfa_cancel(attrs);
> +		if (ret != 0)
> +			return ret;
> +		return -ETIMEDOUT;
>   	}
>   
>   	mutex_unlock(&lfa_lock);
Re: [PATCH 1/2] firmware: smccc: add timeout, touch wdt
Posted by Trilok Soni 1 month, 2 weeks ago
On 2/10/2026 2:40 PM, Vedashree Vidwans wrote:
> Enhance PRIME/ACTIVATION functions to touch watchdog and implement
> timeout mechanism. This update ensures that any potential hangs are
> detected promptly and that the LFA process is allocated sufficient
> execution time before the watchdog timer expires. These changes improve
> overall system reliability by reducing the risk of undetected process
> stalls and unexpected watchdog resets.
> 
> Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
> ---
>  drivers/firmware/smccc/lfa_fw.c | 40 +++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
> 
> diff --git a/drivers/firmware/smccc/lfa_fw.c b/drivers/firmware/smccc/lfa_fw.c
> index da6b54fe1685..b0ace6fc8dac 100644
> --- a/drivers/firmware/smccc/lfa_fw.c
> +++ b/drivers/firmware/smccc/lfa_fw.c
> @@ -17,6 +17,9 @@
>  #include <linux/array_size.h>
>  #include <linux/list.h>
>  #include <linux/mutex.h>
> +#include <linux/nmi.h>
> +#include <linux/ktime.h>
> +#include <linux/delay.h>
>  
>  #undef pr_fmt
>  #define pr_fmt(fmt) "Arm LFA: " fmt
> @@ -37,6 +40,14 @@
>  #define LFA_PRIME_CALL_AGAIN		BIT(0)
>  #define LFA_ACTIVATE_CALL_AGAIN		BIT(0)
>  
> +/* Prime loop limits, TODO: tune after testing */

Do you want to keep this TODO? Your patches are not marked as RFC. 

> +#define LFA_PRIME_BUDGET_US		30000000	/* 30s cap */
> +#define LFA_PRIME_POLL_DELAY_US		10		/* 10us between polls */

Are these values going to be tunable from the userspace or kernel module parameters? 

> +
> +/* Activation loop limits, TODO: tune after testing */

Ditto.

> +#define LFA_ACTIVATE_BUDGET_US		20000000	/* 20s cap */
> +#define LFA_ACTIVATE_POLL_DELAY_US	10		/* 10us between polls */
...

---Trilok Soni
Re: [PATCH 1/2] firmware: smccc: add timeout, touch wdt
Posted by Vedashree Vidwans 1 month, 2 weeks ago

On 2/10/26 15:10, Trilok Soni wrote:
> On 2/10/2026 2:40 PM, Vedashree Vidwans wrote:
>> Enhance PRIME/ACTIVATION functions to touch watchdog and implement
>> timeout mechanism. This update ensures that any potential hangs are
>> detected promptly and that the LFA process is allocated sufficient
>> execution time before the watchdog timer expires. These changes improve
>> overall system reliability by reducing the risk of undetected process
>> stalls and unexpected watchdog resets.
>>
>> Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
>> ---
>>   drivers/firmware/smccc/lfa_fw.c | 40 +++++++++++++++++++++++++++++++++
>>   1 file changed, 40 insertions(+)
>>
>> diff --git a/drivers/firmware/smccc/lfa_fw.c b/drivers/firmware/smccc/lfa_fw.c
>> index da6b54fe1685..b0ace6fc8dac 100644
>> --- a/drivers/firmware/smccc/lfa_fw.c
>> +++ b/drivers/firmware/smccc/lfa_fw.c
>> @@ -17,6 +17,9 @@
>>   #include <linux/array_size.h>
>>   #include <linux/list.h>
>>   #include <linux/mutex.h>
>> +#include <linux/nmi.h>
>> +#include <linux/ktime.h>
>> +#include <linux/delay.h>
>>   
>>   #undef pr_fmt
>>   #define pr_fmt(fmt) "Arm LFA: " fmt
>> @@ -37,6 +40,14 @@
>>   #define LFA_PRIME_CALL_AGAIN		BIT(0)
>>   #define LFA_ACTIVATE_CALL_AGAIN		BIT(0)
>>   
>> +/* Prime loop limits, TODO: tune after testing */
> 
> Do you want to keep this TODO? Your patches are not marked as RFC.
> 
>> +#define LFA_PRIME_BUDGET_US		30000000	/* 30s cap */
>> +#define LFA_PRIME_POLL_DELAY_US		10		/* 10us between polls */
> 
> Are these values going to be tunable from the userspace or kernel module parameters?
> 
>> +
>> +/* Activation loop limits, TODO: tune after testing */
> 
> Ditto.
> 
>> +#define LFA_ACTIVATE_BUDGET_US		20000000	/* 20s cap */
>> +#define LFA_ACTIVATE_POLL_DELAY_US	10		/* 10us between polls */
> ...
> 
> ---Trilok Soni

Thanks for pointing this out.

The "TODO: tune after testing" comment was left in by mistake; it should 
not have been included in a non‑RFC posting.

Regarding tunability: the current series uses fixed values, but I agree 
it would be useful to make these configurable. Adding module parameter 
to adjust the timeout values would make it easier to tune them for 
different platforms and workloads.

I’ll address both of these points in the next revision of the series.

Veda