[PATCH] accel/amdxdna: Treat power-off failure as unrecoverable error

Lizhi Hou posted 1 patch 1 month, 1 week ago
drivers/accel/amdxdna/aie2_smu.c | 10 ++++++++++
1 file changed, 10 insertions(+)
[PATCH] accel/amdxdna: Treat power-off failure as unrecoverable error
Posted by Lizhi Hou 1 month, 1 week ago
Failing to set power off indicates an unrecoverable hardware or firmware
error. Update the driver to treat such a failure as a fatal condition
and stop further operations that depend on successful power state
transition.

This prevents undefined behavior when the hardware remains in an
unexpected state after a failed power-off attempt.

Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
---
 drivers/accel/amdxdna/aie2_smu.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/accel/amdxdna/aie2_smu.c b/drivers/accel/amdxdna/aie2_smu.c
index 11c0e9e7b03a..bd94ee96c2bc 100644
--- a/drivers/accel/amdxdna/aie2_smu.c
+++ b/drivers/accel/amdxdna/aie2_smu.c
@@ -147,6 +147,16 @@ int aie2_smu_init(struct amdxdna_dev_hdl *ndev)
 {
 	int ret;
 
+	/*
+	 * Failing to set power off indicates an unrecoverable hardware or
+	 * firmware error.
+	 */
+	ret = aie2_smu_exec(ndev, AIE2_SMU_POWER_OFF, 0, NULL);
+	if (ret) {
+		XDNA_ERR(ndev->xdna, "Access power failed, ret %d", ret);
+		return ret;
+	}
+
 	ret = aie2_smu_exec(ndev, AIE2_SMU_POWER_ON, 0, NULL);
 	if (ret) {
 		XDNA_ERR(ndev->xdna, "Power on failed, ret %d", ret);
-- 
2.34.1
Re: [PATCH] accel/amdxdna: Treat power-off failure as unrecoverable error
Posted by Falkowski, Maciej 1 month ago
Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>

On 11/6/2025 7:05 PM, Lizhi Hou wrote:
> Failing to set power off indicates an unrecoverable hardware or firmware
> error. Update the driver to treat such a failure as a fatal condition
> and stop further operations that depend on successful power state
> transition.
>
> This prevents undefined behavior when the hardware remains in an
> unexpected state after a failed power-off attempt.
>
> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
> ---
>   drivers/accel/amdxdna/aie2_smu.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)
>
> diff --git a/drivers/accel/amdxdna/aie2_smu.c b/drivers/accel/amdxdna/aie2_smu.c
> index 11c0e9e7b03a..bd94ee96c2bc 100644
> --- a/drivers/accel/amdxdna/aie2_smu.c
> +++ b/drivers/accel/amdxdna/aie2_smu.c
> @@ -147,6 +147,16 @@ int aie2_smu_init(struct amdxdna_dev_hdl *ndev)
>   {
>   	int ret;
>   
> +	/*
> +	 * Failing to set power off indicates an unrecoverable hardware or
> +	 * firmware error.
> +	 */
> +	ret = aie2_smu_exec(ndev, AIE2_SMU_POWER_OFF, 0, NULL);
> +	if (ret) {
> +		XDNA_ERR(ndev->xdna, "Access power failed, ret %d", ret);
> +		return ret;
> +	}
> +
>   	ret = aie2_smu_exec(ndev, AIE2_SMU_POWER_ON, 0, NULL);
>   	if (ret) {
>   		XDNA_ERR(ndev->xdna, "Power on failed, ret %d", ret);
Re: [PATCH] accel/amdxdna: Treat power-off failure as unrecoverable error
Posted by Mario Limonciello 1 month, 1 week ago
On 11/6/25 12:05 PM, Lizhi Hou wrote:
> Failing to set power off indicates an unrecoverable hardware or firmware
> error. Update the driver to treat such a failure as a fatal condition
> and stop further operations that depend on successful power state
> transition.
> 
> This prevents undefined behavior when the hardware remains in an
> unexpected state after a failed power-off attempt.
> 
> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>

Presumably all versions of hardware in the wild can handle receiving a 
power off command if they're already powered off?

> ---
>   drivers/accel/amdxdna/aie2_smu.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/accel/amdxdna/aie2_smu.c b/drivers/accel/amdxdna/aie2_smu.c
> index 11c0e9e7b03a..bd94ee96c2bc 100644
> --- a/drivers/accel/amdxdna/aie2_smu.c
> +++ b/drivers/accel/amdxdna/aie2_smu.c
> @@ -147,6 +147,16 @@ int aie2_smu_init(struct amdxdna_dev_hdl *ndev)
>   {
>   	int ret;
>   
> +	/*
> +	 * Failing to set power off indicates an unrecoverable hardware or
> +	 * firmware error.
> +	 */
> +	ret = aie2_smu_exec(ndev, AIE2_SMU_POWER_OFF, 0, NULL);
> +	if (ret) {
> +		XDNA_ERR(ndev->xdna, "Access power failed, ret %d", ret);
> +		return ret;
> +	}
> +
>   	ret = aie2_smu_exec(ndev, AIE2_SMU_POWER_ON, 0, NULL);
>   	if (ret) {
>   		XDNA_ERR(ndev->xdna, "Power on failed, ret %d", ret);
Re: [PATCH] accel/amdxdna: Treat power-off failure as unrecoverable error
Posted by Lizhi Hou 1 month, 1 week ago
On 11/6/25 10:12, Mario Limonciello wrote:
> On 11/6/25 12:05 PM, Lizhi Hou wrote:
>> Failing to set power off indicates an unrecoverable hardware or firmware
>> error. Update the driver to treat such a failure as a fatal condition
>> and stop further operations that depend on successful power state
>> transition.
>>
>> This prevents undefined behavior when the hardware remains in an
>> unexpected state after a failed power-off attempt.
>>
>> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
>
> Presumably all versions of hardware in the wild can handle receiving a 
> power off command if they're already powered off?

Yes for the aie2 platforms. This was verified by xdna-driver pipeline tests.


Lizhi

>
>> ---
>>   drivers/accel/amdxdna/aie2_smu.c | 10 ++++++++++
>>   1 file changed, 10 insertions(+)
>>
>> diff --git a/drivers/accel/amdxdna/aie2_smu.c 
>> b/drivers/accel/amdxdna/aie2_smu.c
>> index 11c0e9e7b03a..bd94ee96c2bc 100644
>> --- a/drivers/accel/amdxdna/aie2_smu.c
>> +++ b/drivers/accel/amdxdna/aie2_smu.c
>> @@ -147,6 +147,16 @@ int aie2_smu_init(struct amdxdna_dev_hdl *ndev)
>>   {
>>       int ret;
>>   +    /*
>> +     * Failing to set power off indicates an unrecoverable hardware or
>> +     * firmware error.
>> +     */
>> +    ret = aie2_smu_exec(ndev, AIE2_SMU_POWER_OFF, 0, NULL);
>> +    if (ret) {
>> +        XDNA_ERR(ndev->xdna, "Access power failed, ret %d", ret);
>> +        return ret;
>> +    }
>> +
>>       ret = aie2_smu_exec(ndev, AIE2_SMU_POWER_ON, 0, NULL);
>>       if (ret) {
>>           XDNA_ERR(ndev->xdna, "Power on failed, ret %d", ret);
>
Re: [PATCH] accel/amdxdna: Treat power-off failure as unrecoverable error
Posted by Mario Limonciello 1 month, 1 week ago
On 11/6/25 12:19 PM, Lizhi Hou wrote:
> 
> On 11/6/25 10:12, Mario Limonciello wrote:
>> On 11/6/25 12:05 PM, Lizhi Hou wrote:
>>> Failing to set power off indicates an unrecoverable hardware or firmware
>>> error. Update the driver to treat such a failure as a fatal condition
>>> and stop further operations that depend on successful power state
>>> transition.
>>>
>>> This prevents undefined behavior when the hardware remains in an
>>> unexpected state after a failed power-off attempt.
>>>
>>> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
>>
>> Presumably all versions of hardware in the wild can handle receiving a 
>> power off command if they're already powered off?
> 
> Yes for the aie2 platforms. This was verified by xdna-driver pipeline 
> tests.
> 
> 

OK LGTM then.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>

> Lizhi
> 
>>
>>> ---
>>>   drivers/accel/amdxdna/aie2_smu.c | 10 ++++++++++
>>>   1 file changed, 10 insertions(+)
>>>
>>> diff --git a/drivers/accel/amdxdna/aie2_smu.c b/drivers/accel/ 
>>> amdxdna/aie2_smu.c
>>> index 11c0e9e7b03a..bd94ee96c2bc 100644
>>> --- a/drivers/accel/amdxdna/aie2_smu.c
>>> +++ b/drivers/accel/amdxdna/aie2_smu.c
>>> @@ -147,6 +147,16 @@ int aie2_smu_init(struct amdxdna_dev_hdl *ndev)
>>>   {
>>>       int ret;
>>>   +    /*
>>> +     * Failing to set power off indicates an unrecoverable hardware or
>>> +     * firmware error.
>>> +     */
>>> +    ret = aie2_smu_exec(ndev, AIE2_SMU_POWER_OFF, 0, NULL);
>>> +    if (ret) {
>>> +        XDNA_ERR(ndev->xdna, "Access power failed, ret %d", ret);
>>> +        return ret;
>>> +    }
>>> +
>>>       ret = aie2_smu_exec(ndev, AIE2_SMU_POWER_ON, 0, NULL);
>>>       if (ret) {
>>>           XDNA_ERR(ndev->xdna, "Power on failed, ret %d", ret);
>>

Re: [PATCH] accel/amdxdna: Treat power-off failure as unrecoverable error
Posted by Lizhi Hou 1 month, 1 week ago
Applied to drm-misc-next.

On 11/6/25 10:31, Mario Limonciello wrote:
> On 11/6/25 12:19 PM, Lizhi Hou wrote:
>>
>> On 11/6/25 10:12, Mario Limonciello wrote:
>>> On 11/6/25 12:05 PM, Lizhi Hou wrote:
>>>> Failing to set power off indicates an unrecoverable hardware or 
>>>> firmware
>>>> error. Update the driver to treat such a failure as a fatal condition
>>>> and stop further operations that depend on successful power state
>>>> transition.
>>>>
>>>> This prevents undefined behavior when the hardware remains in an
>>>> unexpected state after a failed power-off attempt.
>>>>
>>>> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
>>>
>>> Presumably all versions of hardware in the wild can handle receiving 
>>> a power off command if they're already powered off?
>>
>> Yes for the aie2 platforms. This was verified by xdna-driver pipeline 
>> tests.
>>
>>
>
> OK LGTM then.
>
> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
>
>> Lizhi
>>
>>>
>>>> ---
>>>>   drivers/accel/amdxdna/aie2_smu.c | 10 ++++++++++
>>>>   1 file changed, 10 insertions(+)
>>>>
>>>> diff --git a/drivers/accel/amdxdna/aie2_smu.c b/drivers/accel/ 
>>>> amdxdna/aie2_smu.c
>>>> index 11c0e9e7b03a..bd94ee96c2bc 100644
>>>> --- a/drivers/accel/amdxdna/aie2_smu.c
>>>> +++ b/drivers/accel/amdxdna/aie2_smu.c
>>>> @@ -147,6 +147,16 @@ int aie2_smu_init(struct amdxdna_dev_hdl *ndev)
>>>>   {
>>>>       int ret;
>>>>   +    /*
>>>> +     * Failing to set power off indicates an unrecoverable 
>>>> hardware or
>>>> +     * firmware error.
>>>> +     */
>>>> +    ret = aie2_smu_exec(ndev, AIE2_SMU_POWER_OFF, 0, NULL);
>>>> +    if (ret) {
>>>> +        XDNA_ERR(ndev->xdna, "Access power failed, ret %d", ret);
>>>> +        return ret;
>>>> +    }
>>>> +
>>>>       ret = aie2_smu_exec(ndev, AIE2_SMU_POWER_ON, 0, NULL);
>>>>       if (ret) {
>>>>           XDNA_ERR(ndev->xdna, "Power on failed, ret %d", ret);
>>>
>