[RESEND][PATCH] drm/xe/hwmon: Return early on power limit read failure

zhaoguohan@kylinos.cn posted 1 patch 1 month, 3 weeks ago
There is a newer version of this series
drivers/gpu/drm/xe/xe_hwmon.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
[RESEND][PATCH] drm/xe/hwmon: Return early on power limit read failure
Posted by zhaoguohan@kylinos.cn 1 month, 3 weeks ago
From: GuoHan Zhao <zhaoguohan@kylinos.cn>

In xe_hwmon_pcode_rmw_power_limit(), when xe_pcode_read() fails,
the function logs the error but continues to execute the subsequent
logic. This can result in undefined behavior as the values val0 and
val1 may contain invalid data.

Fix this by adding an early return after logging the read failure,
ensuring that we don't proceed with potentially corrupted data.

Signed-off-by: GuoHan Zhao <zhaoguohan@kylinos.cn>
---
 drivers/gpu/drm/xe/xe_hwmon.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_hwmon.c b/drivers/gpu/drm/xe/xe_hwmon.c
index f08fc4377d25..eb410c5293e7 100644
--- a/drivers/gpu/drm/xe/xe_hwmon.c
+++ b/drivers/gpu/drm/xe/xe_hwmon.c
@@ -190,9 +190,11 @@ static int xe_hwmon_pcode_rmw_power_limit(const struct xe_hwmon *hwmon, u32 attr
 						  READ_PL_FROM_PCODE : READ_PL_FROM_FW),
 						  &val0, &val1);
 
-	if (ret)
+	if (ret) {
 		drm_dbg(&hwmon->xe->drm, "read failed ch %d val0 0x%08x, val1 0x%08x, ret %d\n",
 			channel, val0, val1, ret);
+			return ret;
+	}
 
 	if (attr == PL1_HWMON_ATTR)
 		val0 = (val0 & ~clr) | set;
-- 
2.43.0
Re: [RESEND][PATCH] drm/xe/hwmon: Return early on power limit read failure
Posted by Rodrigo Vivi 1 month, 3 weeks ago
On Tue, Aug 12, 2025 at 02:59:30PM +0800, zhaoguohan@kylinos.cn wrote:
> From: GuoHan Zhao <zhaoguohan@kylinos.cn>
> 
> In xe_hwmon_pcode_rmw_power_limit(), when xe_pcode_read() fails,
> the function logs the error but continues to execute the subsequent
> logic. This can result in undefined behavior as the values val0 and
> val1 may contain invalid data.
> 
> Fix this by adding an early return after logging the read failure,
> ensuring that we don't proceed with potentially corrupted data.
> 

Fixes: 8aa7306631f0 ("drm/xe/hwmon: Fix xe_hwmon_power_max_write")
Cc: Riana Tauro <riana.tauro@intel.com>
Cc: Karthik Poosa <karthik.poosa@intel.com>
Cc: Badal Nilawar <badal.nilawar@intel.com>

It looks like the original idea was to try to write even if the
read failed, but it was not a RMW function. Then, when it got
moved to RMW this ignored error was forgotten.

> Signed-off-by: GuoHan Zhao <zhaoguohan@kylinos.cn>
> ---
>  drivers/gpu/drm/xe/xe_hwmon.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_hwmon.c b/drivers/gpu/drm/xe/xe_hwmon.c
> index f08fc4377d25..eb410c5293e7 100644
> --- a/drivers/gpu/drm/xe/xe_hwmon.c
> +++ b/drivers/gpu/drm/xe/xe_hwmon.c
> @@ -190,9 +190,11 @@ static int xe_hwmon_pcode_rmw_power_limit(const struct xe_hwmon *hwmon, u32 attr
>  						  READ_PL_FROM_PCODE : READ_PL_FROM_FW),
>  						  &val0, &val1);
>  
> -	if (ret)
> +	if (ret) {
>  		drm_dbg(&hwmon->xe->drm, "read failed ch %d val0 0x%08x, val1 0x%08x, ret %d\n",
>  			channel, val0, val1, ret);
> +			return ret;

Please change this to drm_err. Now this is an error that needs to be visible.

But also, I believe this error needs to be propagated up to the caller so anyone
using hwmon will have a clear indication that the write failed.

> +	}
>  
>  	if (attr == PL1_HWMON_ATTR)
>  		val0 = (val0 & ~clr) | set;
> -- 
> 2.43.0
>