drivers/firmware/arm_scmi/driver.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
Fix race conditions in SCMI raw mode implementation by adding proper
completion timeout handling. Multiple tests in the SCMI test suite
were failing due to early clearing of SCMI_XFER_FLAG_IS_RAW flag in
scmi_xfer_raw_put() function.
TRANS=raw
PROTOCOLS=base,clock,power_domain,performance,system_power,sensor,
voltage,reset,powercap,pin_control VERBOSE=5
The root cause:
Tests were failing on poll() system calls with this condition:
if (!raw || (idx == SCMI_RAW_REPLY_QUEUE && !SCMI_XFER_IS_RAW(xfer)))
return;
The SCMI_XFER_FLAG_IS_RAW flag was being cleared prematurely before
the transfer completion was properly acknowledged, causing the poll
to return on timeout and tests to fail.
Fix ensures:
- Proper synchronization between transfer completion and flag clearing
- Stable test execution by maintaining correct flag states
An example of a random test failure:
817: Voltage get ext name for invalid domain
[Check 1] Get extended name for invalid domain
MSG HDR : 0x04585c09
NUM PARAM : 1
PARAMETER[00] : 0x0000000c
CHECK STATUS : PASSED [SCMI_NOT_FOUND_ERR]
CHECK HEADER : PASSED [0x04585c09]
RETURN COUNT : 0
NUM DOMAINS : 11
VOLTAGE DOMAIN : 0
[Check 2] Get extended name for unsupp. domain
MSG HDR : 0x045c5c09
NUM PARAM : 1
PARAMETER[00] : 0x00000000
CHECK STATUS : FAILED
EXPECTED : SCMI_NOT_FOUND_ERR
RECEIVED : SCMI_GENERIC_ERROR : NON CONFORMANT
After making these changes, the tests stopped failing.
$mount -t debugfs none /sys/kernel/debug
$scmi_test_agent
[ 127.865032] arm-scmi arm-scmi.1.auto: Resetting SCMI Raw stack.
[ 128.360503] arm-scmi arm-scmi.1.auto: Using Base channel for protocol 0x12
$tail -n 6 arm_scmi_test_log.txt
****************************************************
TOTAL TESTS: 167 PASSED: 120 FAILED: 0 SKIPPED: 47
****************************************************
An ftrace log with of passed test:
0) | scmi_rx_callback()
0) | scmi_raw_message_report()
7) | scmi_xfer_raw_wait_for_message_response()
7) + 22.000 us | scmi_wait_for_reply();
0) | /* scmi_raw_message_report*/
7) | scmi_xfer_raw_put()
An ftrace log with of failed test:
0) | scmi_rx_callback() {
0) | scmi_raw_message_report()
5) | scmi_xfer_raw_wait_for_message_response()
5) ! 383.000 us | scmi_wait_for_reply();
5) | scmi_xfer_raw_put() {
0) | /* scmi_raw_message_report*/
Link [1] https://gitlab.arm.com/tests/scmi-tests/-/releases
Fixes: 3095a3e25d8f7 (firmware: arm_scmi: Add xfer helpers to provide raw access)
Suggested-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Artem Shimko <a.shimko.dev@gmail.com>
---
Hi Cristian,
Good point about CONFIG_ARM_SCMI_RAW_MODE_SUPPORT_COEX.
I can confirm this setting doesn't impact the test failures in my environment.
The issue reproduces consistently with COEX both enabled and disabled.
Thank you!
Best regards,
Artem Shimko
ChangeLog:
v1:
* https://lore.kernel.org/arm-scmi/20250929142856.540590-1-a.shimko.dev@gmail.com/
v2:
* Use simpler approach suggested by Cristian Marussi
* Clear all xfer flags in __scmi_xfer_put() under spinlock protection
* Add Fixes tag as requested
* Drop completion timeout mechanism from v1
drivers/firmware/arm_scmi/driver.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c
index bd56a877fdfc..0976bfdbb44b 100644
--- a/drivers/firmware/arm_scmi/driver.c
+++ b/drivers/firmware/arm_scmi/driver.c
@@ -821,6 +821,7 @@ __scmi_xfer_put(struct scmi_xfers_info *minfo, struct scmi_xfer *xfer)
scmi_dec_count(info->dbg->counters, XFERS_INFLIGHT);
}
+ xfer->flags = 0;
hlist_add_head(&xfer->node, &minfo->free_xfers);
}
spin_unlock_irqrestore(&minfo->xfer_lock, flags);
@@ -839,8 +840,6 @@ void scmi_xfer_raw_put(const struct scmi_handle *handle, struct scmi_xfer *xfer)
{
struct scmi_info *info = handle_to_scmi_info(handle);
- xfer->flags &= ~SCMI_XFER_FLAG_IS_RAW;
- xfer->flags &= ~SCMI_XFER_FLAG_CHAN_SET;
return __scmi_xfer_put(&info->tx_minfo, xfer);
}
--
2.43.0
© 2016 - 2025 Red Hat, Inc.