[PATCH v4 0/4] platform/x86: intel_scu_ipc: Timeout fixes

Stephen Boyd posted 4 patches 2 years, 3 months ago
drivers/platform/x86/intel_scu_ipc.c | 66 +++++++++++++++++-----------
1 file changed, 40 insertions(+), 26 deletions(-)
[PATCH v4 0/4] platform/x86: intel_scu_ipc: Timeout fixes
Posted by Stephen Boyd 2 years, 3 months ago
I recently looked at some crash reports on ChromeOS devices that call
into this intel_scu_ipc driver. They were hitting timeouts, and it
certainly looks possible for those timeouts to be triggering because of
scheduling issues. Once things started going south, the timeouts kept
coming. Maybe that's because the other side got seriously confused? I
don't know.

I added some sleeps to these paths to trigger the timeout behavior to
make sure the code works. Simply sleeping for a long time in busy_loop()
hits the timeout, which could happen if the system is scheduling lots of
other things at the time.

I couldn't really test the last patch because forcing a timeout or
returning immediately wasn't fast enough to trigger the second
transaction to run into the first one being processed.

Changes from v3 (https://lore.kernel.org/r/20230911193937.302552-1-swboyd@chromium.org):
 * Use readx_poll_timeout() to shorten a line

Changes from v2 (https://lore.kernel.org/r/20230906180944.2197111-1-swboyd@chromium.org):
 * Use read_poll_timeout() helper in patch #1 (again)
 * New patch #3 to fix bug pointed out by Andy
 * Consolidate more code into busy check in patch #4

Changes from v1 (https://lore.kernel.org/r/20230831011405.3246849-1-swboyd@chromium.org):
 * Don't use read_poll_timeout() helper in patch 1, just add code
 * Rewrite patch 2 to be simpler
 * Make intel_scu_ipc_busy() return -EBUSY when busy
 * Downgrade dev_err() to dev_dbg() in intel_scu_ipc_busy()

Stephen Boyd (4):
  platform/x86: intel_scu_ipc: Check status after timeout in busy_loop()
  platform/x86: intel_scu_ipc: Check status upon timeout in
    ipc_wait_for_interrupt()
  platform/x86: intel_scu_ipc: Don't override scu in
    intel_scu_ipc_dev_simple_command()
  platform/x86: intel_scu_ipc: Fail IPC send if still busy

 drivers/platform/x86/intel_scu_ipc.c | 66 +++++++++++++++++-----------
 1 file changed, 40 insertions(+), 26 deletions(-)

Cc: Prashant Malani <pmalani@chromium.org>

base-commit: 2dde18cd1d8fac735875f2e4987f11817cc0bc2c
-- 
https://chromeos.dev
Re: [PATCH v4 0/4] platform/x86: intel_scu_ipc: Timeout fixes
Posted by Hans de Goede 2 years, 3 months ago
Hi,

On 9/13/23 23:27, Stephen Boyd wrote:
> I recently looked at some crash reports on ChromeOS devices that call
> into this intel_scu_ipc driver. They were hitting timeouts, and it
> certainly looks possible for those timeouts to be triggering because of
> scheduling issues. Once things started going south, the timeouts kept
> coming. Maybe that's because the other side got seriously confused? I
> don't know.
> 
> I added some sleeps to these paths to trigger the timeout behavior to
> make sure the code works. Simply sleeping for a long time in busy_loop()
> hits the timeout, which could happen if the system is scheduling lots of
> other things at the time.
> 
> I couldn't really test the last patch because forcing a timeout or
> returning immediately wasn't fast enough to trigger the second
> transaction to run into the first one being processed.

Thank you for your patch, I've applied this patch to my fixes
branch:
https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git/log/?h=fixes

Note it will show up in my fixes branch once I've pushed my
local branch there, which might take a while.

I will include this patch in my next fixes pull-req to Linus
for the current kernel development cycle.

Regards,

Hans




> Changes from v3 (https://lore.kernel.org/r/20230911193937.302552-1-swboyd@chromium.org):
>  * Use readx_poll_timeout() to shorten a line
> 
> Changes from v2 (https://lore.kernel.org/r/20230906180944.2197111-1-swboyd@chromium.org):
>  * Use read_poll_timeout() helper in patch #1 (again)
>  * New patch #3 to fix bug pointed out by Andy
>  * Consolidate more code into busy check in patch #4
> 
> Changes from v1 (https://lore.kernel.org/r/20230831011405.3246849-1-swboyd@chromium.org):
>  * Don't use read_poll_timeout() helper in patch 1, just add code
>  * Rewrite patch 2 to be simpler
>  * Make intel_scu_ipc_busy() return -EBUSY when busy
>  * Downgrade dev_err() to dev_dbg() in intel_scu_ipc_busy()
> 
> Stephen Boyd (4):
>   platform/x86: intel_scu_ipc: Check status after timeout in busy_loop()
>   platform/x86: intel_scu_ipc: Check status upon timeout in
>     ipc_wait_for_interrupt()
>   platform/x86: intel_scu_ipc: Don't override scu in
>     intel_scu_ipc_dev_simple_command()
>   platform/x86: intel_scu_ipc: Fail IPC send if still busy
> 
>  drivers/platform/x86/intel_scu_ipc.c | 66 +++++++++++++++++-----------
>  1 file changed, 40 insertions(+), 26 deletions(-)
> 
> Cc: Prashant Malani <pmalani@chromium.org>
> 
> base-commit: 2dde18cd1d8fac735875f2e4987f11817cc0bc2c