[PATCH 0/3] platform/x86: intel_scu_ipc: Timeout fixes

Stephen Boyd posted 3 patches 2 years, 3 months ago
There is a newer version of this series
drivers/platform/x86/intel_scu_ipc.c | 59 ++++++++++++++++++++--------
1 file changed, 42 insertions(+), 17 deletions(-)
[PATCH 0/3] platform/x86: intel_scu_ipc: Timeout fixes
Posted by Stephen Boyd 2 years, 3 months ago
I recently looked at some crash reports on ChromeOS devices that call
into this intel_scu_ipc driver. They were hitting timeouts, and it
certainly looks possible for those timeouts to be triggering because of
scheduling issues. Once things started going south, the timeouts kept
coming. Maybe that's because the other side got seriously confused? I
don't know. I'll poke at it some more by injecting timeouts on the
kernel side.

The first two patches are only lightly tested (normal functions keep
working), while the third one is purely speculation. I was going to make
the interrupt delay for a long time to see if I could hit the timeout.

Stephen Boyd (3):
  platform/x86: intel_scu_ipc: Check status after timeouts in
    busy_loop()
  platform/x86: intel_scu_ipc: Check status upon timeout in
    ipc_wait_for_interrupt()
  platform/x86: intel_scu_ipc: Fail IPC send if still busy

 drivers/platform/x86/intel_scu_ipc.c | 59 ++++++++++++++++++++--------
 1 file changed, 42 insertions(+), 17 deletions(-)


base-commit: 2dde18cd1d8fac735875f2e4987f11817cc0bc2c
-- 
https://chromeos.dev
Re: [PATCH 0/3] platform/x86: intel_scu_ipc: Timeout fixes
Posted by Kuppuswamy Sathyanarayanan 2 years, 3 months ago

On 8/30/2023 6:14 PM, Stephen Boyd wrote:
> I recently looked at some crash reports on ChromeOS devices that call
> into this intel_scu_ipc driver. They were hitting timeouts, and it
> certainly looks possible for those timeouts to be triggering because of
> scheduling issues. Once things started going south, the timeouts kept

Are you talking about timeouts during IPC command?

> coming. Maybe that's because the other side got seriously confused? I
> don't know. I'll poke at it some more by injecting timeouts on the
> kernel side.

Do you think it is possible due to a firmware issue?

> 
> The first two patches are only lightly tested (normal functions keep
> working), while the third one is purely speculation. I was going to make
> the interrupt delay for a long time to see if I could hit the timeout.
> 
> Stephen Boyd (3):
>   platform/x86: intel_scu_ipc: Check status after timeouts in
>     busy_loop()
>   platform/x86: intel_scu_ipc: Check status upon timeout in
>     ipc_wait_for_interrupt()
>   platform/x86: intel_scu_ipc: Fail IPC send if still busy
> 
>  drivers/platform/x86/intel_scu_ipc.c | 59 ++++++++++++++++++++--------
>  1 file changed, 42 insertions(+), 17 deletions(-)
> 
> 
> base-commit: 2dde18cd1d8fac735875f2e4987f11817cc0bc2c

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
Re: [PATCH 0/3] platform/x86: intel_scu_ipc: Timeout fixes
Posted by Stephen Boyd 2 years, 3 months ago
Quoting Kuppuswamy Sathyanarayanan (2023-08-30 20:28:57)
>
>
> On 8/30/2023 6:14 PM, Stephen Boyd wrote:
> > I recently looked at some crash reports on ChromeOS devices that call
> > into this intel_scu_ipc driver. They were hitting timeouts, and it
> > certainly looks possible for those timeouts to be triggering because of
> > scheduling issues. Once things started going south, the timeouts kept
>
> Are you talking about timeouts during IPC command?

Yes? I see messages like this

	intel_scu_ipc intel_scu_ipc: IPC command 0x200a7 failed with -110

which led me to this driver and I wrote these patches based on that
failure message. I was trying to figure out how that could happen, and
it seems that it could simply be scheduling delays while nothing is
really timing out.

>
> > coming. Maybe that's because the other side got seriously confused? I
> > don't know. I'll poke at it some more by injecting timeouts on the
> > kernel side.
>
> Do you think it is possible due to a firmware issue?

I have no idea. Is there some way to figure that out? I'm not able to
reproduce the problem locally.