[v2] tee: optee: Fix supplicant wait loop

[PATCH v2] tee: optee: Fix supplicant wait loop

Posted by Sumit Garg 1 year ago

OP-TEE supplicant is a user-space daemon and it's possible for it
being hung or crashed or killed in the middle of processing an OP-TEE
RPC call. It becomes more complicated when there is incorrect shutdown
ordering of the supplicant process vs the OP-TEE client application which
can eventually lead to system hang-up waiting for the closure of the
client application.

Allow the client process waiting in kernel for supplicant response to
be killed rather than indefinitetly waiting in an unkillable state. This
fixes issues observed during system reboot/shutdown when supplicant got
hung for some reason or gets crashed/killed which lead to client getting
hung in an unkillable state. It in turn lead to system being in hung up
state requiring hard power off/on to recover.

Fixes: 4fb0a5eb364d ("tee: add OP-TEE driver")
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
---

Changes in v2:
- Switch to killable wait instead as suggested by Arnd instead
  of supplicant timeout. It atleast allow the client to wait for
  supplicant in killable state which in turn allows system to reboot
  or shutdown gracefully.

 drivers/tee/optee/supp.c | 32 +++++++-------------------------
 1 file changed, 7 insertions(+), 25 deletions(-)

diff --git a/drivers/tee/optee/supp.c b/drivers/tee/optee/supp.c
index 322a543b8c27..3fbfa9751931 100644
--- a/drivers/tee/optee/supp.c
+++ b/drivers/tee/optee/supp.c
@@ -80,7 +80,6 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
 	struct optee *optee = tee_get_drvdata(ctx->teedev);
 	struct optee_supp *supp = &optee->supp;
 	struct optee_supp_req *req;
-	bool interruptable;
 	u32 ret;
 
 	/*
@@ -111,36 +110,19 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
 	/*
 	 * Wait for supplicant to process and return result, once we've
 	 * returned from wait_for_completion(&req->c) successfully we have
-	 * exclusive access again.
+	 * exclusive access again. Allow the wait to be killable such that
+	 * the wait doesn't turn into an indefinite state if the supplicant
+	 * gets hung for some reason.
 	 */
-	while (wait_for_completion_interruptible(&req->c)) {
-		mutex_lock(&supp->mutex);
-		interruptable = !supp->ctx;
-		if (interruptable) {
-			/*
-			 * There's no supplicant available and since the
-			 * supp->mutex currently is held none can
-			 * become available until the mutex released
-			 * again.
-			 *
-			 * Interrupting an RPC to supplicant is only
-			 * allowed as a way of slightly improving the user
-			 * experience in case the supplicant hasn't been
-			 * started yet. During normal operation the supplicant
-			 * will serve all requests in a timely manner and
-			 * interrupting then wouldn't make sense.
-			 */
+	if (wait_for_completion_killable(&req->c)) {
+		if (!mutex_lock_killable(&supp->mutex))	{
 			if (req->in_queue) {
 				list_del(&req->link);
 				req->in_queue = false;
 			}
+			mutex_unlock(&supp->mutex);
 		}
-		mutex_unlock(&supp->mutex);
-
-		if (interruptable) {
-			req->ret = TEEC_ERROR_COMMUNICATION;
-			break;
-		}
+		req->ret = TEEC_ERROR_COMMUNICATION;
 	}
 
 	ret = req->ret;
-- 
2.43.0

Re: [PATCH v2] tee: optee: Fix supplicant wait loop

Posted by Jens Wiklander 1 year ago

On Mon, Feb 3, 2025 at 9:00 AM Sumit Garg <sumit.garg@linaro.org> wrote:
>
> OP-TEE supplicant is a user-space daemon and it's possible for it
> being hung or crashed or killed in the middle of processing an OP-TEE
> RPC call. It becomes more complicated when there is incorrect shutdown
> ordering of the supplicant process vs the OP-TEE client application which
> can eventually lead to system hang-up waiting for the closure of the
> client application.
>
> Allow the client process waiting in kernel for supplicant response to
> be killed rather than indefinitetly waiting in an unkillable state. This
> fixes issues observed during system reboot/shutdown when supplicant got
> hung for some reason or gets crashed/killed which lead to client getting
> hung in an unkillable state. It in turn lead to system being in hung up
> state requiring hard power off/on to recover.
>
> Fixes: 4fb0a5eb364d ("tee: add OP-TEE driver")
> Suggested-by: Arnd Bergmann <arnd@arndb.de>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
> ---
>
> Changes in v2:
> - Switch to killable wait instead as suggested by Arnd instead
>   of supplicant timeout. It atleast allow the client to wait for
>   supplicant in killable state which in turn allows system to reboot
>   or shutdown gracefully.
>
>  drivers/tee/optee/supp.c | 32 +++++++-------------------------
>  1 file changed, 7 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/tee/optee/supp.c b/drivers/tee/optee/supp.c
> index 322a543b8c27..3fbfa9751931 100644
> --- a/drivers/tee/optee/supp.c
> +++ b/drivers/tee/optee/supp.c
> @@ -80,7 +80,6 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
>         struct optee *optee = tee_get_drvdata(ctx->teedev);
>         struct optee_supp *supp = &optee->supp;
>         struct optee_supp_req *req;
> -       bool interruptable;
>         u32 ret;
>
>         /*
> @@ -111,36 +110,19 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
>         /*
>          * Wait for supplicant to process and return result, once we've
>          * returned from wait_for_completion(&req->c) successfully we have
> -        * exclusive access again.
> +        * exclusive access again. Allow the wait to be killable such that
> +        * the wait doesn't turn into an indefinite state if the supplicant
> +        * gets hung for some reason.
>          */
> -       while (wait_for_completion_interruptible(&req->c)) {
> -               mutex_lock(&supp->mutex);
> -               interruptable = !supp->ctx;
> -               if (interruptable) {
> -                       /*
> -                        * There's no supplicant available and since the
> -                        * supp->mutex currently is held none can
> -                        * become available until the mutex released
> -                        * again.
> -                        *
> -                        * Interrupting an RPC to supplicant is only
> -                        * allowed as a way of slightly improving the user
> -                        * experience in case the supplicant hasn't been
> -                        * started yet. During normal operation the supplicant
> -                        * will serve all requests in a timely manner and
> -                        * interrupting then wouldn't make sense.
> -                        */
> +       if (wait_for_completion_killable(&req->c)) {
> +               if (!mutex_lock_killable(&supp->mutex)) {

Why not mutex_lock()? If we fail to acquire the mutex here, we will
quite likely free the req list item below at the end of this function
while it remains in the list.

Cheers,
Jens

>                         if (req->in_queue) {
>                                 list_del(&req->link);
>                                 req->in_queue = false;
>                         }
> +                       mutex_unlock(&supp->mutex);
>                 }
> -               mutex_unlock(&supp->mutex);
> -
> -               if (interruptable) {
> -                       req->ret = TEEC_ERROR_COMMUNICATION;
> -                       break;
> -               }
> +               req->ret = TEEC_ERROR_COMMUNICATION;
>         }
>
>         ret = req->ret;
> --
> 2.43.0
>

Re: [PATCH v2] tee: optee: Fix supplicant wait loop

Posted by Arnd Bergmann 1 year ago

On Mon, Feb 3, 2025, at 09:00, Sumit Garg wrote:
> OP-TEE supplicant is a user-space daemon and it's possible for it
> being hung or crashed or killed in the middle of processing an OP-TEE
> RPC call. It becomes more complicated when there is incorrect shutdown
> ordering of the supplicant process vs the OP-TEE client application which
> can eventually lead to system hang-up waiting for the closure of the
> client application.
>
> Allow the client process waiting in kernel for supplicant response to
> be killed rather than indefinitetly waiting in an unkillable state.

It would be good to mention that the existing version ends up in
a busy-loop here because of the broken wait_for_completion_interruptible()
loop.

A normal uninterruptible wait should not have resulted in the hung-task
watchdog getting triggered, but the endless loop would.

> +	if (wait_for_completion_killable(&req->c)) {
> +		if (!mutex_lock_killable(&supp->mutex))	{
>  			if (req->in_queue) {

Using mutex_trylock() here is probably clearer than
mutex_lock_killable(), since a task that is already in the
process of getting killed won't ever wait for the mutex.
mutex_lock_killable() does try to get the lock first though
if nobody else holds it already, which is the same as trylock.

     Arnd

Re: [PATCH v2] tee: optee: Fix supplicant wait loop

Posted by Sumit Garg 1 year ago

Hi Arnd,

On Mon, 3 Feb 2025 at 13:46, Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Mon, Feb 3, 2025, at 09:00, Sumit Garg wrote:
> > OP-TEE supplicant is a user-space daemon and it's possible for it
> > being hung or crashed or killed in the middle of processing an OP-TEE
> > RPC call. It becomes more complicated when there is incorrect shutdown
> > ordering of the supplicant process vs the OP-TEE client application which
> > can eventually lead to system hang-up waiting for the closure of the
> > client application.
> >
> > Allow the client process waiting in kernel for supplicant response to
> > be killed rather than indefinitetly waiting in an unkillable state.
>
> It would be good to mention that the existing version ends up in
> a busy-loop here because of the broken wait_for_completion_interruptible()
> loop.
>
> A normal uninterruptible wait should not have resulted in the hung-task
> watchdog getting triggered, but the endless loop would.

Sure, I will add that.

>
> > +     if (wait_for_completion_killable(&req->c)) {
> > +             if (!mutex_lock_killable(&supp->mutex)) {
> >                       if (req->in_queue) {
>
> Using mutex_trylock() here is probably clearer than
> mutex_lock_killable(), since a task that is already in the
> process of getting killed won't ever wait for the mutex.
> mutex_lock_killable() does try to get the lock first though
> if nobody else holds it already, which is the same as trylock.

Let's follow-up on this in the other thread.

-Sumit

>
>      Arnd