[v3] tee: optee: prevent use-after-free when the client exits before the supplicant

[PATCH v3] tee: optee: prevent use-after-free when the client exits before the supplicant

Posted by Amirreza Zarrabi 1 week, 2 days ago

Commit 70b0d6b0a199 ("tee: optee: Fix supplicant wait loop") made the
client wait as killable so it can be interrupted during shutdown or
after a supplicant crash. This changes the original lifetime expectations:
the client task can now terminate while the supplicant is still processing
its request.

If the client exits first it removes the request from its queue and
kfree()s it, while the request ID remains in supp->idr. A subsequent
lookup on the supplicant path then dereferences freed memory, leading to
a use-after-free.

Serialise access to the request with supp->mutex:

  * Hold supp->mutex in optee_supp_recv() and optee_supp_send() while
    looking up and touching the request.
  * Let optee_supp_thrd_req() notice that the client has terminated and
    signal optee_supp_send() accordingly.

With these changes the request cannot be freed while the supplicant still
has a reference, eliminating the race.

Fixes: 70b0d6b0a199 ("tee: optee: Fix supplicant wait loop")
Signed-off-by: Amirreza Zarrabi <amirreza.zarrabi@oss.qualcomm.com>
---
Changes in v3:
- Introduce processed flag instead of -1 for req->id.
- Update optee_supp_release() as reported by Michael Wu.
- Use mutex instead of guard.
- Link to v2: https://lore.kernel.org/r/20250617-fix-use-after-free-v2-1-1fbfafec5917@oss.qualcomm.com

Changes in v2:
- Replace the static variable with a sentinel value.
- Fix the issue with returning the popped request to the supplicant.
- Link to v1: https://lore.kernel.org/r/20250605-fix-use-after-free-v1-1-a70d23bff248@oss.qualcomm.com
---
 drivers/tee/optee/supp.c | 122 +++++++++++++++++++++++++++++++++--------------
 1 file changed, 86 insertions(+), 36 deletions(-)

diff --git a/drivers/tee/optee/supp.c b/drivers/tee/optee/supp.c
index d0f397c90242..0ec66008df19 100644
--- a/drivers/tee/optee/supp.c
+++ b/drivers/tee/optee/supp.c
@@ -10,7 +10,11 @@
 struct optee_supp_req {
 	struct list_head link;
 
+	int id;
+
 	bool in_queue;
+	bool processed;
+
 	u32 func;
 	u32 ret;
 	size_t num_params;
@@ -19,6 +23,9 @@ struct optee_supp_req {
 	struct completion c;
 };
 
+/* It is temporary request used for invalid pending request in supp->idr. */
+#define INVALID_REQ_PTR ((struct optee_supp_req *)ERR_PTR(-ENOENT))
+
 void optee_supp_init(struct optee_supp *supp)
 {
 	memset(supp, 0, sizeof(*supp));
@@ -46,6 +53,10 @@ void optee_supp_release(struct optee_supp *supp)
 	/* Abort all request retrieved by supplicant */
 	idr_for_each_entry(&supp->idr, req, id) {
 		idr_remove(&supp->idr, id);
+		/* Skip if request was already marked invalid */
+		if (IS_ERR(req))
+			continue;
+
 		req->ret = TEEC_ERROR_COMMUNICATION;
 		complete(&req->c);
 	}
@@ -102,6 +113,7 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
 	mutex_lock(&supp->mutex);
 	list_add_tail(&req->link, &supp->reqs);
 	req->in_queue = true;
+	req->processed = false;
 	mutex_unlock(&supp->mutex);
 
 	/* Tell an eventual waiter there's a new request */
@@ -117,21 +129,40 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
 	if (wait_for_completion_killable(&req->c)) {
 		mutex_lock(&supp->mutex);
 		if (req->in_queue) {
+			/* Supplicant has not seen this request yet. */
 			list_del(&req->link);
 			req->in_queue = false;
+
+			ret = TEEC_ERROR_COMMUNICATION;
+		} else if (req->processed) {
+			/*
+			 * Supplicant has processed this request. Ignore the
+			 * kill signal for now and submit the result.
+			 */
+			ret = req->ret;
+		} else {
+			/*
+			 * Supplicant is in the middle of processing this
+			 * request. Replace req with INVALID_REQ_PTR so that
+			 * the ID remains busy, causing optee_supp_send() to
+			 * fail on the next call to supp_pop_req() with this ID.
+			 */
+			idr_replace(&supp->idr, INVALID_REQ_PTR, req->id);
+			ret = TEEC_ERROR_COMMUNICATION;
 		}
+
 		mutex_unlock(&supp->mutex);
-		req->ret = TEEC_ERROR_COMMUNICATION;
+	} else {
+		ret = req->ret;
 	}
 
-	ret = req->ret;
 	kfree(req);
 
 	return ret;
 }
 
 static struct optee_supp_req  *supp_pop_entry(struct optee_supp *supp,
-					      int num_params, int *id)
+					      int num_params)
 {
 	struct optee_supp_req *req;
 
@@ -153,8 +184,8 @@ static struct optee_supp_req  *supp_pop_entry(struct optee_supp *supp,
 		return ERR_PTR(-EINVAL);
 	}
 
-	*id = idr_alloc(&supp->idr, req, 1, 0, GFP_KERNEL);
-	if (*id < 0)
+	req->id = idr_alloc(&supp->idr, req, 1, 0, GFP_KERNEL);
+	if (req->id < 0)
 		return ERR_PTR(-ENOMEM);
 
 	list_del(&req->link);
@@ -214,7 +245,6 @@ int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params,
 	struct optee *optee = tee_get_drvdata(teedev);
 	struct optee_supp *supp = &optee->supp;
 	struct optee_supp_req *req = NULL;
-	int id;
 	size_t num_meta;
 	int rc;
 
@@ -224,15 +254,48 @@ int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params,
 
 	while (true) {
 		mutex_lock(&supp->mutex);
-		req = supp_pop_entry(supp, *num_params - num_meta, &id);
-		mutex_unlock(&supp->mutex);
 
-		if (req) {
-			if (IS_ERR(req))
-				return PTR_ERR(req);
-			break;
+		req = supp_pop_entry(supp, *num_params - num_meta);
+		if (!req) {
+			mutex_unlock(&supp->mutex);
+			goto wait_for_request;
+		}
+
+		if (IS_ERR(req)) {
+			rc = PTR_ERR(req);
+			mutex_unlock(&supp->mutex);
+
+			return rc;
 		}
 
+		/*
+		 * Process the request while holding the lock, so that
+		 * optee_supp_thrd_req() doesn't pull the request from under us.
+		 */
+
+		if (num_meta) {
+			/*
+			 * tee-supplicant support meta parameters ->
+			 * requests can be processed asynchronously.
+			 */
+			param->attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT |
+				      TEE_IOCTL_PARAM_ATTR_META;
+			param->u.value.a = req->id;
+			param->u.value.b = 0;
+			param->u.value.c = 0;
+		} else {
+			supp->req_id = req->id;
+		}
+
+		*func = req->func;
+		*num_params = req->num_params + num_meta;
+		memcpy(param + num_meta, req->param,
+		       sizeof(struct tee_param) * req->num_params);
+
+		mutex_unlock(&supp->mutex);
+		return 0;
+
+wait_for_request:
 		/*
 		 * If we didn't get a request we'll block in
 		 * wait_for_completion() to avoid needless spinning.
@@ -243,29 +306,10 @@ int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params,
 		 */
 		if (wait_for_completion_interruptible(&supp->reqs_c))
 			return -ERESTARTSYS;
-	}
 
-	if (num_meta) {
-		/*
-		 * tee-supplicant support meta parameters -> requsts can be
-		 * processed asynchronously.
-		 */
-		param->attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT |
-			      TEE_IOCTL_PARAM_ATTR_META;
-		param->u.value.a = id;
-		param->u.value.b = 0;
-		param->u.value.c = 0;
-	} else {
-		mutex_lock(&supp->mutex);
-		supp->req_id = id;
-		mutex_unlock(&supp->mutex);
+		/* Check for the next request in the queue. */
 	}
 
-	*func = req->func;
-	*num_params = req->num_params + num_meta;
-	memcpy(param + num_meta, req->param,
-	       sizeof(struct tee_param) * req->num_params);
-
 	return 0;
 }
 
@@ -297,12 +341,18 @@ static struct optee_supp_req *supp_pop_req(struct optee_supp *supp,
 	if (!req)
 		return ERR_PTR(-ENOENT);
 
+	/* optee_supp_thrd_req() already returned to optee. */
+	if (IS_ERR(req))
+		goto failed_req;
+
 	if ((num_params - nm) != req->num_params)
 		return ERR_PTR(-EINVAL);
 
+	*num_meta = nm;
+failed_req:
 	idr_remove(&supp->idr, id);
 	supp->req_id = -1;
-	*num_meta = nm;
+
 
 	return req;
 }
@@ -328,9 +378,8 @@ int optee_supp_send(struct tee_context *ctx, u32 ret, u32 num_params,
 
 	mutex_lock(&supp->mutex);
 	req = supp_pop_req(supp, num_params, param, &num_meta);
-	mutex_unlock(&supp->mutex);
-
 	if (IS_ERR(req)) {
+		mutex_unlock(&supp->mutex);
 		/* Something is wrong, let supplicant restart. */
 		return PTR_ERR(req);
 	}
@@ -355,9 +404,10 @@ int optee_supp_send(struct tee_context *ctx, u32 ret, u32 num_params,
 		}
 	}
 	req->ret = ret;
-
+	req->processed = true;
 	/* Let the requesting thread continue */
 	complete(&req->c);
+	mutex_unlock(&supp->mutex);
 
 	return 0;
 }

---
base-commit: 3f24e4edcd1b8981c6b448ea2680726dedd87279
change-id: 20250604-fix-use-after-free-8ff1b5d5d774

Best regards,
-- 
Amirreza Zarrabi <amirreza.zarrabi@oss.qualcomm.com>

Re: [PATCH v3] tee: optee: prevent use-after-free when the client exits before the supplicant

Posted by Jens Wiklander 4 days, 21 hours ago

Hi Amir,

On Thu, Jan 29, 2026 at 4:22 AM Amirreza Zarrabi
<amirreza.zarrabi@oss.qualcomm.com> wrote:
>
> Commit 70b0d6b0a199 ("tee: optee: Fix supplicant wait loop") made the
> client wait as killable so it can be interrupted during shutdown or
> after a supplicant crash. This changes the original lifetime expectations:
> the client task can now terminate while the supplicant is still processing
> its request.
>
> If the client exits first it removes the request from its queue and
> kfree()s it, while the request ID remains in supp->idr. A subsequent
> lookup on the supplicant path then dereferences freed memory, leading to
> a use-after-free.
>
> Serialise access to the request with supp->mutex:
>
>   * Hold supp->mutex in optee_supp_recv() and optee_supp_send() while
>     looking up and touching the request.
>   * Let optee_supp_thrd_req() notice that the client has terminated and
>     signal optee_supp_send() accordingly.
>
> With these changes the request cannot be freed while the supplicant still
> has a reference, eliminating the race.
>
> Fixes: 70b0d6b0a199 ("tee: optee: Fix supplicant wait loop")
> Signed-off-by: Amirreza Zarrabi <amirreza.zarrabi@oss.qualcomm.com>
> ---
> Changes in v3:
> - Introduce processed flag instead of -1 for req->id.
> - Update optee_supp_release() as reported by Michael Wu.
> - Use mutex instead of guard.
> - Link to v2: https://lore.kernel.org/r/20250617-fix-use-after-free-v2-1-1fbfafec5917@oss.qualcomm.com
>
> Changes in v2:
> - Replace the static variable with a sentinel value.
> - Fix the issue with returning the popped request to the supplicant.
> - Link to v1: https://lore.kernel.org/r/20250605-fix-use-after-free-v1-1-a70d23bff248@oss.qualcomm.com
> ---
>  drivers/tee/optee/supp.c | 122 +++++++++++++++++++++++++++++++++--------------
>  1 file changed, 86 insertions(+), 36 deletions(-)

I had forgotten about this. I'd like to prioritize getting this fixed
soon. By the way, how did you test this?

>
> diff --git a/drivers/tee/optee/supp.c b/drivers/tee/optee/supp.c
> index d0f397c90242..0ec66008df19 100644
> --- a/drivers/tee/optee/supp.c
> +++ b/drivers/tee/optee/supp.c
> @@ -10,7 +10,11 @@
>  struct optee_supp_req {
>         struct list_head link;
>
> +       int id;
> +
>         bool in_queue;
> +       bool processed;
> +
>         u32 func;
>         u32 ret;
>         size_t num_params;
> @@ -19,6 +23,9 @@ struct optee_supp_req {
>         struct completion c;
>  };
>
> +/* It is temporary request used for invalid pending request in supp->idr. */
> +#define INVALID_REQ_PTR ((struct optee_supp_req *)ERR_PTR(-ENOENT))
> +
>  void optee_supp_init(struct optee_supp *supp)
>  {
>         memset(supp, 0, sizeof(*supp));
> @@ -46,6 +53,10 @@ void optee_supp_release(struct optee_supp *supp)
>         /* Abort all request retrieved by supplicant */
>         idr_for_each_entry(&supp->idr, req, id) {
>                 idr_remove(&supp->idr, id);
> +               /* Skip if request was already marked invalid */
> +               if (IS_ERR(req))
> +                       continue;
> +
>                 req->ret = TEEC_ERROR_COMMUNICATION;
>                 complete(&req->c);
>         }
> @@ -102,6 +113,7 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
>         mutex_lock(&supp->mutex);
>         list_add_tail(&req->link, &supp->reqs);
>         req->in_queue = true;
> +       req->processed = false;
>         mutex_unlock(&supp->mutex);
>
>         /* Tell an eventual waiter there's a new request */
> @@ -117,21 +129,40 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
>         if (wait_for_completion_killable(&req->c)) {
>                 mutex_lock(&supp->mutex);
>                 if (req->in_queue) {
> +                       /* Supplicant has not seen this request yet. */
>                         list_del(&req->link);
>                         req->in_queue = false;
> +
> +                       ret = TEEC_ERROR_COMMUNICATION;
> +               } else if (req->processed) {
> +                       /*
> +                        * Supplicant has processed this request. Ignore the
> +                        * kill signal for now and submit the result.
> +                        */
> +                       ret = req->ret;
> +               } else {
> +                       /*
> +                        * Supplicant is in the middle of processing this
> +                        * request. Replace req with INVALID_REQ_PTR so that
> +                        * the ID remains busy, causing optee_supp_send() to
> +                        * fail on the next call to supp_pop_req() with this ID.
> +                        */
> +                       idr_replace(&supp->idr, INVALID_REQ_PTR, req->id);
> +                       ret = TEEC_ERROR_COMMUNICATION;
>                 }
> +
>                 mutex_unlock(&supp->mutex);
> -               req->ret = TEEC_ERROR_COMMUNICATION;
> +       } else {
> +               ret = req->ret;
>         }
>
> -       ret = req->ret;
>         kfree(req);
>
>         return ret;
>  }
>
>  static struct optee_supp_req  *supp_pop_entry(struct optee_supp *supp,
> -                                             int num_params, int *id)
> +                                             int num_params)
>  {
>         struct optee_supp_req *req;
>
> @@ -153,8 +184,8 @@ static struct optee_supp_req  *supp_pop_entry(struct optee_supp *supp,
>                 return ERR_PTR(-EINVAL);
>         }
>
> -       *id = idr_alloc(&supp->idr, req, 1, 0, GFP_KERNEL);
> -       if (*id < 0)
> +       req->id = idr_alloc(&supp->idr, req, 1, 0, GFP_KERNEL);
> +       if (req->id < 0)
>                 return ERR_PTR(-ENOMEM);

Since we're now storing the supplicant request ID, wouldn't it make
sense to pre-allocate the ID when allocating the request to avoid this
error case?

>
>         list_del(&req->link);
> @@ -214,7 +245,6 @@ int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params,
>         struct optee *optee = tee_get_drvdata(teedev);
>         struct optee_supp *supp = &optee->supp;
>         struct optee_supp_req *req = NULL;
> -       int id;
>         size_t num_meta;
>         int rc;
>
> @@ -224,15 +254,48 @@ int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params,
>
>         while (true) {
>                 mutex_lock(&supp->mutex);
> -               req = supp_pop_entry(supp, *num_params - num_meta, &id);
> -               mutex_unlock(&supp->mutex);
>
> -               if (req) {
> -                       if (IS_ERR(req))
> -                               return PTR_ERR(req);
> -                       break;
> +               req = supp_pop_entry(supp, *num_params - num_meta);
> +               if (!req) {
> +                       mutex_unlock(&supp->mutex);
> +                       goto wait_for_request;
> +               }
> +
> +               if (IS_ERR(req)) {
> +                       rc = PTR_ERR(req);
> +                       mutex_unlock(&supp->mutex);
> +
> +                       return rc;
>                 }
>
> +               /*
> +                * Process the request while holding the lock, so that
> +                * optee_supp_thrd_req() doesn't pull the request from under us.
> +                */
> +
> +               if (num_meta) {
> +                       /*
> +                        * tee-supplicant support meta parameters ->
> +                        * requests can be processed asynchronously.
> +                        */
> +                       param->attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT |
> +                                     TEE_IOCTL_PARAM_ATTR_META;
> +                       param->u.value.a = req->id;
> +                       param->u.value.b = 0;
> +                       param->u.value.c = 0;
> +               } else {
> +                       supp->req_id = req->id;
> +               }
> +
> +               *func = req->func;
> +               *num_params = req->num_params + num_meta;
> +               memcpy(param + num_meta, req->param,
> +                      sizeof(struct tee_param) * req->num_params);
> +
> +               mutex_unlock(&supp->mutex);
> +               return 0;

Do we really need to move this into the loop? The structure of the
function becomes a bit unusual and harder to read.

> +
> +wait_for_request:
>                 /*
>                  * If we didn't get a request we'll block in
>                  * wait_for_completion() to avoid needless spinning.
> @@ -243,29 +306,10 @@ int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params,
>                  */
>                 if (wait_for_completion_interruptible(&supp->reqs_c))
>                         return -ERESTARTSYS;
> -       }
>
> -       if (num_meta) {
> -               /*
> -                * tee-supplicant support meta parameters -> requsts can be
> -                * processed asynchronously.
> -                */
> -               param->attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT |
> -                             TEE_IOCTL_PARAM_ATTR_META;
> -               param->u.value.a = id;
> -               param->u.value.b = 0;
> -               param->u.value.c = 0;
> -       } else {
> -               mutex_lock(&supp->mutex);
> -               supp->req_id = id;
> -               mutex_unlock(&supp->mutex);
> +               /* Check for the next request in the queue. */
>         }
>
> -       *func = req->func;
> -       *num_params = req->num_params + num_meta;
> -       memcpy(param + num_meta, req->param,
> -              sizeof(struct tee_param) * req->num_params);
> -
>         return 0;
>  }
>
> @@ -297,12 +341,18 @@ static struct optee_supp_req *supp_pop_req(struct optee_supp *supp,
>         if (!req)
>                 return ERR_PTR(-ENOENT);
>
> +       /* optee_supp_thrd_req() already returned to optee. */
> +       if (IS_ERR(req))
> +               goto failed_req;
> +
>         if ((num_params - nm) != req->num_params)
>                 return ERR_PTR(-EINVAL);
>
> +       *num_meta = nm;
> +failed_req:
>         idr_remove(&supp->idr, id);
>         supp->req_id = -1;
> -       *num_meta = nm;
> +
>
>         return req;
>  }
> @@ -328,9 +378,8 @@ int optee_supp_send(struct tee_context *ctx, u32 ret, u32 num_params,
>
>         mutex_lock(&supp->mutex);
>         req = supp_pop_req(supp, num_params, param, &num_meta);
> -       mutex_unlock(&supp->mutex);
> -
>         if (IS_ERR(req)) {
> +               mutex_unlock(&supp->mutex);

We need a way to tell the difference between an id not found and an id
removed because of a killed requester.
How about storing NULL for revoked requests instead of an err-pointer?

Cheers,
Jens

>                 /* Something is wrong, let supplicant restart. */
>                 return PTR_ERR(req);
>         }
> @@ -355,9 +404,10 @@ int optee_supp_send(struct tee_context *ctx, u32 ret, u32 num_params,
>                 }
>         }
>         req->ret = ret;
> -
> +       req->processed = true;
>         /* Let the requesting thread continue */
>         complete(&req->c);
> +       mutex_unlock(&supp->mutex);
>
>         return 0;
>  }
>
> ---
> base-commit: 3f24e4edcd1b8981c6b448ea2680726dedd87279
> change-id: 20250604-fix-use-after-free-8ff1b5d5d774
>
> Best regards,
> --
> Amirreza Zarrabi <amirreza.zarrabi@oss.qualcomm.com>
>

Re: [PATCH v3] tee: optee: prevent use-after-free when the client exits before the supplicant

Posted by Amirreza Zarrabi 4 days, 6 hours ago

Hi Jens,

On 2/2/2026 10:36 PM, Jens Wiklander wrote:
> Hi Amir,
> 
> On Thu, Jan 29, 2026 at 4:22 AM Amirreza Zarrabi
> <amirreza.zarrabi@oss.qualcomm.com> wrote:
>>
>> Commit 70b0d6b0a199 ("tee: optee: Fix supplicant wait loop") made the
>> client wait as killable so it can be interrupted during shutdown or
>> after a supplicant crash. This changes the original lifetime expectations:
>> the client task can now terminate while the supplicant is still processing
>> its request.
>>
>> If the client exits first it removes the request from its queue and
>> kfree()s it, while the request ID remains in supp->idr. A subsequent
>> lookup on the supplicant path then dereferences freed memory, leading to
>> a use-after-free.
>>
>> Serialise access to the request with supp->mutex:
>>
>>   * Hold supp->mutex in optee_supp_recv() and optee_supp_send() while
>>     looking up and touching the request.
>>   * Let optee_supp_thrd_req() notice that the client has terminated and
>>     signal optee_supp_send() accordingly.
>>
>> With these changes the request cannot be freed while the supplicant still
>> has a reference, eliminating the race.
>>
>> Fixes: 70b0d6b0a199 ("tee: optee: Fix supplicant wait loop")
>> Signed-off-by: Amirreza Zarrabi <amirreza.zarrabi@oss.qualcomm.com>
>> ---
>> Changes in v3:
>> - Introduce processed flag instead of -1 for req->id.
>> - Update optee_supp_release() as reported by Michael Wu.
>> - Use mutex instead of guard.
>> - Link to v2: https://lore.kernel.org/r/20250617-fix-use-after-free-v2-1-1fbfafec5917@oss.qualcomm.com
>>
>> Changes in v2:
>> - Replace the static variable with a sentinel value.
>> - Fix the issue with returning the popped request to the supplicant.
>> - Link to v1: https://lore.kernel.org/r/20250605-fix-use-after-free-v1-1-a70d23bff248@oss.qualcomm.com
>> ---
>>  drivers/tee/optee/supp.c | 122 +++++++++++++++++++++++++++++++++--------------
>>  1 file changed, 86 insertions(+), 36 deletions(-)
> 
> I had forgotten about this. I'd like to prioritize getting this fixed
> soon. By the way, how did you test this?
> 

Thanks for the update. I currently don't have access to the setup required to run
the tests myself. My plan is to finalize the design and implementation, then
ask Michael Wu to run his use case. Based on his earlier feedback, the patch
appears to be working as expected.

https://lore.kernel.org/all/292653ba-3836-00f1-acd4-a28b1c54388c@allwinnertech.com/

>>
>> diff --git a/drivers/tee/optee/supp.c b/drivers/tee/optee/supp.c
>> index d0f397c90242..0ec66008df19 100644
>> --- a/drivers/tee/optee/supp.c
>> +++ b/drivers/tee/optee/supp.c
>> @@ -10,7 +10,11 @@
>>  struct optee_supp_req {
>>         struct list_head link;
>>
>> +       int id;
>> +
>>         bool in_queue;
>> +       bool processed;
>> +
>>         u32 func;
>>         u32 ret;
>>         size_t num_params;
>> @@ -19,6 +23,9 @@ struct optee_supp_req {
>>         struct completion c;
>>  };
>>
>> +/* It is temporary request used for invalid pending request in supp->idr. */
>> +#define INVALID_REQ_PTR ((struct optee_supp_req *)ERR_PTR(-ENOENT))
>> +
>>  void optee_supp_init(struct optee_supp *supp)
>>  {
>>         memset(supp, 0, sizeof(*supp));
>> @@ -46,6 +53,10 @@ void optee_supp_release(struct optee_supp *supp)
>>         /* Abort all request retrieved by supplicant */
>>         idr_for_each_entry(&supp->idr, req, id) {
>>                 idr_remove(&supp->idr, id);
>> +               /* Skip if request was already marked invalid */
>> +               if (IS_ERR(req))
>> +                       continue;
>> +
>>                 req->ret = TEEC_ERROR_COMMUNICATION;
>>                 complete(&req->c);
>>         }
>> @@ -102,6 +113,7 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
>>         mutex_lock(&supp->mutex);
>>         list_add_tail(&req->link, &supp->reqs);
>>         req->in_queue = true;
>> +       req->processed = false;
>>         mutex_unlock(&supp->mutex);
>>
>>         /* Tell an eventual waiter there's a new request */
>> @@ -117,21 +129,40 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
>>         if (wait_for_completion_killable(&req->c)) {
>>                 mutex_lock(&supp->mutex);
>>                 if (req->in_queue) {
>> +                       /* Supplicant has not seen this request yet. */
>>                         list_del(&req->link);
>>                         req->in_queue = false;
>> +
>> +                       ret = TEEC_ERROR_COMMUNICATION;
>> +               } else if (req->processed) {
>> +                       /*
>> +                        * Supplicant has processed this request. Ignore the
>> +                        * kill signal for now and submit the result.
>> +                        */
>> +                       ret = req->ret;
>> +               } else {
>> +                       /*
>> +                        * Supplicant is in the middle of processing this
>> +                        * request. Replace req with INVALID_REQ_PTR so that
>> +                        * the ID remains busy, causing optee_supp_send() to
>> +                        * fail on the next call to supp_pop_req() with this ID.
>> +                        */
>> +                       idr_replace(&supp->idr, INVALID_REQ_PTR, req->id);
>> +                       ret = TEEC_ERROR_COMMUNICATION;
>>                 }
>> +
>>                 mutex_unlock(&supp->mutex);
>> -               req->ret = TEEC_ERROR_COMMUNICATION;
>> +       } else {
>> +               ret = req->ret;
>>         }
>>
>> -       ret = req->ret;
>>         kfree(req);
>>
>>         return ret;
>>  }
>>
>>  static struct optee_supp_req  *supp_pop_entry(struct optee_supp *supp,
>> -                                             int num_params, int *id)
>> +                                             int num_params)
>>  {
>>         struct optee_supp_req *req;
>>
>> @@ -153,8 +184,8 @@ static struct optee_supp_req  *supp_pop_entry(struct optee_supp *supp,
>>                 return ERR_PTR(-EINVAL);
>>         }
>>
>> -       *id = idr_alloc(&supp->idr, req, 1, 0, GFP_KERNEL);
>> -       if (*id < 0)
>> +       req->id = idr_alloc(&supp->idr, req, 1, 0, GFP_KERNEL);
>> +       if (req->id < 0)
>>                 return ERR_PTR(-ENOMEM);
> 
> Since we're now storing the supplicant request ID, wouldn't it make
> sense to pre-allocate the ID when allocating the request to avoid this
> error case?
> 

True, but allocating the ID at this stage has one advantage.
If an ID is not available, the request can remain on the request list,
allowing the supplicant to retry later when resources become available.
If ID allocation fails during request creation, I have no choice but
to drop the request and report an error to optee.

>>
>>         list_del(&req->link);
>> @@ -214,7 +245,6 @@ int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params,
>>         struct optee *optee = tee_get_drvdata(teedev);
>>         struct optee_supp *supp = &optee->supp;
>>         struct optee_supp_req *req = NULL;
>> -       int id;
>>         size_t num_meta;
>>         int rc;
>>
>> @@ -224,15 +254,48 @@ int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params,
>>
>>         while (true) {
>>                 mutex_lock(&supp->mutex);
>> -               req = supp_pop_entry(supp, *num_params - num_meta, &id);
>> -               mutex_unlock(&supp->mutex);
>>
>> -               if (req) {
>> -                       if (IS_ERR(req))
>> -                               return PTR_ERR(req);
>> -                       break;
>> +               req = supp_pop_entry(supp, *num_params - num_meta);
>> +               if (!req) {
>> +                       mutex_unlock(&supp->mutex);
>> +                       goto wait_for_request;
>> +               }
>> +
>> +               if (IS_ERR(req)) {
>> +                       rc = PTR_ERR(req);
>> +                       mutex_unlock(&supp->mutex);
>> +
>> +                       return rc;
>>                 }
>>
>> +               /*
>> +                * Process the request while holding the lock, so that
>> +                * optee_supp_thrd_req() doesn't pull the request from under us.
>> +                */
>> +
>> +               if (num_meta) {
>> +                       /*
>> +                        * tee-supplicant support meta parameters ->
>> +                        * requests can be processed asynchronously.
>> +                        */
>> +                       param->attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT |
>> +                                     TEE_IOCTL_PARAM_ATTR_META;
>> +                       param->u.value.a = req->id;
>> +                       param->u.value.b = 0;
>> +                       param->u.value.c = 0;
>> +               } else {
>> +                       supp->req_id = req->id;
>> +               }
>> +
>> +               *func = req->func;
>> +               *num_params = req->num_params + num_meta;
>> +               memcpy(param + num_meta, req->param,
>> +                      sizeof(struct tee_param) * req->num_params);
>> +
>> +               mutex_unlock(&supp->mutex);
>> +               return 0;
> 
> Do we really need to move this into the loop? The structure of the
> function becomes a bit unusual and harder to read.
> 

Ack. I'll reorganize this function.

>> +
>> +wait_for_request:
>>                 /*
>>                  * If we didn't get a request we'll block in
>>                  * wait_for_completion() to avoid needless spinning.
>> @@ -243,29 +306,10 @@ int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params,
>>                  */
>>                 if (wait_for_completion_interruptible(&supp->reqs_c))
>>                         return -ERESTARTSYS;
>> -       }
>>
>> -       if (num_meta) {
>> -               /*
>> -                * tee-supplicant support meta parameters -> requsts can be
>> -                * processed asynchronously.
>> -                */
>> -               param->attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT |
>> -                             TEE_IOCTL_PARAM_ATTR_META;
>> -               param->u.value.a = id;
>> -               param->u.value.b = 0;
>> -               param->u.value.c = 0;
>> -       } else {
>> -               mutex_lock(&supp->mutex);
>> -               supp->req_id = id;
>> -               mutex_unlock(&supp->mutex);
>> +               /* Check for the next request in the queue. */
>>         }
>>
>> -       *func = req->func;
>> -       *num_params = req->num_params + num_meta;
>> -       memcpy(param + num_meta, req->param,
>> -              sizeof(struct tee_param) * req->num_params);
>> -
>>         return 0;
>>  }
>>
>> @@ -297,12 +341,18 @@ static struct optee_supp_req *supp_pop_req(struct optee_supp *supp,
>>         if (!req)
>>                 return ERR_PTR(-ENOENT);
>>
>> +       /* optee_supp_thrd_req() already returned to optee. */
>> +       if (IS_ERR(req))
>> +               goto failed_req;
>> +
>>         if ((num_params - nm) != req->num_params)
>>                 return ERR_PTR(-EINVAL);
>>
>> +       *num_meta = nm;
>> +failed_req:
>>         idr_remove(&supp->idr, id);
>>         supp->req_id = -1;
>> -       *num_meta = nm;
>> +
>>
>>         return req;
>>  }
>> @@ -328,9 +378,8 @@ int optee_supp_send(struct tee_context *ctx, u32 ret, u32 num_params,
>>
>>         mutex_lock(&supp->mutex);
>>         req = supp_pop_req(supp, num_params, param, &num_meta);
>> -       mutex_unlock(&supp->mutex);
>> -
>>         if (IS_ERR(req)) {
>> +               mutex_unlock(&supp->mutex);
> 
> We need a way to tell the difference between an id not found and an id
> removed because of a killed requester.
> How about storing NULL for revoked requests instead of an err-pointer?
> 

Not sure I'm following correctly. Are you expecting supp_pop_req()
to return NULL instead of an err-pointer when a request has been revoked?

Best Rearads,
Amir

> Cheers,
> Jens
> 
>>                 /* Something is wrong, let supplicant restart. */
>>                 return PTR_ERR(req);
>>         }
>> @@ -355,9 +404,10 @@ int optee_supp_send(struct tee_context *ctx, u32 ret, u32 num_params,
>>                 }
>>         }
>>         req->ret = ret;
>> -
>> +       req->processed = true;
>>         /* Let the requesting thread continue */
>>         complete(&req->c);
>> +       mutex_unlock(&supp->mutex);
>>
>>         return 0;
>>  }
>>
>> ---
>> base-commit: 3f24e4edcd1b8981c6b448ea2680726dedd87279
>> change-id: 20250604-fix-use-after-free-8ff1b5d5d774
>>
>> Best regards,
>> --
>> Amirreza Zarrabi <amirreza.zarrabi@oss.qualcomm.com>
>>

Re: [PATCH v3] tee: optee: prevent use-after-free when the client exits before the supplicant

Posted by Jens Wiklander 4 days, 1 hour ago

Hi,

On Tue, Feb 3, 2026 at 3:09 AM Amirreza Zarrabi
<amirreza.zarrabi@oss.qualcomm.com> wrote:
>
> Hi Jens,
>
> On 2/2/2026 10:36 PM, Jens Wiklander wrote:
> > Hi Amir,
> >
> > On Thu, Jan 29, 2026 at 4:22 AM Amirreza Zarrabi
> > <amirreza.zarrabi@oss.qualcomm.com> wrote:
> >>
> >> Commit 70b0d6b0a199 ("tee: optee: Fix supplicant wait loop") made the
> >> client wait as killable so it can be interrupted during shutdown or
> >> after a supplicant crash. This changes the original lifetime expectations:
> >> the client task can now terminate while the supplicant is still processing
> >> its request.
> >>
> >> If the client exits first it removes the request from its queue and
> >> kfree()s it, while the request ID remains in supp->idr. A subsequent
> >> lookup on the supplicant path then dereferences freed memory, leading to
> >> a use-after-free.
> >>
> >> Serialise access to the request with supp->mutex:
> >>
> >>   * Hold supp->mutex in optee_supp_recv() and optee_supp_send() while
> >>     looking up and touching the request.
> >>   * Let optee_supp_thrd_req() notice that the client has terminated and
> >>     signal optee_supp_send() accordingly.
> >>
> >> With these changes the request cannot be freed while the supplicant still
> >> has a reference, eliminating the race.
> >>
> >> Fixes: 70b0d6b0a199 ("tee: optee: Fix supplicant wait loop")
> >> Signed-off-by: Amirreza Zarrabi <amirreza.zarrabi@oss.qualcomm.com>
> >> ---
> >> Changes in v3:
> >> - Introduce processed flag instead of -1 for req->id.
> >> - Update optee_supp_release() as reported by Michael Wu.
> >> - Use mutex instead of guard.
> >> - Link to v2: https://lore.kernel.org/r/20250617-fix-use-after-free-v2-1-1fbfafec5917@oss.qualcomm.com
> >>
> >> Changes in v2:
> >> - Replace the static variable with a sentinel value.
> >> - Fix the issue with returning the popped request to the supplicant.
> >> - Link to v1: https://lore.kernel.org/r/20250605-fix-use-after-free-v1-1-a70d23bff248@oss.qualcomm.com
> >> ---
> >>  drivers/tee/optee/supp.c | 122 +++++++++++++++++++++++++++++++++--------------
> >>  1 file changed, 86 insertions(+), 36 deletions(-)
> >
> > I had forgotten about this. I'd like to prioritize getting this fixed
> > soon. By the way, how did you test this?
> >
>
> Thanks for the update. I currently don't have access to the setup required to run
> the tests myself. My plan is to finalize the design and implementation, then
> ask Michael Wu to run his use case. Based on his earlier feedback, the patch
> appears to be working as expected.
>
> https://lore.kernel.org/all/292653ba-3836-00f1-acd4-a28b1c54388c@allwinnertech.com/

OK

>
> >>
> >> diff --git a/drivers/tee/optee/supp.c b/drivers/tee/optee/supp.c
> >> index d0f397c90242..0ec66008df19 100644
> >> --- a/drivers/tee/optee/supp.c
> >> +++ b/drivers/tee/optee/supp.c
> >> @@ -10,7 +10,11 @@
> >>  struct optee_supp_req {
> >>         struct list_head link;
> >>
> >> +       int id;
> >> +
> >>         bool in_queue;
> >> +       bool processed;
> >> +
> >>         u32 func;
> >>         u32 ret;
> >>         size_t num_params;
> >> @@ -19,6 +23,9 @@ struct optee_supp_req {
> >>         struct completion c;
> >>  };
> >>
> >> +/* It is temporary request used for invalid pending request in supp->idr. */
> >> +#define INVALID_REQ_PTR ((struct optee_supp_req *)ERR_PTR(-ENOENT))
> >> +
> >>  void optee_supp_init(struct optee_supp *supp)
> >>  {
> >>         memset(supp, 0, sizeof(*supp));
> >> @@ -46,6 +53,10 @@ void optee_supp_release(struct optee_supp *supp)
> >>         /* Abort all request retrieved by supplicant */
> >>         idr_for_each_entry(&supp->idr, req, id) {
> >>                 idr_remove(&supp->idr, id);
> >> +               /* Skip if request was already marked invalid */
> >> +               if (IS_ERR(req))
> >> +                       continue;
> >> +
> >>                 req->ret = TEEC_ERROR_COMMUNICATION;
> >>                 complete(&req->c);
> >>         }
> >> @@ -102,6 +113,7 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
> >>         mutex_lock(&supp->mutex);
> >>         list_add_tail(&req->link, &supp->reqs);
> >>         req->in_queue = true;
> >> +       req->processed = false;
> >>         mutex_unlock(&supp->mutex);
> >>
> >>         /* Tell an eventual waiter there's a new request */
> >> @@ -117,21 +129,40 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
> >>         if (wait_for_completion_killable(&req->c)) {
> >>                 mutex_lock(&supp->mutex);
> >>                 if (req->in_queue) {
> >> +                       /* Supplicant has not seen this request yet. */
> >>                         list_del(&req->link);
> >>                         req->in_queue = false;
> >> +
> >> +                       ret = TEEC_ERROR_COMMUNICATION;
> >> +               } else if (req->processed) {
> >> +                       /*
> >> +                        * Supplicant has processed this request. Ignore the
> >> +                        * kill signal for now and submit the result.
> >> +                        */
> >> +                       ret = req->ret;
> >> +               } else {
> >> +                       /*
> >> +                        * Supplicant is in the middle of processing this
> >> +                        * request. Replace req with INVALID_REQ_PTR so that
> >> +                        * the ID remains busy, causing optee_supp_send() to
> >> +                        * fail on the next call to supp_pop_req() with this ID.
> >> +                        */
> >> +                       idr_replace(&supp->idr, INVALID_REQ_PTR, req->id);
> >> +                       ret = TEEC_ERROR_COMMUNICATION;
> >>                 }
> >> +
> >>                 mutex_unlock(&supp->mutex);
> >> -               req->ret = TEEC_ERROR_COMMUNICATION;
> >> +       } else {
> >> +               ret = req->ret;
> >>         }
> >>
> >> -       ret = req->ret;
> >>         kfree(req);
> >>
> >>         return ret;
> >>  }
> >>
> >>  static struct optee_supp_req  *supp_pop_entry(struct optee_supp *supp,
> >> -                                             int num_params, int *id)
> >> +                                             int num_params)
> >>  {
> >>         struct optee_supp_req *req;
> >>
> >> @@ -153,8 +184,8 @@ static struct optee_supp_req  *supp_pop_entry(struct optee_supp *supp,
> >>                 return ERR_PTR(-EINVAL);
> >>         }
> >>
> >> -       *id = idr_alloc(&supp->idr, req, 1, 0, GFP_KERNEL);
> >> -       if (*id < 0)
> >> +       req->id = idr_alloc(&supp->idr, req, 1, 0, GFP_KERNEL);
> >> +       if (req->id < 0)
> >>                 return ERR_PTR(-ENOMEM);
> >
> > Since we're now storing the supplicant request ID, wouldn't it make
> > sense to pre-allocate the ID when allocating the request to avoid this
> > error case?
> >
>
> True, but allocating the ID at this stage has one advantage.
> If an ID is not available, the request can remain on the request list,
> allowing the supplicant to retry later when resources become available.
> If ID allocation fails during request creation, I have no choice but
> to drop the request and report an error to optee.

We're allocating in the range 1..INT_MAX, and not more than a handful
are expected to be active at a time. If we run out of IDs, we have
bigger problems.

>
> >>
> >>         list_del(&req->link);
> >> @@ -214,7 +245,6 @@ int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params,
> >>         struct optee *optee = tee_get_drvdata(teedev);
> >>         struct optee_supp *supp = &optee->supp;
> >>         struct optee_supp_req *req = NULL;
> >> -       int id;
> >>         size_t num_meta;
> >>         int rc;
> >>
> >> @@ -224,15 +254,48 @@ int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params,
> >>
> >>         while (true) {
> >>                 mutex_lock(&supp->mutex);
> >> -               req = supp_pop_entry(supp, *num_params - num_meta, &id);
> >> -               mutex_unlock(&supp->mutex);
> >>
> >> -               if (req) {
> >> -                       if (IS_ERR(req))
> >> -                               return PTR_ERR(req);
> >> -                       break;
> >> +               req = supp_pop_entry(supp, *num_params - num_meta);
> >> +               if (!req) {
> >> +                       mutex_unlock(&supp->mutex);
> >> +                       goto wait_for_request;
> >> +               }
> >> +
> >> +               if (IS_ERR(req)) {
> >> +                       rc = PTR_ERR(req);
> >> +                       mutex_unlock(&supp->mutex);
> >> +
> >> +                       return rc;
> >>                 }
> >>
> >> +               /*
> >> +                * Process the request while holding the lock, so that
> >> +                * optee_supp_thrd_req() doesn't pull the request from under us.
> >> +                */
> >> +
> >> +               if (num_meta) {
> >> +                       /*
> >> +                        * tee-supplicant support meta parameters ->
> >> +                        * requests can be processed asynchronously.
> >> +                        */
> >> +                       param->attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT |
> >> +                                     TEE_IOCTL_PARAM_ATTR_META;
> >> +                       param->u.value.a = req->id;
> >> +                       param->u.value.b = 0;
> >> +                       param->u.value.c = 0;
> >> +               } else {
> >> +                       supp->req_id = req->id;
> >> +               }
> >> +
> >> +               *func = req->func;
> >> +               *num_params = req->num_params + num_meta;
> >> +               memcpy(param + num_meta, req->param,
> >> +                      sizeof(struct tee_param) * req->num_params);
> >> +
> >> +               mutex_unlock(&supp->mutex);
> >> +               return 0;
> >
> > Do we really need to move this into the loop? The structure of the
> > function becomes a bit unusual and harder to read.
> >
>
> Ack. I'll reorganize this function.
>
> >> +
> >> +wait_for_request:
> >>                 /*
> >>                  * If we didn't get a request we'll block in
> >>                  * wait_for_completion() to avoid needless spinning.
> >> @@ -243,29 +306,10 @@ int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params,
> >>                  */
> >>                 if (wait_for_completion_interruptible(&supp->reqs_c))
> >>                         return -ERESTARTSYS;
> >> -       }
> >>
> >> -       if (num_meta) {
> >> -               /*
> >> -                * tee-supplicant support meta parameters -> requsts can be
> >> -                * processed asynchronously.
> >> -                */
> >> -               param->attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT |
> >> -                             TEE_IOCTL_PARAM_ATTR_META;
> >> -               param->u.value.a = id;
> >> -               param->u.value.b = 0;
> >> -               param->u.value.c = 0;
> >> -       } else {
> >> -               mutex_lock(&supp->mutex);
> >> -               supp->req_id = id;
> >> -               mutex_unlock(&supp->mutex);
> >> +               /* Check for the next request in the queue. */
> >>         }
> >>
> >> -       *func = req->func;
> >> -       *num_params = req->num_params + num_meta;
> >> -       memcpy(param + num_meta, req->param,
> >> -              sizeof(struct tee_param) * req->num_params);
> >> -
> >>         return 0;
> >>  }
> >>
> >> @@ -297,12 +341,18 @@ static struct optee_supp_req *supp_pop_req(struct optee_supp *supp,
> >>         if (!req)
> >>                 return ERR_PTR(-ENOENT);
> >>
> >> +       /* optee_supp_thrd_req() already returned to optee. */
> >> +       if (IS_ERR(req))
> >> +               goto failed_req;
> >> +
> >>         if ((num_params - nm) != req->num_params)
> >>                 return ERR_PTR(-EINVAL);
> >>
> >> +       *num_meta = nm;
> >> +failed_req:
> >>         idr_remove(&supp->idr, id);
> >>         supp->req_id = -1;
> >> -       *num_meta = nm;
> >> +
> >>
> >>         return req;
> >>  }
> >> @@ -328,9 +378,8 @@ int optee_supp_send(struct tee_context *ctx, u32 ret, u32 num_params,
> >>
> >>         mutex_lock(&supp->mutex);
> >>         req = supp_pop_req(supp, num_params, param, &num_meta);
> >> -       mutex_unlock(&supp->mutex);
> >> -
> >>         if (IS_ERR(req)) {
> >> +               mutex_unlock(&supp->mutex);
> >
> > We need a way to tell the difference between an id not found and an id
> > removed because of a killed requester.
> > How about storing NULL for revoked requests instead of an err-pointer?
> >
>
> Not sure I'm following correctly. Are you expecting supp_pop_req()
> to return NULL instead of an err-pointer when a request has been revoked?

I was looking at it again, and storing an err-pointer as you do in
this patch has the advantage that we can tell whether the ID has been
revoked or was never supplied. In the latter case, it suggests that
the supplicant is doing something wrong and might as well restart in
an attempt to recover. So, please keep using an err-pointer as a
placeholder, but we must be able to distinguish a revoked request from
other errors to make sure that the supplicant doesn't restart due to a
revoked request.

Cheers,
Jens

>
> Best Rearads,
> Amir
>
> > Cheers,
> > Jens
> >
> >>                 /* Something is wrong, let supplicant restart. */
> >>                 return PTR_ERR(req);
> >>         }
> >> @@ -355,9 +404,10 @@ int optee_supp_send(struct tee_context *ctx, u32 ret, u32 num_params,
> >>                 }
> >>         }
> >>         req->ret = ret;
> >> -
> >> +       req->processed = true;
> >>         /* Let the requesting thread continue */
> >>         complete(&req->c);
> >> +       mutex_unlock(&supp->mutex);
> >>
> >>         return 0;
> >>  }
> >>
> >> ---
> >> base-commit: 3f24e4edcd1b8981c6b448ea2680726dedd87279
> >> change-id: 20250604-fix-use-after-free-8ff1b5d5d774
> >>
> >> Best regards,
> >> --
> >> Amirreza Zarrabi <amirreza.zarrabi@oss.qualcomm.com>
> >>
>

Re: [PATCH v3] tee: optee: prevent use-after-free when the client exits before the supplicant

Posted by Amirreza Zarrabi 3 days, 9 hours ago

Hi Jens,

On 2/3/2026 5:59 PM, Jens Wiklander wrote:
> Hi,
> 
> On Tue, Feb 3, 2026 at 3:09 AM Amirreza Zarrabi
> <amirreza.zarrabi@oss.qualcomm.com> wrote:
>>
>> Hi Jens,
>>
>> On 2/2/2026 10:36 PM, Jens Wiklander wrote:
>>> Hi Amir,
>>>
>>> On Thu, Jan 29, 2026 at 4:22 AM Amirreza Zarrabi
>>> <amirreza.zarrabi@oss.qualcomm.com> wrote:
>>>>
>>>> Commit 70b0d6b0a199 ("tee: optee: Fix supplicant wait loop") made the
>>>> client wait as killable so it can be interrupted during shutdown or
>>>> after a supplicant crash. This changes the original lifetime expectations:
>>>> the client task can now terminate while the supplicant is still processing
>>>> its request.
>>>>
>>>> If the client exits first it removes the request from its queue and
>>>> kfree()s it, while the request ID remains in supp->idr. A subsequent
>>>> lookup on the supplicant path then dereferences freed memory, leading to
>>>> a use-after-free.
>>>>
>>>> Serialise access to the request with supp->mutex:
>>>>
>>>>   * Hold supp->mutex in optee_supp_recv() and optee_supp_send() while
>>>>     looking up and touching the request.
>>>>   * Let optee_supp_thrd_req() notice that the client has terminated and
>>>>     signal optee_supp_send() accordingly.
>>>>
>>>> With these changes the request cannot be freed while the supplicant still
>>>> has a reference, eliminating the race.
>>>>
>>>> Fixes: 70b0d6b0a199 ("tee: optee: Fix supplicant wait loop")
>>>> Signed-off-by: Amirreza Zarrabi <amirreza.zarrabi@oss.qualcomm.com>
>>>> ---
>>>> Changes in v3:
>>>> - Introduce processed flag instead of -1 for req->id.
>>>> - Update optee_supp_release() as reported by Michael Wu.
>>>> - Use mutex instead of guard.
>>>> - Link to v2: https://lore.kernel.org/r/20250617-fix-use-after-free-v2-1-1fbfafec5917@oss.qualcomm.com
>>>>
>>>> Changes in v2:
>>>> - Replace the static variable with a sentinel value.
>>>> - Fix the issue with returning the popped request to the supplicant.
>>>> - Link to v1: https://lore.kernel.org/r/20250605-fix-use-after-free-v1-1-a70d23bff248@oss.qualcomm.com
>>>> ---
>>>>  drivers/tee/optee/supp.c | 122 +++++++++++++++++++++++++++++++++--------------
>>>>  1 file changed, 86 insertions(+), 36 deletions(-)
>>>
>>> I had forgotten about this. I'd like to prioritize getting this fixed
>>> soon. By the way, how did you test this?
>>>
>>
>> Thanks for the update. I currently don't have access to the setup required to run
>> the tests myself. My plan is to finalize the design and implementation, then
>> ask Michael Wu to run his use case. Based on his earlier feedback, the patch
>> appears to be working as expected.
>>
>> https://lore.kernel.org/all/292653ba-3836-00f1-acd4-a28b1c54388c@allwinnertech.com/
> 
> OK
> 
>>
>>>>
>>>> diff --git a/drivers/tee/optee/supp.c b/drivers/tee/optee/supp.c
>>>> index d0f397c90242..0ec66008df19 100644
>>>> --- a/drivers/tee/optee/supp.c
>>>> +++ b/drivers/tee/optee/supp.c
>>>> @@ -10,7 +10,11 @@
>>>>  struct optee_supp_req {
>>>>         struct list_head link;
>>>>
>>>> +       int id;
>>>> +
>>>>         bool in_queue;
>>>> +       bool processed;
>>>> +
>>>>         u32 func;
>>>>         u32 ret;
>>>>         size_t num_params;
>>>> @@ -19,6 +23,9 @@ struct optee_supp_req {
>>>>         struct completion c;
>>>>  };
>>>>
>>>> +/* It is temporary request used for invalid pending request in supp->idr. */
>>>> +#define INVALID_REQ_PTR ((struct optee_supp_req *)ERR_PTR(-ENOENT))
>>>> +
>>>>  void optee_supp_init(struct optee_supp *supp)
>>>>  {
>>>>         memset(supp, 0, sizeof(*supp));
>>>> @@ -46,6 +53,10 @@ void optee_supp_release(struct optee_supp *supp)
>>>>         /* Abort all request retrieved by supplicant */
>>>>         idr_for_each_entry(&supp->idr, req, id) {
>>>>                 idr_remove(&supp->idr, id);
>>>> +               /* Skip if request was already marked invalid */
>>>> +               if (IS_ERR(req))
>>>> +                       continue;
>>>> +
>>>>                 req->ret = TEEC_ERROR_COMMUNICATION;
>>>>                 complete(&req->c);
>>>>         }
>>>> @@ -102,6 +113,7 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
>>>>         mutex_lock(&supp->mutex);
>>>>         list_add_tail(&req->link, &supp->reqs);
>>>>         req->in_queue = true;
>>>> +       req->processed = false;
>>>>         mutex_unlock(&supp->mutex);
>>>>
>>>>         /* Tell an eventual waiter there's a new request */
>>>> @@ -117,21 +129,40 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
>>>>         if (wait_for_completion_killable(&req->c)) {
>>>>                 mutex_lock(&supp->mutex);
>>>>                 if (req->in_queue) {
>>>> +                       /* Supplicant has not seen this request yet. */
>>>>                         list_del(&req->link);
>>>>                         req->in_queue = false;
>>>> +
>>>> +                       ret = TEEC_ERROR_COMMUNICATION;
>>>> +               } else if (req->processed) {
>>>> +                       /*
>>>> +                        * Supplicant has processed this request. Ignore the
>>>> +                        * kill signal for now and submit the result.
>>>> +                        */
>>>> +                       ret = req->ret;
>>>> +               } else {
>>>> +                       /*
>>>> +                        * Supplicant is in the middle of processing this
>>>> +                        * request. Replace req with INVALID_REQ_PTR so that
>>>> +                        * the ID remains busy, causing optee_supp_send() to
>>>> +                        * fail on the next call to supp_pop_req() with this ID.
>>>> +                        */
>>>> +                       idr_replace(&supp->idr, INVALID_REQ_PTR, req->id);
>>>> +                       ret = TEEC_ERROR_COMMUNICATION;
>>>>                 }
>>>> +
>>>>                 mutex_unlock(&supp->mutex);
>>>> -               req->ret = TEEC_ERROR_COMMUNICATION;
>>>> +       } else {
>>>> +               ret = req->ret;
>>>>         }
>>>>
>>>> -       ret = req->ret;
>>>>         kfree(req);
>>>>
>>>>         return ret;
>>>>  }
>>>>
>>>>  static struct optee_supp_req  *supp_pop_entry(struct optee_supp *supp,
>>>> -                                             int num_params, int *id)
>>>> +                                             int num_params)
>>>>  {
>>>>         struct optee_supp_req *req;
>>>>
>>>> @@ -153,8 +184,8 @@ static struct optee_supp_req  *supp_pop_entry(struct optee_supp *supp,
>>>>                 return ERR_PTR(-EINVAL);
>>>>         }
>>>>
>>>> -       *id = idr_alloc(&supp->idr, req, 1, 0, GFP_KERNEL);
>>>> -       if (*id < 0)
>>>> +       req->id = idr_alloc(&supp->idr, req, 1, 0, GFP_KERNEL);
>>>> +       if (req->id < 0)
>>>>                 return ERR_PTR(-ENOMEM);
>>>
>>> Since we're now storing the supplicant request ID, wouldn't it make
>>> sense to pre-allocate the ID when allocating the request to avoid this
>>> error case?
>>>
>>
>> True, but allocating the ID at this stage has one advantage.
>> If an ID is not available, the request can remain on the request list,
>> allowing the supplicant to retry later when resources become available.
>> If ID allocation fails during request creation, I have no choice but
>> to drop the request and report an error to optee.
> 
> We're allocating in the range 1..INT_MAX, and not more than a handful
> are expected to be active at a time. If we run out of IDs, we have
> bigger problems.
> 

Ack.

>>
>>>>
>>>>         list_del(&req->link);
>>>> @@ -214,7 +245,6 @@ int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params,
>>>>         struct optee *optee = tee_get_drvdata(teedev);
>>>>         struct optee_supp *supp = &optee->supp;
>>>>         struct optee_supp_req *req = NULL;
>>>> -       int id;
>>>>         size_t num_meta;
>>>>         int rc;
>>>>
>>>> @@ -224,15 +254,48 @@ int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params,
>>>>
>>>>         while (true) {
>>>>                 mutex_lock(&supp->mutex);
>>>> -               req = supp_pop_entry(supp, *num_params - num_meta, &id);
>>>> -               mutex_unlock(&supp->mutex);
>>>>
>>>> -               if (req) {
>>>> -                       if (IS_ERR(req))
>>>> -                               return PTR_ERR(req);
>>>> -                       break;
>>>> +               req = supp_pop_entry(supp, *num_params - num_meta);
>>>> +               if (!req) {
>>>> +                       mutex_unlock(&supp->mutex);
>>>> +                       goto wait_for_request;
>>>> +               }
>>>> +
>>>> +               if (IS_ERR(req)) {
>>>> +                       rc = PTR_ERR(req);
>>>> +                       mutex_unlock(&supp->mutex);
>>>> +
>>>> +                       return rc;
>>>>                 }
>>>>
>>>> +               /*
>>>> +                * Process the request while holding the lock, so that
>>>> +                * optee_supp_thrd_req() doesn't pull the request from under us.
>>>> +                */
>>>> +
>>>> +               if (num_meta) {
>>>> +                       /*
>>>> +                        * tee-supplicant support meta parameters ->
>>>> +                        * requests can be processed asynchronously.
>>>> +                        */
>>>> +                       param->attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT |
>>>> +                                     TEE_IOCTL_PARAM_ATTR_META;
>>>> +                       param->u.value.a = req->id;
>>>> +                       param->u.value.b = 0;
>>>> +                       param->u.value.c = 0;
>>>> +               } else {
>>>> +                       supp->req_id = req->id;
>>>> +               }
>>>> +
>>>> +               *func = req->func;
>>>> +               *num_params = req->num_params + num_meta;
>>>> +               memcpy(param + num_meta, req->param,
>>>> +                      sizeof(struct tee_param) * req->num_params);
>>>> +
>>>> +               mutex_unlock(&supp->mutex);
>>>> +               return 0;
>>>
>>> Do we really need to move this into the loop? The structure of the
>>> function becomes a bit unusual and harder to read.
>>>
>>
>> Ack. I'll reorganize this function.
>>
>>>> +
>>>> +wait_for_request:
>>>>                 /*
>>>>                  * If we didn't get a request we'll block in
>>>>                  * wait_for_completion() to avoid needless spinning.
>>>> @@ -243,29 +306,10 @@ int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params,
>>>>                  */
>>>>                 if (wait_for_completion_interruptible(&supp->reqs_c))
>>>>                         return -ERESTARTSYS;
>>>> -       }
>>>>
>>>> -       if (num_meta) {
>>>> -               /*
>>>> -                * tee-supplicant support meta parameters -> requsts can be
>>>> -                * processed asynchronously.
>>>> -                */
>>>> -               param->attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT |
>>>> -                             TEE_IOCTL_PARAM_ATTR_META;
>>>> -               param->u.value.a = id;
>>>> -               param->u.value.b = 0;
>>>> -               param->u.value.c = 0;
>>>> -       } else {
>>>> -               mutex_lock(&supp->mutex);
>>>> -               supp->req_id = id;
>>>> -               mutex_unlock(&supp->mutex);
>>>> +               /* Check for the next request in the queue. */
>>>>         }
>>>>
>>>> -       *func = req->func;
>>>> -       *num_params = req->num_params + num_meta;
>>>> -       memcpy(param + num_meta, req->param,
>>>> -              sizeof(struct tee_param) * req->num_params);
>>>> -
>>>>         return 0;
>>>>  }
>>>>
>>>> @@ -297,12 +341,18 @@ static struct optee_supp_req *supp_pop_req(struct optee_supp *supp,
>>>>         if (!req)
>>>>                 return ERR_PTR(-ENOENT);
>>>>
>>>> +       /* optee_supp_thrd_req() already returned to optee. */
>>>> +       if (IS_ERR(req))
>>>> +               goto failed_req;
>>>> +
>>>>         if ((num_params - nm) != req->num_params)
>>>>                 return ERR_PTR(-EINVAL);
>>>>
>>>> +       *num_meta = nm;
>>>> +failed_req:
>>>>         idr_remove(&supp->idr, id);
>>>>         supp->req_id = -1;
>>>> -       *num_meta = nm;
>>>> +
>>>>
>>>>         return req;
>>>>  }
>>>> @@ -328,9 +378,8 @@ int optee_supp_send(struct tee_context *ctx, u32 ret, u32 num_params,
>>>>
>>>>         mutex_lock(&supp->mutex);
>>>>         req = supp_pop_req(supp, num_params, param, &num_meta);
>>>> -       mutex_unlock(&supp->mutex);
>>>> -
>>>>         if (IS_ERR(req)) {
>>>> +               mutex_unlock(&supp->mutex);
>>>
>>> We need a way to tell the difference between an id not found and an id
>>> removed because of a killed requester.
>>> How about storing NULL for revoked requests instead of an err-pointer?
>>>
>>
>> Not sure I'm following correctly. Are you expecting supp_pop_req()
>> to return NULL instead of an err-pointer when a request has been revoked?
> 
> I was looking at it again, and storing an err-pointer as you do in
> this patch has the advantage that we can tell whether the ID has been
> revoked or was never supplied. In the latter case, it suggests that
> the supplicant is doing something wrong and might as well restart in
> an attempt to recover. So, please keep using an err-pointer as a
> placeholder, but we must be able to distinguish a revoked request from
> other errors to make sure that the supplicant doesn't restart due to a
> revoked request.
> 

Understood. What if I switch the stored err-pointer to EBADF instead of ENOENT
(which seems more natural), so it doesn't overlap with other supp_pop_req() error
codes and the supplicant can reliably detect it.

Best Regards,
Amir

> Cheers,
> Jens
> 
>>
>> Best Rearads,
>> Amir
>>
>>> Cheers,
>>> Jens
>>>
>>>>                 /* Something is wrong, let supplicant restart. */
>>>>                 return PTR_ERR(req);
>>>>         }
>>>> @@ -355,9 +404,10 @@ int optee_supp_send(struct tee_context *ctx, u32 ret, u32 num_params,
>>>>                 }
>>>>         }
>>>>         req->ret = ret;
>>>> -
>>>> +       req->processed = true;
>>>>         /* Let the requesting thread continue */
>>>>         complete(&req->c);
>>>> +       mutex_unlock(&supp->mutex);
>>>>
>>>>         return 0;
>>>>  }
>>>>
>>>> ---
>>>> base-commit: 3f24e4edcd1b8981c6b448ea2680726dedd87279
>>>> change-id: 20250604-fix-use-after-free-8ff1b5d5d774
>>>>
>>>> Best regards,
>>>> --
>>>> Amirreza Zarrabi <amirreza.zarrabi@oss.qualcomm.com>
>>>>
>>

Re: [PATCH v3] tee: optee: prevent use-after-free when the client exits before the supplicant

Posted by Jens Wiklander 3 days, 1 hour ago

Hi Amir,

On Tue, Feb 3, 2026 at 11:56 PM Amirreza Zarrabi
<amirreza.zarrabi@oss.qualcomm.com> wrote:
>
> Hi Jens,
>
> On 2/3/2026 5:59 PM, Jens Wiklander wrote:
> > Hi,
> >
> > On Tue, Feb 3, 2026 at 3:09 AM Amirreza Zarrabi
> > <amirreza.zarrabi@oss.qualcomm.com> wrote:
> >>
> >> Hi Jens,
> >>
> >> On 2/2/2026 10:36 PM, Jens Wiklander wrote:
> >>> Hi Amir,
> >>>
> >>> On Thu, Jan 29, 2026 at 4:22 AM Amirreza Zarrabi
> >>> <amirreza.zarrabi@oss.qualcomm.com> wrote:
> >>>>
> >>>> Commit 70b0d6b0a199 ("tee: optee: Fix supplicant wait loop") made the
> >>>> client wait as killable so it can be interrupted during shutdown or
> >>>> after a supplicant crash. This changes the original lifetime expectations:
> >>>> the client task can now terminate while the supplicant is still processing
> >>>> its request.
> >>>>
> >>>> If the client exits first it removes the request from its queue and
> >>>> kfree()s it, while the request ID remains in supp->idr. A subsequent
> >>>> lookup on the supplicant path then dereferences freed memory, leading to
> >>>> a use-after-free.
> >>>>
> >>>> Serialise access to the request with supp->mutex:
> >>>>
> >>>>   * Hold supp->mutex in optee_supp_recv() and optee_supp_send() while
> >>>>     looking up and touching the request.
> >>>>   * Let optee_supp_thrd_req() notice that the client has terminated and
> >>>>     signal optee_supp_send() accordingly.
> >>>>
> >>>> With these changes the request cannot be freed while the supplicant still
> >>>> has a reference, eliminating the race.
> >>>>
> >>>> Fixes: 70b0d6b0a199 ("tee: optee: Fix supplicant wait loop")
> >>>> Signed-off-by: Amirreza Zarrabi <amirreza.zarrabi@oss.qualcomm.com>
> >>>> ---
> >>>> Changes in v3:
> >>>> - Introduce processed flag instead of -1 for req->id.
> >>>> - Update optee_supp_release() as reported by Michael Wu.
> >>>> - Use mutex instead of guard.
> >>>> - Link to v2: https://lore.kernel.org/r/20250617-fix-use-after-free-v2-1-1fbfafec5917@oss.qualcomm.com
> >>>>
> >>>> Changes in v2:
> >>>> - Replace the static variable with a sentinel value.
> >>>> - Fix the issue with returning the popped request to the supplicant.
> >>>> - Link to v1: https://lore.kernel.org/r/20250605-fix-use-after-free-v1-1-a70d23bff248@oss.qualcomm.com
> >>>> ---
> >>>>  drivers/tee/optee/supp.c | 122 +++++++++++++++++++++++++++++++++--------------
> >>>>  1 file changed, 86 insertions(+), 36 deletions(-)
> >>>
> >>> I had forgotten about this. I'd like to prioritize getting this fixed
> >>> soon. By the way, how did you test this?
> >>>
> >>
> >> Thanks for the update. I currently don't have access to the setup required to run
> >> the tests myself. My plan is to finalize the design and implementation, then
> >> ask Michael Wu to run his use case. Based on his earlier feedback, the patch
> >> appears to be working as expected.
> >>
> >> https://lore.kernel.org/all/292653ba-3836-00f1-acd4-a28b1c54388c@allwinnertech.com/
> >
> > OK
> >
> >>
> >>>>
> >>>> diff --git a/drivers/tee/optee/supp.c b/drivers/tee/optee/supp.c
> >>>> index d0f397c90242..0ec66008df19 100644
> >>>> --- a/drivers/tee/optee/supp.c
> >>>> +++ b/drivers/tee/optee/supp.c
> >>>> @@ -10,7 +10,11 @@
> >>>>  struct optee_supp_req {
> >>>>         struct list_head link;
> >>>>
> >>>> +       int id;
> >>>> +
> >>>>         bool in_queue;
> >>>> +       bool processed;
> >>>> +
> >>>>         u32 func;
> >>>>         u32 ret;
> >>>>         size_t num_params;
> >>>> @@ -19,6 +23,9 @@ struct optee_supp_req {
> >>>>         struct completion c;
> >>>>  };
> >>>>
> >>>> +/* It is temporary request used for invalid pending request in supp->idr. */
> >>>> +#define INVALID_REQ_PTR ((struct optee_supp_req *)ERR_PTR(-ENOENT))
> >>>> +
> >>>>  void optee_supp_init(struct optee_supp *supp)
> >>>>  {
> >>>>         memset(supp, 0, sizeof(*supp));
> >>>> @@ -46,6 +53,10 @@ void optee_supp_release(struct optee_supp *supp)
> >>>>         /* Abort all request retrieved by supplicant */
> >>>>         idr_for_each_entry(&supp->idr, req, id) {
> >>>>                 idr_remove(&supp->idr, id);
> >>>> +               /* Skip if request was already marked invalid */
> >>>> +               if (IS_ERR(req))
> >>>> +                       continue;
> >>>> +
> >>>>                 req->ret = TEEC_ERROR_COMMUNICATION;
> >>>>                 complete(&req->c);
> >>>>         }
> >>>> @@ -102,6 +113,7 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
> >>>>         mutex_lock(&supp->mutex);
> >>>>         list_add_tail(&req->link, &supp->reqs);
> >>>>         req->in_queue = true;
> >>>> +       req->processed = false;
> >>>>         mutex_unlock(&supp->mutex);
> >>>>
> >>>>         /* Tell an eventual waiter there's a new request */
> >>>> @@ -117,21 +129,40 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
> >>>>         if (wait_for_completion_killable(&req->c)) {
> >>>>                 mutex_lock(&supp->mutex);
> >>>>                 if (req->in_queue) {
> >>>> +                       /* Supplicant has not seen this request yet. */
> >>>>                         list_del(&req->link);
> >>>>                         req->in_queue = false;
> >>>> +
> >>>> +                       ret = TEEC_ERROR_COMMUNICATION;
> >>>> +               } else if (req->processed) {
> >>>> +                       /*
> >>>> +                        * Supplicant has processed this request. Ignore the
> >>>> +                        * kill signal for now and submit the result.
> >>>> +                        */
> >>>> +                       ret = req->ret;
> >>>> +               } else {
> >>>> +                       /*
> >>>> +                        * Supplicant is in the middle of processing this
> >>>> +                        * request. Replace req with INVALID_REQ_PTR so that
> >>>> +                        * the ID remains busy, causing optee_supp_send() to
> >>>> +                        * fail on the next call to supp_pop_req() with this ID.
> >>>> +                        */
> >>>> +                       idr_replace(&supp->idr, INVALID_REQ_PTR, req->id);
> >>>> +                       ret = TEEC_ERROR_COMMUNICATION;
> >>>>                 }
> >>>> +
> >>>>                 mutex_unlock(&supp->mutex);
> >>>> -               req->ret = TEEC_ERROR_COMMUNICATION;
> >>>> +       } else {
> >>>> +               ret = req->ret;
> >>>>         }
> >>>>
> >>>> -       ret = req->ret;
> >>>>         kfree(req);
> >>>>
> >>>>         return ret;
> >>>>  }
> >>>>
> >>>>  static struct optee_supp_req  *supp_pop_entry(struct optee_supp *supp,
> >>>> -                                             int num_params, int *id)
> >>>> +                                             int num_params)
> >>>>  {
> >>>>         struct optee_supp_req *req;
> >>>>
> >>>> @@ -153,8 +184,8 @@ static struct optee_supp_req  *supp_pop_entry(struct optee_supp *supp,
> >>>>                 return ERR_PTR(-EINVAL);
> >>>>         }
> >>>>
> >>>> -       *id = idr_alloc(&supp->idr, req, 1, 0, GFP_KERNEL);
> >>>> -       if (*id < 0)
> >>>> +       req->id = idr_alloc(&supp->idr, req, 1, 0, GFP_KERNEL);
> >>>> +       if (req->id < 0)
> >>>>                 return ERR_PTR(-ENOMEM);
> >>>
> >>> Since we're now storing the supplicant request ID, wouldn't it make
> >>> sense to pre-allocate the ID when allocating the request to avoid this
> >>> error case?
> >>>
> >>
> >> True, but allocating the ID at this stage has one advantage.
> >> If an ID is not available, the request can remain on the request list,
> >> allowing the supplicant to retry later when resources become available.
> >> If ID allocation fails during request creation, I have no choice but
> >> to drop the request and report an error to optee.
> >
> > We're allocating in the range 1..INT_MAX, and not more than a handful
> > are expected to be active at a time. If we run out of IDs, we have
> > bigger problems.
> >
>
> Ack.
>
> >>
> >>>>
> >>>>         list_del(&req->link);
> >>>> @@ -214,7 +245,6 @@ int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params,
> >>>>         struct optee *optee = tee_get_drvdata(teedev);
> >>>>         struct optee_supp *supp = &optee->supp;
> >>>>         struct optee_supp_req *req = NULL;
> >>>> -       int id;
> >>>>         size_t num_meta;
> >>>>         int rc;
> >>>>
> >>>> @@ -224,15 +254,48 @@ int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params,
> >>>>
> >>>>         while (true) {
> >>>>                 mutex_lock(&supp->mutex);
> >>>> -               req = supp_pop_entry(supp, *num_params - num_meta, &id);
> >>>> -               mutex_unlock(&supp->mutex);
> >>>>
> >>>> -               if (req) {
> >>>> -                       if (IS_ERR(req))
> >>>> -                               return PTR_ERR(req);
> >>>> -                       break;
> >>>> +               req = supp_pop_entry(supp, *num_params - num_meta);
> >>>> +               if (!req) {
> >>>> +                       mutex_unlock(&supp->mutex);
> >>>> +                       goto wait_for_request;
> >>>> +               }
> >>>> +
> >>>> +               if (IS_ERR(req)) {
> >>>> +                       rc = PTR_ERR(req);
> >>>> +                       mutex_unlock(&supp->mutex);
> >>>> +
> >>>> +                       return rc;
> >>>>                 }
> >>>>
> >>>> +               /*
> >>>> +                * Process the request while holding the lock, so that
> >>>> +                * optee_supp_thrd_req() doesn't pull the request from under us.
> >>>> +                */
> >>>> +
> >>>> +               if (num_meta) {
> >>>> +                       /*
> >>>> +                        * tee-supplicant support meta parameters ->
> >>>> +                        * requests can be processed asynchronously.
> >>>> +                        */
> >>>> +                       param->attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT |
> >>>> +                                     TEE_IOCTL_PARAM_ATTR_META;
> >>>> +                       param->u.value.a = req->id;
> >>>> +                       param->u.value.b = 0;
> >>>> +                       param->u.value.c = 0;
> >>>> +               } else {
> >>>> +                       supp->req_id = req->id;
> >>>> +               }
> >>>> +
> >>>> +               *func = req->func;
> >>>> +               *num_params = req->num_params + num_meta;
> >>>> +               memcpy(param + num_meta, req->param,
> >>>> +                      sizeof(struct tee_param) * req->num_params);
> >>>> +
> >>>> +               mutex_unlock(&supp->mutex);
> >>>> +               return 0;
> >>>
> >>> Do we really need to move this into the loop? The structure of the
> >>> function becomes a bit unusual and harder to read.
> >>>
> >>
> >> Ack. I'll reorganize this function.
> >>
> >>>> +
> >>>> +wait_for_request:
> >>>>                 /*
> >>>>                  * If we didn't get a request we'll block in
> >>>>                  * wait_for_completion() to avoid needless spinning.
> >>>> @@ -243,29 +306,10 @@ int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params,
> >>>>                  */
> >>>>                 if (wait_for_completion_interruptible(&supp->reqs_c))
> >>>>                         return -ERESTARTSYS;
> >>>> -       }
> >>>>
> >>>> -       if (num_meta) {
> >>>> -               /*
> >>>> -                * tee-supplicant support meta parameters -> requsts can be
> >>>> -                * processed asynchronously.
> >>>> -                */
> >>>> -               param->attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT |
> >>>> -                             TEE_IOCTL_PARAM_ATTR_META;
> >>>> -               param->u.value.a = id;
> >>>> -               param->u.value.b = 0;
> >>>> -               param->u.value.c = 0;
> >>>> -       } else {
> >>>> -               mutex_lock(&supp->mutex);
> >>>> -               supp->req_id = id;
> >>>> -               mutex_unlock(&supp->mutex);
> >>>> +               /* Check for the next request in the queue. */
> >>>>         }
> >>>>
> >>>> -       *func = req->func;
> >>>> -       *num_params = req->num_params + num_meta;
> >>>> -       memcpy(param + num_meta, req->param,
> >>>> -              sizeof(struct tee_param) * req->num_params);
> >>>> -
> >>>>         return 0;
> >>>>  }
> >>>>
> >>>> @@ -297,12 +341,18 @@ static struct optee_supp_req *supp_pop_req(struct optee_supp *supp,
> >>>>         if (!req)
> >>>>                 return ERR_PTR(-ENOENT);
> >>>>
> >>>> +       /* optee_supp_thrd_req() already returned to optee. */
> >>>> +       if (IS_ERR(req))
> >>>> +               goto failed_req;
> >>>> +
> >>>>         if ((num_params - nm) != req->num_params)
> >>>>                 return ERR_PTR(-EINVAL);
> >>>>
> >>>> +       *num_meta = nm;
> >>>> +failed_req:
> >>>>         idr_remove(&supp->idr, id);
> >>>>         supp->req_id = -1;
> >>>> -       *num_meta = nm;
> >>>> +
> >>>>
> >>>>         return req;
> >>>>  }
> >>>> @@ -328,9 +378,8 @@ int optee_supp_send(struct tee_context *ctx, u32 ret, u32 num_params,
> >>>>
> >>>>         mutex_lock(&supp->mutex);
> >>>>         req = supp_pop_req(supp, num_params, param, &num_meta);
> >>>> -       mutex_unlock(&supp->mutex);
> >>>> -
> >>>>         if (IS_ERR(req)) {
> >>>> +               mutex_unlock(&supp->mutex);
> >>>
> >>> We need a way to tell the difference between an id not found and an id
> >>> removed because of a killed requester.
> >>> How about storing NULL for revoked requests instead of an err-pointer?
> >>>
> >>
> >> Not sure I'm following correctly. Are you expecting supp_pop_req()
> >> to return NULL instead of an err-pointer when a request has been revoked?
> >
> > I was looking at it again, and storing an err-pointer as you do in
> > this patch has the advantage that we can tell whether the ID has been
> > revoked or was never supplied. In the latter case, it suggests that
> > the supplicant is doing something wrong and might as well restart in
> > an attempt to recover. So, please keep using an err-pointer as a
> > placeholder, but we must be able to distinguish a revoked request from
> > other errors to make sure that the supplicant doesn't restart due to a
> > revoked request.
> >
>
> Understood. What if I switch the stored err-pointer to EBADF instead of ENOENT
> (which seems more natural), so it doesn't overlap with other supp_pop_req() error
> codes and the supplicant can reliably detect it.

Any error returned by TEE_IOC_SUPPL_SEND (or TEE_IOC_SUPPL_RECV) will
cause the tee-supplicant to exit. Even if we update it to ignore
certain codes, we must also consider the installed base. There's not
much tee-supplicant could do with this error, except logging it. But I
don't think that is very useful either. Unless the tee-supplicant does
anything wrong or if the device isn't working any longer, we shouldn't
return an error.

Cheers,
Jens

>
> Best Regards,
> Amir
>
> > Cheers,
> > Jens
> >
> >>
> >> Best Rearads,
> >> Amir
> >>
> >>> Cheers,
> >>> Jens
> >>>
> >>>>                 /* Something is wrong, let supplicant restart. */
> >>>>                 return PTR_ERR(req);
> >>>>         }
> >>>> @@ -355,9 +404,10 @@ int optee_supp_send(struct tee_context *ctx, u32 ret, u32 num_params,
> >>>>                 }
> >>>>         }
> >>>>         req->ret = ret;
> >>>> -
> >>>> +       req->processed = true;
> >>>>         /* Let the requesting thread continue */
> >>>>         complete(&req->c);
> >>>> +       mutex_unlock(&supp->mutex);
> >>>>
> >>>>         return 0;
> >>>>  }
> >>>>
> >>>> ---
> >>>> base-commit: 3f24e4edcd1b8981c6b448ea2680726dedd87279
> >>>> change-id: 20250604-fix-use-after-free-8ff1b5d5d774
> >>>>
> >>>> Best regards,
> >>>> --
> >>>> Amirreza Zarrabi <amirreza.zarrabi@oss.qualcomm.com>
> >>>>
> >>
>

Re: [PATCH v3] tee: optee: prevent use-after-free when the client exits before the supplicant

Posted by Amirreza Zarrabi 2 days, 6 hours ago

Hi Jens,

On 2/4/2026 6:46 PM, Jens Wiklander wrote:
> Hi Amir,
> 
> On Tue, Feb 3, 2026 at 11:56 PM Amirreza Zarrabi
> <amirreza.zarrabi@oss.qualcomm.com> wrote:
>>
>> Hi Jens,
>>
>> On 2/3/2026 5:59 PM, Jens Wiklander wrote:
>>> Hi,
>>>
>>> On Tue, Feb 3, 2026 at 3:09 AM Amirreza Zarrabi
>>> <amirreza.zarrabi@oss.qualcomm.com> wrote:
>>>>
>>>> Hi Jens,
>>>>
>>>> On 2/2/2026 10:36 PM, Jens Wiklander wrote:
>>>>> Hi Amir,
>>>>>
>>>>> On Thu, Jan 29, 2026 at 4:22 AM Amirreza Zarrabi
>>>>> <amirreza.zarrabi@oss.qualcomm.com> wrote:
>>>>>>
>>>>>> Commit 70b0d6b0a199 ("tee: optee: Fix supplicant wait loop") made the
>>>>>> client wait as killable so it can be interrupted during shutdown or
>>>>>> after a supplicant crash. This changes the original lifetime expectations:
>>>>>> the client task can now terminate while the supplicant is still processing
>>>>>> its request.
>>>>>>
>>>>>> If the client exits first it removes the request from its queue and
>>>>>> kfree()s it, while the request ID remains in supp->idr. A subsequent
>>>>>> lookup on the supplicant path then dereferences freed memory, leading to
>>>>>> a use-after-free.
>>>>>>
>>>>>> Serialise access to the request with supp->mutex:
>>>>>>
>>>>>>   * Hold supp->mutex in optee_supp_recv() and optee_supp_send() while
>>>>>>     looking up and touching the request.
>>>>>>   * Let optee_supp_thrd_req() notice that the client has terminated and
>>>>>>     signal optee_supp_send() accordingly.
>>>>>>
>>>>>> With these changes the request cannot be freed while the supplicant still
>>>>>> has a reference, eliminating the race.
>>>>>>
>>>>>> Fixes: 70b0d6b0a199 ("tee: optee: Fix supplicant wait loop")
>>>>>> Signed-off-by: Amirreza Zarrabi <amirreza.zarrabi@oss.qualcomm.com>
>>>>>> ---
>>>>>> Changes in v3:
>>>>>> - Introduce processed flag instead of -1 for req->id.
>>>>>> - Update optee_supp_release() as reported by Michael Wu.
>>>>>> - Use mutex instead of guard.
>>>>>> - Link to v2: https://lore.kernel.org/r/20250617-fix-use-after-free-v2-1-1fbfafec5917@oss.qualcomm.com
>>>>>>
>>>>>> Changes in v2:
>>>>>> - Replace the static variable with a sentinel value.
>>>>>> - Fix the issue with returning the popped request to the supplicant.
>>>>>> - Link to v1: https://lore.kernel.org/r/20250605-fix-use-after-free-v1-1-a70d23bff248@oss.qualcomm.com
>>>>>> ---
>>>>>>  drivers/tee/optee/supp.c | 122 +++++++++++++++++++++++++++++++++--------------
>>>>>>  1 file changed, 86 insertions(+), 36 deletions(-)
>>>>>
>>>>> I had forgotten about this. I'd like to prioritize getting this fixed
>>>>> soon. By the way, how did you test this?
>>>>>
>>>>
>>>> Thanks for the update. I currently don't have access to the setup required to run
>>>> the tests myself. My plan is to finalize the design and implementation, then
>>>> ask Michael Wu to run his use case. Based on his earlier feedback, the patch
>>>> appears to be working as expected.
>>>>
>>>> https://lore.kernel.org/all/292653ba-3836-00f1-acd4-a28b1c54388c@allwinnertech.com/
>>>
>>> OK
>>>
>>>>
>>>>>>
>>>>>> diff --git a/drivers/tee/optee/supp.c b/drivers/tee/optee/supp.c
>>>>>> index d0f397c90242..0ec66008df19 100644
>>>>>> --- a/drivers/tee/optee/supp.c
>>>>>> +++ b/drivers/tee/optee/supp.c
>>>>>> @@ -10,7 +10,11 @@
>>>>>>  struct optee_supp_req {
>>>>>>         struct list_head link;
>>>>>>
>>>>>> +       int id;
>>>>>> +
>>>>>>         bool in_queue;
>>>>>> +       bool processed;
>>>>>> +
>>>>>>         u32 func;
>>>>>>         u32 ret;
>>>>>>         size_t num_params;
>>>>>> @@ -19,6 +23,9 @@ struct optee_supp_req {
>>>>>>         struct completion c;
>>>>>>  };
>>>>>>
>>>>>> +/* It is temporary request used for invalid pending request in supp->idr. */
>>>>>> +#define INVALID_REQ_PTR ((struct optee_supp_req *)ERR_PTR(-ENOENT))
>>>>>> +
>>>>>>  void optee_supp_init(struct optee_supp *supp)
>>>>>>  {
>>>>>>         memset(supp, 0, sizeof(*supp));
>>>>>> @@ -46,6 +53,10 @@ void optee_supp_release(struct optee_supp *supp)
>>>>>>         /* Abort all request retrieved by supplicant */
>>>>>>         idr_for_each_entry(&supp->idr, req, id) {
>>>>>>                 idr_remove(&supp->idr, id);
>>>>>> +               /* Skip if request was already marked invalid */
>>>>>> +               if (IS_ERR(req))
>>>>>> +                       continue;
>>>>>> +
>>>>>>                 req->ret = TEEC_ERROR_COMMUNICATION;
>>>>>>                 complete(&req->c);
>>>>>>         }
>>>>>> @@ -102,6 +113,7 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
>>>>>>         mutex_lock(&supp->mutex);
>>>>>>         list_add_tail(&req->link, &supp->reqs);
>>>>>>         req->in_queue = true;
>>>>>> +       req->processed = false;
>>>>>>         mutex_unlock(&supp->mutex);
>>>>>>
>>>>>>         /* Tell an eventual waiter there's a new request */
>>>>>> @@ -117,21 +129,40 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
>>>>>>         if (wait_for_completion_killable(&req->c)) {
>>>>>>                 mutex_lock(&supp->mutex);
>>>>>>                 if (req->in_queue) {
>>>>>> +                       /* Supplicant has not seen this request yet. */
>>>>>>                         list_del(&req->link);
>>>>>>                         req->in_queue = false;
>>>>>> +
>>>>>> +                       ret = TEEC_ERROR_COMMUNICATION;
>>>>>> +               } else if (req->processed) {
>>>>>> +                       /*
>>>>>> +                        * Supplicant has processed this request. Ignore the
>>>>>> +                        * kill signal for now and submit the result.
>>>>>> +                        */
>>>>>> +                       ret = req->ret;
>>>>>> +               } else {
>>>>>> +                       /*
>>>>>> +                        * Supplicant is in the middle of processing this
>>>>>> +                        * request. Replace req with INVALID_REQ_PTR so that
>>>>>> +                        * the ID remains busy, causing optee_supp_send() to
>>>>>> +                        * fail on the next call to supp_pop_req() with this ID.
>>>>>> +                        */
>>>>>> +                       idr_replace(&supp->idr, INVALID_REQ_PTR, req->id);
>>>>>> +                       ret = TEEC_ERROR_COMMUNICATION;
>>>>>>                 }
>>>>>> +
>>>>>>                 mutex_unlock(&supp->mutex);
>>>>>> -               req->ret = TEEC_ERROR_COMMUNICATION;
>>>>>> +       } else {
>>>>>> +               ret = req->ret;
>>>>>>         }
>>>>>>
>>>>>> -       ret = req->ret;
>>>>>>         kfree(req);
>>>>>>
>>>>>>         return ret;
>>>>>>  }
>>>>>>
>>>>>>  static struct optee_supp_req  *supp_pop_entry(struct optee_supp *supp,
>>>>>> -                                             int num_params, int *id)
>>>>>> +                                             int num_params)
>>>>>>  {
>>>>>>         struct optee_supp_req *req;
>>>>>>
>>>>>> @@ -153,8 +184,8 @@ static struct optee_supp_req  *supp_pop_entry(struct optee_supp *supp,
>>>>>>                 return ERR_PTR(-EINVAL);
>>>>>>         }
>>>>>>
>>>>>> -       *id = idr_alloc(&supp->idr, req, 1, 0, GFP_KERNEL);
>>>>>> -       if (*id < 0)
>>>>>> +       req->id = idr_alloc(&supp->idr, req, 1, 0, GFP_KERNEL);
>>>>>> +       if (req->id < 0)
>>>>>>                 return ERR_PTR(-ENOMEM);
>>>>>
>>>>> Since we're now storing the supplicant request ID, wouldn't it make
>>>>> sense to pre-allocate the ID when allocating the request to avoid this
>>>>> error case?
>>>>>
>>>>
>>>> True, but allocating the ID at this stage has one advantage.
>>>> If an ID is not available, the request can remain on the request list,
>>>> allowing the supplicant to retry later when resources become available.
>>>> If ID allocation fails during request creation, I have no choice but
>>>> to drop the request and report an error to optee.
>>>
>>> We're allocating in the range 1..INT_MAX, and not more than a handful
>>> are expected to be active at a time. If we run out of IDs, we have
>>> bigger problems.
>>>
>>
>> Ack.
>>
>>>>
>>>>>>
>>>>>>         list_del(&req->link);
>>>>>> @@ -214,7 +245,6 @@ int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params,
>>>>>>         struct optee *optee = tee_get_drvdata(teedev);
>>>>>>         struct optee_supp *supp = &optee->supp;
>>>>>>         struct optee_supp_req *req = NULL;
>>>>>> -       int id;
>>>>>>         size_t num_meta;
>>>>>>         int rc;
>>>>>>
>>>>>> @@ -224,15 +254,48 @@ int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params,
>>>>>>
>>>>>>         while (true) {
>>>>>>                 mutex_lock(&supp->mutex);
>>>>>> -               req = supp_pop_entry(supp, *num_params - num_meta, &id);
>>>>>> -               mutex_unlock(&supp->mutex);
>>>>>>
>>>>>> -               if (req) {
>>>>>> -                       if (IS_ERR(req))
>>>>>> -                               return PTR_ERR(req);
>>>>>> -                       break;
>>>>>> +               req = supp_pop_entry(supp, *num_params - num_meta);
>>>>>> +               if (!req) {
>>>>>> +                       mutex_unlock(&supp->mutex);
>>>>>> +                       goto wait_for_request;
>>>>>> +               }
>>>>>> +
>>>>>> +               if (IS_ERR(req)) {
>>>>>> +                       rc = PTR_ERR(req);
>>>>>> +                       mutex_unlock(&supp->mutex);
>>>>>> +
>>>>>> +                       return rc;
>>>>>>                 }
>>>>>>
>>>>>> +               /*
>>>>>> +                * Process the request while holding the lock, so that
>>>>>> +                * optee_supp_thrd_req() doesn't pull the request from under us.
>>>>>> +                */
>>>>>> +
>>>>>> +               if (num_meta) {
>>>>>> +                       /*
>>>>>> +                        * tee-supplicant support meta parameters ->
>>>>>> +                        * requests can be processed asynchronously.
>>>>>> +                        */
>>>>>> +                       param->attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT |
>>>>>> +                                     TEE_IOCTL_PARAM_ATTR_META;
>>>>>> +                       param->u.value.a = req->id;
>>>>>> +                       param->u.value.b = 0;
>>>>>> +                       param->u.value.c = 0;
>>>>>> +               } else {
>>>>>> +                       supp->req_id = req->id;
>>>>>> +               }
>>>>>> +
>>>>>> +               *func = req->func;
>>>>>> +               *num_params = req->num_params + num_meta;
>>>>>> +               memcpy(param + num_meta, req->param,
>>>>>> +                      sizeof(struct tee_param) * req->num_params);
>>>>>> +
>>>>>> +               mutex_unlock(&supp->mutex);
>>>>>> +               return 0;
>>>>>
>>>>> Do we really need to move this into the loop? The structure of the
>>>>> function becomes a bit unusual and harder to read.
>>>>>
>>>>
>>>> Ack. I'll reorganize this function.
>>>>
>>>>>> +
>>>>>> +wait_for_request:
>>>>>>                 /*
>>>>>>                  * If we didn't get a request we'll block in
>>>>>>                  * wait_for_completion() to avoid needless spinning.
>>>>>> @@ -243,29 +306,10 @@ int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params,
>>>>>>                  */
>>>>>>                 if (wait_for_completion_interruptible(&supp->reqs_c))
>>>>>>                         return -ERESTARTSYS;
>>>>>> -       }
>>>>>>
>>>>>> -       if (num_meta) {
>>>>>> -               /*
>>>>>> -                * tee-supplicant support meta parameters -> requsts can be
>>>>>> -                * processed asynchronously.
>>>>>> -                */
>>>>>> -               param->attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT |
>>>>>> -                             TEE_IOCTL_PARAM_ATTR_META;
>>>>>> -               param->u.value.a = id;
>>>>>> -               param->u.value.b = 0;
>>>>>> -               param->u.value.c = 0;
>>>>>> -       } else {
>>>>>> -               mutex_lock(&supp->mutex);
>>>>>> -               supp->req_id = id;
>>>>>> -               mutex_unlock(&supp->mutex);
>>>>>> +               /* Check for the next request in the queue. */
>>>>>>         }
>>>>>>
>>>>>> -       *func = req->func;
>>>>>> -       *num_params = req->num_params + num_meta;
>>>>>> -       memcpy(param + num_meta, req->param,
>>>>>> -              sizeof(struct tee_param) * req->num_params);
>>>>>> -
>>>>>>         return 0;
>>>>>>  }
>>>>>>
>>>>>> @@ -297,12 +341,18 @@ static struct optee_supp_req *supp_pop_req(struct optee_supp *supp,
>>>>>>         if (!req)
>>>>>>                 return ERR_PTR(-ENOENT);
>>>>>>
>>>>>> +       /* optee_supp_thrd_req() already returned to optee. */
>>>>>> +       if (IS_ERR(req))
>>>>>> +               goto failed_req;
>>>>>> +
>>>>>>         if ((num_params - nm) != req->num_params)
>>>>>>                 return ERR_PTR(-EINVAL);
>>>>>>
>>>>>> +       *num_meta = nm;
>>>>>> +failed_req:
>>>>>>         idr_remove(&supp->idr, id);
>>>>>>         supp->req_id = -1;
>>>>>> -       *num_meta = nm;
>>>>>> +
>>>>>>
>>>>>>         return req;
>>>>>>  }
>>>>>> @@ -328,9 +378,8 @@ int optee_supp_send(struct tee_context *ctx, u32 ret, u32 num_params,
>>>>>>
>>>>>>         mutex_lock(&supp->mutex);
>>>>>>         req = supp_pop_req(supp, num_params, param, &num_meta);
>>>>>> -       mutex_unlock(&supp->mutex);
>>>>>> -
>>>>>>         if (IS_ERR(req)) {
>>>>>> +               mutex_unlock(&supp->mutex);
>>>>>
>>>>> We need a way to tell the difference between an id not found and an id
>>>>> removed because of a killed requester.
>>>>> How about storing NULL for revoked requests instead of an err-pointer?
>>>>>
>>>>
>>>> Not sure I'm following correctly. Are you expecting supp_pop_req()
>>>> to return NULL instead of an err-pointer when a request has been revoked?
>>>
>>> I was looking at it again, and storing an err-pointer as you do in
>>> this patch has the advantage that we can tell whether the ID has been
>>> revoked or was never supplied. In the latter case, it suggests that
>>> the supplicant is doing something wrong and might as well restart in
>>> an attempt to recover. So, please keep using an err-pointer as a
>>> placeholder, but we must be able to distinguish a revoked request from
>>> other errors to make sure that the supplicant doesn't restart due to a
>>> revoked request.
>>>
>>
>> Understood. What if I switch the stored err-pointer to EBADF instead of ENOENT
>> (which seems more natural), so it doesn't overlap with other supp_pop_req() error
>> codes and the supplicant can reliably detect it.
> 
> Any error returned by TEE_IOC_SUPPL_SEND (or TEE_IOC_SUPPL_RECV) will
> cause the tee-supplicant to exit. Even if we update it to ignore
> certain codes, we must also consider the installed base. There's not
> much tee-supplicant could do with this error, except logging it. But I
> don't think that is very useful either. Unless the tee-supplicant does
> anything wrong or if the device isn't working any longer, we shouldn't
> return an error.
> 

The direction of data flow in optee_supp_send is from the supplicant to optee,
so the only way I can return meaningful information back to the supplicant is
through the return value. I suppose I could simply ignore the revoked request
and return success, but it might be useful for the supplicant to know about it
in case it needs to roll back something.

At this point I'm out of ideas :). Do you have any suggestions on how I can
inform the supplicant about a revoked request in optee_supp_send while returning
success return value?

Best regards,
Amir

> Cheers,
> Jens
> 
>>
>> Best Regards,
>> Amir
>>
>>> Cheers,
>>> Jens
>>>
>>>>
>>>> Best Rearads,
>>>> Amir
>>>>
>>>>> Cheers,
>>>>> Jens
>>>>>
>>>>>>                 /* Something is wrong, let supplicant restart. */
>>>>>>                 return PTR_ERR(req);
>>>>>>         }
>>>>>> @@ -355,9 +404,10 @@ int optee_supp_send(struct tee_context *ctx, u32 ret, u32 num_params,
>>>>>>                 }
>>>>>>         }
>>>>>>         req->ret = ret;
>>>>>> -
>>>>>> +       req->processed = true;
>>>>>>         /* Let the requesting thread continue */
>>>>>>         complete(&req->c);
>>>>>> +       mutex_unlock(&supp->mutex);
>>>>>>
>>>>>>         return 0;
>>>>>>  }
>>>>>>
>>>>>> ---
>>>>>> base-commit: 3f24e4edcd1b8981c6b448ea2680726dedd87279
>>>>>> change-id: 20250604-fix-use-after-free-8ff1b5d5d774
>>>>>>
>>>>>> Best regards,
>>>>>> --
>>>>>> Amirreza Zarrabi <amirreza.zarrabi@oss.qualcomm.com>
>>>>>>
>>>>
>>

Re: [PATCH v3] tee: optee: prevent use-after-free when the client exits before the supplicant

Posted by Jens Wiklander 1 day ago

Hi Amir,

On Thu, Feb 5, 2026 at 3:13 AM Amirreza Zarrabi
<amirreza.zarrabi@oss.qualcomm.com> wrote:
>
> Hi Jens,
>
> On 2/4/2026 6:46 PM, Jens Wiklander wrote:
> > Hi Amir,
> >
> > On Tue, Feb 3, 2026 at 11:56 PM Amirreza Zarrabi
> > <amirreza.zarrabi@oss.qualcomm.com> wrote:
> >>
> >> Hi Jens,
> >>
> >> On 2/3/2026 5:59 PM, Jens Wiklander wrote:
> >>> Hi,
> >>>
> >>> On Tue, Feb 3, 2026 at 3:09 AM Amirreza Zarrabi
> >>> <amirreza.zarrabi@oss.qualcomm.com> wrote:
> >>>>
> >>>> Hi Jens,
> >>>>
> >>>> On 2/2/2026 10:36 PM, Jens Wiklander wrote:
> >>>>> Hi Amir,
> >>>>>
> >>>>> On Thu, Jan 29, 2026 at 4:22 AM Amirreza Zarrabi
> >>>>> <amirreza.zarrabi@oss.qualcomm.com> wrote:
> >>>>>>
> >>>>>> Commit 70b0d6b0a199 ("tee: optee: Fix supplicant wait loop") made the
> >>>>>> client wait as killable so it can be interrupted during shutdown or
> >>>>>> after a supplicant crash. This changes the original lifetime expectations:
> >>>>>> the client task can now terminate while the supplicant is still processing
> >>>>>> its request.
> >>>>>>
> >>>>>> If the client exits first it removes the request from its queue and
> >>>>>> kfree()s it, while the request ID remains in supp->idr. A subsequent
> >>>>>> lookup on the supplicant path then dereferences freed memory, leading to
> >>>>>> a use-after-free.
> >>>>>>
> >>>>>> Serialise access to the request with supp->mutex:
> >>>>>>
> >>>>>>   * Hold supp->mutex in optee_supp_recv() and optee_supp_send() while
> >>>>>>     looking up and touching the request.
> >>>>>>   * Let optee_supp_thrd_req() notice that the client has terminated and
> >>>>>>     signal optee_supp_send() accordingly.
> >>>>>>
> >>>>>> With these changes the request cannot be freed while the supplicant still
> >>>>>> has a reference, eliminating the race.
> >>>>>>
> >>>>>> Fixes: 70b0d6b0a199 ("tee: optee: Fix supplicant wait loop")
> >>>>>> Signed-off-by: Amirreza Zarrabi <amirreza.zarrabi@oss.qualcomm.com>
> >>>>>> ---
> >>>>>> Changes in v3:
> >>>>>> - Introduce processed flag instead of -1 for req->id.
> >>>>>> - Update optee_supp_release() as reported by Michael Wu.
> >>>>>> - Use mutex instead of guard.
> >>>>>> - Link to v2: https://lore.kernel.org/r/20250617-fix-use-after-free-v2-1-1fbfafec5917@oss.qualcomm.com
> >>>>>>
> >>>>>> Changes in v2:
> >>>>>> - Replace the static variable with a sentinel value.
> >>>>>> - Fix the issue with returning the popped request to the supplicant.
> >>>>>> - Link to v1: https://lore.kernel.org/r/20250605-fix-use-after-free-v1-1-a70d23bff248@oss.qualcomm.com
> >>>>>> ---
> >>>>>>  drivers/tee/optee/supp.c | 122 +++++++++++++++++++++++++++++++++--------------
> >>>>>>  1 file changed, 86 insertions(+), 36 deletions(-)
> >>>>>
[snip]


> >>>>>> @@ -328,9 +378,8 @@ int optee_supp_send(struct tee_context *ctx, u32 ret, u32 num_params,
> >>>>>>
> >>>>>>         mutex_lock(&supp->mutex);
> >>>>>>         req = supp_pop_req(supp, num_params, param, &num_meta);
> >>>>>> -       mutex_unlock(&supp->mutex);
> >>>>>> -
> >>>>>>         if (IS_ERR(req)) {
> >>>>>> +               mutex_unlock(&supp->mutex);
> >>>>>
> >>>>> We need a way to tell the difference between an id not found and an id
> >>>>> removed because of a killed requester.
> >>>>> How about storing NULL for revoked requests instead of an err-pointer?
> >>>>>
> >>>>
> >>>> Not sure I'm following correctly. Are you expecting supp_pop_req()
> >>>> to return NULL instead of an err-pointer when a request has been revoked?
> >>>
> >>> I was looking at it again, and storing an err-pointer as you do in
> >>> this patch has the advantage that we can tell whether the ID has been
> >>> revoked or was never supplied. In the latter case, it suggests that
> >>> the supplicant is doing something wrong and might as well restart in
> >>> an attempt to recover. So, please keep using an err-pointer as a
> >>> placeholder, but we must be able to distinguish a revoked request from
> >>> other errors to make sure that the supplicant doesn't restart due to a
> >>> revoked request.
> >>>
> >>
> >> Understood. What if I switch the stored err-pointer to EBADF instead of ENOENT
> >> (which seems more natural), so it doesn't overlap with other supp_pop_req() error
> >> codes and the supplicant can reliably detect it.
> >
> > Any error returned by TEE_IOC_SUPPL_SEND (or TEE_IOC_SUPPL_RECV) will
> > cause the tee-supplicant to exit. Even if we update it to ignore
> > certain codes, we must also consider the installed base. There's not
> > much tee-supplicant could do with this error, except logging it. But I
> > don't think that is very useful either. Unless the tee-supplicant does
> > anything wrong or if the device isn't working any longer, we shouldn't
> > return an error.
> >
>
> The direction of data flow in optee_supp_send is from the supplicant to optee,
> so the only way I can return meaningful information back to the supplicant is
> through the return value. I suppose I could simply ignore the revoked request
> and return success, but it might be useful for the supplicant to know about it
> in case it needs to roll back something.
>
> At this point I'm out of ideas :). Do you have any suggestions on how I can
> inform the supplicant about a revoked request in optee_supp_send while returning
> success return value?

This became a bit harder than I first thought. At this point, to fix
the possible use-after-free, we have two options:
1. Returning an error code: tee-supplicant will exit
2. Returning OK: worst case, tee-supplicant can leak memory

During reboot, neither case is a problem. During normal operation,
it's annoying if tee-supplicant exists, but you still need some
privileges to kill the client. If we return an error, it's enough to
update tee-supplicant to handle that error case, and we're done. The
advantage is no added code to the kernel.

I think we should do what you suggested and return an error. This will
not happen during normal operation. We'll fix tee-supplicant to handle
the return error properly. tee-supplicant doesn't care about what
error code it gets. If there's an error in TEE_IOC_SUPPL_SEND, it
knows that no one will receive whatever was sent, and cleanup is
needed.

Sumit and Jerome, what do you think?

Cheers,
Jens

Re: [PATCH v3] tee: optee: prevent use-after-free when the client exits before the supplicant

Posted by Jérôme Forissier 22 hours ago

Hi,


On Fri, Feb 6, 2026 at 9:54 AM Jens Wiklander <jens.wiklander@linaro.org> wrote:
>
> Hi Amir,
>
> On Thu, Feb 5, 2026 at 3:13 AM Amirreza Zarrabi
> <amirreza.zarrabi@oss.qualcomm.com> wrote:
> >
> > Hi Jens,
> >
> > On 2/4/2026 6:46 PM, Jens Wiklander wrote:
> > > Hi Amir,
> > >
> > > On Tue, Feb 3, 2026 at 11:56 PM Amirreza Zarrabi
> > > <amirreza.zarrabi@oss.qualcomm.com> wrote:
> > >>
> > >> Hi Jens,
> > >>
> > >> On 2/3/2026 5:59 PM, Jens Wiklander wrote:
> > >>> Hi,
> > >>>
> > >>> On Tue, Feb 3, 2026 at 3:09 AM Amirreza Zarrabi
> > >>> <amirreza.zarrabi@oss.qualcomm.com> wrote:
> > >>>>
> > >>>> Hi Jens,
> > >>>>
> > >>>> On 2/2/2026 10:36 PM, Jens Wiklander wrote:
> > >>>>> Hi Amir,
> > >>>>>
> > >>>>> On Thu, Jan 29, 2026 at 4:22 AM Amirreza Zarrabi
> > >>>>> <amirreza.zarrabi@oss.qualcomm.com> wrote:
> > >>>>>>
> > >>>>>> Commit 70b0d6b0a199 ("tee: optee: Fix supplicant wait loop") made the
> > >>>>>> client wait as killable so it can be interrupted during shutdown or
> > >>>>>> after a supplicant crash. This changes the original lifetime expectations:
> > >>>>>> the client task can now terminate while the supplicant is still processing
> > >>>>>> its request.
> > >>>>>>
> > >>>>>> If the client exits first it removes the request from its queue and
> > >>>>>> kfree()s it, while the request ID remains in supp->idr. A subsequent
> > >>>>>> lookup on the supplicant path then dereferences freed memory, leading to
> > >>>>>> a use-after-free.
> > >>>>>>
> > >>>>>> Serialise access to the request with supp->mutex:
> > >>>>>>
> > >>>>>>   * Hold supp->mutex in optee_supp_recv() and optee_supp_send() while
> > >>>>>>     looking up and touching the request.
> > >>>>>>   * Let optee_supp_thrd_req() notice that the client has terminated and
> > >>>>>>     signal optee_supp_send() accordingly.
> > >>>>>>
> > >>>>>> With these changes the request cannot be freed while the supplicant still
> > >>>>>> has a reference, eliminating the race.
> > >>>>>>
> > >>>>>> Fixes: 70b0d6b0a199 ("tee: optee: Fix supplicant wait loop")
> > >>>>>> Signed-off-by: Amirreza Zarrabi <amirreza.zarrabi@oss.qualcomm.com>
> > >>>>>> ---
> > >>>>>> Changes in v3:
> > >>>>>> - Introduce processed flag instead of -1 for req->id.
> > >>>>>> - Update optee_supp_release() as reported by Michael Wu.
> > >>>>>> - Use mutex instead of guard.
> > >>>>>> - Link to v2: https://lore.kernel.org/r/20250617-fix-use-after-free-v2-1-1fbfafec5917@oss.qualcomm.com
> > >>>>>>
> > >>>>>> Changes in v2:
> > >>>>>> - Replace the static variable with a sentinel value.
> > >>>>>> - Fix the issue with returning the popped request to the supplicant.
> > >>>>>> - Link to v1: https://lore.kernel.org/r/20250605-fix-use-after-free-v1-1-a70d23bff248@oss.qualcomm.com
> > >>>>>> ---
> > >>>>>>  drivers/tee/optee/supp.c | 122 +++++++++++++++++++++++++++++++++--------------
> > >>>>>>  1 file changed, 86 insertions(+), 36 deletions(-)
> > >>>>>
> [snip]
>
>
> > >>>>>> @@ -328,9 +378,8 @@ int optee_supp_send(struct tee_context *ctx, u32 ret, u32 num_params,
> > >>>>>>
> > >>>>>>         mutex_lock(&supp->mutex);
> > >>>>>>         req = supp_pop_req(supp, num_params, param, &num_meta);
> > >>>>>> -       mutex_unlock(&supp->mutex);
> > >>>>>> -
> > >>>>>>         if (IS_ERR(req)) {
> > >>>>>> +               mutex_unlock(&supp->mutex);
> > >>>>>
> > >>>>> We need a way to tell the difference between an id not found and an id
> > >>>>> removed because of a killed requester.
> > >>>>> How about storing NULL for revoked requests instead of an err-pointer?
> > >>>>>
> > >>>>
> > >>>> Not sure I'm following correctly. Are you expecting supp_pop_req()
> > >>>> to return NULL instead of an err-pointer when a request has been revoked?
> > >>>
> > >>> I was looking at it again, and storing an err-pointer as you do in
> > >>> this patch has the advantage that we can tell whether the ID has been
> > >>> revoked or was never supplied. In the latter case, it suggests that
> > >>> the supplicant is doing something wrong and might as well restart in
> > >>> an attempt to recover. So, please keep using an err-pointer as a
> > >>> placeholder, but we must be able to distinguish a revoked request from
> > >>> other errors to make sure that the supplicant doesn't restart due to a
> > >>> revoked request.
> > >>>
> > >>
> > >> Understood. What if I switch the stored err-pointer to EBADF instead of ENOENT
> > >> (which seems more natural), so it doesn't overlap with other supp_pop_req() error
> > >> codes and the supplicant can reliably detect it.
> > >
> > > Any error returned by TEE_IOC_SUPPL_SEND (or TEE_IOC_SUPPL_RECV) will
> > > cause the tee-supplicant to exit. Even if we update it to ignore
> > > certain codes, we must also consider the installed base. There's not
> > > much tee-supplicant could do with this error, except logging it. But I
> > > don't think that is very useful either. Unless the tee-supplicant does
> > > anything wrong or if the device isn't working any longer, we shouldn't
> > > return an error.
> > >
> >
> > The direction of data flow in optee_supp_send is from the supplicant to optee,
> > so the only way I can return meaningful information back to the supplicant is
> > through the return value. I suppose I could simply ignore the revoked request
> > and return success, but it might be useful for the supplicant to know about it
> > in case it needs to roll back something.
> >
> > At this point I'm out of ideas :). Do you have any suggestions on how I can
> > inform the supplicant about a revoked request in optee_supp_send while returning
> > success return value?
>
> This became a bit harder than I first thought. At this point, to fix
> the possible use-after-free, we have two options:
> 1. Returning an error code: tee-supplicant will exit
> 2. Returning OK: worst case, tee-supplicant can leak memory
>
> During reboot, neither case is a problem. During normal operation,
> it's annoying if tee-supplicant exists, but you still need some
> privileges to kill the client. If we return an error, it's enough to
> update tee-supplicant to handle that error case, and we're done. The
> advantage is no added code to the kernel.
>
> I think we should do what you suggested and return an error. This will
> not happen during normal operation. We'll fix tee-supplicant to handle
> the return error properly. tee-supplicant doesn't care about what
> error code it gets. If there's an error in TEE_IOC_SUPPL_SEND, it
> knows that no one will receive whatever was sent, and cleanup is
> needed.
>
> Sumit and Jerome, what do you think?

Sounds good.

Thanks,
-- 
Jerome

>
> Cheers,
> Jens