[v1] xen/arm: ffa: reclaim shared memory on guest destroy

[RFC XEN PATCH] xen/arm: ffa: reclaim shared memory on guest destroy

Posted by Jens Wiklander 4 months, 3 weeks ago

When an FF-A enabled guest is destroyed it may leave behind memory
shared with SPs. This memory must be reclaimed before it's reused or an
SP may make changes to memory used by a new unrelated guest. So when the
domain is teared down add FF-A requests to reclaim all remaining shared
memory.

SPs in the secure world are notified using VM_DESTROYED that a guest has
been destroyed. An SP is supposed to relinquish all shared memory to allow
reclaiming the memory. The relinquish operation may need to be delayed if
the shared memory is for instance part of a DMA operation.

If the FF-A memory reclaim request fails, return -ERESTART to retry
again. This will effectively block the destruction of the guest until
all memory has been reclaimed.

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
---
Hi,

This patch is a bit crude, but gets the job done. In a well designed
system this might even be good enough since the SP or the secure world
will let the memory be reclaimed and we can move on. But, if for some
reason reclaiming the memory is refused it must not be possible to reuse
the memory.

These shared memory ranges are typically quite small compared to the
total memory usage of a guest so it would be an improvement if only
refused shared memory ranges where set aside from future reuse while the
guest was destroyed and other resources made available for reuse. This
could be done by for instance assign the refused shared memory ranges
to a dummy VM like DOMID_IO.

Thanks,
Jens
---
 xen/arch/arm/tee/ffa.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/xen/arch/arm/tee/ffa.c b/xen/arch/arm/tee/ffa.c
index 183528d13388..9c596462a8a2 100644
--- a/xen/arch/arm/tee/ffa.c
+++ b/xen/arch/arm/tee/ffa.c
@@ -1539,6 +1539,7 @@ static bool is_in_subscr_list(const uint16_t *subscr, uint16_t start,
 static int ffa_domain_teardown(struct domain *d)
 {
     struct ffa_ctx *ctx = d->arch.tee;
+    struct ffa_shm_mem *shm, *tmp;
     unsigned int n;
     int32_t res;
 
@@ -1564,10 +1565,45 @@ static int ffa_domain_teardown(struct domain *d)
             printk(XENLOG_ERR "ffa: Failed to report destruction of vm_id %u to  %u: res %d\n",
                    get_vm_id(d), subscr_vm_destroyed[n], res);
     }
+    /*
+     * If this function is called again due to -ERESTART below, make sure
+     * not to send the FFA_MSG_SEND_VM_DESTROYED's.
+     */
+    subscr_vm_destroyed_count = 0;
 
     if ( ctx->rx )
         rxtx_unmap(ctx);
 
+
+    list_for_each_entry_safe(shm, tmp, &ctx->shm_list, list)
+    {
+        register_t handle_hi;
+        register_t handle_lo;
+
+        uint64_to_regpair(&handle_hi, &handle_lo, shm->handle);
+        res = ffa_mem_reclaim(handle_lo, handle_hi, 0);
+        if ( res )
+        {
+            printk(XENLOG_INFO, "ffa: Failed to reclaim handle %#lx : %d\n",
+                   shm->handle, res);
+        }
+        else
+        {
+            printk(XENLOG_DEBUG, "ffa: Reclaimed handle %#lx\n", shm->handle);
+            ctx->shm_count--;
+            list_del(&shm->list);
+        }
+    }
+    if ( !list_empty(&ctx->shm_list) )
+    {
+        printk(XENLOG_INFO, "ffa: Remaining unclaimed handles, retrying\n");
+        /*
+         * TODO: add a timeout where we either panic or let the guest be
+         * fully destroyed.
+         */
+        return -ERESTART;
+    }
+
     XFREE(d->arch.tee);
 
     return 0;
-- 
2.34.1

Re: [RFC XEN PATCH] xen/arm: ffa: reclaim shared memory on guest destroy

Posted by Julien Grall 4 months, 3 weeks ago

Hi Jens,

On 04/12/2023 07:55, Jens Wiklander wrote:
> When an FF-A enabled guest is destroyed it may leave behind memory
> shared with SPs. This memory must be reclaimed before it's reused or an
> SP may make changes to memory used by a new unrelated guest. So when the
> domain is teared down add FF-A requests to reclaim all remaining shared
> memory.
> 
> SPs in the secure world are notified using VM_DESTROYED that a guest has
> been destroyed. An SP is supposed to relinquish all shared memory to allow
> reclaiming the memory. The relinquish operation may need to be delayed if
> the shared memory is for instance part of a DMA operation.
> 
> If the FF-A memory reclaim request fails, return -ERESTART to retry
> again. This will effectively block the destruction of the guest until
> all memory has been reclaimed.
> 
> Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
> ---
> Hi,
> 
> This patch is a bit crude, but gets the job done. In a well designed
> system this might even be good enough since the SP or the secure world
> will let the memory be reclaimed and we can move on. But, if for some
> reason reclaiming the memory is refused it must not be possible to reuse
> the memory.

IIUC, we are trying to harden against a buggy SP. Is that correct?

> 
> These shared memory ranges are typically quite small compared to the
> total memory usage of a guest so it would be an improvement if only
> refused shared memory ranges where set aside from future reuse while the
> guest was destroyed and other resources made available for reuse. This
> could be done by for instance assign the refused shared memory ranges
> to a dummy VM like DOMID_IO.

I like the idea to use a dummy VM, but I don't think DOMID_IO is right. 
Once teardown has completed, the domain will stay around until the last 
reference on all pages are dropped. At this point, the amount of memory 
left-over is minimum (this is mostly bookeeping in Xen).

 From the userland PoV, the domain will still show-up in the list but 
tools like "xl list" will show "(null)". They are called zombie domains.

So I would consider to keep the same domain around. The advantage is you 
can call "xl destroy" again to retry the operation.

> 
> Thanks,
> Jens
> ---
>   xen/arch/arm/tee/ffa.c | 36 ++++++++++++++++++++++++++++++++++++
>   1 file changed, 36 insertions(+)
> 
> diff --git a/xen/arch/arm/tee/ffa.c b/xen/arch/arm/tee/ffa.c
> index 183528d13388..9c596462a8a2 100644
> --- a/xen/arch/arm/tee/ffa.c
> +++ b/xen/arch/arm/tee/ffa.c
> @@ -1539,6 +1539,7 @@ static bool is_in_subscr_list(const uint16_t *subscr, uint16_t start,
>   static int ffa_domain_teardown(struct domain *d)
>   {
>       struct ffa_ctx *ctx = d->arch.tee;
> +    struct ffa_shm_mem *shm, *tmp;
>       unsigned int n;
>       int32_t res;
>   
> @@ -1564,10 +1565,45 @@ static int ffa_domain_teardown(struct domain *d)
>               printk(XENLOG_ERR "ffa: Failed to report destruction of vm_id %u to  %u: res %d\n",
>                      get_vm_id(d), subscr_vm_destroyed[n], res);
>       }
> +    /*
> +     * If this function is called again due to -ERESTART below, make sure
> +     * not to send the FFA_MSG_SEND_VM_DESTROYED's.
> +     */
> +    subscr_vm_destroyed_count = 0;

AFAICT, this variable is global. So wouldn't you effectively break other 
domain if let say the unmapping error is temporary?

>   
>       if ( ctx->rx )
>           rxtx_unmap(ctx);
>   
> +
> +    list_for_each_entry_safe(shm, tmp, &ctx->shm_list, list)
> +    {
> +        register_t handle_hi;
> +        register_t handle_lo;
> +
> +        uint64_to_regpair(&handle_hi, &handle_lo, shm->handle);
> +        res = ffa_mem_reclaim(handle_lo, handle_hi, 0);

Is this call expensive? If so, we may need to handle continuation here.

> +        if ( res )
> +        {
> +            printk(XENLOG_INFO, "ffa: Failed to reclaim handle %#lx : %d\n",
> +                   shm->handle, res);

I think you want to use XENLOG_G_INFO to use the guest ratelimit. Also, 
I would suggest to print the domain ID in the logs (see '%pd').


> +        }
> +        else
> +        {
> +            printk(XENLOG_DEBUG, "ffa: Reclaimed handle %#lx\n", shm->handle);

Same here. You want to use XENLOG_G_DEBUG and print the domain ID.

> +            ctx->shm_count--;
> +            list_del(&shm->list);
> +        }
> +    }

NIT: New line here please for clarity.

> +    if ( !list_empty(&ctx->shm_list) )
> +    {
> +        printk(XENLOG_INFO, "ffa: Remaining unclaimed handles, retrying\n");

Same as the other printks.

> +        /*
> +         * TODO: add a timeout where we either panic or let the guest be
> +         * fully destroyed.
> +         */
Timeout with proper handling would be a solution. I am not sure about 
panic-ing. Do you think the TEE would be in a bad state if we can't 
release memory?

> +        return -ERESTART;
> +    }
> +
>       XFREE(d->arch.tee);
>   
>       return 0;

Cheers,

-- 
Julien Grall

Re: [RFC XEN PATCH] xen/arm: ffa: reclaim shared memory on guest destroy

Posted by Jens Wiklander 4 months, 3 weeks ago

Hi Julien,

Thanks for the feedback. I'm answering the straightforward issues here
and saving the rest for the emerging thread.

On Mon, Dec 4, 2023 at 8:24 PM Julien Grall <julien@xen.org> wrote:
>
> Hi Jens,
>
> On 04/12/2023 07:55, Jens Wiklander wrote:
> > When an FF-A enabled guest is destroyed it may leave behind memory
> > shared with SPs. This memory must be reclaimed before it's reused or an
> > SP may make changes to memory used by a new unrelated guest. So when the
> > domain is teared down add FF-A requests to reclaim all remaining shared
> > memory.
> >
> > SPs in the secure world are notified using VM_DESTROYED that a guest has
> > been destroyed. An SP is supposed to relinquish all shared memory to allow
> > reclaiming the memory. The relinquish operation may need to be delayed if
> > the shared memory is for instance part of a DMA operation.
> >
> > If the FF-A memory reclaim request fails, return -ERESTART to retry
> > again. This will effectively block the destruction of the guest until
> > all memory has been reclaimed.
> >
> > Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
> > ---
> > Hi,
> >
> > This patch is a bit crude, but gets the job done. In a well designed
> > system this might even be good enough since the SP or the secure world
> > will let the memory be reclaimed and we can move on. But, if for some
> > reason reclaiming the memory is refused it must not be possible to reuse
> > the memory.
>
> IIUC, we are trying to harden against a buggy SP. Is that correct?
>
> >
> > These shared memory ranges are typically quite small compared to the
> > total memory usage of a guest so it would be an improvement if only
> > refused shared memory ranges where set aside from future reuse while the
> > guest was destroyed and other resources made available for reuse. This
> > could be done by for instance assign the refused shared memory ranges
> > to a dummy VM like DOMID_IO.
>
> I like the idea to use a dummy VM, but I don't think DOMID_IO is right.
> Once teardown has completed, the domain will stay around until the last
> reference on all pages are dropped. At this point, the amount of memory
> left-over is minimum (this is mostly bookeeping in Xen).
>
>  From the userland PoV, the domain will still show-up in the list but
> tools like "xl list" will show "(null)". They are called zombie domains.
>
> So I would consider to keep the same domain around. The advantage is you
> can call "xl destroy" again to retry the operation.
>
> >
> > Thanks,
> > Jens
> > ---
> >   xen/arch/arm/tee/ffa.c | 36 ++++++++++++++++++++++++++++++++++++
> >   1 file changed, 36 insertions(+)
> >
> > diff --git a/xen/arch/arm/tee/ffa.c b/xen/arch/arm/tee/ffa.c
> > index 183528d13388..9c596462a8a2 100644
> > --- a/xen/arch/arm/tee/ffa.c
> > +++ b/xen/arch/arm/tee/ffa.c
> > @@ -1539,6 +1539,7 @@ static bool is_in_subscr_list(const uint16_t *subscr, uint16_t start,
> >   static int ffa_domain_teardown(struct domain *d)
> >   {
> >       struct ffa_ctx *ctx = d->arch.tee;
> > +    struct ffa_shm_mem *shm, *tmp;
> >       unsigned int n;
> >       int32_t res;
> >
> > @@ -1564,10 +1565,45 @@ static int ffa_domain_teardown(struct domain *d)
> >               printk(XENLOG_ERR "ffa: Failed to report destruction of vm_id %u to  %u: res %d\n",
> >                      get_vm_id(d), subscr_vm_destroyed[n], res);
> >       }
> > +    /*
> > +     * If this function is called again due to -ERESTART below, make sure
> > +     * not to send the FFA_MSG_SEND_VM_DESTROYED's.
> > +     */
> > +    subscr_vm_destroyed_count = 0;
>
> AFAICT, this variable is global. So wouldn't you effectively break other
> domain if let say the unmapping error is temporary?

You're right! I'll have to change this part a bit.

>
> >
> >       if ( ctx->rx )
> >           rxtx_unmap(ctx);
> >
> > +
> > +    list_for_each_entry_safe(shm, tmp, &ctx->shm_list, list)
> > +    {
> > +        register_t handle_hi;
> > +        register_t handle_lo;
> > +
> > +        uint64_to_regpair(&handle_hi, &handle_lo, shm->handle);
> > +        res = ffa_mem_reclaim(handle_lo, handle_hi, 0);
>
> Is this call expensive? If so, we may need to handle continuation here.
>
> > +        if ( res )
> > +        {
> > +            printk(XENLOG_INFO, "ffa: Failed to reclaim handle %#lx : %d\n",
> > +                   shm->handle, res);
>
> I think you want to use XENLOG_G_INFO to use the guest ratelimit. Also,
> I would suggest to print the domain ID in the logs (see '%pd').

Thanks for the tip, I'll update accordingly here and at the other places.

>
>
> > +        }
> > +        else
> > +        {
> > +            printk(XENLOG_DEBUG, "ffa: Reclaimed handle %#lx\n", shm->handle);
>
> Same here. You want to use XENLOG_G_DEBUG and print the domain ID.
>
> > +            ctx->shm_count--;
> > +            list_del(&shm->list);
> > +        }
> > +    }
>
> NIT: New line here please for clarity.

OK

>
> > +    if ( !list_empty(&ctx->shm_list) )
> > +    {
> > +        printk(XENLOG_INFO, "ffa: Remaining unclaimed handles, retrying\n");
>
> Same as the other printks.
>
> > +        /*
> > +         * TODO: add a timeout where we either panic or let the guest be
> > +         * fully destroyed.
> > +         */
> Timeout with proper handling would be a solution. I am not sure about
> panic-ing. Do you think the TEE would be in a bad state if we can't
> release memory?

No, that's not likely.

Thanks,
Jens

>
> > +        return -ERESTART;
> > +    }
> > +
> >       XFREE(d->arch.tee);
> >
> >       return 0;
>
> Cheers,
>
> --
> Julien Grall

Re: [RFC XEN PATCH] xen/arm: ffa: reclaim shared memory on guest destroy

Posted by Bertrand Marquis 4 months, 3 weeks ago

Hi Julien,

Thanks a lot for your review and comment, this is very helpful.

> On 4 Dec 2023, at 20:24, Julien Grall <julien@xen.org> wrote:
> 
> Hi Jens,
> 
> On 04/12/2023 07:55, Jens Wiklander wrote:
>> When an FF-A enabled guest is destroyed it may leave behind memory
>> shared with SPs. This memory must be reclaimed before it's reused or an
>> SP may make changes to memory used by a new unrelated guest. So when the
>> domain is teared down add FF-A requests to reclaim all remaining shared
>> memory.
>> SPs in the secure world are notified using VM_DESTROYED that a guest has
>> been destroyed. An SP is supposed to relinquish all shared memory to allow
>> reclaiming the memory. The relinquish operation may need to be delayed if
>> the shared memory is for instance part of a DMA operation.
>> If the FF-A memory reclaim request fails, return -ERESTART to retry
>> again. This will effectively block the destruction of the guest until
>> all memory has been reclaimed.
>> Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
>> ---
>> Hi,
>> This patch is a bit crude, but gets the job done. In a well designed
>> system this might even be good enough since the SP or the secure world
>> will let the memory be reclaimed and we can move on. But, if for some
>> reason reclaiming the memory is refused it must not be possible to reuse
>> the memory.
> 
> IIUC, we are trying to harden against a buggy SP. Is that correct?

This is not hardening as this is a possible scenario with a correctly implemented SP.
This is valid for the SP to not be able to relinquish the memory directly so we have
to take this possibility into account and retry.

What is not expected if for the SP to never release the memory hence the possible
"todo" at the end of the code where i think we might have to implement a counter
to bound the possible number of loops but as always the question is how many...

In this case the only solution would be to park the memory as suggested after
but we are not completely sure where hence the RFC.

> 
>> These shared memory ranges are typically quite small compared to the
>> total memory usage of a guest so it would be an improvement if only
>> refused shared memory ranges where set aside from future reuse while the
>> guest was destroyed and other resources made available for reuse. This
>> could be done by for instance assign the refused shared memory ranges
>> to a dummy VM like DOMID_IO.
> 
> I like the idea to use a dummy VM, but I don't think DOMID_IO is right. Once teardown has completed, the domain will stay around until the last reference on all pages are dropped. At this point, the amount of memory left-over is minimum (this is mostly bookeeping in Xen).
> 
> From the userland PoV, the domain will still show-up in the list but tools like "xl list" will show "(null)". They are called zombie domains.
> 
> So I would consider to keep the same domain around. The advantage is you can call "xl destroy" again to retry the operation.

In this scenario the "restart" implementation here is right but how could we park the VM as "zombie" instead of busy looping in
the "kill" loop of userland ?

Also we need to release all the memory of the VM but the one shared with the SP. 

I will let Jens answer the more implementation questions here after and try to help on the more "system" ones.

> 
>> Thanks,
>> Jens
>> ---
>>  xen/arch/arm/tee/ffa.c | 36 ++++++++++++++++++++++++++++++++++++
>>  1 file changed, 36 insertions(+)
>> diff --git a/xen/arch/arm/tee/ffa.c b/xen/arch/arm/tee/ffa.c
>> index 183528d13388..9c596462a8a2 100644
>> --- a/xen/arch/arm/tee/ffa.c
>> +++ b/xen/arch/arm/tee/ffa.c
>> @@ -1539,6 +1539,7 @@ static bool is_in_subscr_list(const uint16_t *subscr, uint16_t start,
>>  static int ffa_domain_teardown(struct domain *d)
>>  {
>>      struct ffa_ctx *ctx = d->arch.tee;
>> +    struct ffa_shm_mem *shm, *tmp;
>>      unsigned int n;
>>      int32_t res;
>>  @@ -1564,10 +1565,45 @@ static int ffa_domain_teardown(struct domain *d)
>>              printk(XENLOG_ERR "ffa: Failed to report destruction of vm_id %u to  %u: res %d\n",
>>                     get_vm_id(d), subscr_vm_destroyed[n], res);
>>      }
>> +    /*
>> +     * If this function is called again due to -ERESTART below, make sure
>> +     * not to send the FFA_MSG_SEND_VM_DESTROYED's.
>> +     */
>> +    subscr_vm_destroyed_count = 0;
> 
> AFAICT, this variable is global. So wouldn't you effectively break other domain if let say the unmapping error is temporary?
> 
>>        if ( ctx->rx )
>>          rxtx_unmap(ctx);
>>  +
>> +    list_for_each_entry_safe(shm, tmp, &ctx->shm_list, list)
>> +    {
>> +        register_t handle_hi;
>> +        register_t handle_lo;
>> +
>> +        uint64_to_regpair(&handle_hi, &handle_lo, shm->handle);
>> +        res = ffa_mem_reclaim(handle_lo, handle_hi, 0);
> 
> Is this call expensive? If so, we may need to handle continuation here.

This call should not be expensive in the normal case as memory is reclaimable
so there is no processing required in the SP and all is done in the SPMC which
should basically just return a yes or no depending on a state for the handle.

So I think this is the best trade.

@Jens: One thing to consider is that a Destroy might get a retry or busy answer and we
will have to issue it again and this is not considered in the current implementation.

After discussing the subject internally we could in fact consider that if an SP cannot release
some memory shared with the VM destroyed, it should tell it by returning "retry" to the message.
Here that could simplify things by doing a strategy where:
- we retry on the VM_DESTROY message if required
- if some memory is not reclaimable we check if we could park it and make the VM a zombie.
What do you think ?


> 
>> +        if ( res )
>> +        {
>> +            printk(XENLOG_INFO, "ffa: Failed to reclaim handle %#lx : %d\n",
>> +                   shm->handle, res);
> 
> I think you want to use XENLOG_G_INFO to use the guest ratelimit. Also, I would suggest to print the domain ID in the logs (see '%pd').
> 
> 
>> +        }
>> +        else
>> +        {
>> +            printk(XENLOG_DEBUG, "ffa: Reclaimed handle %#lx\n", shm->handle);
> 
> Same here. You want to use XENLOG_G_DEBUG and print the domain ID.
> 
>> +            ctx->shm_count--;
>> +            list_del(&shm->list);
>> +        }
>> +    }
> 
> NIT: New line here please for clarity.
> 
>> +    if ( !list_empty(&ctx->shm_list) )
>> +    {
>> +        printk(XENLOG_INFO, "ffa: Remaining unclaimed handles, retrying\n");
> 
> Same as the other printks.
> 
>> +        /*
>> +         * TODO: add a timeout where we either panic or let the guest be
>> +         * fully destroyed.
>> +         */
> Timeout with proper handling would be a solution. I am not sure about panic-ing. Do you think the TEE would be in a bad state if we can't release memory?
> 
>> +        return -ERESTART;
>> +    }
>> +
>>      XFREE(d->arch.tee);
>>        return 0;
> 
> Cheers,
> 

Cheers
Bertrand

> -- 
> Julien Grall

Re: [RFC XEN PATCH] xen/arm: ffa: reclaim shared memory on guest destroy

Posted by Jens Wiklander 4 months, 3 weeks ago

Hi,

On Tue, Dec 5, 2023 at 9:14 AM Bertrand Marquis
<Bertrand.Marquis@arm.com> wrote:
>
> Hi Julien,
>
> Thanks a lot for your review and comment, this is very helpful.
>
> > On 4 Dec 2023, at 20:24, Julien Grall <julien@xen.org> wrote:
> >
> > Hi Jens,
> >
> > On 04/12/2023 07:55, Jens Wiklander wrote:
> >> When an FF-A enabled guest is destroyed it may leave behind memory
> >> shared with SPs. This memory must be reclaimed before it's reused or an
> >> SP may make changes to memory used by a new unrelated guest. So when the
> >> domain is teared down add FF-A requests to reclaim all remaining shared
> >> memory.
> >> SPs in the secure world are notified using VM_DESTROYED that a guest has
> >> been destroyed. An SP is supposed to relinquish all shared memory to allow
> >> reclaiming the memory. The relinquish operation may need to be delayed if
> >> the shared memory is for instance part of a DMA operation.
> >> If the FF-A memory reclaim request fails, return -ERESTART to retry
> >> again. This will effectively block the destruction of the guest until
> >> all memory has been reclaimed.
> >> Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
> >> ---
> >> Hi,
> >> This patch is a bit crude, but gets the job done. In a well designed
> >> system this might even be good enough since the SP or the secure world
> >> will let the memory be reclaimed and we can move on. But, if for some
> >> reason reclaiming the memory is refused it must not be possible to reuse
> >> the memory.
> >
> > IIUC, we are trying to harden against a buggy SP. Is that correct?
>
> This is not hardening as this is a possible scenario with a correctly implemented SP.
> This is valid for the SP to not be able to relinquish the memory directly so we have
> to take this possibility into account and retry.
>
> What is not expected if for the SP to never release the memory hence the possible
> "todo" at the end of the code where i think we might have to implement a counter
> to bound the possible number of loops but as always the question is how many...
>
> In this case the only solution would be to park the memory as suggested after
> but we are not completely sure where hence the RFC.
>
> >
> >> These shared memory ranges are typically quite small compared to the
> >> total memory usage of a guest so it would be an improvement if only
> >> refused shared memory ranges where set aside from future reuse while the
> >> guest was destroyed and other resources made available for reuse. This
> >> could be done by for instance assign the refused shared memory ranges
> >> to a dummy VM like DOMID_IO.
> >
> > I like the idea to use a dummy VM, but I don't think DOMID_IO is right. Once teardown has completed, the domain will stay around until the last reference on all pages are dropped. At this point, the amount of memory left-over is minimum (this is mostly bookeeping in Xen).
> >
> > From the userland PoV, the domain will still show-up in the list but tools like "xl list" will show "(null)". They are called zombie domains.
> >
> > So I would consider to keep the same domain around. The advantage is you can call "xl destroy" again to retry the operation.
>
> In this scenario the "restart" implementation here is right but how could we park the VM as "zombie" instead of busy looping in
> the "kill" loop of userland ?
>
> Also we need to release all the memory of the VM but the one shared with the SP.
>
> I will let Jens answer the more implementation questions here after and try to help on the more "system" ones.
>
> >
> >> Thanks,
> >> Jens
> >> ---
> >>  xen/arch/arm/tee/ffa.c | 36 ++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 36 insertions(+)
> >> diff --git a/xen/arch/arm/tee/ffa.c b/xen/arch/arm/tee/ffa.c
> >> index 183528d13388..9c596462a8a2 100644
> >> --- a/xen/arch/arm/tee/ffa.c
> >> +++ b/xen/arch/arm/tee/ffa.c
> >> @@ -1539,6 +1539,7 @@ static bool is_in_subscr_list(const uint16_t *subscr, uint16_t start,
> >>  static int ffa_domain_teardown(struct domain *d)
> >>  {
> >>      struct ffa_ctx *ctx = d->arch.tee;
> >> +    struct ffa_shm_mem *shm, *tmp;
> >>      unsigned int n;
> >>      int32_t res;
> >>  @@ -1564,10 +1565,45 @@ static int ffa_domain_teardown(struct domain *d)
> >>              printk(XENLOG_ERR "ffa: Failed to report destruction of vm_id %u to  %u: res %d\n",
> >>                     get_vm_id(d), subscr_vm_destroyed[n], res);
> >>      }
> >> +    /*
> >> +     * If this function is called again due to -ERESTART below, make sure
> >> +     * not to send the FFA_MSG_SEND_VM_DESTROYED's.
> >> +     */
> >> +    subscr_vm_destroyed_count = 0;
> >
> > AFAICT, this variable is global. So wouldn't you effectively break other domain if let say the unmapping error is temporary?
> >
> >>        if ( ctx->rx )
> >>          rxtx_unmap(ctx);
> >>  +
> >> +    list_for_each_entry_safe(shm, tmp, &ctx->shm_list, list)
> >> +    {
> >> +        register_t handle_hi;
> >> +        register_t handle_lo;
> >> +
> >> +        uint64_to_regpair(&handle_hi, &handle_lo, shm->handle);
> >> +        res = ffa_mem_reclaim(handle_lo, handle_hi, 0);
> >
> > Is this call expensive? If so, we may need to handle continuation here.
>
> This call should not be expensive in the normal case as memory is reclaimable
> so there is no processing required in the SP and all is done in the SPMC which
> should basically just return a yes or no depending on a state for the handle.

I agree, this should only be a thing between the hypervisor and the
SPMC in the secure world.

>
> So I think this is the best trade.
>
> @Jens: One thing to consider is that a Destroy might get a retry or busy answer and we
> will have to issue it again and this is not considered in the current implementation.

You're right, we'll need to keep track of which SPs we've been able to
send a VM_DESTROY message to.

>
> After discussing the subject internally we could in fact consider that if an SP cannot release
> some memory shared with the VM destroyed, it should tell it by returning "retry" to the message.
> Here that could simplify things by doing a strategy where:
> - we retry on the VM_DESTROY message if required

We should keep a record of which SPs remain to be signaled with
VM_DESTROY. An SP may have other reasons to return an error so this
call can be retried later.

> - if some memory is not reclaimable we check if we could park it and make the VM a zombie.
> What do you think ?

The zombie option sounds like a good fallback when automatic reclaim
(reasonable timeouts have expired etc) has failed.

Thanks,
Jens

>
>
> >
> >> +        if ( res )
> >> +        {
> >> +            printk(XENLOG_INFO, "ffa: Failed to reclaim handle %#lx : %d\n",
> >> +                   shm->handle, res);
> >
> > I think you want to use XENLOG_G_INFO to use the guest ratelimit. Also, I would suggest to print the domain ID in the logs (see '%pd').
> >
> >
> >> +        }
> >> +        else
> >> +        {
> >> +            printk(XENLOG_DEBUG, "ffa: Reclaimed handle %#lx\n", shm->handle);
> >
> > Same here. You want to use XENLOG_G_DEBUG and print the domain ID.
> >
> >> +            ctx->shm_count--;
> >> +            list_del(&shm->list);
> >> +        }
> >> +    }
> >
> > NIT: New line here please for clarity.
> >
> >> +    if ( !list_empty(&ctx->shm_list) )
> >> +    {
> >> +        printk(XENLOG_INFO, "ffa: Remaining unclaimed handles, retrying\n");
> >
> > Same as the other printks.
> >
> >> +        /*
> >> +         * TODO: add a timeout where we either panic or let the guest be
> >> +         * fully destroyed.
> >> +         */
> > Timeout with proper handling would be a solution. I am not sure about panic-ing. Do you think the TEE would be in a bad state if we can't release memory?
> >
> >> +        return -ERESTART;
> >> +    }
> >> +
> >>      XFREE(d->arch.tee);
> >>        return 0;
> >
> > Cheers,
> >
>
> Cheers
> Bertrand
>
> > --
> > Julien Grall
>
>

Re: [RFC XEN PATCH] xen/arm: ffa: reclaim shared memory on guest destroy

Posted by Andrew Cooper 4 months, 3 weeks ago

On 05/12/2023 8:14 am, Bertrand Marquis wrote:
> Hi Julien,
>
> Thanks a lot for your review and comment, this is very helpful.
>
>> On 4 Dec 2023, at 20:24, Julien Grall <julien@xen.org> wrote:
>>
>> Hi Jens,
>>
>> On 04/12/2023 07:55, Jens Wiklander wrote:
>>>        if ( ctx->rx )
>>>          rxtx_unmap(ctx);
>>>  +
>>> +    list_for_each_entry_safe(shm, tmp, &ctx->shm_list, list)
>>> +    {
>>> +        register_t handle_hi;
>>> +        register_t handle_lo;
>>> +
>>> +        uint64_to_regpair(&handle_hi, &handle_lo, shm->handle);
>>> +        res = ffa_mem_reclaim(handle_lo, handle_hi, 0);
>> Is this call expensive? If so, we may need to handle continuation here.
> This call should not be expensive in the normal case as memory is reclaimable
> so there is no processing required in the SP and all is done in the SPMC which
> should basically just return a yes or no depending on a state for the handle.
>
> So I think this is the best trade.
>
> @Jens: One thing to consider is that a Destroy might get a retry or busy answer and we
> will have to issue it again and this is not considered in the current implementation.
>
> After discussing the subject internally we could in fact consider that if an SP cannot release
> some memory shared with the VM destroyed, it should tell it by returning "retry" to the message.
> Here that could simplify things by doing a strategy where:
> - we retry on the VM_DESTROY message if required
> - if some memory is not reclaimable we check if we could park it and make the VM a zombie.
> What do you think ?

This is the cleanup issue discussed at XenSummit, isn't it?

You cannot feasibly implement this cleanup by having
ffa_domain_teardown() return -ERESTART.

Yes, it will probably function - but now you're now bouncing in/out of
Xen as fast as the CPU will allow, rechecking a condition which will
take an unbounded quantity of time.  Meanwhile, you've tied up a
userspace thread (the invocation of `xl destroy`) to do so, and one of
dom0's vCPU for however long the scheduler is willing to schedule the
destroy invocation, which will be 100% of the time as it's always busy
in the hypervisor.

The teardown/kill infrastructure is intended and expected to always make
forward progress.

The closest thing to this patch which will work sanely is this:

Hold a single domain reference for any non-zero amount of magic memory
held.  See domain_adjust_tot_pages() and how it interacts with
{get,put}_domain(), and copy it.  Importantly, this prevents the domain
being freed until the final piece of magic memory has been released.

Have some way (can be early on the teardown/kill path, or a separate
hypercall - assuming the VM can't ever be scheduled again) to kick Xen
into being responsible for trying to reclaim the memory.  (Start a
timer, or reclaim in the idle loop, whatever.)

This way, you can `xl destroy` a VM in an arbitrary state, *and* the
invocation will terminate when Xen has nothing deterministic left to do,
*and* in the case that the secure world or Xen has an issue, the VM will
stay around as a zombie holding minimal resources.

~Andrew

Re: [RFC XEN PATCH] xen/arm: ffa: reclaim shared memory on guest destroy

Posted by Jens Wiklander 4 months, 3 weeks ago

Hi Andrew,

On Tue, Dec 5, 2023 at 11:53 AM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 05/12/2023 8:14 am, Bertrand Marquis wrote:
> > Hi Julien,
> >
> > Thanks a lot for your review and comment, this is very helpful.
> >
> >> On 4 Dec 2023, at 20:24, Julien Grall <julien@xen.org> wrote:
> >>
> >> Hi Jens,
> >>
> >> On 04/12/2023 07:55, Jens Wiklander wrote:
> >>>        if ( ctx->rx )
> >>>          rxtx_unmap(ctx);
> >>>  +
> >>> +    list_for_each_entry_safe(shm, tmp, &ctx->shm_list, list)
> >>> +    {
> >>> +        register_t handle_hi;
> >>> +        register_t handle_lo;
> >>> +
> >>> +        uint64_to_regpair(&handle_hi, &handle_lo, shm->handle);
> >>> +        res = ffa_mem_reclaim(handle_lo, handle_hi, 0);
> >> Is this call expensive? If so, we may need to handle continuation here.
> > This call should not be expensive in the normal case as memory is reclaimable
> > so there is no processing required in the SP and all is done in the SPMC which
> > should basically just return a yes or no depending on a state for the handle.
> >
> > So I think this is the best trade.
> >
> > @Jens: One thing to consider is that a Destroy might get a retry or busy answer and we
> > will have to issue it again and this is not considered in the current implementation.
> >
> > After discussing the subject internally we could in fact consider that if an SP cannot release
> > some memory shared with the VM destroyed, it should tell it by returning "retry" to the message.
> > Here that could simplify things by doing a strategy where:
> > - we retry on the VM_DESTROY message if required
> > - if some memory is not reclaimable we check if we could park it and make the VM a zombie.
> > What do you think ?
>
> This is the cleanup issue discussed at XenSummit, isn't it?
>
> You cannot feasibly implement this cleanup by having
> ffa_domain_teardown() return -ERESTART.
>
> Yes, it will probably function - but now you're now bouncing in/out of
> Xen as fast as the CPU will allow, rechecking a condition which will
> take an unbounded quantity of time.  Meanwhile, you've tied up a
> userspace thread (the invocation of `xl destroy`) to do so, and one of
> dom0's vCPU for however long the scheduler is willing to schedule the
> destroy invocation, which will be 100% of the time as it's always busy
> in the hypervisor.
>
> The teardown/kill infrastructure is intended and expected to always make
> forward progress.
>

OK

>
> The closest thing to this patch which will work sanely is this:
>
> Hold a single domain reference for any non-zero amount of magic memory
> held.  See domain_adjust_tot_pages() and how it interacts with
> {get,put}_domain(), and copy it.  Importantly, this prevents the domain
> being freed until the final piece of magic memory has been released.
>
> Have some way (can be early on the teardown/kill path, or a separate
> hypercall - assuming the VM can't ever be scheduled again) to kick Xen
> into being responsible for trying to reclaim the memory.  (Start a
> timer, or reclaim in the idle loop, whatever.)
>
> This way, you can `xl destroy` a VM in an arbitrary state, *and* the
> invocation will terminate when Xen has nothing deterministic left to do,
> *and* in the case that the secure world or Xen has an issue, the VM will
> stay around as a zombie holding minimal resources.

Thanks for the pointers, very helpful, and now I at least know where
to start looking.

Cheers,
Jens

>
> ~Andrew