[PATCH RFC 04/10] domain: update GADDR based runstate guest area

Jan Beulich posted 10 patches 3 years, 3 months ago
There is a newer version of this series
[PATCH RFC 04/10] domain: update GADDR based runstate guest area
Posted by Jan Beulich 3 years, 3 months ago
Before adding a new vCPU operation to register the runstate area by
guest-physical address, add code to actually keep such areas up-to-date.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
RFC: Pages aren't marked dirty when written to (matching the handling of
     space mapped by map_vcpu_info() afaict), on the basis that the
     registrations are lost anyway across migration. Plus the contents
     of the areas in question have to be deemed volatile in the first
     place (so saving a "most recent" value is pretty meaningless even
     for e.g. snapshotting).

RFC: Can we perhaps avoid the VM-assist conditionals, assuming the more
     modern behavior to apply uniformly for gaddr-based registrations?

RFC: HVM guests (on x86) can change bitness and hence layout (and size!
     and alignment) of the runstate area. I don't think it is an option
     to require 32-bit code to pass a range such that even the 64-bit
     layout wouldn't cross a page boundary (and be suitably aligned). I
     also don't see any other good solution, so for now a crude approach
     with an extra boolean is used (using has_32bit_shinfo() isn't race
     free and could hence lead to overrunning the mapped space).

--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1599,14 +1599,55 @@ bool update_runstate_area(struct vcpu *v
     struct guest_memory_policy policy = { };
     void __user *guest_handle = NULL;
     struct vcpu_runstate_info runstate;
+    struct vcpu_runstate_info *map = v->runstate_guest_area.map;
+
+    memcpy(&runstate, &v->runstate, sizeof(runstate));
+
+    if ( map )
+    {
+        uint64_t *pset = NULL;
+#ifdef CONFIG_COMPAT
+        struct compat_vcpu_runstate_info *cmap = NULL;
+
+        if ( v->runstate_guest_area_compat )
+            cmap = (void *)map;
+#endif
+
+        if ( VM_ASSIST(v->domain, runstate_update_flag) )
+        {
+#ifdef CONFIG_COMPAT
+            if ( cmap )
+                pset = &cmap->state_entry_time;
+            else
+#endif
+                pset = &map->state_entry_time;
+            runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
+            write_atomic(pset, runstate.state_entry_time);
+            smp_wmb();
+        }
+
+#ifdef CONFIG_COMPAT
+        if ( cmap )
+            XLAT_vcpu_runstate_info(cmap, &runstate);
+        else
+#endif
+            *map = runstate;
+
+        if ( pset )
+        {
+            smp_wmb();
+            runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
+            write_atomic(pset, runstate.state_entry_time);
+        }
+
+        return true;
+    }
 
     if ( guest_handle_is_null(runstate_guest(v)) )
         return true;
 
     update_guest_memory_policy(v, &policy);
 
-    memcpy(&runstate, &v->runstate, sizeof(runstate));
-
     if ( VM_ASSIST(v->domain, runstate_update_flag) )
     {
 #ifdef CONFIG_COMPAT
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -231,6 +231,8 @@ struct vcpu
 #ifdef CONFIG_COMPAT
     /* A hypercall is using the compat ABI? */
     bool             hcall_compat;
+    /* Physical runstate area registered via compat ABI? */
+    bool             runstate_guest_area_compat;
 #endif
 
 #ifdef CONFIG_IOREQ_SERVER
Re: [PATCH RFC 04/10] domain: update GADDR based runstate guest area
Posted by Julien Grall 3 years, 1 month ago
Hi Jan,

On 19/10/2022 08:41, Jan Beulich wrote:
> Before adding a new vCPU operation to register the runstate area by
> guest-physical address, add code to actually keep such areas up-to-date.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> RFC: Pages aren't marked dirty when written to (matching the handling of
>       space mapped by map_vcpu_info() afaict), on the basis that the
>       registrations are lost anyway across migration. 

So I agree for the existing migration. But I wonder whether we would 
need to dirty those pages if we live-migrated a guest without its help 
(IOW the registrations would still be present).

Anyway, nothing to worry about yet as this is not supported upstream.

> Plus the contents
>       of the areas in question have to be deemed volatile in the first
>       place (so saving a "most recent" value is pretty meaningless even
>       for e.g. snapshotting).
> 
> RFC: Can we perhaps avoid the VM-assist conditionals, assuming the more
>       modern behavior to apply uniformly for gaddr-based registrations?

It is not clear why someone would want to use the old behavior with the 
new gaddr-based registrations. So I would say yes.

> 
> RFC: HVM guests (on x86) can change bitness and hence layout (and size!
>       and alignment) of the runstate area. I don't think it is an option
>       to require 32-bit code to pass a range such that even the 64-bit
>       layout wouldn't cross a page boundary (and be suitably aligned). I
>       also don't see any other good solution, so for now a crude approach
>       with an extra boolean is used (using has_32bit_shinfo() isn't race
>       free and could hence lead to overrunning the mapped space).

I think the extra check for 32-bit code to pass the check for 64-bit 
layout would be better.

> 
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -1599,14 +1599,55 @@ bool update_runstate_area(struct vcpu *v
>       struct guest_memory_policy policy = { };
>       void __user *guest_handle = NULL;
>       struct vcpu_runstate_info runstate;
> +    struct vcpu_runstate_info *map = v->runstate_guest_area.map;
> +
> +    memcpy(&runstate, &v->runstate, sizeof(runstate));
> +
> +    if ( map )
> +    {
> +        uint64_t *pset = NULL;
> +#ifdef CONFIG_COMPAT
> +        struct compat_vcpu_runstate_info *cmap = NULL;
> +
> +        if ( v->runstate_guest_area_compat )
> +            cmap = (void *)map;
> +#endif
> +
> +        if ( VM_ASSIST(v->domain, runstate_update_flag) )
> +        {
> +#ifdef CONFIG_COMPAT
> +            if ( cmap )
> +                pset = &cmap->state_entry_time;
> +            else
> +#endif
> +                pset = &map->state_entry_time;
> +            runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
> +            write_atomic(pset, runstate.state_entry_time);
> +            smp_wmb();
> +        }
> +
> +#ifdef CONFIG_COMPAT
> +        if ( cmap )
> +            XLAT_vcpu_runstate_info(cmap, &runstate);
> +        else
> +#endif
> +            *map = runstate;
> +
> +        if ( pset )
> +        {
> +            smp_wmb();
> +            runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
> +            write_atomic(pset, runstate.state_entry_time);
> +        }
> +
> +        return true;
> +    }
>   
>       if ( guest_handle_is_null(runstate_guest(v)) )
>           return true;
>   
>       update_guest_memory_policy(v, &policy);
>   
> -    memcpy(&runstate, &v->runstate, sizeof(runstate));
> -
>       if ( VM_ASSIST(v->domain, runstate_update_flag) )
>       {
>   #ifdef CONFIG_COMPAT
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -231,6 +231,8 @@ struct vcpu
>   #ifdef CONFIG_COMPAT
>       /* A hypercall is using the compat ABI? */
>       bool             hcall_compat;
> +    /* Physical runstate area registered via compat ABI? */
> +    bool             runstate_guest_area_compat;
>   #endif
>   
>   #ifdef CONFIG_IOREQ_SERVER
> 

Cheers,

-- 
Julien Grall
Re: [PATCH RFC 04/10] domain: update GADDR based runstate guest area
Posted by Jan Beulich 3 years, 1 month ago
On 16.12.2022 13:26, Julien Grall wrote:
> On 19/10/2022 08:41, Jan Beulich wrote:
>> Before adding a new vCPU operation to register the runstate area by
>> guest-physical address, add code to actually keep such areas up-to-date.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> RFC: Pages aren't marked dirty when written to (matching the handling of
>>       space mapped by map_vcpu_info() afaict), on the basis that the
>>       registrations are lost anyway across migration. 
> 
> So I agree for the existing migration. But I wonder whether we would 
> need to dirty those pages if we live-migrated a guest without its help 
> (IOW the registrations would still be present).

Even then I'd expect the area to be re-populated at the target, so the
page contents would need moving over (perhaps multiple times) only if
any other parts of such a page were written to.

> Anyway, nothing to worry about yet as this is not supported upstream.

I assume you've taken note of this for the transparent migration work.
One question after all is how you'd make handling of the area at the
new target transparent, i.e. without any anomalies in the values the
guest gets to see. It may very well be that every such area needs
special treatment in the course of migrating, such that the most up-
to-date values are reported as part of the migration stream, but
separate from all the pages' contents.

>> Plus the contents
>>       of the areas in question have to be deemed volatile in the first
>>       place (so saving a "most recent" value is pretty meaningless even
>>       for e.g. snapshotting).
>>
>> RFC: Can we perhaps avoid the VM-assist conditionals, assuming the more
>>       modern behavior to apply uniformly for gaddr-based registrations?
> 
> It is not clear why someone would want to use the old behavior with the 
> new gaddr-based registrations. So I would say yes.

Okay, will do.

>> RFC: HVM guests (on x86) can change bitness and hence layout (and size!
>>       and alignment) of the runstate area. I don't think it is an option
>>       to require 32-bit code to pass a range such that even the 64-bit
>>       layout wouldn't cross a page boundary (and be suitably aligned). I
>>       also don't see any other good solution, so for now a crude approach
>>       with an extra boolean is used (using has_32bit_shinfo() isn't race
>>       free and could hence lead to overrunning the mapped space).
> 
> I think the extra check for 32-bit code to pass the check for 64-bit 
> layout would be better.

I'm afraid I can't derive from your reply what it is you actually want.

Jan
Re: [PATCH RFC 04/10] domain: update GADDR based runstate guest area
Posted by Julien Grall 3 years, 1 month ago
Hi,

On 19/12/2022 12:48, Jan Beulich wrote:
> On 16.12.2022 13:26, Julien Grall wrote:
>> On 19/10/2022 08:41, Jan Beulich wrote:
>>> Before adding a new vCPU operation to register the runstate area by
>>> guest-physical address, add code to actually keep such areas up-to-date.
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>> ---
>>> RFC: Pages aren't marked dirty when written to (matching the handling of
>>>        space mapped by map_vcpu_info() afaict), on the basis that the
>>>        registrations are lost anyway across migration.
>>
>> So I agree for the existing migration. But I wonder whether we would
>> need to dirty those pages if we live-migrated a guest without its help
>> (IOW the registrations would still be present).
> 
> Even then I'd expect the area to be re-populated at the target, so the
> page contents would need moving over (perhaps multiple times) only if
> any other parts of such a page were written to.
You are right. I had another look how this was implemented and we indeed 
create a record for each vcpu_info/runstate to transfer the content.

> 
>> Anyway, nothing to worry about yet as this is not supported upstream.
> 
> I assume you've taken note of this for the transparent migration work.

Yes.

> One question after all is how you'd make handling of the area at the
> new target transparent, i.e. without any anomalies in the values the
> guest gets to see. It may very well be that every such area needs
> special treatment in the course of migrating, such that the most up-
> to-date values are reported as part of the migration stream, but
> separate from all the pages' contents.

AFAIK, vcpu_info doesn't need special treatment but the runstate needs 
because it contains a field 'state_entry_time' that is a system time in 
nanoseconds.

This could be solved by never exposing the system time to the guest and 
instead a relative value from when it starts.

> 
>>> Plus the contents
>>>        of the areas in question have to be deemed volatile in the first
>>>        place (so saving a "most recent" value is pretty meaningless even
>>>        for e.g. snapshotting).
>>>
>>> RFC: Can we perhaps avoid the VM-assist conditionals, assuming the more
>>>        modern behavior to apply uniformly for gaddr-based registrations?
>>
>> It is not clear why someone would want to use the old behavior with the
>> new gaddr-based registrations. So I would say yes.
> 
> Okay, will do.
> 
>>> RFC: HVM guests (on x86) can change bitness and hence layout (and size!
>>>        and alignment) of the runstate area. I don't think it is an option
>>>        to require 32-bit code to pass a range such that even the 64-bit
>>>        layout wouldn't cross a page boundary (and be suitably aligned). I
>>>        also don't see any other good solution, so for now a crude approach
>>>        with an extra boolean is used (using has_32bit_shinfo() isn't race
>>>        free and could hence lead to overrunning the mapped space).
>>
>> I think the extra check for 32-bit code to pass the check for 64-bit
>> layout would be better.
> 
> I'm afraid I can't derive from your reply what it is you actually want.

I think for 32-bit call, we also want to check the address provide will 
also pass the 64-bit check (i.e. if used as a 64-bit layout, the area 
would not cross a page boundary and be suitably aligned).

Cheers,

-- 
Julien Grall
Re: [PATCH RFC 04/10] domain: update GADDR based runstate guest area
Posted by Jan Beulich 3 years, 1 month ago
On 20.12.2022 09:40, Julien Grall wrote:
> On 19/12/2022 12:48, Jan Beulich wrote:
>> On 16.12.2022 13:26, Julien Grall wrote:
>>> On 19/10/2022 08:41, Jan Beulich wrote:
>>>> RFC: HVM guests (on x86) can change bitness and hence layout (and size!
>>>>        and alignment) of the runstate area. I don't think it is an option
>>>>        to require 32-bit code to pass a range such that even the 64-bit
>>>>        layout wouldn't cross a page boundary (and be suitably aligned). I
>>>>        also don't see any other good solution, so for now a crude approach
>>>>        with an extra boolean is used (using has_32bit_shinfo() isn't race
>>>>        free and could hence lead to overrunning the mapped space).
>>>
>>> I think the extra check for 32-bit code to pass the check for 64-bit
>>> layout would be better.
>>
>> I'm afraid I can't derive from your reply what it is you actually want.
> 
> I think for 32-bit call, we also want to check the address provide will 
> also pass the 64-bit check (i.e. if used as a 64-bit layout, the area 
> would not cross a page boundary and be suitably aligned).

But that's specifically what I say I don't think is an option. First and
foremost because of the implication on 32-bit callers: They're need to
use magic to get hold of the size of the 64-bit variant of the struct.

Jan
Re: [PATCH RFC 04/10] domain: update GADDR based runstate guest area
Posted by Julien Grall 3 years, 1 month ago
Hi Jan,

On 20/12/2022 08:45, Jan Beulich wrote:
> On 20.12.2022 09:40, Julien Grall wrote:
>> On 19/12/2022 12:48, Jan Beulich wrote:
>>> On 16.12.2022 13:26, Julien Grall wrote:
>>>> On 19/10/2022 08:41, Jan Beulich wrote:
>>>>> RFC: HVM guests (on x86) can change bitness and hence layout (and size!
>>>>>         and alignment) of the runstate area. I don't think it is an option
>>>>>         to require 32-bit code to pass a range such that even the 64-bit
>>>>>         layout wouldn't cross a page boundary (and be suitably aligned). I
>>>>>         also don't see any other good solution, so for now a crude approach
>>>>>         with an extra boolean is used (using has_32bit_shinfo() isn't race
>>>>>         free and could hence lead to overrunning the mapped space).
>>>>
>>>> I think the extra check for 32-bit code to pass the check for 64-bit
>>>> layout would be better.
>>>
>>> I'm afraid I can't derive from your reply what it is you actually want.
>>
>> I think for 32-bit call, we also want to check the address provide will
>> also pass the 64-bit check (i.e. if used as a 64-bit layout, the area
>> would not cross a page boundary and be suitably aligned).
> 
> But that's specifically what I say I don't think is an option. First and
> foremost because of the implication on 32-bit callers: They're need to
> use magic to get hold of the size of the 64-bit variant of the struct.

I understand that. But I am not aware of any other (simple) approach 
where you could have race free code.

So between a non-race free code and exposing the restriction to the 
guest, I would chose the latter.

Cheers,

-- 
Julien Grall
Re: [PATCH RFC 04/10] domain: update GADDR based runstate guest area
Posted by Jan Beulich 3 years, 1 month ago
On 20.12.2022 09:48, Julien Grall wrote:
> Hi Jan,
> 
> On 20/12/2022 08:45, Jan Beulich wrote:
>> On 20.12.2022 09:40, Julien Grall wrote:
>>> On 19/12/2022 12:48, Jan Beulich wrote:
>>>> On 16.12.2022 13:26, Julien Grall wrote:
>>>>> On 19/10/2022 08:41, Jan Beulich wrote:
>>>>>> RFC: HVM guests (on x86) can change bitness and hence layout (and size!
>>>>>>         and alignment) of the runstate area. I don't think it is an option
>>>>>>         to require 32-bit code to pass a range such that even the 64-bit
>>>>>>         layout wouldn't cross a page boundary (and be suitably aligned). I
>>>>>>         also don't see any other good solution, so for now a crude approach
>>>>>>         with an extra boolean is used (using has_32bit_shinfo() isn't race
>>>>>>         free and could hence lead to overrunning the mapped space).
>>>>>
>>>>> I think the extra check for 32-bit code to pass the check for 64-bit
>>>>> layout would be better.
>>>>
>>>> I'm afraid I can't derive from your reply what it is you actually want.
>>>
>>> I think for 32-bit call, we also want to check the address provide will
>>> also pass the 64-bit check (i.e. if used as a 64-bit layout, the area
>>> would not cross a page boundary and be suitably aligned).
>>
>> But that's specifically what I say I don't think is an option. First and
>> foremost because of the implication on 32-bit callers: They're need to
>> use magic to get hold of the size of the 64-bit variant of the struct.
> 
> I understand that. But I am not aware of any other (simple) approach 
> where you could have race free code.

My reference to using has_32bit_shinfo() not being race free was to avoid
suggestions in that direction. There's no use of this function in the
patch here, nor in the one where the new boolean field is actually written
to. The solution as presented is, afaict, both simple and race free. It
merely isn't very nice.

Jan

> So between a non-race free code and exposing the restriction to the 
> guest, I would chose the latter.
> 
> Cheers,
>
Re: [PATCH RFC 04/10] domain: update GADDR based runstate guest area
Posted by Julien Grall 3 years, 1 month ago

On 20/12/2022 09:59, Jan Beulich wrote:
> On 20.12.2022 09:48, Julien Grall wrote:
>> Hi Jan,
>>
>> On 20/12/2022 08:45, Jan Beulich wrote:
>>> On 20.12.2022 09:40, Julien Grall wrote:
>>>> On 19/12/2022 12:48, Jan Beulich wrote:
>>>>> On 16.12.2022 13:26, Julien Grall wrote:
>>>>>> On 19/10/2022 08:41, Jan Beulich wrote:
>>>>>>> RFC: HVM guests (on x86) can change bitness and hence layout (and size!
>>>>>>>          and alignment) of the runstate area. I don't think it is an option
>>>>>>>          to require 32-bit code to pass a range such that even the 64-bit
>>>>>>>          layout wouldn't cross a page boundary (and be suitably aligned). I
>>>>>>>          also don't see any other good solution, so for now a crude approach
>>>>>>>          with an extra boolean is used (using has_32bit_shinfo() isn't race
>>>>>>>          free and could hence lead to overrunning the mapped space).
>>>>>>
>>>>>> I think the extra check for 32-bit code to pass the check for 64-bit
>>>>>> layout would be better.
>>>>>
>>>>> I'm afraid I can't derive from your reply what it is you actually want.
>>>>
>>>> I think for 32-bit call, we also want to check the address provide will
>>>> also pass the 64-bit check (i.e. if used as a 64-bit layout, the area
>>>> would not cross a page boundary and be suitably aligned).
>>>
>>> But that's specifically what I say I don't think is an option. First and
>>> foremost because of the implication on 32-bit callers: They're need to
>>> use magic to get hold of the size of the 64-bit variant of the struct.
>>
>> I understand that. But I am not aware of any other (simple) approach
>> where you could have race free code.
> 
> My reference to using has_32bit_shinfo() not being race free was to avoid
> suggestions in that direction. There's no use of this function in the
> patch here, nor in the one where the new boolean field is actually written
> to. The solution as presented is, afaict, both simple and race free. It
> merely isn't very nice.

Ah! I misunderstood what you original wrote. Thanks for the clarification.

Cheers,

-- 
Julien Grall