Before adding a new vCPU operation to register the runstate area by
guest-physical address, add code to actually keep such areas up-to-date.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
RFC: Pages aren't marked dirty when written to (matching the handling of
space mapped by map_vcpu_info() afaict), on the basis that the
registrations are lost anyway across migration. Plus the contents
of the areas in question have to be deemed volatile in the first
place (so saving a "most recent" value is pretty meaningless even
for e.g. snapshotting).
RFC: Can we perhaps avoid the VM-assist conditionals, assuming the more
modern behavior to apply uniformly for gaddr-based registrations?
RFC: HVM guests (on x86) can change bitness and hence layout (and size!
and alignment) of the runstate area. I don't think it is an option
to require 32-bit code to pass a range such that even the 64-bit
layout wouldn't cross a page boundary (and be suitably aligned). I
also don't see any other good solution, so for now a crude approach
with an extra boolean is used (using has_32bit_shinfo() isn't race
free and could hence lead to overrunning the mapped space).
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1599,14 +1599,55 @@ bool update_runstate_area(struct vcpu *v
struct guest_memory_policy policy = { };
void __user *guest_handle = NULL;
struct vcpu_runstate_info runstate;
+ struct vcpu_runstate_info *map = v->runstate_guest_area.map;
+
+ memcpy(&runstate, &v->runstate, sizeof(runstate));
+
+ if ( map )
+ {
+ uint64_t *pset = NULL;
+#ifdef CONFIG_COMPAT
+ struct compat_vcpu_runstate_info *cmap = NULL;
+
+ if ( v->runstate_guest_area_compat )
+ cmap = (void *)map;
+#endif
+
+ if ( VM_ASSIST(v->domain, runstate_update_flag) )
+ {
+#ifdef CONFIG_COMPAT
+ if ( cmap )
+ pset = &cmap->state_entry_time;
+ else
+#endif
+ pset = &map->state_entry_time;
+ runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
+ write_atomic(pset, runstate.state_entry_time);
+ smp_wmb();
+ }
+
+#ifdef CONFIG_COMPAT
+ if ( cmap )
+ XLAT_vcpu_runstate_info(cmap, &runstate);
+ else
+#endif
+ *map = runstate;
+
+ if ( pset )
+ {
+ smp_wmb();
+ runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
+ write_atomic(pset, runstate.state_entry_time);
+ }
+
+ return true;
+ }
if ( guest_handle_is_null(runstate_guest(v)) )
return true;
update_guest_memory_policy(v, &policy);
- memcpy(&runstate, &v->runstate, sizeof(runstate));
-
if ( VM_ASSIST(v->domain, runstate_update_flag) )
{
#ifdef CONFIG_COMPAT
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -231,6 +231,8 @@ struct vcpu
#ifdef CONFIG_COMPAT
/* A hypercall is using the compat ABI? */
bool hcall_compat;
+ /* Physical runstate area registered via compat ABI? */
+ bool runstate_guest_area_compat;
#endif
#ifdef CONFIG_IOREQ_SERVER
Hi Jan,
On 19/10/2022 08:41, Jan Beulich wrote:
> Before adding a new vCPU operation to register the runstate area by
> guest-physical address, add code to actually keep such areas up-to-date.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> RFC: Pages aren't marked dirty when written to (matching the handling of
> space mapped by map_vcpu_info() afaict), on the basis that the
> registrations are lost anyway across migration.
So I agree for the existing migration. But I wonder whether we would
need to dirty those pages if we live-migrated a guest without its help
(IOW the registrations would still be present).
Anyway, nothing to worry about yet as this is not supported upstream.
> Plus the contents
> of the areas in question have to be deemed volatile in the first
> place (so saving a "most recent" value is pretty meaningless even
> for e.g. snapshotting).
>
> RFC: Can we perhaps avoid the VM-assist conditionals, assuming the more
> modern behavior to apply uniformly for gaddr-based registrations?
It is not clear why someone would want to use the old behavior with the
new gaddr-based registrations. So I would say yes.
>
> RFC: HVM guests (on x86) can change bitness and hence layout (and size!
> and alignment) of the runstate area. I don't think it is an option
> to require 32-bit code to pass a range such that even the 64-bit
> layout wouldn't cross a page boundary (and be suitably aligned). I
> also don't see any other good solution, so for now a crude approach
> with an extra boolean is used (using has_32bit_shinfo() isn't race
> free and could hence lead to overrunning the mapped space).
I think the extra check for 32-bit code to pass the check for 64-bit
layout would be better.
>
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -1599,14 +1599,55 @@ bool update_runstate_area(struct vcpu *v
> struct guest_memory_policy policy = { };
> void __user *guest_handle = NULL;
> struct vcpu_runstate_info runstate;
> + struct vcpu_runstate_info *map = v->runstate_guest_area.map;
> +
> + memcpy(&runstate, &v->runstate, sizeof(runstate));
> +
> + if ( map )
> + {
> + uint64_t *pset = NULL;
> +#ifdef CONFIG_COMPAT
> + struct compat_vcpu_runstate_info *cmap = NULL;
> +
> + if ( v->runstate_guest_area_compat )
> + cmap = (void *)map;
> +#endif
> +
> + if ( VM_ASSIST(v->domain, runstate_update_flag) )
> + {
> +#ifdef CONFIG_COMPAT
> + if ( cmap )
> + pset = &cmap->state_entry_time;
> + else
> +#endif
> + pset = &map->state_entry_time;
> + runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
> + write_atomic(pset, runstate.state_entry_time);
> + smp_wmb();
> + }
> +
> +#ifdef CONFIG_COMPAT
> + if ( cmap )
> + XLAT_vcpu_runstate_info(cmap, &runstate);
> + else
> +#endif
> + *map = runstate;
> +
> + if ( pset )
> + {
> + smp_wmb();
> + runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
> + write_atomic(pset, runstate.state_entry_time);
> + }
> +
> + return true;
> + }
>
> if ( guest_handle_is_null(runstate_guest(v)) )
> return true;
>
> update_guest_memory_policy(v, &policy);
>
> - memcpy(&runstate, &v->runstate, sizeof(runstate));
> -
> if ( VM_ASSIST(v->domain, runstate_update_flag) )
> {
> #ifdef CONFIG_COMPAT
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -231,6 +231,8 @@ struct vcpu
> #ifdef CONFIG_COMPAT
> /* A hypercall is using the compat ABI? */
> bool hcall_compat;
> + /* Physical runstate area registered via compat ABI? */
> + bool runstate_guest_area_compat;
> #endif
>
> #ifdef CONFIG_IOREQ_SERVER
>
Cheers,
--
Julien Grall
On 16.12.2022 13:26, Julien Grall wrote: > On 19/10/2022 08:41, Jan Beulich wrote: >> Before adding a new vCPU operation to register the runstate area by >> guest-physical address, add code to actually keep such areas up-to-date. >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> --- >> RFC: Pages aren't marked dirty when written to (matching the handling of >> space mapped by map_vcpu_info() afaict), on the basis that the >> registrations are lost anyway across migration. > > So I agree for the existing migration. But I wonder whether we would > need to dirty those pages if we live-migrated a guest without its help > (IOW the registrations would still be present). Even then I'd expect the area to be re-populated at the target, so the page contents would need moving over (perhaps multiple times) only if any other parts of such a page were written to. > Anyway, nothing to worry about yet as this is not supported upstream. I assume you've taken note of this for the transparent migration work. One question after all is how you'd make handling of the area at the new target transparent, i.e. without any anomalies in the values the guest gets to see. It may very well be that every such area needs special treatment in the course of migrating, such that the most up- to-date values are reported as part of the migration stream, but separate from all the pages' contents. >> Plus the contents >> of the areas in question have to be deemed volatile in the first >> place (so saving a "most recent" value is pretty meaningless even >> for e.g. snapshotting). >> >> RFC: Can we perhaps avoid the VM-assist conditionals, assuming the more >> modern behavior to apply uniformly for gaddr-based registrations? > > It is not clear why someone would want to use the old behavior with the > new gaddr-based registrations. So I would say yes. Okay, will do. >> RFC: HVM guests (on x86) can change bitness and hence layout (and size! >> and alignment) of the runstate area. I don't think it is an option >> to require 32-bit code to pass a range such that even the 64-bit >> layout wouldn't cross a page boundary (and be suitably aligned). I >> also don't see any other good solution, so for now a crude approach >> with an extra boolean is used (using has_32bit_shinfo() isn't race >> free and could hence lead to overrunning the mapped space). > > I think the extra check for 32-bit code to pass the check for 64-bit > layout would be better. I'm afraid I can't derive from your reply what it is you actually want. Jan
Hi, On 19/12/2022 12:48, Jan Beulich wrote: > On 16.12.2022 13:26, Julien Grall wrote: >> On 19/10/2022 08:41, Jan Beulich wrote: >>> Before adding a new vCPU operation to register the runstate area by >>> guest-physical address, add code to actually keep such areas up-to-date. >>> >>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>> --- >>> RFC: Pages aren't marked dirty when written to (matching the handling of >>> space mapped by map_vcpu_info() afaict), on the basis that the >>> registrations are lost anyway across migration. >> >> So I agree for the existing migration. But I wonder whether we would >> need to dirty those pages if we live-migrated a guest without its help >> (IOW the registrations would still be present). > > Even then I'd expect the area to be re-populated at the target, so the > page contents would need moving over (perhaps multiple times) only if > any other parts of such a page were written to. You are right. I had another look how this was implemented and we indeed create a record for each vcpu_info/runstate to transfer the content. > >> Anyway, nothing to worry about yet as this is not supported upstream. > > I assume you've taken note of this for the transparent migration work. Yes. > One question after all is how you'd make handling of the area at the > new target transparent, i.e. without any anomalies in the values the > guest gets to see. It may very well be that every such area needs > special treatment in the course of migrating, such that the most up- > to-date values are reported as part of the migration stream, but > separate from all the pages' contents. AFAIK, vcpu_info doesn't need special treatment but the runstate needs because it contains a field 'state_entry_time' that is a system time in nanoseconds. This could be solved by never exposing the system time to the guest and instead a relative value from when it starts. > >>> Plus the contents >>> of the areas in question have to be deemed volatile in the first >>> place (so saving a "most recent" value is pretty meaningless even >>> for e.g. snapshotting). >>> >>> RFC: Can we perhaps avoid the VM-assist conditionals, assuming the more >>> modern behavior to apply uniformly for gaddr-based registrations? >> >> It is not clear why someone would want to use the old behavior with the >> new gaddr-based registrations. So I would say yes. > > Okay, will do. > >>> RFC: HVM guests (on x86) can change bitness and hence layout (and size! >>> and alignment) of the runstate area. I don't think it is an option >>> to require 32-bit code to pass a range such that even the 64-bit >>> layout wouldn't cross a page boundary (and be suitably aligned). I >>> also don't see any other good solution, so for now a crude approach >>> with an extra boolean is used (using has_32bit_shinfo() isn't race >>> free and could hence lead to overrunning the mapped space). >> >> I think the extra check for 32-bit code to pass the check for 64-bit >> layout would be better. > > I'm afraid I can't derive from your reply what it is you actually want. I think for 32-bit call, we also want to check the address provide will also pass the 64-bit check (i.e. if used as a 64-bit layout, the area would not cross a page boundary and be suitably aligned). Cheers, -- Julien Grall
On 20.12.2022 09:40, Julien Grall wrote: > On 19/12/2022 12:48, Jan Beulich wrote: >> On 16.12.2022 13:26, Julien Grall wrote: >>> On 19/10/2022 08:41, Jan Beulich wrote: >>>> RFC: HVM guests (on x86) can change bitness and hence layout (and size! >>>> and alignment) of the runstate area. I don't think it is an option >>>> to require 32-bit code to pass a range such that even the 64-bit >>>> layout wouldn't cross a page boundary (and be suitably aligned). I >>>> also don't see any other good solution, so for now a crude approach >>>> with an extra boolean is used (using has_32bit_shinfo() isn't race >>>> free and could hence lead to overrunning the mapped space). >>> >>> I think the extra check for 32-bit code to pass the check for 64-bit >>> layout would be better. >> >> I'm afraid I can't derive from your reply what it is you actually want. > > I think for 32-bit call, we also want to check the address provide will > also pass the 64-bit check (i.e. if used as a 64-bit layout, the area > would not cross a page boundary and be suitably aligned). But that's specifically what I say I don't think is an option. First and foremost because of the implication on 32-bit callers: They're need to use magic to get hold of the size of the 64-bit variant of the struct. Jan
Hi Jan, On 20/12/2022 08:45, Jan Beulich wrote: > On 20.12.2022 09:40, Julien Grall wrote: >> On 19/12/2022 12:48, Jan Beulich wrote: >>> On 16.12.2022 13:26, Julien Grall wrote: >>>> On 19/10/2022 08:41, Jan Beulich wrote: >>>>> RFC: HVM guests (on x86) can change bitness and hence layout (and size! >>>>> and alignment) of the runstate area. I don't think it is an option >>>>> to require 32-bit code to pass a range such that even the 64-bit >>>>> layout wouldn't cross a page boundary (and be suitably aligned). I >>>>> also don't see any other good solution, so for now a crude approach >>>>> with an extra boolean is used (using has_32bit_shinfo() isn't race >>>>> free and could hence lead to overrunning the mapped space). >>>> >>>> I think the extra check for 32-bit code to pass the check for 64-bit >>>> layout would be better. >>> >>> I'm afraid I can't derive from your reply what it is you actually want. >> >> I think for 32-bit call, we also want to check the address provide will >> also pass the 64-bit check (i.e. if used as a 64-bit layout, the area >> would not cross a page boundary and be suitably aligned). > > But that's specifically what I say I don't think is an option. First and > foremost because of the implication on 32-bit callers: They're need to > use magic to get hold of the size of the 64-bit variant of the struct. I understand that. But I am not aware of any other (simple) approach where you could have race free code. So between a non-race free code and exposing the restriction to the guest, I would chose the latter. Cheers, -- Julien Grall
On 20.12.2022 09:48, Julien Grall wrote: > Hi Jan, > > On 20/12/2022 08:45, Jan Beulich wrote: >> On 20.12.2022 09:40, Julien Grall wrote: >>> On 19/12/2022 12:48, Jan Beulich wrote: >>>> On 16.12.2022 13:26, Julien Grall wrote: >>>>> On 19/10/2022 08:41, Jan Beulich wrote: >>>>>> RFC: HVM guests (on x86) can change bitness and hence layout (and size! >>>>>> and alignment) of the runstate area. I don't think it is an option >>>>>> to require 32-bit code to pass a range such that even the 64-bit >>>>>> layout wouldn't cross a page boundary (and be suitably aligned). I >>>>>> also don't see any other good solution, so for now a crude approach >>>>>> with an extra boolean is used (using has_32bit_shinfo() isn't race >>>>>> free and could hence lead to overrunning the mapped space). >>>>> >>>>> I think the extra check for 32-bit code to pass the check for 64-bit >>>>> layout would be better. >>>> >>>> I'm afraid I can't derive from your reply what it is you actually want. >>> >>> I think for 32-bit call, we also want to check the address provide will >>> also pass the 64-bit check (i.e. if used as a 64-bit layout, the area >>> would not cross a page boundary and be suitably aligned). >> >> But that's specifically what I say I don't think is an option. First and >> foremost because of the implication on 32-bit callers: They're need to >> use magic to get hold of the size of the 64-bit variant of the struct. > > I understand that. But I am not aware of any other (simple) approach > where you could have race free code. My reference to using has_32bit_shinfo() not being race free was to avoid suggestions in that direction. There's no use of this function in the patch here, nor in the one where the new boolean field is actually written to. The solution as presented is, afaict, both simple and race free. It merely isn't very nice. Jan > So between a non-race free code and exposing the restriction to the > guest, I would chose the latter. > > Cheers, >
On 20/12/2022 09:59, Jan Beulich wrote: > On 20.12.2022 09:48, Julien Grall wrote: >> Hi Jan, >> >> On 20/12/2022 08:45, Jan Beulich wrote: >>> On 20.12.2022 09:40, Julien Grall wrote: >>>> On 19/12/2022 12:48, Jan Beulich wrote: >>>>> On 16.12.2022 13:26, Julien Grall wrote: >>>>>> On 19/10/2022 08:41, Jan Beulich wrote: >>>>>>> RFC: HVM guests (on x86) can change bitness and hence layout (and size! >>>>>>> and alignment) of the runstate area. I don't think it is an option >>>>>>> to require 32-bit code to pass a range such that even the 64-bit >>>>>>> layout wouldn't cross a page boundary (and be suitably aligned). I >>>>>>> also don't see any other good solution, so for now a crude approach >>>>>>> with an extra boolean is used (using has_32bit_shinfo() isn't race >>>>>>> free and could hence lead to overrunning the mapped space). >>>>>> >>>>>> I think the extra check for 32-bit code to pass the check for 64-bit >>>>>> layout would be better. >>>>> >>>>> I'm afraid I can't derive from your reply what it is you actually want. >>>> >>>> I think for 32-bit call, we also want to check the address provide will >>>> also pass the 64-bit check (i.e. if used as a 64-bit layout, the area >>>> would not cross a page boundary and be suitably aligned). >>> >>> But that's specifically what I say I don't think is an option. First and >>> foremost because of the implication on 32-bit callers: They're need to >>> use magic to get hold of the size of the 64-bit variant of the struct. >> >> I understand that. But I am not aware of any other (simple) approach >> where you could have race free code. > > My reference to using has_32bit_shinfo() not being race free was to avoid > suggestions in that direction. There's no use of this function in the > patch here, nor in the one where the new boolean field is actually written > to. The solution as presented is, afaict, both simple and race free. It > merely isn't very nice. Ah! I misunderstood what you original wrote. Thanks for the clarification. Cheers, -- Julien Grall
© 2016 - 2026 Red Hat, Inc.