[PATCH 2/2][4.17?] x86: wire up VCPUOP_register_vcpu_time_memory_area for 32-bit guests

Jan Beulich posted 2 patches 3 years, 4 months ago
[PATCH 2/2][4.17?] x86: wire up VCPUOP_register_vcpu_time_memory_area for 32-bit guests
Posted by Jan Beulich 3 years, 4 months ago
Forever sinced its introduction VCPUOP_register_vcpu_time_memory_area
was available only to native domains. Linux, for example, would attempt
to use it irrespective of guest bitness (including in its so called
PVHVM mode) as long as it finds XEN_PVCLOCK_TSC_STABLE_BIT set (which we
set only for clocksource=tsc, which in turn needs engaging via command
line option).

Fixes: a5d39947cb89 ("Allow guests to register secondary vcpu_time_info")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Is it actually correct for us to do cross-vCPU updates of the area here
(and also in the native counterpart as well as for runstate area
updates)? The virtual address may be valid for the given vCPU only, but
may be mapped to something else on the current vCPU (yet we only deal
with it not being mapped at all). Note how HVM code already calls
update_vcpu_system_time() only when v == current.

I'm surprised by Linux not using the secondary area in a broader
fashion. But I'm also surprised that they would only ever register an
area for vCPU 0.

--- a/xen/arch/x86/x86_64/domain.c
+++ b/xen/arch/x86/x86_64/domain.c
@@ -58,6 +58,26 @@ compat_vcpu_op(int cmd, unsigned int vcp
         break;
     }
 
+    case VCPUOP_register_vcpu_time_memory_area:
+    {
+        struct compat_vcpu_register_time_memory_area area = { .addr.p = 0 };
+
+        rc = -EFAULT;
+        if ( copy_from_guest(&area.addr.h, arg, 1) )
+            break;
+
+        if ( area.addr.h.c != area.addr.p ||
+             !compat_handle_okay(area.addr.h, 1) )
+            break;
+
+        rc = 0;
+        guest_from_compat_handle(v->arch.time_info_guest, area.addr.h);
+
+        force_update_vcpu_system_time(v);
+
+        break;
+    }
+
     case VCPUOP_send_nmi:
     case VCPUOP_get_physid:
         rc = do_vcpu_op(cmd, vcpuid, arg);
Re: [PATCH 2/2][4.17?] x86: wire up VCPUOP_register_vcpu_time_memory_area for 32-bit guests
Posted by Roger Pau Monné 3 years, 4 months ago
On Thu, Sep 29, 2022 at 11:51:40AM +0200, Jan Beulich wrote:
> Forever sinced its introduction VCPUOP_register_vcpu_time_memory_area
> was available only to native domains. Linux, for example, would attempt
> to use it irrespective of guest bitness (including in its so called
> PVHVM mode) as long as it finds XEN_PVCLOCK_TSC_STABLE_BIT set (which we
> set only for clocksource=tsc, which in turn needs engaging via command
> line option).
> 
> Fixes: a5d39947cb89 ("Allow guests to register secondary vcpu_time_info")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Albeit I have concerns with the notes you raise below, not sure we
also want to introduce a (broken') compat version of the same
hypercall wrt v != current.

> ---
> Is it actually correct for us to do cross-vCPU updates of the area here
> (and also in the native counterpart as well as for runstate area
> updates)? The virtual address may be valid for the given vCPU only, but
> may be mapped to something else on the current vCPU (yet we only deal
> with it not being mapped at all). Note how HVM code already calls
> update_vcpu_system_time() only when v == current.
> 
> I'm surprised by Linux not using the secondary area in a broader
> fashion. But I'm also surprised that they would only ever register an
> area for vCPU 0.

Would be better to update locally just when v == current, otherwise
issue an IPI to the remote vCPU dirty mask and force an update on
resume to guest path?

> 
> --- a/xen/arch/x86/x86_64/domain.c
> +++ b/xen/arch/x86/x86_64/domain.c
> @@ -58,6 +58,26 @@ compat_vcpu_op(int cmd, unsigned int vcp
>          break;
>      }
>  
> +    case VCPUOP_register_vcpu_time_memory_area:
> +    {
> +        struct compat_vcpu_register_time_memory_area area = { .addr.p = 0 };

Why not just use { } to initialize?

Thanks, Roger.

Re: [PATCH 2/2][4.17?] x86: wire up VCPUOP_register_vcpu_time_memory_area for 32-bit guests
Posted by Jan Beulich 3 years, 4 months ago
On 29.09.2022 13:37, Roger Pau Monné wrote:
> On Thu, Sep 29, 2022 at 11:51:40AM +0200, Jan Beulich wrote:
>> Forever sinced its introduction VCPUOP_register_vcpu_time_memory_area
>> was available only to native domains. Linux, for example, would attempt
>> to use it irrespective of guest bitness (including in its so called
>> PVHVM mode) as long as it finds XEN_PVCLOCK_TSC_STABLE_BIT set (which we
>> set only for clocksource=tsc, which in turn needs engaging via command
>> line option).
>>
>> Fixes: a5d39947cb89 ("Allow guests to register secondary vcpu_time_info")
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

> Albeit I have concerns with the notes you raise below, not sure we
> also want to introduce a (broken') compat version of the same
> hypercall wrt v != current.

Since "compat" is ambiguous in this context - I guess you don't mean
just the variant for compat guests? Independent of that I'm afraid I
don't see how a separate variant would help: If the cross-vCPU
copying is not okay, then existing users as affected already, and
would rather have the single variant work correctly. And if the newly
added variant would be the one with the broken behavior, why would
anyone switch to using it?

>> ---
>> Is it actually correct for us to do cross-vCPU updates of the area here
>> (and also in the native counterpart as well as for runstate area
>> updates)? The virtual address may be valid for the given vCPU only, but
>> may be mapped to something else on the current vCPU (yet we only deal
>> with it not being mapped at all). Note how HVM code already calls
>> update_vcpu_system_time() only when v == current.
>>
>> I'm surprised by Linux not using the secondary area in a broader
>> fashion. But I'm also surprised that they would only ever register an
>> area for vCPU 0.
> 
> Would be better to update locally just when v == current, otherwise
> issue an IPI to the remote vCPU dirty mask and force an update on
> resume to guest path?

Yes, that's the outline of how to deal with this _if_ we determine the
present behavior is flawed. Determination is difficult since, like with
so many things, this is something that's not spelled out anywhere.

>> --- a/xen/arch/x86/x86_64/domain.c
>> +++ b/xen/arch/x86/x86_64/domain.c
>> @@ -58,6 +58,26 @@ compat_vcpu_op(int cmd, unsigned int vcp
>>          break;
>>      }
>>  
>> +    case VCPUOP_register_vcpu_time_memory_area:
>> +    {
>> +        struct compat_vcpu_register_time_memory_area area = { .addr.p = 0 };
> 
> Why not just use { } to initialize?

I wanted to match (a) the VCPUOP_register_runstate_memory_area handling
further up, just without using a separate assignment and (b) this line

        if ( area.addr.h.c != area.addr.p ||

And yes, initially I did consider using just { }.

Jan

RE: [PATCH 2/2][4.17?] x86: wire up VCPUOP_register_vcpu_time_memory_area for 32-bit guests
Posted by Henry Wang 3 years, 4 months ago
Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Subject: [PATCH 2/2][4.17?] x86: wire up
> VCPUOP_register_vcpu_time_memory_area for 32-bit guests
> 
> Forever sinced its introduction VCPUOP_register_vcpu_time_memory_area
> was available only to native domains. Linux, for example, would attempt
> to use it irrespective of guest bitness (including in its so called
> PVHVM mode) as long as it finds XEN_PVCLOCK_TSC_STABLE_BIT set (which
> we
> set only for clocksource=tsc, which in turn needs engaging via command
> line option).
> 
> Fixes: a5d39947cb89 ("Allow guests to register secondary vcpu_time_info")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Is it actually correct for us to do cross-vCPU updates of the area here
> (and also in the native counterpart as well as for runstate area
> updates)? The virtual address may be valid for the given vCPU only, but
> may be mapped to something else on the current vCPU (yet we only deal
> with it not being mapped at all). Note how HVM code already calls
> update_vcpu_system_time() only when v == current.
> 
> I'm surprised by Linux not using the secondary area in a broader
> fashion. But I'm also surprised that they would only ever register an
> area for vCPU 0.

I re-read the guide for release manager, and it tells me that "in feature
freeze and early stage of code freeze, bug fixes are encouraged to be
merged, while in the late stage of code freeze, complex bug fixes might
be rejected if the risk of accepting is higher than the risk of rejecting it".

Hence I guess in current stage, I would not block this patch for release.
If this patch is acked/reviewed by other x86 maintainers, feel free to add:

Release-acked-by: Henry Wang <Henry.Wang@arm.com>

Kind regards,
Henry