From: Paul Durrant <pdurrant@amazon.com>
The VMM should only need to set the KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO
attribute in response to a VCPUOP_register_vcpu_info hypercall. We can
handle the default case internally since we already know where the
shared_info page is. Modify get_vcpu_info_cache() to pass back a pointer
to the shared info pfn cache (and appropriate offset) for any of the
first 32 vCPUs if the attribute has not been set.
A VMM will be able to determine whether it needs to set up default
vcpu_info using the previously defined KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA
Xen capability flag, which will be advertized in a subsequent patch.
If the VMM subsequently sets a vcpu_info in response to the aforementioned
hypercall then the content of the default vcpu_info will be copied to the
new location.
Also update the KVM API documentation to describe the new behaviour.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: x86@kernel.org
v4:
- Copy the default vcpu_info content when an explicit vcpu_info is set.
- Amend API documentation.
v3:
- Add a note to the API documentation discussing vcpu_info copying.
v2:
- Dispense with the KVM_XEN_HVM_CONFIG_DEFAULT_VCPU_INFO capability.
- Add API documentation.
---
Documentation/virt/kvm/api.rst | 22 ++++---
arch/x86/kvm/xen.c | 110 ++++++++++++++++++++++++++++++---
2 files changed, 117 insertions(+), 15 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 8d85fd7709f0..48c86108efca 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5442,13 +5442,7 @@ KVM_XEN_ATTR_TYPE_LONG_MODE
KVM_XEN_ATTR_TYPE_SHARED_INFO
Sets the guest physical frame number at which the Xen shared_info
- page resides. Note that although Xen places vcpu_info for the first
- 32 vCPUs in the shared_info page, KVM does not automatically do so
- and instead requires that KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO be used
- explicitly even when the vcpu_info for a given vCPU resides at the
- "default" location in the shared_info page. This is because KVM may
- not be aware of the Xen CPU id which is used as the index into the
- vcpu_info[] array, so may know the correct default location.
+ page resides.
Note that the shared_info page may be constantly written to by KVM;
it contains the event channel bitmap used to deliver interrupts to
@@ -5564,12 +5558,26 @@ type values:
KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO
Sets the guest physical address of the vcpu_info for a given vCPU.
+ The location of the shared_info page for the VM must be specified before
+ this attribute is set.
+
+ If the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA flag is also set in the
+ Xen capabilities then the VMM is not required to set this default
+ location for any of the first 32 vCPUs; KVM will handle that
+ internally (using the vcpu_info embedded in the shared_info page).
+ Otherwise this attribute must be set for all vCPUs.
+
As with the shared_info page for the VM, the corresponding page may be
dirtied at any time if event channel interrupt delivery is enabled, so
userspace should always assume that the page is dirty without relying
on dirty logging. Setting the gpa to KVM_XEN_INVALID_GPA will disable
the vcpu_info.
+ Note that once vcpu_info is set then it may not be set again unless
+ it is explicitly disabled beforehand. If it is set for any of the
+ first 32 vCPUS then the content of the default vcpu_info will be copied
+ into the specified location.
+
KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO
Sets the guest physical address of an additional pvclock structure
for a given vCPU. This is typically used for guest vsyscall support.
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index a9aada47e9b8..f0ac535300dd 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -491,6 +491,21 @@ static void kvm_xen_inject_vcpu_vector(struct kvm_vcpu *v)
static struct gfn_to_pfn_cache *get_vcpu_info_cache(struct kvm_vcpu *v, unsigned long *offset)
{
+ if (!v->arch.xen.vcpu_info_cache.active && v->arch.xen.vcpu_id < MAX_VIRT_CPUS) {
+ struct kvm *kvm = v->kvm;
+
+ if (offset) {
+ if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode)
+ *offset = offsetof(struct shared_info,
+ vcpu_info[v->arch.xen.vcpu_id]);
+ else
+ *offset = offsetof(struct compat_shared_info,
+ vcpu_info[v->arch.xen.vcpu_id]);
+ }
+
+ return &kvm->arch.xen.shinfo_cache;
+ }
+
if (offset)
*offset = 0;
@@ -764,6 +779,92 @@ static int kvm_xen_set_vcpu_id(struct kvm_vcpu *vcpu, unsigned int vcpu_id)
return 0;
}
+static int kvm_xen_set_vcpu_info(struct kvm_vcpu *vcpu, gpa_t gpa)
+{
+ struct kvm *kvm = vcpu->kvm;
+ struct gfn_to_pfn_cache *si_gpc = &kvm->arch.xen.shinfo_cache;
+ struct gfn_to_pfn_cache *vi_gpc = &vcpu->arch.xen.vcpu_info_cache;
+ unsigned long flags;
+ unsigned long offset;
+ int ret;
+
+ if (gpa == KVM_XEN_INVALID_GPA) {
+ kvm_gpc_deactivate(vi_gpc);
+ return 0;
+ }
+
+ /*
+ * In Xen it is not possible for an explicit vcpu_info to be set
+ * before the shared_info exists since the former is done in response
+ * to a hypercall and the latter is set up as part of domain creation.
+ * The first 32 vCPUs have a default vcpu_info embedded in shared_info
+ * the content of which is copied across when an explicit vcpu_info is
+ * set, which can also clearly not be done if we don't know where the
+ * shared_info is. Hence we need to enforce that the shared_info cache
+ * is active here.
+ */
+ if (!si_gpc->active)
+ return -EINVAL;
+
+ /* Setting an explicit vcpu_info is a one-off operation */
+ if (vi_gpc->active)
+ return -EINVAL;
+
+ ret = kvm_gpc_activate(vi_gpc, gpa, sizeof(struct vcpu_info));
+ if (ret)
+ return ret;
+
+ /* Nothing more to do if the vCPU is not among the first 32 */
+ if (vcpu->arch.xen.vcpu_id >= MAX_VIRT_CPUS)
+ return 0;
+
+ /*
+ * It's possible that the vcpu_info cache has been invalidated since
+ * we activated it so we need to go through the check-refresh dance.
+ */
+ read_lock_irqsave(&vi_gpc->lock, flags);
+ while (!kvm_gpc_check(vi_gpc, sizeof(struct vcpu_info))) {
+ read_unlock_irqrestore(&vi_gpc->lock, flags);
+
+ ret = kvm_gpc_refresh(vi_gpc, sizeof(struct vcpu_info));
+ if (ret) {
+ kvm_gpc_deactivate(vi_gpc);
+ return ret;
+ }
+
+ read_lock_irqsave(&vi_gpc->lock, flags);
+ }
+
+ /* Now lock the shared_info cache so we can copy the vcpu_info */
+ read_lock(&si_gpc->lock);
+ while (!kvm_gpc_check(si_gpc, PAGE_SIZE)) {
+ read_unlock(&si_gpc->lock);
+
+ ret = kvm_gpc_refresh(si_gpc, PAGE_SIZE);
+ if (ret) {
+ read_unlock_irqrestore(&vi_gpc->lock, flags);
+ kvm_gpc_deactivate(vi_gpc);
+ return ret;
+ }
+
+ read_lock(&si_gpc->lock);
+ }
+
+ if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode)
+ offset = offsetof(struct shared_info,
+ vcpu_info[vcpu->arch.xen.vcpu_id]);
+ else
+ offset = offsetof(struct compat_shared_info,
+ vcpu_info[vcpu->arch.xen.vcpu_id]);
+
+ memcpy(vi_gpc->khva, si_gpc->khva + offset, sizeof(struct vcpu_info));
+
+ read_unlock(&si_gpc->lock);
+ read_unlock_irqrestore(&vi_gpc->lock, flags);
+
+ return 0;
+}
+
int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)
{
int idx, r = -ENOENT;
@@ -779,14 +880,7 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)
BUILD_BUG_ON(offsetof(struct vcpu_info, time) !=
offsetof(struct compat_vcpu_info, time));
- if (data->u.gpa == KVM_XEN_INVALID_GPA) {
- kvm_gpc_deactivate(&vcpu->arch.xen.vcpu_info_cache);
- r = 0;
- break;
- }
-
- r = kvm_gpc_activate(&vcpu->arch.xen.vcpu_info_cache,
- data->u.gpa, sizeof(struct vcpu_info));
+ r = kvm_xen_set_vcpu_info(vcpu, data->u.gpa);
if (!r)
kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
--
2.39.2
On Tue, 2023-09-19 at 13:41 +0000, Paul Durrant wrote:
> --- a/arch/x86/kvm/xen.c
> +++ b/arch/x86/kvm/xen.c
> @@ -491,6 +491,21 @@ static void kvm_xen_inject_vcpu_vector(struct kvm_vcpu *v)
>
> static struct gfn_to_pfn_cache *get_vcpu_info_cache(struct kvm_vcpu *v, unsigned long *offset)
> {
> + if (!v->arch.xen.vcpu_info_cache.active && v->arch.xen.vcpu_id < MAX_VIRT_CPUS) {
> + struct kvm *kvm = v->kvm;
> +
> + if (offset) {
> + if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode)
> + *offset = offsetof(struct shared_info,
> + vcpu_info[v->arch.xen.vcpu_id]);
> + else
> + *offset = offsetof(struct compat_shared_info,
> + vcpu_info[v->arch.xen.vcpu_id]);
> + }
> +
> + return &kvm->arch.xen.shinfo_cache;
> + }
> +
> if (offset)
> *offset = 0;
>
> @@ -764,6 +779,92 @@ static int kvm_xen_set_vcpu_id(struct kvm_vcpu *vcpu, unsigned int vcpu_id)
> return 0;
> }
>
> +static int kvm_xen_set_vcpu_info(struct kvm_vcpu *vcpu, gpa_t gpa)
> +{
> + struct kvm *kvm = vcpu->kvm;
> + struct gfn_to_pfn_cache *si_gpc = &kvm->arch.xen.shinfo_cache;
> + struct gfn_to_pfn_cache *vi_gpc = &vcpu->arch.xen.vcpu_info_cache;
> + unsigned long flags;
> + unsigned long offset;
> + int ret;
> +
> + if (gpa == KVM_XEN_INVALID_GPA) {
> + kvm_gpc_deactivate(vi_gpc);
> + return 0;
> + }
> +
> + /*
> + * In Xen it is not possible for an explicit vcpu_info to be set
> + * before the shared_info exists since the former is done in response
> + * to a hypercall and the latter is set up as part of domain creation.
> + * The first 32 vCPUs have a default vcpu_info embedded in shared_info
> + * the content of which is copied across when an explicit vcpu_info is
> + * set, which can also clearly not be done if we don't know where the
> + * shared_info is. Hence we need to enforce that the shared_info cache
> + * is active here.
> + */
> + if (!si_gpc->active)
> + return -EINVAL;
> +
> + /* Setting an explicit vcpu_info is a one-off operation */
> + if (vi_gpc->active)
> + return -EINVAL;
Is that the errno that Xen will return to the hypercall if a guest
tries it? I.e. if the VMM simply returns the errno that it gets from
the kernel, is that OK?
> + ret = kvm_gpc_activate(vi_gpc, gpa, sizeof(struct vcpu_info));
From this moment, can't interrupts be delivered to the new vcpu_info,
even though the memcpy hasn't happened yet?
I think we need to ensure that any kvm_xen_set_evtchn_fast() which
happens at this point cannot proceed, and falls back to the slow path.
Can we set a flag before we activate the vcpu_info and clear it after
the memcpy is done, then make kvm_xen_set_evtchn_fast() return
EWOULDBLOCK whenever that flag is set?
The slow path in kvm_xen_set_evtchn() takes kvm->arch.xen.xen_lock and
I think kvm_xen_vcpu_set_attr() has taken that same lock before you get
to this code, so it works out nicely?
> + if (ret)
> + return ret;
> +
> + /* Nothing more to do if the vCPU is not among the first 32 */
> + if (vcpu->arch.xen.vcpu_id >= MAX_VIRT_CPUS)
> + return 0;
> +
> + /*
> + * It's possible that the vcpu_info cache has been invalidated since
> + * we activated it so we need to go through the check-refresh dance.
> + */
> + read_lock_irqsave(&vi_gpc->lock, flags);
> + while (!kvm_gpc_check(vi_gpc, sizeof(struct vcpu_info))) {
> + read_unlock_irqrestore(&vi_gpc->lock, flags);
> +
> + ret = kvm_gpc_refresh(vi_gpc, sizeof(struct vcpu_info));
> + if (ret) {
> + kvm_gpc_deactivate(vi_gpc);
> + return ret;
> + }
> +
> + read_lock_irqsave(&vi_gpc->lock, flags);
> + }
> +
> + /* Now lock the shared_info cache so we can copy the vcpu_info */
> + read_lock(&si_gpc->lock);
This adds a new lock ordering rule of the vcpu_info lock(s) before the
shared_info lock. I don't know that it's *wrong* but it seems weird to
me; I expected the shared_info to come first?
I avoided taking both at once in kvm_xen_set_evtchn_fast(), although
maybe if we are going to have a rule that allows both, we could revisit
that. Suspect it isn't needed.
Either way it is worth a clear comment somewhere to document the lock
ordering, and I'd also like to know this has been tested with lockdep,
which is often cleverer than me.
> + while (!kvm_gpc_check(si_gpc, PAGE_SIZE)) {
> + read_unlock(&si_gpc->lock);
> +
> + ret = kvm_gpc_refresh(si_gpc, PAGE_SIZE);
> + if (ret) {
> + read_unlock_irqrestore(&vi_gpc->lock, flags);
> + kvm_gpc_deactivate(vi_gpc);
> + return ret;
> + }
> +
> + read_lock(&si_gpc->lock);
> + }
> +
> + if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode)
> + offset = offsetof(struct shared_info,
> + vcpu_info[vcpu->arch.xen.vcpu_id]);
> + else
> + offset = offsetof(struct compat_shared_info,
> + vcpu_info[vcpu->arch.xen.vcpu_id]);
> +
> + memcpy(vi_gpc->khva, si_gpc->khva + offset, sizeof(struct vcpu_info));
> +
> + read_unlock(&si_gpc->lock);
> + read_unlock_irqrestore(&vi_gpc->lock, flags);
> +
> + return 0;
> +}
> +
> int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)
> {
> int idx, r = -ENOENT;
> @@ -779,14 +880,7 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)
> BUILD_BUG_ON(offsetof(struct vcpu_info, time) !=
> offsetof(struct compat_vcpu_info, time));
>
> - if (data->u.gpa == KVM_XEN_INVALID_GPA) {
> - kvm_gpc_deactivate(&vcpu->arch.xen.vcpu_info_cache);
> - r = 0;
> - break;
> - }
> -
> - r = kvm_gpc_activate(&vcpu->arch.xen.vcpu_info_cache,
> - data->u.gpa, sizeof(struct vcpu_info));
> + r = kvm_xen_set_vcpu_info(vcpu, data->u.gpa);
> if (!r)
> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
>
On 19/09/2023 15:18, David Woodhouse wrote:
> On Tue, 2023-09-19 at 13:41 +0000, Paul Durrant wrote:
>> --- a/arch/x86/kvm/xen.c
>> +++ b/arch/x86/kvm/xen.c
>> @@ -491,6 +491,21 @@ static void kvm_xen_inject_vcpu_vector(struct kvm_vcpu *v)
>>
>> static struct gfn_to_pfn_cache *get_vcpu_info_cache(struct kvm_vcpu *v, unsigned long *offset)
>> {
>> + if (!v->arch.xen.vcpu_info_cache.active && v->arch.xen.vcpu_id < MAX_VIRT_CPUS) {
>> + struct kvm *kvm = v->kvm;
>> +
>> + if (offset) {
>> + if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode)
>> + *offset = offsetof(struct shared_info,
>> + vcpu_info[v->arch.xen.vcpu_id]);
>> + else
>> + *offset = offsetof(struct compat_shared_info,
>> + vcpu_info[v->arch.xen.vcpu_id]);
>> + }
>> +
>> + return &kvm->arch.xen.shinfo_cache;
>> + }
>> +
>> if (offset)
>> *offset = 0;
>>
>> @@ -764,6 +779,92 @@ static int kvm_xen_set_vcpu_id(struct kvm_vcpu *vcpu, unsigned int vcpu_id)
>> return 0;
>> }
>>
>> +static int kvm_xen_set_vcpu_info(struct kvm_vcpu *vcpu, gpa_t gpa)
>> +{
>> + struct kvm *kvm = vcpu->kvm;
>> + struct gfn_to_pfn_cache *si_gpc = &kvm->arch.xen.shinfo_cache;
>> + struct gfn_to_pfn_cache *vi_gpc = &vcpu->arch.xen.vcpu_info_cache;
>> + unsigned long flags;
>> + unsigned long offset;
>> + int ret;
>> +
>> + if (gpa == KVM_XEN_INVALID_GPA) {
>> + kvm_gpc_deactivate(vi_gpc);
>> + return 0;
>> + }
>> +
>> + /*
>> + * In Xen it is not possible for an explicit vcpu_info to be set
>> + * before the shared_info exists since the former is done in response
>> + * to a hypercall and the latter is set up as part of domain creation.
>> + * The first 32 vCPUs have a default vcpu_info embedded in shared_info
>> + * the content of which is copied across when an explicit vcpu_info is
>> + * set, which can also clearly not be done if we don't know where the
>> + * shared_info is. Hence we need to enforce that the shared_info cache
>> + * is active here.
>> + */
>> + if (!si_gpc->active)
>> + return -EINVAL;
>> +
>> + /* Setting an explicit vcpu_info is a one-off operation */
>> + if (vi_gpc->active)
>> + return -EINVAL;
>
> Is that the errno that Xen will return to the hypercall if a guest
> tries it? I.e. if the VMM simply returns the errno that it gets from
> the kernel, is that OK?
>
Yes, I checked. Xen returns -EINVAL.
>> + ret = kvm_gpc_activate(vi_gpc, gpa, sizeof(struct vcpu_info));
>
> From this moment, can't interrupts be delivered to the new vcpu_info,
> even though the memcpy hasn't happened yet?
>
Hmm, that's a good point. TBH it would be nice to have an 'activate and
leave locked' primitive to avoid this.
> I think we need to ensure that any kvm_xen_set_evtchn_fast() which
> happens at this point cannot proceed, and falls back to the slow path.
>
> Can we set a flag before we activate the vcpu_info and clear it after
> the memcpy is done, then make kvm_xen_set_evtchn_fast() return
> EWOULDBLOCK whenever that flag is set?
>
> The slow path in kvm_xen_set_evtchn() takes kvm->arch.xen.xen_lock and
> I think kvm_xen_vcpu_set_attr() has taken that same lock before you get
> to this code, so it works out nicely?
>
Yes, I think that is safe... but if we didn't have the window between
activating the vcpu_info cache and doing the copy we'd also be ok I
think... Or perhaps we could simply preserve evtchn_pending_sel and copy
the rest of it?
>
>
>> + if (ret)
>> + return ret;
>> +
>> + /* Nothing more to do if the vCPU is not among the first 32 */
>> + if (vcpu->arch.xen.vcpu_id >= MAX_VIRT_CPUS)
>> + return 0;
>> +
>> + /*
>> + * It's possible that the vcpu_info cache has been invalidated since
>> + * we activated it so we need to go through the check-refresh dance.
>> + */
>> + read_lock_irqsave(&vi_gpc->lock, flags);
>> + while (!kvm_gpc_check(vi_gpc, sizeof(struct vcpu_info))) {
>> + read_unlock_irqrestore(&vi_gpc->lock, flags);
>> +
>> + ret = kvm_gpc_refresh(vi_gpc, sizeof(struct vcpu_info));
>> + if (ret) {
>> + kvm_gpc_deactivate(vi_gpc);
>> + return ret;
>> + }
>> +
>> + read_lock_irqsave(&vi_gpc->lock, flags);
>> + }
>> +
>> + /* Now lock the shared_info cache so we can copy the vcpu_info */
>> + read_lock(&si_gpc->lock);
>
> This adds a new lock ordering rule of the vcpu_info lock(s) before the
> shared_info lock. I don't know that it's *wrong* but it seems weird to
> me; I expected the shared_info to come first?
>
> I avoided taking both at once in kvm_xen_set_evtchn_fast(), although
> maybe if we are going to have a rule that allows both, we could revisit
> that. Suspect it isn't needed.
>
> Either way it is worth a clear comment somewhere to document the lock
> ordering, and I'd also like to know this has been tested with lockdep,
> which is often cleverer than me.
>
Ok. I agree that shared_info before vcpu_info does seem more intuitive
and maybe it would be better given the code in
kvm_xen_set_evtchn_fast(). I'll seem how messy it gets in re-ordering
and add a comment as you suggest.
Paul
>> + while (!kvm_gpc_check(si_gpc, PAGE_SIZE)) {
>> + read_unlock(&si_gpc->lock);
>> +
>> + ret = kvm_gpc_refresh(si_gpc, PAGE_SIZE);
>> + if (ret) {
>> + read_unlock_irqrestore(&vi_gpc->lock, flags);
>> + kvm_gpc_deactivate(vi_gpc);
>> + return ret;
>> + }
>> +
>> + read_lock(&si_gpc->lock);
>> + }
>> +
>> + if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode)
>> + offset = offsetof(struct shared_info,
>> + vcpu_info[vcpu->arch.xen.vcpu_id]);
>> + else
>> + offset = offsetof(struct compat_shared_info,
>> + vcpu_info[vcpu->arch.xen.vcpu_id]);
>> +
>> + memcpy(vi_gpc->khva, si_gpc->khva + offset, sizeof(struct vcpu_info));
>> +
>> + read_unlock(&si_gpc->lock);
>> + read_unlock_irqrestore(&vi_gpc->lock, flags);
>> +
>> + return 0;
>> +}
>> +
>> int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)
>> {
>> int idx, r = -ENOENT;
>> @@ -779,14 +880,7 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)
>> BUILD_BUG_ON(offsetof(struct vcpu_info, time) !=
>> offsetof(struct compat_vcpu_info, time));
>>
>> - if (data->u.gpa == KVM_XEN_INVALID_GPA) {
>> - kvm_gpc_deactivate(&vcpu->arch.xen.vcpu_info_cache);
>> - r = 0;
>> - break;
>> - }
>> -
>> - r = kvm_gpc_activate(&vcpu->arch.xen.vcpu_info_cache,
>> - data->u.gpa, sizeof(struct vcpu_info));
>> + r = kvm_xen_set_vcpu_info(vcpu, data->u.gpa);
>> if (!r)
>> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
>>
>
On Tue, 2023-09-19 at 15:34 +0100, Paul Durrant wrote: > > > + ret = kvm_gpc_activate(vi_gpc, gpa, sizeof(struct vcpu_info)); > > > > From this moment, can't interrupts be delivered to the new vcpu_info, > > even though the memcpy hasn't happened yet? > > > > Hmm, that's a good point. TBH it would be nice to have an 'activate and > leave locked' primitive to avoid this. I suppose so from the caller's point of view in this case, but I'm somewhat disinclined to add that complexity to the pfncache code. We take the refresh_lock *mutex* in __kvm_gpc_refresh() so it's not as simple as declaring that said function is called with the gpc rwlock already held. We also do the final gpc_unmap_khva() of the old mapping after dropping the lock; *could* we call that with a write lock held? A write lock which is going to be taken the MM notifier callbacks? Well, maybe not in the case of the first *activate* which isn't really a 'refresh' per se but the whole thing is making my skin itch. I don't like it. > > I think we need to ensure that any kvm_xen_set_evtchn_fast() which > > happens at this point cannot proceed, and falls back to the slow path. > > > > Can we set a flag before we activate the vcpu_info and clear it after > > the memcpy is done, then make kvm_xen_set_evtchn_fast() return > > EWOULDBLOCK whenever that flag is set? > > > > The slow path in kvm_xen_set_evtchn() takes kvm->arch.xen.xen_lock and > > I think kvm_xen_vcpu_set_attr() has taken that same lock before you get > > to this code, so it works out nicely? > > > > Yes, I think that is safe... but if we didn't have the window between > activating the vcpu_info cache and doing the copy we'd also be ok I > think... Or perhaps we could simply preserve evtchn_pending_sel and copy > the rest of it? > I suppose you could just write the evtchn_pending_sel word in the new vcpu_info GPA to zero before setting up the pfncache for it. When when you do the memcpy, you don't *just* memcpy the evtchn_pending_sel word; you use the bitwise OR of the old and new, so you catch any bits which got set in the new word in the interim? But then again, who moves the vcpu_info while there are actually interrupts in-flight to the vCPU in question? Maybe we just declare that we don't care, and that interrupts may be lost in that case? Even if *Xen* wouldn't have lost them (and I don't even know that part is true). > > This adds a new lock ordering rule of the vcpu_info lock(s) before the > > shared_info lock. I don't know that it's *wrong* but it seems weird to > > me; I expected the shared_info to come first? > > > > I avoided taking both at once in kvm_xen_set_evtchn_fast(), although > > maybe if we are going to have a rule that allows both, we could revisit > > that. Suspect it isn't needed. > > > > Either way it is worth a clear comment somewhere to document the lock > > ordering, and I'd also like to know this has been tested with lockdep, > > which is often cleverer than me. > > > > Ok. I agree that shared_info before vcpu_info does seem more intuitive > and maybe it would be better given the code in > kvm_xen_set_evtchn_fast(). I'll seem how messy it gets in re-ordering > and add a comment as you suggest. > I think they look interchangeable in this case. If we *do* take them both in kvm_xen_set_evtchn_fast() then maybe we can simplify the slow path where it set the bits in shared_info but then the vcpu_info gpc was invalid. That currently uses a kvm->arch.xen.evtchn_pending_sel shadow of the bits, and just kicks the vCPU to deliver them for itself... but maybe that whole thing could be dropped, and kvm_xen_set_evtchn_fast() can just return EWOULDBLOCK if it fails to lock *both* shared_info and vcpu_info at the same time? I didn't do that before, because I didn't want to introduce lock ordering rules. But I'm happier to do so now. And I think we can ditch a lot of hairy asm in kvm_xen_inject_pending_events() ?
On 19/09/2023 16:38, David Woodhouse wrote: > On Tue, 2023-09-19 at 15:34 +0100, Paul Durrant wrote: >>>> + ret = kvm_gpc_activate(vi_gpc, gpa, sizeof(struct vcpu_info)); >>> >>> From this moment, can't interrupts be delivered to the new vcpu_info, >>> even though the memcpy hasn't happened yet? >>> >> >> Hmm, that's a good point. TBH it would be nice to have an 'activate and >> leave locked' primitive to avoid this. > > I suppose so from the caller's point of view in this case, but I'm > somewhat disinclined to add that complexity to the pfncache code. > > We take the refresh_lock *mutex* in __kvm_gpc_refresh() so it's not as > simple as declaring that said function is called with the gpc rwlock > already held. > > We also do the final gpc_unmap_khva() of the old mapping after dropping > the lock; *could* we call that with a write lock held? A write lock > which is going to be taken the MM notifier callbacks? Well, maybe not > in the case of the first *activate* which isn't really a 'refresh' per > se but the whole thing is making my skin itch. I don't like it. > >>> I think we need to ensure that any kvm_xen_set_evtchn_fast() which >>> happens at this point cannot proceed, and falls back to the slow path. >>> >>> Can we set a flag before we activate the vcpu_info and clear it after >>> the memcpy is done, then make kvm_xen_set_evtchn_fast() return >>> EWOULDBLOCK whenever that flag is set? >>> >>> The slow path in kvm_xen_set_evtchn() takes kvm->arch.xen.xen_lock and >>> I think kvm_xen_vcpu_set_attr() has taken that same lock before you get >>> to this code, so it works out nicely? >>> >> >> Yes, I think that is safe... but if we didn't have the window between >> activating the vcpu_info cache and doing the copy we'd also be ok I >> think... Or perhaps we could simply preserve evtchn_pending_sel and copy >> the rest of it? > >> > I suppose you could just write the evtchn_pending_sel word in the new > vcpu_info GPA to zero before setting up the pfncache for it. > > When when you do the memcpy, you don't *just* memcpy the > evtchn_pending_sel word; you use the bitwise OR of the old and new, so > you catch any bits which got set in the new word in the interim? > > But then again, who moves the vcpu_info while there are actually > interrupts in-flight to the vCPU in question? Maybe we just declare > that we don't care, and that interrupts may be lost in that case? Even > if *Xen* wouldn't have lost them (and I don't even know that part is > true). > >>> This adds a new lock ordering rule of the vcpu_info lock(s) before the >>> shared_info lock. I don't know that it's *wrong* but it seems weird to >>> me; I expected the shared_info to come first? >>> >>> I avoided taking both at once in kvm_xen_set_evtchn_fast(), although >>> maybe if we are going to have a rule that allows both, we could revisit >>> that. Suspect it isn't needed. >>> >>> Either way it is worth a clear comment somewhere to document the lock >>> ordering, and I'd also like to know this has been tested with lockdep, >>> which is often cleverer than me. >>> >> >> Ok. I agree that shared_info before vcpu_info does seem more intuitive >> and maybe it would be better given the code in >> kvm_xen_set_evtchn_fast(). I'll seem how messy it gets in re-ordering >> and add a comment as you suggest. >> > > I think they look interchangeable in this case. If we *do* take them > both in kvm_xen_set_evtchn_fast() then maybe we can simplify the slow > path where it set the bits in shared_info but then the vcpu_info gpc > was invalid. That currently uses a kvm->arch.xen.evtchn_pending_sel > shadow of the bits, and just kicks the vCPU to deliver them for > itself... but maybe that whole thing could be dropped, and > kvm_xen_set_evtchn_fast() can just return EWOULDBLOCK if it fails to > lock *both* shared_info and vcpu_info at the same time? > Yes, I think that sounds like a neater approach. > I didn't do that before, because I didn't want to introduce lock > ordering rules. But I'm happier to do so now. And I think we can ditch > a lot of hairy asm in kvm_xen_inject_pending_events() ? > Messing with the asm sounds like something for a follow-up though. Paul
On Tue, 2023-09-19 at 16:47 +0100, Paul Durrant wrote: > > > > > I think they look interchangeable in this case. If we *do* take them > > both in kvm_xen_set_evtchn_fast() then maybe we can simplify the slow > > path where it set the bits in shared_info but then the vcpu_info gpc > > was invalid. That currently uses a kvm->arch.xen.evtchn_pending_sel > > shadow of the bits, and just kicks the vCPU to deliver them for > > itself... but maybe that whole thing could be dropped, and > > kvm_xen_set_evtchn_fast() can just return EWOULDBLOCK if it fails to > > lock *both* shared_info and vcpu_info at the same time? > > > > Yes, I think that sounds like a neater approach. > > > I didn't do that before, because I didn't want to introduce lock > > ordering rules. But I'm happier to do so now. And I think we can ditch > > a lot of hairy asm in kvm_xen_inject_pending_events() ? > > > > Messing with the asm sounds like something for a follow-up though. AFAICT we can just delete the whole bloody lot. But yes, it's definitely a later cleanup, enabled by the fact that we (will) now have an agreed way of taking both locks at once, which I didn't want to do in the first place.
© 2016 - 2025 Red Hat, Inc.