[PATCH v3 07/28] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled

Sean Christopherson posted 28 patches 3 years, 6 months ago
There is a newer version of this series
[PATCH v3 07/28] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled
Posted by Sean Christopherson 3 years, 6 months ago
Free the APIC access page memslot if any vCPU enables x2APIC and SVM's
AVIC is enabled to prevent accesses to the virtual APIC on vCPUs with
x2APIC enabled.   On AMD, due to its "hybrid" mode where AVIC is enabled
when x2APIC is enabled even without x2AVIC support, keeping the APIC
access page memslot results in the guest being able to access the virtual
APIC page as x2APIC is fully emulated by KVM.  I.e. hardware isn't aware
that the guest is operating in x2APIC mode.

Intel doesn't suffer from the same issue as APICv has fully independent
VMCS controls for xAPIC vs. x2APIC virtualization.  Technically, KVM
should provide bus error semantics and not memory semantics for the APIC
page when x2APIC is enabled, but KVM already provides memory semantics in
other scenarios, e.g. if APICv/AVIC is enabled and the APIC is hardware
disabled (via APIC_BASE MSR).

Reserve an inhibit bit so that common code can detect whether or not the
"x2APIC inhibit" applies, but use a dedicated flag to track the inhibit
so that it doesn't need to be stripped from apicv_inhibit_reasons (since
it's not a "full" inhibit).

Note, setting apic_access_memslot_inhibited without taking locks relies
on it being sticky, and also relies on apic_access_memslot_enabled being
set during vCPU creation (before kvm_vcpu_reset()).  vCPUs can race to
set the inhibit and delete the memslot, i.e. can get false positives, but
can't false negatives as apic_access_memslot_enabled can't be toggle "on"
once any vCPU reaches kvm_lapic_set_base().

Opportunistically drop the "can" while updating avic_activate_vmcb()'s
comment, i.e. to state that KVM _does_ support the hybrid mode.  Move
the "Note:" down a line to conform to preferred kernel/KVM multi-line
comment style.

Opportunistically update the apicv_update_lock comment, as it isn't
actually used to protect apic_access_memslot_enabled (it's protected by
slots_lock).

Fixes: 0e311d33bfbe ("KVM: SVM: Introduce hybrid-AVIC mode")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 18 ++++++++++++++---
 arch/x86/kvm/lapic.c            | 34 +++++++++++++++++++++++++++++++--
 arch/x86/kvm/lapic.h            |  1 +
 arch/x86/kvm/svm/avic.c         | 15 ++++++++-------
 arch/x86/kvm/x86.c              |  7 +++++++
 5 files changed, 63 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2c96c43c313a..6475c882b359 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1132,6 +1132,17 @@ enum kvm_apicv_inhibit {
 	 * AVIC is disabled because SEV doesn't support it.
 	 */
 	APICV_INHIBIT_REASON_SEV,
+
+	/*
+	 * Due to sharing page tables across vCPUs, the xAPIC memslot must be
+	 * deleted if any vCPU has x2APIC enabled as SVM doesn't provide fully
+	 * independent controls for AVIC vs. x2AVIC, and also because SVM
+	 * supports a "hybrid" AVIC mode for CPUs that support AVIC but not
+	 * x2AVIC.  Note, this isn't a "full" inhibit and is tracked separately.
+	 * AVIC can still be activated, but KVM must not create SPTEs for the
+	 * APIC base.  For simplicity, this is sticky.
+	 */
+	APICV_INHIBIT_REASON_X2APIC,
 };
 
 struct kvm_arch {
@@ -1169,10 +1180,11 @@ struct kvm_arch {
 	struct kvm_apic_map __rcu *apic_map;
 	atomic_t apic_map_dirty;
 
-	/* Protects apic_access_memslot_enabled and apicv_inhibit_reasons */
-	struct rw_semaphore apicv_update_lock;
-
 	bool apic_access_memslot_enabled;
+	bool apic_access_memslot_inhibited;
+
+	/* Protects apicv_inhibit_reasons */
+	struct rw_semaphore apicv_update_lock;
 	unsigned long apicv_inhibit_reasons;
 
 	gpa_t wall_clock;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 99994d2470a2..70f00eda75b2 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2394,9 +2394,26 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
 		}
 	}
 
-	if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE))
+	if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE)) {
 		kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);
 
+		/*
+		 * Mark the APIC memslot as inhibited if x2APIC is enabled and
+		 * the x2APIC inhibit is required.  The actual deletion of the
+		 * memslot is handled by vcpu_run() as SRCU may or may not be
+		 * held at this time, i.e. updating memslots isn't safe.  Don't
+		 * check apic_access_memslot_inhibited, this vCPU needs to
+		 * ensure the memslot is deleted before re-entering the guest,
+		 * i.e. needs to make the request even if the inhibit flag was
+		 * already set by a different vCPU.
+		 */
+		if (vcpu->kvm->arch.apic_access_memslot_enabled &&
+		    static_call(kvm_x86_check_apicv_inhibit_reasons)(APICV_INHIBIT_REASON_X2APIC)) {
+			vcpu->kvm->arch.apic_access_memslot_inhibited = true;
+			kvm_make_request(KVM_REQ_UNBLOCK, vcpu);
+		}
+	}
+
 	if ((old_value ^ value) & (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE)) {
 		kvm_vcpu_update_apicv(vcpu);
 		static_call_cond(kvm_x86_set_virtual_apic_mode)(vcpu);
@@ -2440,7 +2457,8 @@ int kvm_alloc_apic_access_page(struct kvm *kvm)
 	int ret = 0;
 
 	mutex_lock(&kvm->slots_lock);
-	if (kvm->arch.apic_access_memslot_enabled)
+	if (kvm->arch.apic_access_memslot_enabled ||
+	    kvm->arch.apic_access_memslot_inhibited)
 		goto out;
 
 	hva = __x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
@@ -2468,6 +2486,18 @@ int kvm_alloc_apic_access_page(struct kvm *kvm)
 }
 EXPORT_SYMBOL_GPL(kvm_alloc_apic_access_page);
 
+void kvm_free_apic_access_page(struct kvm *kvm)
+{
+	mutex_lock(&kvm->slots_lock);
+
+	if (kvm->arch.apic_access_memslot_enabled) {
+		__x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT, 0, 0);
+		kvm->arch.apic_access_memslot_enabled = false;
+	}
+
+	mutex_unlock(&kvm->slots_lock);
+}
+
 void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 6d06397683d0..e2271ffa7ac0 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -112,6 +112,7 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
 int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
 void kvm_apic_update_apicv(struct kvm_vcpu *vcpu);
 int kvm_alloc_apic_access_page(struct kvm *kvm);
+void kvm_free_apic_access_page(struct kvm *kvm);
 
 bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 		struct kvm_lapic_irq *irq, int *r, struct dest_map *dest_map);
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 0424a5e664bb..8f9426f21bbf 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -72,12 +72,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
 
 	vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
 
-	/* Note:
-	 * KVM can support hybrid-AVIC mode, where KVM emulates x2APIC
-	 * MSR accesses, while interrupt injection to a running vCPU
-	 * can be achieved using AVIC doorbell. The AVIC hardware still
-	 * accelerate MMIO accesses, but this does not cause any harm
-	 * as the guest is not supposed to access xAPIC mmio when uses x2APIC.
+	/*
+	 * Note: KVM supports hybrid-AVIC mode, where KVM emulates x2APIC MSR
+	 * accesses, while interrupt injection to a running vCPU can be
+	 * achieved using AVIC doorbell.  KVM disables the APIC access page
+	 * (deletes the memslot) if any vCPU has x2APIC enabled, thus enabling
+	 * AVIC in hybrid mode activates only the doorbell mechanism.
 	 */
 	if (apic_x2apic_mode(svm->vcpu.arch.apic) &&
 	    avic_mode == AVIC_MODE_X2) {
@@ -987,7 +987,8 @@ bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
 			  BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |
 			  BIT(APICV_INHIBIT_REASON_SEV)      |
 			  BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |
-			  BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED);
+			  BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) |
+			  BIT(APICV_INHIBIT_REASON_X2APIC);
 
 	return supported & BIT(reason);
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d7374d768296..aa5ab0c620de 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10705,6 +10705,13 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
 			break;
 		}
 
+		if (vcpu->kvm->arch.apic_access_memslot_inhibited &&
+		    vcpu->kvm->arch.apic_access_memslot_enabled) {
+			kvm_vcpu_srcu_read_unlock(vcpu);
+			kvm_free_apic_access_page(vcpu->kvm);
+			kvm_vcpu_srcu_read_lock(vcpu);
+		}
+
 		if (__xfer_to_guest_mode_work_pending()) {
 			kvm_vcpu_srcu_read_unlock(vcpu);
 			r = xfer_to_guest_mode_handle_work(vcpu);
-- 
2.37.3.968.ga6b4b080e4-goog
Re: [PATCH v3 07/28] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled
Posted by Maxim Levitsky 3 years, 6 months ago
On Tue, 2022-09-20 at 23:31 +0000, Sean Christopherson wrote:
> Free the APIC access page memslot if any vCPU enables x2APIC and SVM's
> AVIC is enabled to prevent accesses to the virtual APIC on vCPUs with
> x2APIC enabled.   On AMD, due to its "hybrid" mode where AVIC is enabled
> when x2APIC is enabled even without x2AVIC support, keeping the APIC
> access page memslot results in the guest being able to access the virtual
> APIC page as x2APIC is fully emulated by KVM.  I.e. hardware isn't aware
> that the guest is operating in x2APIC mode.
> 
> Intel doesn't suffer from the same issue as APICv has fully independent
> VMCS controls for xAPIC vs. x2APIC virtualization.  Technically, KVM
> should provide bus error semantics and not memory semantics for the APIC
> page when x2APIC is enabled, but KVM already provides memory semantics in
> other scenarios, e.g. if APICv/AVIC is enabled and the APIC is hardware
> disabled (via APIC_BASE MSR).
> 
> Reserve an inhibit bit so that common code can detect whether or not the
> "x2APIC inhibit" applies, but use a dedicated flag to track the inhibit
> so that it doesn't need to be stripped from apicv_inhibit_reasons (since
> it's not a "full" inhibit).
> 
> Note, setting apic_access_memslot_inhibited without taking locks relies
> on it being sticky, and also relies on apic_access_memslot_enabled being
> set during vCPU creation (before kvm_vcpu_reset()).  vCPUs can race to
> set the inhibit and delete the memslot, i.e. can get false positives, but
> can't false negatives as apic_access_memslot_enabled can't be toggle "on"
> once any vCPU reaches kvm_lapic_set_base().
> 
> Opportunistically drop the "can" while updating avic_activate_vmcb()'s
> comment, i.e. to state that KVM _does_ support the hybrid mode.  Move
> the "Note:" down a line to conform to preferred kernel/KVM multi-line
> comment style.
> 
> Opportunistically update the apicv_update_lock comment, as it isn't
> actually used to protect apic_access_memslot_enabled (it's protected by
> slots_lock).
> 
> Fixes: 0e311d33bfbe ("KVM: SVM: Introduce hybrid-AVIC mode")
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 18 ++++++++++++++---
>  arch/x86/kvm/lapic.c            | 34 +++++++++++++++++++++++++++++++--
>  arch/x86/kvm/lapic.h            |  1 +
>  arch/x86/kvm/svm/avic.c         | 15 ++++++++-------
>  arch/x86/kvm/x86.c              |  7 +++++++
>  5 files changed, 63 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 2c96c43c313a..6475c882b359 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1132,6 +1132,17 @@ enum kvm_apicv_inhibit {
>  	 * AVIC is disabled because SEV doesn't support it.
>  	 */
>  	APICV_INHIBIT_REASON_SEV,
> +
> +	/*
> +	 * Due to sharing page tables across vCPUs, the xAPIC memslot must be
> +	 * deleted if any vCPU has x2APIC enabled as SVM doesn't provide fully
> +	 * independent controls for AVIC vs. x2AVIC, and also because SVM
> +	 * supports a "hybrid" AVIC mode for CPUs that support AVIC but not
> +	 * x2AVIC.  Note, this isn't a "full" inhibit and is tracked separately.
> +	 * AVIC can still be activated, but KVM must not create SPTEs for the
> +	 * APIC base.  For simplicity, this is sticky.
> +	 */
> +	APICV_INHIBIT_REASON_X2APIC,

Hi Sean!

So assuming that I won't object to making it SVM specific (I still think
that VMX should also inhibit this memslot because this is closer to x86 spec,
but if you really want it this way, I won't fight over it):

I somewhat don't like this inhibit, because now it is used just to say
'I am AVIC'.

What do you think if you just move the code that removes the memslot to SVM,
to avic_set_virtual_apic_mode?


>  };
>  
>  struct kvm_arch {
> @@ -1169,10 +1180,11 @@ struct kvm_arch {
>  	struct kvm_apic_map __rcu *apic_map;
>  	atomic_t apic_map_dirty;
>  
> -	/* Protects apic_access_memslot_enabled and apicv_inhibit_reasons */
> -	struct rw_semaphore apicv_update_lock;
> -
>  	bool apic_access_memslot_enabled;
> +	bool apic_access_memslot_inhibited;

So the apic_access_memslot_enabled currently tracks if the memslot is enabled.
As I see later in the patch when you free the memslot, you set it to false,
which means that if a vCPU is created after that (it can happen in theory),
the memslot will be created again :(

I say we need 'enabled', and 'allocated' booleans instead. Inhibit will set
enabled to false, and then on next vcpu run, that will free the memslot.

when enabled == false, the code needs to be changed to not allocate it again.

> +
> +	/* Protects apicv_inhibit_reasons */
> +	struct rw_semaphore apicv_update_lock;
>  	unsigned long apicv_inhibit_reasons;
>  
>  	gpa_t wall_clock;
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 99994d2470a2..70f00eda75b2 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2394,9 +2394,26 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
>  		}
>  	}
>  
> -	if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE))
> +	if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE)) {
>  		kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);
>  
> +		/*
> +		 * Mark the APIC memslot as inhibited if x2APIC is enabled and
> +		 * the x2APIC inhibit is required.  The actual deletion of the
> +		 * memslot is handled by vcpu_run() as SRCU may or may not be
> +		 * held at this time, i.e. updating memslots isn't safe.  Don't
> +		 * check apic_access_memslot_inhibited, this vCPU needs to
> +		 * ensure the memslot is deleted before re-entering the guest,
> +		 * i.e. needs to make the request even if the inhibit flag was
> +		 * already set by a different vCPU.
> +		 */
> +		if (vcpu->kvm->arch.apic_access_memslot_enabled &&
> +		    static_call(kvm_x86_check_apicv_inhibit_reasons)(APICV_INHIBIT_REASON_X2APIC)) {
> +			vcpu->kvm->arch.apic_access_memslot_inhibited = true;
> +			kvm_make_request(KVM_REQ_UNBLOCK, vcpu);

You are about to remove the KVM_REQ_UNBLOCK in other patch series.

How about just raising KVM_REQ_APICV_UPDATE on current vCPU
and having a special case in kvm_vcpu_update_apicv of 

if (apic_access_memslot_enabled == false && apic_access_memslot_allocaed == true) {
	drop srcu lock
	free the memslot
	take srcu lock
}

That wasn't possible to do with regular AVIC inhibit as I tried, because it has to be done
before any vCPU re-enters the guest, so the KVM_REQ_APICV_UPDATE has to be raised
on all vCPUs, and then a 'winning' vCPU would toggle the memslot, but that
would cause a lot of impossible to solve races, thus I abandoned that approach
back then.

Here the accuracy is not that critical so we can raise the request on current vCPU,
as you do in your patch anyway, so it should work.


> +		}
> +	}
> +
>  	if ((old_value ^ value) & (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE)) {
>  		kvm_vcpu_update_apicv(vcpu);
>  		static_call_cond(kvm_x86_set_virtual_apic_mode)(vcpu);
> @@ -2440,7 +2457,8 @@ int kvm_alloc_apic_access_page(struct kvm *kvm)
>  	int ret = 0;
>  
>  	mutex_lock(&kvm->slots_lock);
> -	if (kvm->arch.apic_access_memslot_enabled)
> +	if (kvm->arch.apic_access_memslot_enabled ||
> +	    kvm->arch.apic_access_memslot_inhibited)
>  		goto out;
>  
>  	hva = __x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
> @@ -2468,6 +2486,18 @@ int kvm_alloc_apic_access_page(struct kvm *kvm)
>  }
>  EXPORT_SYMBOL_GPL(kvm_alloc_apic_access_page);
>  
> +void kvm_free_apic_access_page(struct kvm *kvm)
> +{
> +	mutex_lock(&kvm->slots_lock);
> +
> +	if (kvm->arch.apic_access_memslot_enabled) {
> +		__x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT, 0, 0);
> +		kvm->arch.apic_access_memslot_enabled = false;
> +	}
> +
> +	mutex_unlock(&kvm->slots_lock);
> +}
> +
>  void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
>  {
>  	struct kvm_lapic *apic = vcpu->arch.apic;
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 6d06397683d0..e2271ffa7ac0 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -112,6 +112,7 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
>  int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
>  void kvm_apic_update_apicv(struct kvm_vcpu *vcpu);
>  int kvm_alloc_apic_access_page(struct kvm *kvm);
> +void kvm_free_apic_access_page(struct kvm *kvm);
>  
>  bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
>  		struct kvm_lapic_irq *irq, int *r, struct dest_map *dest_map);
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 0424a5e664bb..8f9426f21bbf 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -72,12 +72,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
>  
>  	vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
>  
> -	/* Note:
> -	 * KVM can support hybrid-AVIC mode, where KVM emulates x2APIC
> -	 * MSR accesses, while interrupt injection to a running vCPU
> -	 * can be achieved using AVIC doorbell. The AVIC hardware still
> -	 * accelerate MMIO accesses, but this does not cause any harm
> -	 * as the guest is not supposed to access xAPIC mmio when uses x2APIC.
> +	/*
> +	 * Note: KVM supports hybrid-AVIC mode, where KVM emulates x2APIC MSR
> +	 * accesses, while interrupt injection to a running vCPU can be
> +	 * achieved using AVIC doorbell.  KVM disables the APIC access page
> +	 * (deletes the memslot) if any vCPU has x2APIC enabled, thus enabling
> +	 * AVIC in hybrid mode activates only the doorbell mechanism.
>  	 */
>  	if (apic_x2apic_mode(svm->vcpu.arch.apic) &&
>  	    avic_mode == AVIC_MODE_X2) {
> @@ -987,7 +987,8 @@ bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
>  			  BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |
>  			  BIT(APICV_INHIBIT_REASON_SEV)      |
>  			  BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |
> -			  BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED);
> +			  BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) |
> +			  BIT(APICV_INHIBIT_REASON_X2APIC);
>  
>  	return supported & BIT(reason);
>  }
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d7374d768296..aa5ab0c620de 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10705,6 +10705,13 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
>  			break;
>  		}
>  
> +		if (vcpu->kvm->arch.apic_access_memslot_inhibited &&
> +		    vcpu->kvm->arch.apic_access_memslot_enabled) {
> +			kvm_vcpu_srcu_read_unlock(vcpu);
> +			kvm_free_apic_access_page(vcpu->kvm);
> +			kvm_vcpu_srcu_read_lock(vcpu);
> +		}
> +
>  		if (__xfer_to_guest_mode_work_pending()) {
>  			kvm_vcpu_srcu_read_unlock(vcpu);
>  			r = xfer_to_guest_mode_handle_work(vcpu);

I will review the rest of the patch series on next Wednesday (after
our holidays). 
From a quick glance they look good to me.


Thanks,
Best regards,
	Maxim Levitsky
Re: [PATCH v3 07/28] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled
Posted by Sean Christopherson 3 years, 6 months ago
On Fri, Sep 23, 2022, Maxim Levitsky wrote:
> On Tue, 2022-09-20 at 23:31 +0000, Sean Christopherson wrote:
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 2c96c43c313a..6475c882b359 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1132,6 +1132,17 @@ enum kvm_apicv_inhibit {
> >  	 * AVIC is disabled because SEV doesn't support it.
> >  	 */
> >  	APICV_INHIBIT_REASON_SEV,
> > +
> > +	/*
> > +	 * Due to sharing page tables across vCPUs, the xAPIC memslot must be
> > +	 * deleted if any vCPU has x2APIC enabled as SVM doesn't provide fully
> > +	 * independent controls for AVIC vs. x2AVIC, and also because SVM
> > +	 * supports a "hybrid" AVIC mode for CPUs that support AVIC but not
> > +	 * x2AVIC.  Note, this isn't a "full" inhibit and is tracked separately.
> > +	 * AVIC can still be activated, but KVM must not create SPTEs for the
> > +	 * APIC base.  For simplicity, this is sticky.
> > +	 */
> > +	APICV_INHIBIT_REASON_X2APIC,
> 
> Hi Sean!
> 
> So assuming that I won't object to making it SVM specific (I still think
> that VMX should also inhibit this memslot because this is closer to x86 spec,
> but if you really want it this way, I won't fight over it):

Heh, I don't necessarily "want" it this way, it's more that I don't see a compelling
reason to change KVM's behavior and risk silently causing a performance regression.
If KVM didn't already have the "APIC base may have RAM semantics" quirk, and/or if
this were the initial APICv implementation and thus no possible users, then I would
probably also vote to give APICv the same treatment.

> I somewhat don't like this inhibit, because now it is used just to say
> 'I am AVIC'.
> 
> What do you think if you just move the code that removes the memslot to SVM,
> to avic_set_virtual_apic_mode?

Suffers the same SRCU issue (see below) :-/

Given the SRCU problem, I'd prefer to keep the management of the memslot in common
code, even though I agree it's a bit silly.  And KVM_REQ_UNBLOCK is a perfect fit
for dealing with the SRCU issue, i.e. handling this in AVIC code would require
another hook on top of spreading the memslot management across x86 and SVM code.

> > @@ -1169,10 +1180,11 @@ struct kvm_arch {
> >  	struct kvm_apic_map __rcu *apic_map;
> >  	atomic_t apic_map_dirty;
> >  
> > -	/* Protects apic_access_memslot_enabled and apicv_inhibit_reasons */
> > -	struct rw_semaphore apicv_update_lock;
> > -
> >  	bool apic_access_memslot_enabled;
> > +	bool apic_access_memslot_inhibited;
> 
> So the apic_access_memslot_enabled currently tracks if the memslot is enabled.
> As I see later in the patch when you free the memslot, you set it to false,
> which means that if a vCPU is created after that (it can happen in theory),
> the memslot will be created again :(
> 
> I say we need 'enabled', and 'allocated' booleans instead. Inhibit will set
> enabled to false, and then on next vcpu run, that will free the memslot.
> 
> when enabled == false, the code needs to be changed to not allocate it again.

This should be handled already.  apic_access_memslot_enabled is toggled from
true=>false if and only if apic_access_memslot_inhibited is set, and the "enabled"
flag is protected by slots_lock.  Thus, newly created vCPUs are guaranteed to
either see apic_access_memslot_enabled==true or apic_access_memslot_inhibited==true.

  int kvm_alloc_apic_access_page(struct kvm *kvm)
  {
	struct page *page;
	void __user *hva;
	int ret = 0;

	mutex_lock(&kvm->slots_lock);
	if (kvm->arch.apic_access_memslot_enabled ||
	    kvm->arch.apic_access_memslot_inhibited)  <=== prevents reallocation
		goto out;

  out:
	mutex_unlock(&kvm->slots_lock);
	return ret;
  }

That could be made more obvious by adding a WARN in kvm_free_apic_access_page(), i.e.

  void kvm_free_apic_access_page(struct kvm *kvm)
  {
	WARN_ON_ONCE(!kvm->arch.apic_access_memslot_inhibited);

	mutex_lock(&kvm->slots_lock);

	if (kvm->arch.apic_access_memslot_enabled) {
		__x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT, 0, 0);
		kvm->arch.apic_access_memslot_enabled = false;
	}

	mutex_unlock(&kvm->slots_lock);
  }

> > +
> > +	/* Protects apicv_inhibit_reasons */
> > +	struct rw_semaphore apicv_update_lock;
> >  	unsigned long apicv_inhibit_reasons;
> >  
> >  	gpa_t wall_clock;
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 99994d2470a2..70f00eda75b2 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -2394,9 +2394,26 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
> >  		}
> >  	}
> >  
> > -	if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE))
> > +	if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE)) {
> >  		kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);
> >  
> > +		/*
> > +		 * Mark the APIC memslot as inhibited if x2APIC is enabled and
> > +		 * the x2APIC inhibit is required.  The actual deletion of the
> > +		 * memslot is handled by vcpu_run() as SRCU may or may not be
> > +		 * held at this time, i.e. updating memslots isn't safe.  Don't
> > +		 * check apic_access_memslot_inhibited, this vCPU needs to
> > +		 * ensure the memslot is deleted before re-entering the guest,
> > +		 * i.e. needs to make the request even if the inhibit flag was
> > +		 * already set by a different vCPU.
> > +		 */
> > +		if (vcpu->kvm->arch.apic_access_memslot_enabled &&
> > +		    static_call(kvm_x86_check_apicv_inhibit_reasons)(APICV_INHIBIT_REASON_X2APIC)) {
> > +			vcpu->kvm->arch.apic_access_memslot_inhibited = true;
> > +			kvm_make_request(KVM_REQ_UNBLOCK, vcpu);
> 
> You are about to remove the KVM_REQ_UNBLOCK in other patch series.

No, KVM_REQ_UNHALT is being removed.  KVM_REQ_UNBLOCK needs to stay, although it
has a rather weird name, e.g. KVM_REQ_WORK would probably be better.

> How about just raising KVM_REQ_APICV_UPDATE on current vCPU
> and having a special case in kvm_vcpu_update_apicv of 
> 
> if (apic_access_memslot_enabled == false && apic_access_memslot_allocaed == true) {
> 	drop srcu lock

This was my initial thought as well, but the issue is that SRCU may or may not be
held, and so the unlock+lock would need to be conditional.  That's technically a
solvable problem, as it's possible to detect if SRCU is held, but I really don't
want to rely on kvm_vcpu.srcu_depth for anything other than proving that KVM doesn't
screw up SRCU.

> 	free the memslot
> 	take srcu lock
> }
Re: [PATCH v3 07/28] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled
Posted by Maxim Levitsky 3 years, 6 months ago
On Mon, 2022-09-26 at 17:00 +0000, Sean Christopherson wrote:
> On Fri, Sep 23, 2022, Maxim Levitsky wrote:
> > On Tue, 2022-09-20 at 23:31 +0000, Sean Christopherson wrote:
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 2c96c43c313a..6475c882b359 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -1132,6 +1132,17 @@ enum kvm_apicv_inhibit {
> > >  	 * AVIC is disabled because SEV doesn't support it.
> > >  	 */
> > >  	APICV_INHIBIT_REASON_SEV,
> > > +
> > > +	/*
> > > +	 * Due to sharing page tables across vCPUs, the xAPIC memslot must be
> > > +	 * deleted if any vCPU has x2APIC enabled as SVM doesn't provide fully
> > > +	 * independent controls for AVIC vs. x2AVIC, and also because SVM
> > > +	 * supports a "hybrid" AVIC mode for CPUs that support AVIC but not
> > > +	 * x2AVIC.  Note, this isn't a "full" inhibit and is tracked separately.
> > > +	 * AVIC can still be activated, but KVM must not create SPTEs for the
> > > +	 * APIC base.  For simplicity, this is sticky.
> > > +	 */
> > > +	APICV_INHIBIT_REASON_X2APIC,
> > 
> > Hi Sean!
> > 
> > So assuming that I won't object to making it SVM specific (I still think
> > that VMX should also inhibit this memslot because this is closer to x86 spec,
> > but if you really want it this way, I won't fight over it):
> 
> Heh, I don't necessarily "want" it this way, it's more that I don't see a compelling
> reason to change KVM's behavior and risk silently causing a performance regression.
> If KVM didn't already have the "APIC base may have RAM semantics" quirk, and/or if
> this were the initial APICv implementation and thus no possible users, then I would
> probably also vote to give APICv the same treatment.
> 
> > I somewhat don't like this inhibit, because now it is used just to say
> > 'I am AVIC'.
> > 
> > What do you think if you just move the code that removes the memslot to SVM,
> > to avic_set_virtual_apic_mode?
> 
> Suffers the same SRCU issue (see below) :-/
> 
> Given the SRCU problem, I'd prefer to keep the management of the memslot in common
> code, even though I agree it's a bit silly.  And KVM_REQ_UNBLOCK is a perfect fit
> for dealing with the SRCU issue, i.e. handling this in AVIC code would require
> another hook on top of spreading the memslot management across x86 and SVM code.

OK, I am not going to argue about this. But what about at least not using an inhibit
bit for that but something else like a boolean, or maybe really add 'I am AVIC bit'
or rather something like vcpu->arch.apicv_type enum?


Or we can make SVM code just call a common function - just put these in a function and call it
from avic_set_virtual_apic_mode?


void kvm_disable_apicv_memslot(struct kvm_vcpu *vcpu)
{
	if (!vcpu->kvm->arch.apic_access_memslot_inhibited) {
            vcpu->kvm->arch.apic_access_memslot_inhibited = true;
            kvm_make_request(KVM_REQ_UNBLOCK, vcpu);
	}
}

> 
> > > @@ -1169,10 +1180,11 @@ struct kvm_arch {
> > >  	struct kvm_apic_map __rcu *apic_map;
> > >  	atomic_t apic_map_dirty;
> > >  
> > > -	/* Protects apic_access_memslot_enabled and apicv_inhibit_reasons */
> > > -	struct rw_semaphore apicv_update_lock;
> > > -
> > >  	bool apic_access_memslot_enabled;
> > > +	bool apic_access_memslot_inhibited;
> > 
> > So the apic_access_memslot_enabled currently tracks if the memslot is enabled.
> > As I see later in the patch when you free the memslot, you set it to false,
> > which means that if a vCPU is created after that (it can happen in theory),
> > the memslot will be created again :(
> > 
> > I say we need 'enabled', and 'allocated' booleans instead. Inhibit will set
> > enabled to false, and then on next vcpu run, that will free the memslot.
> > 
> > when enabled == false, the code needs to be changed to not allocate it again.
> 
> This should be handled already.  apic_access_memslot_enabled is toggled from
> true=>false if and only if apic_access_memslot_inhibited is set, and the "enabled"
> flag is protected by slots_lock.  Thus, newly created vCPUs are guaranteed to
> either see apic_access_memslot_enabled==true or apic_access_memslot_inhibited==true.
> 
>   int kvm_alloc_apic_access_page(struct kvm *kvm)
>   {
> 	struct page *page;
> 	void __user *hva;
> 	int ret = 0;
> 
> 	mutex_lock(&kvm->slots_lock);
> 	if (kvm->arch.apic_access_memslot_enabled ||
> 	    kvm->arch.apic_access_memslot_inhibited)  <=== prevents reallocation
> 		goto out;
> 
>   out:
> 	mutex_unlock(&kvm->slots_lock);
> 	return ret;
>   }

Ah, you added this in previous patch which I didn't see, makes sense.

> 
> That could be made more obvious by adding a WARN in kvm_free_apic_access_page(), i.e.
Yep, a WARN_ON_ONCE unless in hot path, is almost always a good idea, so lets add it.
> 
>   void kvm_free_apic_access_page(struct kvm *kvm)
>   {
> 	WARN_ON_ONCE(!kvm->arch.apic_access_memslot_inhibited);
> 
> 	mutex_lock(&kvm->slots_lock);
> 
> 	if (kvm->arch.apic_access_memslot_enabled) {
> 		__x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT, 0, 0);
> 		kvm->arch.apic_access_memslot_enabled = false;
> 	}
> 
> 	mutex_unlock(&kvm->slots_lock);
>   }
> 
> > > +
> > > +	/* Protects apicv_inhibit_reasons */
> > > +	struct rw_semaphore apicv_update_lock;
> > >  	unsigned long apicv_inhibit_reasons;
> > >  
> > >  	gpa_t wall_clock;
> > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > index 99994d2470a2..70f00eda75b2 100644
> > > --- a/arch/x86/kvm/lapic.c
> > > +++ b/arch/x86/kvm/lapic.c
> > > @@ -2394,9 +2394,26 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
> > >  		}
> > >  	}
> > >  
> > > -	if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE))
> > > +	if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE)) {
> > >  		kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);
> > >  
> > > +		/*
> > > +		 * Mark the APIC memslot as inhibited if x2APIC is enabled and
> > > +		 * the x2APIC inhibit is required.  The actual deletion of the
> > > +		 * memslot is handled by vcpu_run() as SRCU may or may not be
> > > +		 * held at this time, i.e. updating memslots isn't safe.  Don't
> > > +		 * check apic_access_memslot_inhibited, this vCPU needs to
> > > +		 * ensure the memslot is deleted before re-entering the guest,
> > > +		 * i.e. needs to make the request even if the inhibit flag was
> > > +		 * already set by a different vCPU.
> > > +		 */
> > > +		if (vcpu->kvm->arch.apic_access_memslot_enabled &&
> > > +		    static_call(kvm_x86_check_apicv_inhibit_reasons)(APICV_INHIBIT_REASON_X2APIC)) {
> > > +			vcpu->kvm->arch.apic_access_memslot_inhibited = true;
> > > +			kvm_make_request(KVM_REQ_UNBLOCK, vcpu);
> > 
> > You are about to remove the KVM_REQ_UNBLOCK in other patch series.
> 
> No, KVM_REQ_UNHALT is being removed.  KVM_REQ_UNBLOCK needs to stay, although it
> has a rather weird name, e.g. KVM_REQ_WORK would probably be better.

Roger that!
And I guess lets rename it while we are at it.

> 
> > How about just raising KVM_REQ_APICV_UPDATE on current vCPU
> > and having a special case in kvm_vcpu_update_apicv of 
> > 
> > if (apic_access_memslot_enabled == false && apic_access_memslot_allocaed == true) {
> > 	drop srcu lock
> 
> This was my initial thought as well, but the issue is that SRCU may or may not be
> held, and so the unlock+lock would need to be conditional.  That's technically a
> solvable problem, as it's possible to detect if SRCU is held, but I really don't
> want to rely on kvm_vcpu.srcu_depth for anything other than proving that KVM doesn't
> screw up SRCU.

Why though? the KVM_REQ_APICV_UPDATE is only handled AFAIK in vcpu_enter_guest
which drops the srcu lock few lines afterwards, and therefore the
kvm_vcpu_update_apicv is always called with the lock held and it means that it
can drop it for the duration of slot update.

The original issue we had was that we tried to drop the srcu lock in 
'kvm_set_apicv_inhibit' which indeed is called from various places,
with, or without the lock held.

Moving the memslot disable code to kvm_vcpu_update_apicv would actually solve that,
but it was not possible because kvm_vcpu_update_apicv is called simultaneously on all vCPUs, 
and created various races, including toggling the memslot twice.


So if possible please take another look at using KVM_REQ_APICV_UPDATE instead of KVM_REQ_UNBLOCK.

Best regards,
	Maxim Levitsky

> 
> > 	free the memslot
> > 	take srcu lock
> > }
Re: [PATCH v3 07/28] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled
Posted by Sean Christopherson 3 years, 6 months ago
On Wed, Sep 28, 2022, Maxim Levitsky wrote:
> On Mon, 2022-09-26 at 17:00 +0000, Sean Christopherson wrote:
> > Given the SRCU problem, I'd prefer to keep the management of the memslot in common
> > code, even though I agree it's a bit silly.  And KVM_REQ_UNBLOCK is a perfect fit
> > for dealing with the SRCU issue, i.e. handling this in AVIC code would require
> > another hook on top of spreading the memslot management across x86 and SVM code.
> 
> OK, I am not going to argue about this. But what about at least not using an inhibit
> bit for that but something else like a boolean, or maybe really add 'I am AVIC bit'
> or rather something like vcpu->arch.apicv_type enum?
> 
> Or we can make SVM code just call a common function - just put these in a
> function and call it from avic_set_virtual_apic_mode?

The issue is that kvm_vcpu_update_apicv() is called from kvm_lapic_set_base(),
which is the one that may or may not hold SRCU.

> > > You are about to remove the KVM_REQ_UNBLOCK in other patch series.
> > 
> > No, KVM_REQ_UNHALT is being removed.  KVM_REQ_UNBLOCK needs to stay, although it
> > has a rather weird name, e.g. KVM_REQ_WORK would probably be better.
> 
> Roger that!
> And I guess lets rename it while we are at it.

I'll prep a patch.

> > > How about just raising KVM_REQ_APICV_UPDATE on current vCPU
> > > and having a special case in kvm_vcpu_update_apicv of 
> > > 
> > > if (apic_access_memslot_enabled == false && apic_access_memslot_allocaed == true) {
> > > 	drop srcu lock
> > 
> > This was my initial thought as well, but the issue is that SRCU may or may not be
> > held, and so the unlock+lock would need to be conditional.  That's technically a
> > solvable problem, as it's possible to detect if SRCU is held, but I really don't
> > want to rely on kvm_vcpu.srcu_depth for anything other than proving that KVM doesn't
> > screw up SRCU.
> 
> Why though? the KVM_REQ_APICV_UPDATE is only handled AFAIK in vcpu_enter_guest
> which drops the srcu lock few lines afterwards, and therefore the
> kvm_vcpu_update_apicv is always called with the lock held and it means that it
> can drop it for the duration of slot update.
> 
> The original issue we had was that we tried to drop the srcu lock in 
> 'kvm_set_apicv_inhibit' which indeed is called from various places,
> with, or without the lock held.
> 
> Moving the memslot disable code to kvm_vcpu_update_apicv would actually solve
> that, but it was not possible because kvm_vcpu_update_apicv is called
> simultaneously on all vCPUs, and created various races, including toggling
> the memslot twice.

As above, kvm_vcpu_update_apicv() can be called without SRCU held.  Oh, but that
was a recent addition, commit 8fc9c7a3079e ("KVM: x86: Deactivate APICv on vCPU
with APIC disabled").  I was wary of using KVM_REQ_APICV_UPDATE in kvm_lapic_set_base(),
e.g. in case there was some dependency on updating _immediately, but since that's
such a new addition I have no objection to switching to the request.

Similarly, is there a good reason for having nested_svm_vmexit() invoke
kvm_vcpu_update_apicv() directly?  I'm confused by the "so that other vCPUs can
start to benefit from it right away".  The nested inhibit is per-vCPU and so
should only affect the current vCPU, no?  I.e. for all intents and purposes, using
a request should be functionally equivalent.

	/*
	 * Un-inhibit the AVIC right away, so that other vCPUs can start
	 * to benefit from it right away.
	 */
	if (kvm_apicv_activated(vcpu->kvm))
		kvm_vcpu_update_apicv(vcpu);
Re: [PATCH v3 07/28] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled
Posted by Maxim Levitsky 3 years, 6 months ago
On Wed, 2022-09-28 at 16:33 +0000, Sean Christopherson wrote:
> On Wed, Sep 28, 2022, Maxim Levitsky wrote:
> > On Mon, 2022-09-26 at 17:00 +0000, Sean Christopherson wrote:
> > > Given the SRCU problem, I'd prefer to keep the management of the memslot in common
> > > code, even though I agree it's a bit silly.  And KVM_REQ_UNBLOCK is a perfect fit
> > > for dealing with the SRCU issue, i.e. handling this in AVIC code would require
> > > another hook on top of spreading the memslot management across x86 and SVM code.
> > 
> > OK, I am not going to argue about this. But what about at least not using an inhibit
> > bit for that but something else like a boolean, or maybe really add 'I am AVIC bit'
> > or rather something like vcpu->arch.apicv_type enum?
> > 
> > Or we can make SVM code just call a common function - just put these in a
> > function and call it from avic_set_virtual_apic_mode?
> 
> The issue is that kvm_vcpu_update_apicv() is called from kvm_lapic_set_base(),
> which is the one that may or may not hold SRCU.

Makes sense now.

> 
> > > > You are about to remove the KVM_REQ_UNBLOCK in other patch series.
> > > 
> > > No, KVM_REQ_UNHALT is being removed.  KVM_REQ_UNBLOCK needs to stay, although it
> > > has a rather weird name, e.g. KVM_REQ_WORK would probably be better.
> > 
> > Roger that!
> > And I guess lets rename it while we are at it.
> 
> I'll prep a patch.
> 
> > > > How about just raising KVM_REQ_APICV_UPDATE on current vCPU
> > > > and having a special case in kvm_vcpu_update_apicv of 
> > > > 
> > > > if (apic_access_memslot_enabled == false && apic_access_memslot_allocaed == true) {
> > > > 	drop srcu lock
> > > 
> > > This was my initial thought as well, but the issue is that SRCU may or may not be
> > > held, and so the unlock+lock would need to be conditional.  That's technically a
> > > solvable problem, as it's possible to detect if SRCU is held, but I really don't
> > > want to rely on kvm_vcpu.srcu_depth for anything other than proving that KVM doesn't
> > > screw up SRCU.
> > 
> > Why though? the KVM_REQ_APICV_UPDATE is only handled AFAIK in vcpu_enter_guest
> > which drops the srcu lock few lines afterwards, and therefore the
> > kvm_vcpu_update_apicv is always called with the lock held and it means that it
> > can drop it for the duration of slot update.
> > 
> > The original issue we had was that we tried to drop the srcu lock in 
> > 'kvm_set_apicv_inhibit' which indeed is called from various places,
> > with, or without the lock held.
> > 
> > Moving the memslot disable code to kvm_vcpu_update_apicv would actually solve
> > that, but it was not possible because kvm_vcpu_update_apicv is called
> > simultaneously on all vCPUs, and created various races, including toggling
> > the memslot twice.
> 
> As above, kvm_vcpu_update_apicv() can be called without SRCU held.  Oh, but that
> was a recent addition, commit 8fc9c7a3079e ("KVM: x86: Deactivate APICv on vCPU
> with APIC disabled").  I was wary of using KVM_REQ_APICV_UPDATE in kvm_lapic_set_base(),
> e.g. in case there was some dependency on updating _immediately, but since that's
> such a new addition I have no objection to switching to the request.
> 
> Similarly, is there a good reason for having nested_svm_vmexit() invoke
> kvm_vcpu_update_apicv() directly?  I'm confused by the "so that other vCPUs can
> start to benefit from it right away".  The nested inhibit is per-vCPU and so
> should only affect the current vCPU, no?  I.e. for all intents and purposes, using
> a request should be functionally equivalent.

It is kind of the other way around:

The mere fact of switching to vmcb02 *inhibits* the AVIC on the current vCPU,
but the AVIC inhibit is there only to set the is_running bits in the physid table
and in IOMMU to prevent its *peers* to try and send interrupts to it via AVIC.

It is the reason why APICv doesn't need it - the posted interrupts still work
just fine when a vCPU doens't use APICv, or uses a different posted interrupt vector
when it uses the nested APICv.

So it makes sense to remove that inhibit as soon as possible that the peers
could stop getting 'unaccellerated IPI' vmexits for nothing.


However back to the discussion, I don't think this is a problem.

We can just call both the kvm_vcpu_update_apicv() and a new function that
does the memslot disable from KVM_REQ_APICV_UPDATE, then 
plain kvm_vcpu_update_apicv() won't need to drop the srcu lock.

It is pretty much the same that you proposed, just instead of piggybacking on 
KVM_REQ_UNBLOCK, I proposed to piggyback on KVM_REQ_APICV_UPDATE.


Best regards,
	Maxim Levitsky


> 
> 	/*
> 	 * Un-inhibit the AVIC right away, so that other vCPUs can start
> 	 * to benefit from it right away.
> 	 */
> 	if (kvm_apicv_activated(vcpu->kvm))
> 		kvm_vcpu_update_apicv(vcpu);
>
Re: [PATCH v3 07/28] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled
Posted by Sean Christopherson 3 years, 6 months ago
On Wed, Sep 28, 2022, Maxim Levitsky wrote:
> On Wed, 2022-09-28 at 16:33 +0000, Sean Christopherson wrote:
> > Similarly, is there a good reason for having nested_svm_vmexit() invoke
> > kvm_vcpu_update_apicv() directly?  I'm confused by the "so that other vCPUs can
> > start to benefit from it right away".  The nested inhibit is per-vCPU and so
> > should only affect the current vCPU, no?  I.e. for all intents and purposes, using
> > a request should be functionally equivalent.
> 
> It is kind of the other way around:
> 
> The mere fact of switching to vmcb02 *inhibits* the AVIC on the current vCPU,
> but the AVIC inhibit is there only to set the is_running bits in the physid table
> and in IOMMU to prevent its *peers* to try and send interrupts to it via AVIC.
> 
> It is the reason why APICv doesn't need it - the posted interrupts still work
> just fine when a vCPU doens't use APICv, or uses a different posted interrupt vector
> when it uses the nested APICv.

Gotcha, the "other vCPUs" part is where I got confused.

> So it makes sense to remove that inhibit as soon as possible that the peers
> could stop getting 'unaccellerated IPI' vmexits for nothing.

But practically speaking, the delay between the nested VM-Exit and servicing the
request is minimal.  Might be a moot point if nested AVIC is supported, i.e. an
inline update may be "required" at that point.

Not a sticking point by any means, but if possible, it would be nice to have a
single call site for the per-vCPU APICv update.

> However back to the discussion, I don't think this is a problem.
> 
> We can just call both the kvm_vcpu_update_apicv() and a new function that
> does the memslot disable from KVM_REQ_APICV_UPDATE, then 
> plain kvm_vcpu_update_apicv() won't need to drop the srcu lock.
> 
> It is pretty much the same that you proposed, just instead of piggybacking on 
> KVM_REQ_UNBLOCK, I proposed to piggyback on KVM_REQ_APICV_UPDATE.

Yep, easy to do after converting the x2APIC toggling to use a request.

Thanks!