KVM: nVMX: Honor event priority for PI ack at VM-Enter

[PATCH 5/5] KVM: nVMX: Honor event priority when emulating PI delivery during VM-Enter

Posted by Sean Christopherson 1 year, 3 months ago

Move the handling of a nested posted interrupt notification that is
unblocked by nested VM-Enter (unblocks L1 IRQs when ack-on-exit is enabled
by L1) from VM-Enter emulation to vmx_check_nested_events().  To avoid a
pointless forced immediate exit, i.e. to not regress IRQ delivery latency
when a nested posted interrupt is pending at VM-Enter, block processing of
the notification IRQ if and only if KVM must block _all_ events.  Unlike
injected events, KVM doesn't need to actually enter L2 before updating the
vIRR and vmcs02.GUEST_INTR_STATUS, as the resulting L2 IRQ will be blocked
by hardware itself, until VM-Enter to L2 completes.

Note, very strictly speaking, moving the IRQ from L2's PIR to IRR before
entering L2 is still technically wrong.  But, practically speaking, only a
userspace that is deliberately checking KVM_STATE_NESTED_RUN_PENDING
against PIR and IRR can even notice; L2 will see architecturally correct
behavior, as KVM ensure the VM-Enter is finished before doing anything
that would effectively preempt the PIR=>IRR movement.

Reported-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 53 ++++++++++++++++++++++++++++-----------
 1 file changed, 38 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 0540faef0c85..0c6c0aeaddc2 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3725,14 +3725,6 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
 	if (unlikely(status != NVMX_VMENTRY_SUCCESS))
 		goto vmentry_failed;
 
-	/* Emulate processing of posted interrupts on VM-Enter. */
-	if (nested_cpu_has_posted_intr(vmcs12) &&
-	    kvm_apic_has_interrupt(vcpu) == vmx->nested.posted_intr_nv) {
-		vmx->nested.pi_pending = true;
-		kvm_make_request(KVM_REQ_EVENT, vcpu);
-		kvm_apic_clear_irr(vcpu, vmx->nested.posted_intr_nv);
-	}
-
 	/* Hide L1D cache contents from the nested guest.  */
 	vmx->vcpu.arch.l1tf_flush_l1d = true;
 
@@ -4194,13 +4186,25 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
 	 */
 	bool block_nested_exceptions = vmx->nested.nested_run_pending;
 	/*
-	 * New events (not exceptions) are only recognized at instruction
+	 * Events that don't require injection, i.e. that are virtualized by
+	 * hardware, aren't blocked by a pending VM-Enter as KVM doesn't need
+	 * to regain control in order to deliver the event, and hardware will
+	 * handle event ordering, e.g. with respect to injected exceptions.
+	 *
+	 * But, new events (not exceptions) are only recognized at instruction
 	 * boundaries.  If an event needs reinjection, then KVM is handling a
-	 * VM-Exit that occurred _during_ instruction execution; new events are
-	 * blocked until the instruction completes.
+	 * VM-Exit that occurred _during_ instruction execution; new events,
+	 * irrespective of whether or not they're injected, are blocked until
+	 * the instruction completes.
+	 */
+	bool block_non_injected_events = kvm_event_needs_reinjection(vcpu);
+	/*
+	 * Inject events are blocked by nested VM-Enter, as KVM is responsible
+	 * for managing priority between concurrent events, i.e. KVM needs to
+	 * wait until after VM-Enter completes to deliver injected events.
 	 */
 	bool block_nested_events = block_nested_exceptions ||
-				   kvm_event_needs_reinjection(vcpu);
+				   block_non_injected_events;
 
 	if (lapic_in_kernel(vcpu) &&
 		test_bit(KVM_APIC_INIT, &apic->pending_events)) {
@@ -4312,18 +4316,26 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
 	if (kvm_cpu_has_interrupt(vcpu) && !vmx_interrupt_blocked(vcpu)) {
 		int irq;
 
-		if (block_nested_events)
-			return -EBUSY;
-		if (!nested_exit_on_intr(vcpu))
+		if (!nested_exit_on_intr(vcpu)) {
+			if (block_nested_events)
+				return -EBUSY;
+
 			goto no_vmexit;
+		}
 
 		if (!nested_exit_intr_ack_set(vcpu)) {
+			if (block_nested_events)
+				return -EBUSY;
+
 			nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
 			return 0;
 		}
 
 		irq = kvm_cpu_get_extint(vcpu);
 		if (irq != -1) {
+			if (block_nested_events)
+				return -EBUSY;
+
 			nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT,
 					  INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR | irq, 0);
 			return 0;
@@ -4342,11 +4354,22 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
 		 * and enabling posted interrupts requires ACK-on-exit.
 		 */
 		if (irq == vmx->nested.posted_intr_nv) {
+			/*
+			 * Nested posted interrupts are delivered via RVI, i.e.
+			 * aren't injected by KVM, and so can be queued even if
+			 * manual event injection is disallowed.
+			 */
+			if (block_non_injected_events)
+				return -EBUSY;
+
 			vmx->nested.pi_pending = true;
 			kvm_apic_clear_irr(vcpu, irq);
 			goto no_vmexit;
 		}
 
+		if (block_nested_events)
+			return -EBUSY;
+
 		nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT,
 				  INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR | irq, 0);
 
-- 
2.47.0.163.g1226f6d8fa-goog

Re: [PATCH 5/5] KVM: nVMX: Honor event priority when emulating PI delivery during VM-Enter

Posted by Chao Gao 1 year, 2 months ago

On Fri, Nov 01, 2024 at 12:14:47PM -0700, Sean Christopherson wrote:
>Move the handling of a nested posted interrupt notification that is
>unblocked by nested VM-Enter (unblocks L1 IRQs when ack-on-exit is enabled
>by L1) from VM-Enter emulation to vmx_check_nested_events().  To avoid a
>pointless forced immediate exit, i.e. to not regress IRQ delivery latency
>when a nested posted interrupt is pending at VM-Enter, block processing of
>the notification IRQ if and only if KVM must block _all_ events.  Unlike
>injected events, KVM doesn't need to actually enter L2 before updating the
>vIRR and vmcs02.GUEST_INTR_STATUS, as the resulting L2 IRQ will be blocked
>by hardware itself, until VM-Enter to L2 completes.
>
>Note, very strictly speaking, moving the IRQ from L2's PIR to IRR before
>entering L2 is still technically wrong.  But, practically speaking, only a
>userspace that is deliberately checking KVM_STATE_NESTED_RUN_PENDING
>against PIR and IRR can even notice; L2 will see architecturally correct
>behavior, as KVM ensure the VM-Enter is finished before doing anything
>that would effectively preempt the PIR=>IRR movement.

In my understanding, L1 can notice some priority issue in some cases. e.g.,
L1 enables NMI window VM-exit and enters L2 with a nested posted interrupt
notification. Assuming L2 doesn't block NMIs, then NMI window VM-exit should
happen immediately after nested VM-enter even before the nested posted
interrupt processing.

Another case is the nested VM-enter may inject some events (i.e.,
vmcs12->vm_entry_intr_info_field has a valid event). Event injection has
higher priority over external interrupt VM-exit. The event injection may
encounter EPT_VIOLATION which needs to be reflected to L1. In this case,
L1 is supposed to observe the EPT VIOLATION before the nested posted interrupt
processing.

Re: [PATCH 5/5] KVM: nVMX: Honor event priority when emulating PI delivery during VM-Enter

Posted by Sean Christopherson 1 year, 2 months ago

On Tue, Nov 12, 2024, Chao Gao wrote:
> On Fri, Nov 01, 2024 at 12:14:47PM -0700, Sean Christopherson wrote:
> >Move the handling of a nested posted interrupt notification that is
> >unblocked by nested VM-Enter (unblocks L1 IRQs when ack-on-exit is enabled
> >by L1) from VM-Enter emulation to vmx_check_nested_events().  To avoid a
> >pointless forced immediate exit, i.e. to not regress IRQ delivery latency
> >when a nested posted interrupt is pending at VM-Enter, block processing of
> >the notification IRQ if and only if KVM must block _all_ events.  Unlike
> >injected events, KVM doesn't need to actually enter L2 before updating the
> >vIRR and vmcs02.GUEST_INTR_STATUS, as the resulting L2 IRQ will be blocked
> >by hardware itself, until VM-Enter to L2 completes.
> >
> >Note, very strictly speaking, moving the IRQ from L2's PIR to IRR before
> >entering L2 is still technically wrong.  But, practically speaking, only a
> >userspace that is deliberately checking KVM_STATE_NESTED_RUN_PENDING
> >against PIR and IRR can even notice; L2 will see architecturally correct
> >behavior, as KVM ensure the VM-Enter is finished before doing anything
> >that would effectively preempt the PIR=>IRR movement.
> 
> In my understanding, L1 can notice some priority issue in some cases. e.g.,
> L1 enables NMI window VM-exit and enters L2 with a nested posted interrupt
> notification. Assuming L2 doesn't block NMIs, then NMI window VM-exit should
> happen immediately after nested VM-enter even before the nested posted
> interrupt processing.
>
> Another case is the nested VM-enter may inject some events (i.e.,
> vmcs12->vm_entry_intr_info_field has a valid event). Event injection has
> higher priority over external interrupt VM-exit. The event injection may
> encounter EPT_VIOLATION which needs to be reflected to L1. In this case,
> L1 is supposed to observe the EPT VIOLATION before the nested posted interrupt
> processing.

Hmm, right, L1 could also observe the PIR=>IRR movement.  How about this?

  Note, very strictly speaking, moving the IRQ from L2's PIR to IRR before
  entering L2 is still technically wrong.  But, practically speaking, only
  an L1 hypervisor or an L0 userspace that is deliberately checking event
  priority against PIR=>IRR processing can even notice; L2 will see
  architecturally correct behavior, as KVM ensures the VM-Enter is finished
  before doing anything that would effectively preempt the PIR=>IRR movement.

[PATCH 1/5] KVM: nVMX: Explicitly update vPPR on successful nested VM-Enter
[PATCH 2/5] KVM: nVMX: Check for pending INIT/SIPI after entering non-root mode
[PATCH 3/5] KVM: nVMX: Drop manual vmcs01.GUEST_INTERRUPT_STATUS.RVI check at VM-Enter
[PATCH 4/5] KVM: nVMX: Use vmcs01's controls shadow to check for IRQ/NMI windows at VM-Enter
[PATCH 5/5] KVM: nVMX: Honor event priority when emulating PI delivery during VM-Enter