[PATCH v2 41/59] iommu/amd: KVM: SVM: Add IRTE metadata to affined vCPU's list if AVIC is inhibited

Sean Christopherson posted 59 patches 6 months, 3 weeks ago
There is a newer version of this series
[PATCH v2 41/59] iommu/amd: KVM: SVM: Add IRTE metadata to affined vCPU's list if AVIC is inhibited
Posted by Sean Christopherson 6 months, 3 weeks ago
If an IRQ can be posted to a vCPU, but AVIC is currently inhibited on the
vCPU, go through the dance of "affining" the IRTE to the vCPU, but leave
the actual IRTE in remapped mode.  KVM already handles the case where AVIC
is inhibited => uninhibited with posted IRQs (see avic_set_pi_irte_mode()),
but doesn't handle the scenario where a postable IRQ comes along while AVIC
is inhibited.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/avic.c   | 16 ++++++----------
 drivers/iommu/amd/iommu.c |  5 ++++-
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 16557328aa58..2e3a8fda0355 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -780,21 +780,17 @@ int avic_pi_update_irte(struct kvm_kernel_irqfd *irqfd, struct kvm *kvm,
 	 */
 	svm_ir_list_del(irqfd);
 
-	/**
-	 * Here, we setup with legacy mode in the following cases:
-	 * 1. When cannot target interrupt to a specific vcpu.
-	 * 2. Unsetting posted interrupt.
-	 * 3. APIC virtualization is disabled for the vcpu.
-	 * 4. IRQ has incompatible delivery mode (SMI, INIT, etc)
-	 */
-	if (vcpu && kvm_vcpu_apicv_active(vcpu)) {
+	if (vcpu) {
 		/*
-		 * Try to enable guest_mode in IRTE.
+		 * Try to enable guest_mode in IRTE, unless AVIC is inhibited,
+		 * in which case configure the IRTE for legacy mode, but track
+		 * the IRTE metadata so that it can be converted to guest mode
+		 * if AVIC is enabled/uninhibited in the future.
 		 */
 		struct amd_iommu_pi_data pi_data = {
 			.ga_tag = AVIC_GATAG(to_kvm_svm(kvm)->avic_vm_id,
 					     vcpu->vcpu_id),
-			.is_guest_mode = true,
+			.is_guest_mode = kvm_vcpu_apicv_active(vcpu),
 			.vapic_addr = avic_get_backing_page_address(to_svm(vcpu)),
 			.vector = vector,
 		};
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 718bd9604f71..becef69a306d 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3939,7 +3939,10 @@ static int amd_ir_set_vcpu_affinity(struct irq_data *data, void *info)
 		ir_data->ga_root_ptr = (pi_data->vapic_addr >> 12);
 		ir_data->ga_vector = pi_data->vector;
 		ir_data->ga_tag = pi_data->ga_tag;
-		ret = amd_iommu_activate_guest_mode(ir_data, pi_data->cpu);
+		if (pi_data->is_guest_mode)
+			ret = amd_iommu_activate_guest_mode(ir_data, pi_data->cpu);
+		else
+			ret = amd_iommu_deactivate_guest_mode(ir_data);
 	} else {
 		ret = amd_iommu_deactivate_guest_mode(ir_data);
 	}
-- 
2.49.0.1151.ga128411c76-goog
Re: [PATCH v2 41/59] iommu/amd: KVM: SVM: Add IRTE metadata to affined vCPU's list if AVIC is inhibited
Posted by Sairaj Kodilkar 6 months, 2 weeks ago

On 5/23/2025 6:29 AM, Sean Christopherson wrote:

> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 718bd9604f71..becef69a306d 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -3939,7 +3939,10 @@ static int amd_ir_set_vcpu_affinity(struct irq_data *data, void *info)
>   		ir_data->ga_root_ptr = (pi_data->vapic_addr >> 12);
>   		ir_data->ga_vector = pi_data->vector;
>   		ir_data->ga_tag = pi_data->ga_tag;
> -		ret = amd_iommu_activate_guest_mode(ir_data, pi_data->cpu);
> +		if (pi_data->is_guest_mode)
> +			ret = amd_iommu_activate_guest_mode(ir_data, pi_data->cpu);
> +		else
> +			ret = amd_iommu_deactivate_guest_mode(ir_data);

Hi Sean,
Why the extra nesting here ?
Its much more cleaner to do..

if (pi_data && pi_data->is_guest_mode) {
	ir_data->ga_root_ptr = (pi_data->vapic_addr >> 12);
    	ir_data->ga_vector = pi_data->vector;
    	ir_data->ga_tag = pi_data->ga_tag;
	ret = amd_iommu_activate_guest_mode(ir_data, pi_data->cpu);
} else {
	ret = amd_iommu_deactivate_guest_mode(ir_data);
}

Thanks
Sairaj Kodilkar

>   	} else {
>   		ret = amd_iommu_deactivate_guest_mode(ir_data);
>   	}
Re: [PATCH v2 41/59] iommu/amd: KVM: SVM: Add IRTE metadata to affined vCPU's list if AVIC is inhibited
Posted by Sean Christopherson 6 months, 2 weeks ago
On Fri, May 30, 2025, Sairaj Kodilkar wrote:
> On 5/23/2025 6:29 AM, Sean Christopherson wrote:
> 
> > diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> > index 718bd9604f71..becef69a306d 100644
> > --- a/drivers/iommu/amd/iommu.c
> > +++ b/drivers/iommu/amd/iommu.c
> > @@ -3939,7 +3939,10 @@ static int amd_ir_set_vcpu_affinity(struct irq_data *data, void *info)
> >   		ir_data->ga_root_ptr = (pi_data->vapic_addr >> 12);
> >   		ir_data->ga_vector = pi_data->vector;
> >   		ir_data->ga_tag = pi_data->ga_tag;
> > -		ret = amd_iommu_activate_guest_mode(ir_data, pi_data->cpu);
> > +		if (pi_data->is_guest_mode)
> > +			ret = amd_iommu_activate_guest_mode(ir_data, pi_data->cpu);
> > +		else
> > +			ret = amd_iommu_deactivate_guest_mode(ir_data);
> 
> Hi Sean,
> Why the extra nesting here ?
> Its much more cleaner to do..
> 
> if (pi_data && pi_data->is_guest_mode) {
> 	ir_data->ga_root_ptr = (pi_data->vapic_addr >> 12);
>    	ir_data->ga_vector = pi_data->vector;
>    	ir_data->ga_tag = pi_data->ga_tag;
> 	ret = amd_iommu_activate_guest_mode(ir_data, pi_data->cpu);
> } else {
> 	ret = amd_iommu_deactivate_guest_mode(ir_data);
> }

Because the intent of the change (and the long-term code) is to affine/bind the
vCPU to the IRTE metadata, while leaving the actual IRTE in remapped mode.  I.e.
connect the passed in pi_data (@info) to the the chip data:

	pi_data->ir_data = ir_data;

and set the GA root, vector and tag in the chip data.

		ir_data->ga_root_ptr = (pi_data->vapic_addr >> 12);
		ir_data->ga_vector = pi_data->vector;
		ir_data->ga_tag = pi_data->ga_tag;

That way if KVM enables AVIC, KVM can call amd_iommu_activate_guest_mode() to
switch the IRTE to vAPIC mode.

If KVM doesn't bind to the IRTE, KVM would need to track all host IRQs (Linux's
"virtual" IRQ numbers) that can be posted to the vCPU in order to active vAPIC
mode.  It would also require taking VM-wide locks in KVM in order to guarantee
accurate IRQ routing information.

FWIW, I don't love that KVM essentially backdoors into the AMD IOMMU via
amd_iommu_(de)activate_guest_mode(), but I also don't see a better alternative.
E.g. on Intel, KVM just leaves the IRTE in posted mode, and relies on the notification
vector IRQ to kick the vCPU into host mode so that KVM can manually process the
PIR.

But that trick doesn't work as well on AMD, because the "guest isn't running" IRQ
will hit whatever CPU is handling the IOMMU interrupts, not the CPU that's running
the vCPU.  I.e. it _could_ functionally be made to work, but it would likely yield
pretty poor performance (and would require a decent amount of new KVM code).