[PATCH v3 02/62] KVM: arm64: WARN if unmapping vLPI fails

Sean Christopherson posted 62 patches 4 months ago
[PATCH v3 02/62] KVM: arm64: WARN if unmapping vLPI fails
Posted by Sean Christopherson 4 months ago
WARN if unmapping a vLPI in kvm_vgic_v4_unset_forwarding() fails, as
failure means an IRTE has likely been left in a bad state, i.e. IRQs
won't go where they should.  WARNing in arch code will eventually allow
dropping all the plumbing a similar WARN in kvm_irq_routing_update(),
and thus avoid having to plumb back a return code just to WARN.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/kvm/vgic/vgic-v4.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c
index 193946108192..86e54cefc237 100644
--- a/arch/arm64/kvm/vgic/vgic-v4.c
+++ b/arch/arm64/kvm/vgic/vgic-v4.c
@@ -546,6 +546,7 @@ int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int host_irq)
 		atomic_dec(&irq->target_vcpu->arch.vgic_cpu.vgic_v3.its_vpe.vlpi_count);
 		irq->hw = false;
 		ret = its_unmap_vlpi(host_irq);
+		WARN_ON_ONCE(ret);
 	}
 
 	raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
-- 
2.50.0.rc1.591.g9c95f17f64-goog
Re: [PATCH v3 02/62] KVM: arm64: WARN if unmapping vLPI fails
Posted by Marc Zyngier 4 months ago
On Wed, 11 Jun 2025 23:45:05 +0100,
Sean Christopherson <seanjc@google.com> wrote:
> 
> WARN if unmapping a vLPI in kvm_vgic_v4_unset_forwarding() fails, as
> failure means an IRTE has likely been left in a bad state, i.e. IRQs
> won't go where they should.

I have no idea what an IRTE is. But not having an VLPI mapping for an
interrupt at the point where we're tearing down the forwarding is
pretty benign. IRQs *still* go where they should, and we don't lose
anything.

What it may mean is that userspace and the guest may have done things
in the wrong order for KVM to accelerate interrupt delivery. That's
silly, but also completely harmless.

	M.


-- 
Without deviation from the norm, progress is not possible.
Re: [PATCH v3 02/62] KVM: arm64: WARN if unmapping vLPI fails
Posted by Sean Christopherson 4 months ago
On Thu, Jun 12, 2025, Marc Zyngier wrote:
> On Wed, 11 Jun 2025 23:45:05 +0100,
> Sean Christopherson <seanjc@google.com> wrote:
> > 
> > WARN if unmapping a vLPI in kvm_vgic_v4_unset_forwarding() fails, as
> > failure means an IRTE has likely been left in a bad state, i.e. IRQs
> > won't go where they should.
> 
> I have no idea what an IRTE is.

Sorry, x86 IOMMU terminology (Interrupt Remapping Table Entry).  I think the GIC
terminology would be ITS entry?  Or maybe ITS mapping?

> But not having an VLPI mapping for an interrupt at the point where we're
> tearing down the forwarding is pretty benign. IRQs *still* go where they
> should, and we don't lose anything.

This is the code I'm trying to refer to:

  static int its_vlpi_unmap(struct irq_data *d)
  {
	struct its_device *its_dev = irq_data_get_irq_chip_data(d);
	u32 event = its_get_event_id(d);

	if (!its_dev->event_map.vm || !irqd_is_forwarded_to_vcpu(d))  <=== this shouldn't happen?
		return -EINVAL;

	/* Drop the virtual mapping */
	its_send_discard(its_dev, event);

	/* and restore the physical one */
	irqd_clr_forwarded_to_vcpu(d);
	its_send_mapti(its_dev, d->hwirq, event);
	lpi_update_config(d, 0xff, (lpi_prop_prio |
				    LPI_PROP_ENABLED |
				    LPI_PROP_GROUP1));

	/* Potentially unmap the VM from this ITS */
	its_unmap_vm(its_dev->its, its_dev->event_map.vm);

	/*
	 * Drop the refcount and make the device available again if
	 * this was the last VLPI.
	 */
	if (!--its_dev->event_map.nr_vlpis) {
		its_dev->event_map.vm = NULL;
		kfree(its_dev->event_map.vlpi_maps);
	}

	return 0;
  }

When called from kvm_vgic_v4_unset_forwarding()

	if (irq->hw) {
		atomic_dec(&irq->target_vcpu->arch.vgic_cpu.vgic_v3.its_vpe.vlpi_count);
		irq->hw = false;
		ret = its_unmap_vlpi(host_irq);
	}

IIUC, irq->hw is set if and only if KVM has configured the IRQ to be fowarded
directly to a vCPU.  Based on this comment in its_map_vlpi(): 

	/*
	 * The host will never see that interrupt firing again, so it
	 * is vital that we don't do any lazy masking.
	 */

and this code in its_vlpi_map():


		/* Drop the physical mapping */
		its_send_discard(its_dev, event);

my understanding is that the associated physical IRQ will not be delivered to the
host while the IRQ is being forwarded to a vCPU.

irq->hw should only become true for MSIs (I'm crossing my fingers that SGIs aren't
in play here) if its_map_vlpi() succeeds:

	ret = its_map_vlpi(virq, &map);
	if (ret)
		goto out_unlock_irq;

	irq->hw		= true;
	irq->host_irq	= virq;
	atomic_inc(&map.vpe->vlpi_count);

and so if its_unmap_vlpi() fails in kvm_vgic_v4_unset_forwarding(), then from KVM's
perspective, the worst case scenario is that an IRQ has been left in a forwarded
state, i.e. the physical mapping hasn't been reinstalled.

KVM already WARNs if kvm_vgic_v4_unset_forwarding() fails when KVM is reacting to
a routing change (this is the WARN I want to move into arch code so that
kvm_arch_update_irqfd_routing() doesn't plumb a pointless error code up the stack):

		if (irqfd->producer &&
		    kvm_arch_irqfd_route_changed(&old, &irqfd->irq_entry)) {
			int ret = kvm_arch_update_irqfd_routing(
					irqfd->kvm, irqfd->producer->irq,
					irqfd->gsi, 1);
			WARN_ON(ret);
		}

It's only the kvm_arch_irq_bypass_del_producer() case where KVM doesn't WARN.  If
that fails, then the IRQ has potentially been left in a forwarded state, despite
whatever driver "owns" the physical IRQ having removed its producer.  E.g. if VFIO
detaches its irqbypass producer and gives the device back to the host, then
whatever is using the device in the host won't receive IRQs as expected.

Looking at this again, its_free_ite() also WARNs on its_unmap_vlpi() failure, so
wouldn't it make sense to have its_unmap_vlpi() WARN if irq_set_vcpu_affinity()
fails?  The only possible failures are that the GIC doesn't have a v4 ITS (from
its_irq_set_vcpu_affinity()):

	/* Need a v4 ITS */
	if (!is_v4(its_dev->its))
		return -EINVAL;

	guard(raw_spinlock)(&its_dev->event_map.vlpi_lock);

	/* Unmap request? */
	if (!info)
		return its_vlpi_unmap(d);

or that KVM has gotten out of sync with the GIC/ITS (from its_vlpi_unmap()):

	if (!its_dev->event_map.vm || !irqd_is_forwarded_to_vcpu(d))
		return -EINVAL;

All of those failure scenario seem like warnable offences when KVM thinks it has
configured the IRQ to be forwarded to a vCPU.

E.g.

--
diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index 534049c7c94b..98630dae910d 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -758,7 +758,7 @@ static void its_free_ite(struct kvm *kvm, struct its_ite *ite)
        if (irq) {
                scoped_guard(raw_spinlock_irqsave, &irq->irq_lock) {
                        if (irq->hw)
-                               WARN_ON(its_unmap_vlpi(ite->irq->host_irq));
+                               its_unmap_vlpi(ite->irq->host_irq);
 
                        irq->hw = false;
                }
diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c
index 193946108192..911170d4a9c8 100644
--- a/arch/arm64/kvm/vgic/vgic-v4.c
+++ b/arch/arm64/kvm/vgic/vgic-v4.c
@@ -545,10 +545,10 @@ int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int host_irq)
        if (irq->hw) {
                atomic_dec(&irq->target_vcpu->arch.vgic_cpu.vgic_v3.its_vpe.vlpi_count);
                irq->hw = false;
-               ret = its_unmap_vlpi(host_irq);
+               its_unmap_vlpi(host_irq);
        }
 
        raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
        vgic_put_irq(kvm, irq);
-       return ret;
+       return 0;
 }
diff --git a/drivers/irqchip/irq-gic-v4.c b/drivers/irqchip/irq-gic-v4.c
index 58c28895f8c4..8455b4a5fbb0 100644
--- a/drivers/irqchip/irq-gic-v4.c
+++ b/drivers/irqchip/irq-gic-v4.c
@@ -342,10 +342,10 @@ int its_get_vlpi(int irq, struct its_vlpi_map *map)
        return irq_set_vcpu_affinity(irq, &info);
 }
 
-int its_unmap_vlpi(int irq)
+void its_unmap_vlpi(int irq)
 {
        irq_clear_status_flags(irq, IRQ_DISABLE_UNLAZY);
-       return irq_set_vcpu_affinity(irq, NULL);
+       WARN_ON_ONCE(irq_set_vcpu_affinity(irq, NULL));
 }
 
 int its_prop_update_vlpi(int irq, u8 config, bool inv)
diff --git a/include/linux/irqchip/arm-gic-v4.h b/include/linux/irqchip/arm-gic-v4.h
index 7f1f11a5e4e4..0b0887099fd7 100644
--- a/include/linux/irqchip/arm-gic-v4.h
+++ b/include/linux/irqchip/arm-gic-v4.h
@@ -146,7 +146,7 @@ int its_commit_vpe(struct its_vpe *vpe);
 int its_invall_vpe(struct its_vpe *vpe);
 int its_map_vlpi(int irq, struct its_vlpi_map *map);
 int its_get_vlpi(int irq, struct its_vlpi_map *map);
-int its_unmap_vlpi(int irq);
+void its_unmap_vlpi(int irq);
 int its_prop_update_vlpi(int irq, u8 config, bool inv);
 int its_prop_update_vsgi(int irq, u8 priority, bool group);
Re: [PATCH v3 02/62] KVM: arm64: WARN if unmapping vLPI fails
Posted by Oliver Upton 3 months, 4 weeks ago
On Thu, Jun 12, 2025 at 07:34:35AM -0700, Sean Christopherson wrote:
> On Thu, Jun 12, 2025, Marc Zyngier wrote:
> > On Wed, 11 Jun 2025 23:45:05 +0100,
> > Sean Christopherson <seanjc@google.com> wrote:
> > > 
> > > WARN if unmapping a vLPI in kvm_vgic_v4_unset_forwarding() fails, as
> > > failure means an IRTE has likely been left in a bad state, i.e. IRQs
> > > won't go where they should.
> > 
> > I have no idea what an IRTE is.
> 
> Sorry, x86 IOMMU terminology (Interrupt Remapping Table Entry).  I think the GIC
> terminology would be ITS entry?  Or maybe ITS mapping?

We tend to just call it a 'vLPI mapping', which under the hood implies
a couple other translations have been wired up as well (vPE + Device).

> > But not having an VLPI mapping for an interrupt at the point where we're
> > tearing down the forwarding is pretty benign. IRQs *still* go where they
> > should, and we don't lose anything.

The VM may not actually be getting torn down, though. The series of
fixes [*] we took for 6.16 addressed games that VMMs might be playing on
irqbypass for a live VM.

[*] https://lore.kernel.org/kvmarm/20250523194722.4066715-1-oliver.upton@linux.dev/

> All of those failure scenario seem like warnable offences when KVM thinks it has
> configured the IRQ to be forwarded to a vCPU.

I tend to agree here, especially considering how horribly fragile GICv4
has been in some systems. I know of a couple implementations where ITS
command failures and/or unmapped MSIs are fatal for the entire machine.
Debugging them has been a genuine pain in the ass.

WARN'ing when state tracking for vLPIs is out of whack would've made it
a little easier.

Thanks,
Oliver
Re: [PATCH v3 02/62] KVM: arm64: WARN if unmapping vLPI fails
Posted by Sean Christopherson 3 months, 3 weeks ago
On Fri, Jun 13, 2025, Oliver Upton wrote:
> On Thu, Jun 12, 2025 at 07:34:35AM -0700, Sean Christopherson wrote:
> > On Thu, Jun 12, 2025, Marc Zyngier wrote:
> > > But not having an VLPI mapping for an interrupt at the point where we're
> > > tearing down the forwarding is pretty benign. IRQs *still* go where they
> > > should, and we don't lose anything.
> 
> The VM may not actually be getting torn down, though. The series of
> fixes [*] we took for 6.16 addressed games that VMMs might be playing on
> irqbypass for a live VM.
> 
> [*] https://lore.kernel.org/kvmarm/20250523194722.4066715-1-oliver.upton@linux.dev/
> 
> > All of those failure scenario seem like warnable offences when KVM thinks it has
> > configured the IRQ to be forwarded to a vCPU.
> 
> I tend to agree here, especially considering how horribly fragile GICv4
> has been in some systems. I know of a couple implementations where ITS
> command failures and/or unmapped MSIs are fatal for the entire machine.
> Debugging them has been a genuine pain in the ass.
> 
> WARN'ing when state tracking for vLPIs is out of whack would've made it
> a little easier.

Marc, does this look and read better?

I'd really, really like to get this sorted out asap, as it's the only thing
blocking the series, and I want to get the series into linux-next early next
week, before I go OOO for ~10 days.

--
From: Sean Christopherson <seanjc@google.com>
Date: Thu, 12 Jun 2025 16:51:47 -0700
Subject: [PATCH] KVM: arm64: WARN if unmapping a vLPI fails in any path

When unmapping a vLPI, WARN if nullifying vCPU affinity fails, not just if
failure occurs when freeing an ITE.  If undoing vCPU affinity fails, then
odds are very good that vLPI state tracking has has gotten out of whack,
i.e. that KVM and the GIC disagree on the state of an IRQ/vLPI.  At best,
inconsistent state means there is a lurking bug/flaw somewhere.  At worst,
the inconsistency could eventually be fatal to the host, e.g. if an ITS
command fails because KVM's view of things doesn't match reality/hardware.

Note, only the call from kvm_arch_irq_bypass_del_producer() by way of
kvm_vgic_v4_unset_forwarding() doesn't already WARN.  Common KVM's
kvm_irq_routing_update() WARNs if kvm_arch_update_irqfd_routing() fails.
For that path, if its_unmap_vlpi() fails in kvm_vgic_v4_unset_forwarding(),
the only possible causes are that the GIC doesn't have a v4 ITS (from
its_irq_set_vcpu_affinity()):

        /* Need a v4 ITS */
        if (!is_v4(its_dev->its))
                return -EINVAL;

        guard(raw_spinlock)(&its_dev->event_map.vlpi_lock);

        /* Unmap request? */
        if (!info)
                return its_vlpi_unmap(d);

or that KVM has gotten out of sync with the GIC/ITS (from its_vlpi_unmap()):

        if (!its_dev->event_map.vm || !irqd_is_forwarded_to_vcpu(d))
                return -EINVAL;

All of the above failure scenarios are warnable offences, as they should
never occur absent a kernel/KVM bug.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/kvm/vgic/vgic-its.c     | 2 +-
 arch/arm64/kvm/vgic/vgic-v4.c      | 4 ++--
 drivers/irqchip/irq-gic-v4.c       | 4 ++--
 include/linux/irqchip/arm-gic-v4.h | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index 534049c7c94b..98630dae910d 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -758,7 +758,7 @@ static void its_free_ite(struct kvm *kvm, struct its_ite *ite)
 	if (irq) {
 		scoped_guard(raw_spinlock_irqsave, &irq->irq_lock) {
 			if (irq->hw)
-				WARN_ON(its_unmap_vlpi(ite->irq->host_irq));
+				its_unmap_vlpi(ite->irq->host_irq);
 
 			irq->hw = false;
 		}
diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c
index 193946108192..911170d4a9c8 100644
--- a/arch/arm64/kvm/vgic/vgic-v4.c
+++ b/arch/arm64/kvm/vgic/vgic-v4.c
@@ -545,10 +545,10 @@ int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int host_irq)
 	if (irq->hw) {
 		atomic_dec(&irq->target_vcpu->arch.vgic_cpu.vgic_v3.its_vpe.vlpi_count);
 		irq->hw = false;
-		ret = its_unmap_vlpi(host_irq);
+		its_unmap_vlpi(host_irq);
 	}
 
 	raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
 	vgic_put_irq(kvm, irq);
-	return ret;
+	return 0;
 }
diff --git a/drivers/irqchip/irq-gic-v4.c b/drivers/irqchip/irq-gic-v4.c
index 58c28895f8c4..8455b4a5fbb0 100644
--- a/drivers/irqchip/irq-gic-v4.c
+++ b/drivers/irqchip/irq-gic-v4.c
@@ -342,10 +342,10 @@ int its_get_vlpi(int irq, struct its_vlpi_map *map)
 	return irq_set_vcpu_affinity(irq, &info);
 }
 
-int its_unmap_vlpi(int irq)
+void its_unmap_vlpi(int irq)
 {
 	irq_clear_status_flags(irq, IRQ_DISABLE_UNLAZY);
-	return irq_set_vcpu_affinity(irq, NULL);
+	WARN_ON_ONCE(irq_set_vcpu_affinity(irq, NULL));
 }
 
 int its_prop_update_vlpi(int irq, u8 config, bool inv)
diff --git a/include/linux/irqchip/arm-gic-v4.h b/include/linux/irqchip/arm-gic-v4.h
index 7f1f11a5e4e4..0b0887099fd7 100644
--- a/include/linux/irqchip/arm-gic-v4.h
+++ b/include/linux/irqchip/arm-gic-v4.h
@@ -146,7 +146,7 @@ int its_commit_vpe(struct its_vpe *vpe);
 int its_invall_vpe(struct its_vpe *vpe);
 int its_map_vlpi(int irq, struct its_vlpi_map *map);
 int its_get_vlpi(int irq, struct its_vlpi_map *map);
-int its_unmap_vlpi(int irq);
+void its_unmap_vlpi(int irq);
 int its_prop_update_vlpi(int irq, u8 config, bool inv);
 int its_prop_update_vsgi(int irq, u8 priority, bool group);
 

base-commit: 4fc39a165c70a49991b7cc29be3a19eddcd9e5b9
--
Re: [PATCH v3 02/62] KVM: arm64: WARN if unmapping vLPI fails
Posted by Oliver Upton 3 months, 3 weeks ago
On Fri, Jun 20, 2025 at 10:22:32AM -0700, Sean Christopherson wrote:
> On Fri, Jun 13, 2025, Oliver Upton wrote:
> > On Thu, Jun 12, 2025 at 07:34:35AM -0700, Sean Christopherson wrote:
> > > On Thu, Jun 12, 2025, Marc Zyngier wrote:
> > > > But not having an VLPI mapping for an interrupt at the point where we're
> > > > tearing down the forwarding is pretty benign. IRQs *still* go where they
> > > > should, and we don't lose anything.
> > 
> > The VM may not actually be getting torn down, though. The series of
> > fixes [*] we took for 6.16 addressed games that VMMs might be playing on
> > irqbypass for a live VM.
> > 
> > [*] https://lore.kernel.org/kvmarm/20250523194722.4066715-1-oliver.upton@linux.dev/
> > 
> > > All of those failure scenario seem like warnable offences when KVM thinks it has
> > > configured the IRQ to be forwarded to a vCPU.
> > 
> > I tend to agree here, especially considering how horribly fragile GICv4
> > has been in some systems. I know of a couple implementations where ITS
> > command failures and/or unmapped MSIs are fatal for the entire machine.
> > Debugging them has been a genuine pain in the ass.
> > 
> > WARN'ing when state tracking for vLPIs is out of whack would've made it
> > a little easier.
> 
> Marc, does this look and read better?
> 
> I'd really, really like to get this sorted out asap, as it's the only thing
> blocking the series, and I want to get the series into linux-next early next
> week, before I go OOO for ~10 days.

Can you just send it out as a standalone patch? It's only tangientally
related to the truckload of x86 stuff that I'd rather not pull in the
event of conflict resolution.

Thanks,
Oliver

> --
> From: Sean Christopherson <seanjc@google.com>
> Date: Thu, 12 Jun 2025 16:51:47 -0700
> Subject: [PATCH] KVM: arm64: WARN if unmapping a vLPI fails in any path
> 
> When unmapping a vLPI, WARN if nullifying vCPU affinity fails, not just if
> failure occurs when freeing an ITE.  If undoing vCPU affinity fails, then
> odds are very good that vLPI state tracking has has gotten out of whack,
> i.e. that KVM and the GIC disagree on the state of an IRQ/vLPI.  At best,
> inconsistent state means there is a lurking bug/flaw somewhere.  At worst,
> the inconsistency could eventually be fatal to the host, e.g. if an ITS
> command fails because KVM's view of things doesn't match reality/hardware.
> 
> Note, only the call from kvm_arch_irq_bypass_del_producer() by way of
> kvm_vgic_v4_unset_forwarding() doesn't already WARN.  Common KVM's
> kvm_irq_routing_update() WARNs if kvm_arch_update_irqfd_routing() fails.
> For that path, if its_unmap_vlpi() fails in kvm_vgic_v4_unset_forwarding(),
> the only possible causes are that the GIC doesn't have a v4 ITS (from
> its_irq_set_vcpu_affinity()):
> 
>         /* Need a v4 ITS */
>         if (!is_v4(its_dev->its))
>                 return -EINVAL;
> 
>         guard(raw_spinlock)(&its_dev->event_map.vlpi_lock);
> 
>         /* Unmap request? */
>         if (!info)
>                 return its_vlpi_unmap(d);
> 
> or that KVM has gotten out of sync with the GIC/ITS (from its_vlpi_unmap()):
> 
>         if (!its_dev->event_map.vm || !irqd_is_forwarded_to_vcpu(d))
>                 return -EINVAL;
> 
> All of the above failure scenarios are warnable offences, as they should
> never occur absent a kernel/KVM bug.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/arm64/kvm/vgic/vgic-its.c     | 2 +-
>  arch/arm64/kvm/vgic/vgic-v4.c      | 4 ++--
>  drivers/irqchip/irq-gic-v4.c       | 4 ++--
>  include/linux/irqchip/arm-gic-v4.h | 2 +-
>  4 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
> index 534049c7c94b..98630dae910d 100644
> --- a/arch/arm64/kvm/vgic/vgic-its.c
> +++ b/arch/arm64/kvm/vgic/vgic-its.c
> @@ -758,7 +758,7 @@ static void its_free_ite(struct kvm *kvm, struct its_ite *ite)
>  	if (irq) {
>  		scoped_guard(raw_spinlock_irqsave, &irq->irq_lock) {
>  			if (irq->hw)
> -				WARN_ON(its_unmap_vlpi(ite->irq->host_irq));
> +				its_unmap_vlpi(ite->irq->host_irq);
>  
>  			irq->hw = false;
>  		}
> diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c
> index 193946108192..911170d4a9c8 100644
> --- a/arch/arm64/kvm/vgic/vgic-v4.c
> +++ b/arch/arm64/kvm/vgic/vgic-v4.c
> @@ -545,10 +545,10 @@ int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int host_irq)
>  	if (irq->hw) {
>  		atomic_dec(&irq->target_vcpu->arch.vgic_cpu.vgic_v3.its_vpe.vlpi_count);
>  		irq->hw = false;
> -		ret = its_unmap_vlpi(host_irq);
> +		its_unmap_vlpi(host_irq);
>  	}
>  
>  	raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
>  	vgic_put_irq(kvm, irq);
> -	return ret;
> +	return 0;
>  }
> diff --git a/drivers/irqchip/irq-gic-v4.c b/drivers/irqchip/irq-gic-v4.c
> index 58c28895f8c4..8455b4a5fbb0 100644
> --- a/drivers/irqchip/irq-gic-v4.c
> +++ b/drivers/irqchip/irq-gic-v4.c
> @@ -342,10 +342,10 @@ int its_get_vlpi(int irq, struct its_vlpi_map *map)
>  	return irq_set_vcpu_affinity(irq, &info);
>  }
>  
> -int its_unmap_vlpi(int irq)
> +void its_unmap_vlpi(int irq)
>  {
>  	irq_clear_status_flags(irq, IRQ_DISABLE_UNLAZY);
> -	return irq_set_vcpu_affinity(irq, NULL);
> +	WARN_ON_ONCE(irq_set_vcpu_affinity(irq, NULL));
>  }
>  
>  int its_prop_update_vlpi(int irq, u8 config, bool inv)
> diff --git a/include/linux/irqchip/arm-gic-v4.h b/include/linux/irqchip/arm-gic-v4.h
> index 7f1f11a5e4e4..0b0887099fd7 100644
> --- a/include/linux/irqchip/arm-gic-v4.h
> +++ b/include/linux/irqchip/arm-gic-v4.h
> @@ -146,7 +146,7 @@ int its_commit_vpe(struct its_vpe *vpe);
>  int its_invall_vpe(struct its_vpe *vpe);
>  int its_map_vlpi(int irq, struct its_vlpi_map *map);
>  int its_get_vlpi(int irq, struct its_vlpi_map *map);
> -int its_unmap_vlpi(int irq);
> +void its_unmap_vlpi(int irq);
>  int its_prop_update_vlpi(int irq, u8 config, bool inv);
>  int its_prop_update_vsgi(int irq, u8 priority, bool group);
>  
> 
> base-commit: 4fc39a165c70a49991b7cc29be3a19eddcd9e5b9
> --
>
Re: [PATCH v3 02/62] KVM: arm64: WARN if unmapping vLPI fails
Posted by Sean Christopherson 3 months, 3 weeks ago
On Fri, Jun 20, 2025, Oliver Upton wrote:
> On Fri, Jun 20, 2025 at 10:22:32AM -0700, Sean Christopherson wrote:
> > On Fri, Jun 13, 2025, Oliver Upton wrote:
> > > On Thu, Jun 12, 2025 at 07:34:35AM -0700, Sean Christopherson wrote:
> > > > On Thu, Jun 12, 2025, Marc Zyngier wrote:
> > > > > But not having an VLPI mapping for an interrupt at the point where we're
> > > > > tearing down the forwarding is pretty benign. IRQs *still* go where they
> > > > > should, and we don't lose anything.
> > > 
> > > The VM may not actually be getting torn down, though. The series of
> > > fixes [*] we took for 6.16 addressed games that VMMs might be playing on
> > > irqbypass for a live VM.
> > > 
> > > [*] https://lore.kernel.org/kvmarm/20250523194722.4066715-1-oliver.upton@linux.dev/
> > > 
> > > > All of those failure scenario seem like warnable offences when KVM thinks it has
> > > > configured the IRQ to be forwarded to a vCPU.
> > > 
> > > I tend to agree here, especially considering how horribly fragile GICv4
> > > has been in some systems. I know of a couple implementations where ITS
> > > command failures and/or unmapped MSIs are fatal for the entire machine.
> > > Debugging them has been a genuine pain in the ass.
> > > 
> > > WARN'ing when state tracking for vLPIs is out of whack would've made it
> > > a little easier.
> > 
> > Marc, does this look and read better?
> > 
> > I'd really, really like to get this sorted out asap, as it's the only thing
> > blocking the series, and I want to get the series into linux-next early next
> > week, before I go OOO for ~10 days.
> 
> Can you just send it out as a standalone patch? It's only tangientally
> related to the truckload of x86 stuff

The issue is that "KVM: Don't WARN if updating IRQ bypass route fails" directly
depends on both this patch and decent chunk of the x86 crud.  I could probably
trim some of the x86 crud by reshuffling patches around, but I can't get rid of
it entirely.

> that I'd rather not pull in the event of conflict resolution.

LOL, why not?  :-)

If I post it as a standalone patch, could you/Marc put it into a stable topic
branch based on kvm/master? (kvm/master now has patch 1, yay!)  Then I can create
a topic branch for this mountain of stuff based on the arm64 topic branch.
Re: [PATCH v3 02/62] KVM: arm64: WARN if unmapping vLPI fails
Posted by Oliver Upton 3 months, 3 weeks ago
On Fri, Jun 20, 2025 at 12:04:19PM -0700, Sean Christopherson wrote:
> On Fri, Jun 20, 2025, Oliver Upton wrote:
> > Can you just send it out as a standalone patch? It's only tangientally
> > related to the truckload of x86 stuff
> 
> The issue is that "KVM: Don't WARN if updating IRQ bypass route fails" directly
> depends on both this patch

Eh? At worst we break bisection for the warn, which I'm sure we manage
to commit similar high crimes on the regular.

> If I post it as a standalone patch, could you/Marc put it into a stable topic
> branch based on kvm/master? (kvm/master now has patch 1, yay!)  Then I can create
> a topic branch for this mountain of stuff based on the arm64 topic branch.

Ok, how about making the arm64 piece patch 1 in your series and you take
the whole pile. If we need it, I'll bug you for a ref that only has the
first change.

That ok?

Thanks,
Oliver
Re: [PATCH v3 02/62] KVM: arm64: WARN if unmapping vLPI fails
Posted by Sean Christopherson 3 months, 3 weeks ago
On Fri, Jun 20, 2025, Oliver Upton wrote:
> On Fri, Jun 20, 2025 at 12:04:19PM -0700, Sean Christopherson wrote:
> > If I post it as a standalone patch, could you/Marc put it into a stable topic
> > branch based on kvm/master? (kvm/master now has patch 1, yay!)  Then I can create
> > a topic branch for this mountain of stuff based on the arm64 topic branch.
> 
> Ok, how about making the arm64 piece patch 1 in your series and you take
> the whole pile. If we need it, I'll bug you for a ref that only has the
> first change.

Any preference as to whether I formally post the last version, or if I apply it
directly from this thread?

> That ok?

Ya, works for me.  What's the going bribe rate for an ack these days?  :-D
Re: [PATCH v3 02/62] KVM: arm64: WARN if unmapping vLPI fails
Posted by Oliver Upton 3 months, 3 weeks ago
On Fri, Jun 20, 2025 at 01:31:55PM -0700, Sean Christopherson wrote:
> On Fri, Jun 20, 2025, Oliver Upton wrote:
> > On Fri, Jun 20, 2025 at 12:04:19PM -0700, Sean Christopherson wrote:
> > > If I post it as a standalone patch, could you/Marc put it into a stable topic
> > > branch based on kvm/master? (kvm/master now has patch 1, yay!)  Then I can create
> > > a topic branch for this mountain of stuff based on the arm64 topic branch.
> > 
> > Ok, how about making the arm64 piece patch 1 in your series and you take
> > the whole pile. If we need it, I'll bug you for a ref that only has the
> > first change.
> 
> Any preference as to whether I formally post the last version, or if I apply it
> directly from this thread?

Since I have this email on my phone I'm always in favor of not getting
spammed :) This thread has all the discourse too, so likely a better
artifact.

> > That ok?
> 
> Ya, works for me.  What's the going bribe rate for an ack these days?  :-D

Usually about the cost of a pint ;-) For the arm64 bits:

Acked-by: Oliver Upton <oliver.upton@linux.dev>

-- 
Thanks,
Oliver
Re: [PATCH v3 02/62] KVM: arm64: WARN if unmapping vLPI fails
Posted by David Woodhouse 3 months, 3 weeks ago
On Fri, 2025-06-20 at 10:22 -0700, Sean Christopherson wrote:
> On Fri, Jun 13, 2025, Oliver Upton wrote:
> > On Thu, Jun 12, 2025 at 07:34:35AM -0700, Sean Christopherson wrote:
> > > On Thu, Jun 12, 2025, Marc Zyngier wrote:
> > > > But not having an VLPI mapping for an interrupt at the point where we're
> > > > tearing down the forwarding is pretty benign. IRQs *still* go where they
> > > > should, and we don't lose anything.
> > 
> > The VM may not actually be getting torn down, though. The series of
> > fixes [*] we took for 6.16 addressed games that VMMs might be playing on
> > irqbypass for a live VM.
> > 
> > [*] https://lore.kernel.org/kvmarm/20250523194722.4066715-1-oliver.upton@linux.dev/
> > 
> > > All of those failure scenario seem like warnable offences when KVM thinks it has
> > > configured the IRQ to be forwarded to a vCPU.
> > 
> > I tend to agree here, especially considering how horribly fragile GICv4
> > has been in some systems. I know of a couple implementations where ITS
> > command failures and/or unmapped MSIs are fatal for the entire machine.
> > Debugging them has been a genuine pain in the ass.
> > 
> > WARN'ing when state tracking for vLPIs is out of whack would've made it
> > a little easier.
> 
> Marc, does this look and read better?
> 
> I'd really, really like to get this sorted out asap, as it's the only thing
> blocking the series, and I want to get the series into linux-next early next
> week, before I go OOO for ~10 days.
> 
> --
> From: Sean Christopherson <seanjc@google.com>
> Date: Thu, 12 Jun 2025 16:51:47 -0700
> Subject: [PATCH] KVM: arm64: WARN if unmapping a vLPI fails in any path
> 
> When unmapping a vLPI, WARN if nullifying vCPU affinity fails, not just if
> failure occurs when freeing an ITE.  If undoing vCPU affinity fails, then
> odds are very good that vLPI state tracking has has gotten out of whack,
> i.e. that KVM and the GIC disagree on the state of an IRQ/vLPI.  At best,
> inconsistent state means there is a lurking bug/flaw somewhere.  At worst,
> the inconsistency could eventually be fatal to the host, e.g. if an ITS
> command fails because KVM's view of things doesn't match reality/hardware.

Btw, we finally figured out the reason some machines were just going
dark on kexec, with the new kernel being corrupted. It turns out the
GIC is still scribbling on the vLPI Pending Table even after it isn't
the vLPI Pending Table any more, and is now part of the new kernel's
text.

In my queue I have a patch to call unmap_all_vpes() ∀ kvm on kexec,
which makes the problem go away.