[PATCH] iommu/vt-d: Wire up irq_ack() to irq_move_irq() for posted MSIs

Sean Christopherson posted 1 patch 10 months, 3 weeks ago
drivers/iommu/intel/irq_remapping.c | 29 +++++++++++++++--------------
1 file changed, 15 insertions(+), 14 deletions(-)
[PATCH] iommu/vt-d: Wire up irq_ack() to irq_move_irq() for posted MSIs
Posted by Sean Christopherson 10 months, 3 weeks ago
Set the posted MSI irq_chip's irq_ack() hook to irq_move_irq() instead of
a dummy/empty callback so that posted MSIs process pending changes to the
IRQ's SMP affinity.  Failure to honor a pending set-affinity results in
userspace being unable to change the effective affinity of the IRQ, as
IRQD_SETAFFINITY_PENDING is never cleared and so irq_set_affinity_locked()
always defers moving the IRQ.

The issue is most easily reproducible by setting /proc/irq/xx/smp_affinity
multiple times in quick succession, as only the first update is likely to
be handled in process context.

Fixes: ed1e48ea4370 ("iommu/vt-d: Enable posted mode for device MSIs")
Cc: Robert Lippert <rlippert@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Wentao Yang <wentaoyang@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/iommu/intel/irq_remapping.c | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index ad795c772f21..333536c5259c 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1278,43 +1278,44 @@ static struct irq_chip intel_ir_chip = {
 };
 
 /*
- * With posted MSIs, all vectors are multiplexed into a single notification
- * vector. Devices MSIs are then dispatched in a demux loop where
- * EOIs can be coalesced as well.
+ * With posted MSIs, the MSI vectors are multiplexed into a single notification
+ * vector, and only the notification vector is sent to the APIC IRR.  Device
+ * MSIs are then dispatched in a demux loop that harvests the MSIs from the
+ * CPU's Posted Interrupt Request bitmap.  I.e. Posted MSIs never get sent to
+ * the APIC IRR, and thus do not need an EOI.  The notification handler instead
+ * performs a single EOI after processing the PIR.
  *
- * "INTEL-IR-POST" IRQ chip does not do EOI on ACK, thus the dummy irq_ack()
- * function. Instead EOI is performed by the posted interrupt notification
- * handler.
+ * Note!  Pending SMP/CPU affinity changes, which are per MSI, must still be
+ * honored, only the APIC EOI is omitted.
  *
  * For the example below, 3 MSIs are coalesced into one CPU notification. Only
- * one apic_eoi() is needed.
+ * one apic_eoi() is needed, but each MSI needs to process pending changes to
+ * its CPU affinity.
  *
  * __sysvec_posted_msi_notification()
  *	irq_enter();
  *		handle_edge_irq()
  *			irq_chip_ack_parent()
- *				dummy(); // No EOI
+ *				irq_move_irq(); // No EOI
  *			handle_irq_event()
  *				driver_handler()
  *		handle_edge_irq()
  *			irq_chip_ack_parent()
- *				dummy(); // No EOI
+ *				irq_move_irq(); // No EOI
  *			handle_irq_event()
  *				driver_handler()
  *		handle_edge_irq()
  *			irq_chip_ack_parent()
- *				dummy(); // No EOI
+ *				irq_move_irq(); // No EOI
  *			handle_irq_event()
  *				driver_handler()
  *	apic_eoi()
  *	irq_exit()
+ *
  */
-
-static void dummy_ack(struct irq_data *d) { }
-
 static struct irq_chip intel_ir_chip_post_msi = {
 	.name			= "INTEL-IR-POST",
-	.irq_ack		= dummy_ack,
+	.irq_ack		= irq_move_irq,
 	.irq_set_affinity	= intel_ir_set_affinity,
 	.irq_compose_msi_msg	= intel_ir_compose_msi_msg,
 	.irq_set_vcpu_affinity	= intel_ir_set_vcpu_affinity,

base-commit: d07de43e3f05576fd275c8c82e413d91932119a5
-- 
2.49.0.395.g12beb8f557-goog
Re: [PATCH] iommu/vt-d: Wire up irq_ack() to irq_move_irq() for posted MSIs
Posted by Thomas Gleixner 10 months, 3 weeks ago
On Fri, Mar 21 2025 at 12:42, Sean Christopherson wrote:
> Set the posted MSI irq_chip's irq_ack() hook to irq_move_irq() instead of
> a dummy/empty callback so that posted MSIs process pending changes to the
> IRQ's SMP affinity.  Failure to honor a pending set-affinity results in
> userspace being unable to change the effective affinity of the IRQ, as
> IRQD_SETAFFINITY_PENDING is never cleared and so irq_set_affinity_locked()
> always defers moving the IRQ.
>
> The issue is most easily reproducible by setting /proc/irq/xx/smp_affinity
> multiple times in quick succession, as only the first update is likely to
> be handled in process context.
>
> Fixes: ed1e48ea4370 ("iommu/vt-d: Enable posted mode for device MSIs")
> Cc: Robert Lippert <rlippert@google.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Reported-by: Wentao Yang <wentaoyang@google.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>