[PATCH] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag

Marc Zyngier posted 1 patch 8 months, 3 weeks ago
There is a newer version of this series
drivers/irqchip/irq-msi-lib.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
[PATCH] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag
Posted by Marc Zyngier 8 months, 3 weeks ago
For systems that implement interrupt masking at the interrupt
controller level, the MSI library offers MSI_FLAG_PCI_MSI_MASK_PARENT.
It indicates that it isn't enough to only unmask the interrupt at the PCI
device level, but that the interrupt controller must also be involved.

However, the way this is currently done is less than optimal, as the
masking/unmasking is done on both side, always. It would be far cheaper
to unmask both at the start of times, and then only deal with the
interrupt controller mask, which is likely to be cheaper than a round-trip
to the endpoint.

Implement this by patching up the irq_chip structure associated with
the MSIs to perform the full unmask on .irq_enable(), and the full mask
on .irq_shutdown(). This asymmetry allows the preservation of the
"lazy disable" feature, which relies on the top-level irq_chip not
implementing the .irq_disable() callback. Yes, this is a terrible hack.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 drivers/irqchip/irq-msi-lib.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/irqchip/irq-msi-lib.c b/drivers/irqchip/irq-msi-lib.c
index 246c30205af40..8c62034ab8d92 100644
--- a/drivers/irqchip/irq-msi-lib.c
+++ b/drivers/irqchip/irq-msi-lib.c
@@ -112,6 +112,21 @@ bool msi_lib_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
 	 */
 	if (!chip->irq_set_affinity && !(info->flags & MSI_FLAG_NO_AFFINITY))
 		chip->irq_set_affinity = msi_domain_set_affinity;
+
+	/*
+	 * If the parent domain insists on being in charge of masking, obey
+	 * blindly. The default mask/unmask become the shutdown/enable
+	 * callbacks, ensuring that we correctly start/stop the interrupt.
+ 	 * We make a point in not using the irq_disable() in order to
+	 * preserve the "lazy disable" behaviour.
+	 */
+	if (info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT) {
+		chip->irq_shutdown	= chip->irq_mask;
+		chip->irq_enable	= chip->irq_unmask;
+		chip->irq_mask		= irq_chip_mask_parent;
+		chip->irq_unmask	= irq_chip_unmask_parent;
+	}
+
 	return true;
 }
 EXPORT_SYMBOL_GPL(msi_lib_init_dev_msi_info);
-- 
2.39.2
Re: [PATCH] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag
Posted by Thomas Gleixner 8 months, 3 weeks ago
On Sat, May 17 2025 at 11:30, Marc Zyngier wrote:
> +	/*
> +	 * If the parent domain insists on being in charge of masking, obey
> +	 * blindly. The default mask/unmask become the shutdown/enable
> +	 * callbacks, ensuring that we correctly start/stop the interrupt.
> + 	 * We make a point in not using the irq_disable() in order to
> +	 * preserve the "lazy disable" behaviour.
> +	 */
> +	if (info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT) {
> +		chip->irq_shutdown	= chip->irq_mask;
> +		chip->irq_enable	= chip->irq_unmask;

This is only correct, when the chip does not have dedicated
irq_shutdown/enable callbacks. And I really hate the asymmetry of this.

> +		chip->irq_mask		= irq_chip_mask_parent;
> +		chip->irq_unmask	= irq_chip_unmask_parent;
> +	}

I'm still trying to understand, what's the actual problem is you are
trying to solve.

MSIs are edge type interrupts, so the interrupt handling hotpath usually
does not mask at all. The only time masking happens is when it's lazy
disabled or during affinity changes, which is not the end of the world.

Thanks,

        tglx
Re: [PATCH] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag
Posted by Marc Zyngier 8 months, 2 weeks ago
On Sat, 17 May 2025 20:59:10 +0100,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Sat, May 17 2025 at 11:30, Marc Zyngier wrote:
> > +	/*
> > +	 * If the parent domain insists on being in charge of masking, obey
> > +	 * blindly. The default mask/unmask become the shutdown/enable
> > +	 * callbacks, ensuring that we correctly start/stop the interrupt.
> > + 	 * We make a point in not using the irq_disable() in order to
> > +	 * preserve the "lazy disable" behaviour.
> > +	 */
> > +	if (info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT) {
> > +		chip->irq_shutdown	= chip->irq_mask;
> > +		chip->irq_enable	= chip->irq_unmask;
> 
> This is only correct, when the chip does not have dedicated
> irq_shutdown/enable callbacks.

The chip structure provided by the PCI MSI code doesn't provide such
callback, meaning that they are unused for the whole hierarchy.

> And I really hate the asymmetry of this.

So do I, but that's how the lazy disable thing currently works. Drop
the bizarre asymmetry on irq_disable, and we can make this nicely
symmetric as well.

> 
> > +		chip->irq_mask		= irq_chip_mask_parent;
> > +		chip->irq_unmask	= irq_chip_unmask_parent;
> > +	}
> 
> I'm still trying to understand, what's the actual problem is you are
> trying to solve.

I'm trying to remove some overhead from machines that don't need to
suffer from this nonsense double masking. Specially in VMs when
masking/unmasking requires *two* extremely costly exits (write +
synchronising read-back). This change reduces the overhead
significantly by only masking where it actually matters.

> MSIs are edge type interrupts, so the interrupt handling hotpath usually
> does not mask at all. The only time masking happens is when it's lazy
> disabled or during affinity changes, which is not the end of the world.

And that's part of the problem. The lazy disable ends up being way
more costly than it should when the interrupt fires during the
"disabled but not quite" phase, and in turn makes the critical section
delineated by disable_irq()/enable_irq() more expensive.

So while, as you put it, it's "not the end of the world", this seems
to me like a valuable optimisation.

Another possible improvement would be to teach the PCI code it can
still rely on masking even when the endpoint is not capable of masking
individual MSIs.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
Re: [PATCH] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag
Posted by Thomas Gleixner 7 months, 1 week ago
On Fri, May 23 2025 at 10:06, Marc Zyngier wrote:
> On Sat, 17 May 2025 20:59:10 +0100,
> Thomas Gleixner <tglx@linutronix.de> wrote:
>> 
>> On Sat, May 17 2025 at 11:30, Marc Zyngier wrote:
>> > +	/*
>> > +	 * If the parent domain insists on being in charge of masking, obey
>> > +	 * blindly. The default mask/unmask become the shutdown/enable
>> > +	 * callbacks, ensuring that we correctly start/stop the interrupt.
>> > + 	 * We make a point in not using the irq_disable() in order to
>> > +	 * preserve the "lazy disable" behaviour.
>> > +	 */
>> > +	if (info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT) {
>> > +		chip->irq_shutdown	= chip->irq_mask;
>> > +		chip->irq_enable	= chip->irq_unmask;
>> 
>> This is only correct, when the chip does not have dedicated
>> irq_shutdown/enable callbacks.
>
> The chip structure provided by the PCI MSI code doesn't provide such
> callback, meaning that they are unused for the whole hierarchy.

Fair enough, but it still stinks.

>> And I really hate the asymmetry of this.
>
> So do I, but that's how the lazy disable thing currently works. Drop
> the bizarre asymmetry on irq_disable, and we can make this nicely
> symmetric as well.

Well, it's not that bizarre and it has a massive performance win if the
thing does not need to go out to the hardware in some scenarios. Don't
ask about the main use case. Mentioning it is probably considered a
violation of the United Nations Convention Against Torture (UNCAT).

>> > +		chip->irq_mask		= irq_chip_mask_parent;
>> > +		chip->irq_unmask	= irq_chip_unmask_parent;
>> > +	}
>> 
>> I'm still trying to understand, what's the actual problem is you are
>> trying to solve.
>
> I'm trying to remove some overhead from machines that don't need to
> suffer from this nonsense double masking. Specially in VMs when
> masking/unmasking requires *two* extremely costly exits (write +
> synchronising read-back). This change reduces the overhead
> significantly by only masking where it actually matters.
>
>> MSIs are edge type interrupts, so the interrupt handling hotpath usually
>> does not mask at all. The only time masking happens is when it's lazy
>> disabled or during affinity changes, which is not the end of the world.
>
> And that's part of the problem. The lazy disable ends up being way
> more costly than it should when the interrupt fires during the
> "disabled but not quite" phase, and in turn makes the critical section
> delineated by disable_irq()/enable_irq() more expensive.
>
> So while, as you put it, it's "not the end of the world", this seems
> to me like a valuable optimisation.

I understand, but this needs more thoughts. Doing this wholesale for all
potential PCI/MSI parent domains which require MASK_PARTN makes me more
than nervous.

> Another possible improvement would be to teach the PCI code it can
> still rely on masking even when the endpoint is not capable of masking
> individual MSIs.

Well, it relies on that today already if the underlying parent domain is
capable of masking. If not, it hopes that nothing bad happens, which is
the only option we have :(

It get's worse when the device does not support masking _and_ the parent
domain does not provide immutable MSI messages because then the MSI
message write becomes a horrorshow. For illustration see the mess in
arch/x86/kernel/apic/msi.c::msi_set_affinity(), which is a violation of
above mentioned convention as well. Despite the fact that this has been
known for decades, RISC-V went ahead and replicated that trainwreck in
the IMSIC IP block. Oh well....

I sat down and stared at it in the few moments where the heat wave did
not completely shutdown my brain. As usual this ended in a larger
cleanup and overhaul... At the end I went and created a new pair of chip
callbacks and the corresponding logic around it. A preview of the whole
pile is at:

  git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git irq/msi

Thanks,

        tglx
[patch 0/2] PCI/MSI: Avoid PCI level masking during normal operation if requested
Posted by Thomas Gleixner 5 months, 1 week ago
This is a follow up to Marc's attempt on this:

     https://lore.kernel.org/lkml/20250517103011.2573288-1-maz@kernel.org

Now that the PCI/MSI side has irq_startup/shutdown() callbacks, which do
the [un]masking at the PCI level, let the MSI parent domains which insist
on being in charge of masking do so for normal operations.

That avoids going out to the PCI endpoint in the case that an interrupt has
to be masked on arrival of an interrupt in software (lazy) disabled state.

That's achieved by overwriting the irq_[un]mask() callbacks in the irq/MSI
library.

As a consequence the conditional mask/unmask logic in the regular
irq_[un]mask() callbacks of the PCI/MSI domain is not longer required.

Thanks,

	tglx
---
 irqchip/irq-msi-lib.c |   14 ++++++++++++++
 pci/msi/irqdomain.c   |   20 --------------------
 2 files changed, 14 insertions(+), 20 deletions(-)
Re: [patch 0/2] PCI/MSI: Avoid PCI level masking during normal operation if requested
Posted by Marc Zyngier 5 months ago
On Wed, 03 Sep 2025 15:04:44 +0100,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> This is a follow up to Marc's attempt on this:
> 
>      https://lore.kernel.org/lkml/20250517103011.2573288-1-maz@kernel.org
> 
> Now that the PCI/MSI side has irq_startup/shutdown() callbacks, which do
> the [un]masking at the PCI level, let the MSI parent domains which insist
> on being in charge of masking do so for normal operations.
> 
> That avoids going out to the PCI endpoint in the case that an interrupt has
> to be masked on arrival of an interrupt in software (lazy) disabled state.
> 
> That's achieved by overwriting the irq_[un]mask() callbacks in the irq/MSI
> library.
> 
> As a consequence the conditional mask/unmask logic in the regular
> irq_[un]mask() callbacks of the PCI/MSI domain is not longer required.

I took this for a quick ride on some of my least favourite machines,
and nothing caught fire.

FWIW:

Acked-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>

Thanks for having re-spun it,

	M.

-- 
Without deviation from the norm, progress is not possible.
[patch 1/2] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag
Posted by Thomas Gleixner 5 months, 1 week ago
From: Marc Zyngier <maz@kernel.org>

For systems that implement interrupt masking at the interrupt controller
level, the MSI library offers MSI_FLAG_PCI_MSI_MASK_PARENT.  It indicates
that it isn't enough to only unmask the interrupt at the PCI device level,
but that the interrupt controller must also be involved.

However, the way this is currently done is less than optimal, as the
masking/unmasking is done on both sides, always. It would be far cheaper to
unmask both at the start of times, and then only deal with the interrupt
controller mask, which is cheaper than a round-trip to the PCI endpoint.

Now that the PCI/MSI layer implements irq_startup() and irq_shutdown()
callbacks, which [un]mask at the PCI level and honor the request to
[un]mask the parent, this can be trivially done.

Overwrite the irq_mask/unmask() callbacks of the device domain interrupt
chip with irq_[un]mask_parent() when the parent domain asks for it.

[ tglx: Adopted to the PCI/MSI changes ]

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/irqchip/irq-msi-lib.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

--- a/drivers/irqchip/irq-msi-lib.c
+++ b/drivers/irqchip/irq-msi-lib.c
@@ -112,6 +112,20 @@ bool msi_lib_init_dev_msi_info(struct de
 	 */
 	if (!chip->irq_set_affinity && !(info->flags & MSI_FLAG_NO_AFFINITY))
 		chip->irq_set_affinity = msi_domain_set_affinity;
+
+	/*
+	 * If the parent domain insists on being in charge of masking, obey
+	 * blindly. The interrupt is un-masked at the PCI level on startup
+	 * and masked on shutdown to prevent rogue interrupts after the
+	 * driver freed the interrupt. Not masking it at the PCI level
+	 * speeds up operation for disable/enable_irq() as it avoids
+	 * getting all the way out to the PCI device.
+	 */
+	if (info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT) {
+		chip->irq_mask		= irq_chip_mask_parent;
+		chip->irq_unmask	= irq_chip_unmask_parent;
+	}
+
 	return true;
 }
 EXPORT_SYMBOL_GPL(msi_lib_init_dev_msi_info);
Re: [patch 1/2] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag
Posted by Luigi Rizzo 1 month, 2 weeks ago
There are platforms (including some ARM SoC) where the MSIx
writes are a performance killer, because they are exceedingly
serializing on the PCIe root port.

These platforms are the key motivation for Global Software
Interrupt Moderation (GSIM) which relies on actually masking
device interrupts so the MSIx writes are not generated.
https://lore.kernel.org/all/20251217112128.1401896-1-lrizzo@google.com/

Overriding mask/unmask with irq_chip_mask_parent() makes software
moderation ineffective. GSIM works great on ARM platforms before
this patch, but becomes ineffective afterwards, e.g. on linux 6.18.

The round trip through the PCI endpoint for mask_irq(), caused by the
readback to make sure the PCI write has been sent, is almost always
(or really always) unnecessary.  Masking is inherently racy; waiting
that the PCIe write has arrived at the device won't guarantee that an
interrupt has arrived in the meantime, so there is really no benefit
in the readback (which, for instance, can be conditionally removed with
code like the one below).

I measured the cost of pci_irq_mask_msix() and it goes from 1000-1500ns
with the readl(), down to 40-50ns without it.

Once we remove the costly readback, is there any remaining reason
to overwrite [un]mask_irq() with irq_chip_[un]mask_parent() ?
 
cheers
luigi

--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -17,6 +17,8 @@

 int pci_msi_enable = 1;

+DEFINE_PER_CPU(bool, pci_msix_fast_mask);
+
 /**
  * pci_msi_supported - check whether MSI may be enabled on a device
  * @dev: pointer to the pci_dev data structure of MSI device function
--- a/drivers/pci/msi/msi.h
+++ b/drivers/pci/msi/msi.h
@@ -40,10 +40,14 @@ static inline void pci_msix_write_vector_ctrl(struct msi_desc *desc, u32 ctrl)
                writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
 }

+DECLARE_PER_CPU(bool, pci_msix_fast_mask);
 static inline void pci_msix_mask(struct msi_desc *desc)
 {
        desc->pci.msix_ctrl |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
        pci_msix_write_vector_ctrl(desc, desc->pci.msix_ctrl);
+       /* There are only a few cases when we really need the read back. */
+       if (__this_cpu_read(pci_msix_fast_mask))
+               return;
        /* Flush write to device */
        readl(desc->pci.mask_base);
 }

--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -420,13 +420,20 @@ static inline void mask_ack_irq(struct irq_desc *desc)
        }
 }

+/* defined in driver/pci/msi/msi.c */
+DECLARE_PER_CPU(bool, pci_msix_fast_mask);
+
 void mask_irq(struct irq_desc *desc)
 {
        if (irqd_irq_masked(&desc->irq_data))
                return;

        if (desc->irq_data.chip->irq_mask) {
+                if (IS_ENABLED(CONFIG_PCI_MSI))
+                       __this_cpu_write(pci_msix_fast_mask, irqd_has_set(&desc->irq_data, IRQD_IRQ_MODERATED));
                desc->irq_data.chip->irq_mask(&desc->irq_data);
+               if (IS_ENABLED(CONFIG_PCI_MSI))
+                       __this_cpu_write(pci_msix_fast_mask, false);
                irq_state_set_masked(desc);
        }
 }
Re: [patch 1/2] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag
Posted by Marc Zyngier 1 month, 2 weeks ago
On Sat, 20 Dec 2025 19:31:19 +0000,
Luigi Rizzo <lrizzo@google.com> wrote:
> 
> There are platforms (including some ARM SoC) where the MSIx
> writes are a performance killer, because they are exceedingly
> serializing on the PCIe root port.
> 
> These platforms are the key motivation for Global Software
> Interrupt Moderation (GSIM) which relies on actually masking
> device interrupts so the MSIx writes are not generated.
> https://lore.kernel.org/all/20251217112128.1401896-1-lrizzo@google.com/
> 
> Overriding mask/unmask with irq_chip_mask_parent() makes software
> moderation ineffective. GSIM works great on ARM platforms before
> this patch, but becomes ineffective afterwards, e.g. on linux 6.18.

You do realise that "ARM platforms" means nothing at all, right? What
you actually mean is "the ARM machines I have access to exhibit some
platform-specific behaviour that may or may not be a general
behaviour".

Your particular circumstances are not in any way something you can
generalise, unless you demonstrate this is caused by an architectural
requirement rather than an implementation defect.

> The round trip through the PCI endpoint for mask_irq(), caused by the
> readback to make sure the PCI write has been sent, is almost always
> (or really always) unnecessary.  Masking is inherently racy; waiting
> that the PCIe write has arrived at the device won't guarantee that an
> interrupt has arrived in the meantime, so there is really no benefit
> in the readback (which, for instance, can be conditionally removed with
> code like the one below).
> 
> I measured the cost of pci_irq_mask_msix() and it goes from 1000-1500ns
> with the readl(), down to 40-50ns without it.
> 
> Once we remove the costly readback, is there any remaining reason
> to overwrite [un]mask_irq() with irq_chip_[un]mask_parent() ?

So you are effectively not masking at all and just rely on hope
instead. I have the utmost confidence in this sort of stuff. Totally.

What you missing is that hitting the config space is causing pretty
high overhead in KVM guests, where the accesses (write and read to the
MSI masks) are trapped all the way to userspace (and back into VFIO),
while the masking at the ITS level is much cheaper.

Masking at the ITS level (and only there) also means that the VM can
be migrated without having to worry about the PBA in each device,
because the pending state is already part of the VM's memory, nicely
tucked away in the RD tables.

Finally, it aligns PCI devices with non-PCI device behaviour,
something that is highly desirable.

For me, that totally beats your interrupt mitigation thing.

Thanks,

	M.

-- 
Jazz isn't dead. It just smells funny.
Re: [patch 1/2] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag
Posted by Luigi Rizzo 1 month, 2 weeks ago
On Sun, Dec 21, 2025 at 12:55 PM Marc Zyngier <maz@kernel.org> wrote:
>
> On Sat, 20 Dec 2025 19:31:19 +0000,
> Luigi Rizzo <lrizzo@google.com> wrote:
> >
> > There are platforms (including some ARM SoC) where the MSIx
> > writes are a performance killer, because they are exceedingly
> > serializing on the PCIe root port.
> >
> > These platforms are the key motivation for Global Software
> > Interrupt Moderation (GSIM) which relies on actually masking
> > device interrupts so the MSIx writes are not generated.
> > https://lore.kernel.org/all/20251217112128.1401896-1-lrizzo@google.com/
> >
> > Overriding mask/unmask with irq_chip_mask_parent() makes software
> > moderation ineffective. GSIM works great on ARM platforms before
> > this patch, but becomes ineffective afterwards, e.g. on linux 6.18.
>
> You do realise that "ARM platforms" means nothing at all, right? What
> you actually mean is "the ARM machines I have access to exhibit some
> platform-specific behaviour that may or may not be a general
> behaviour".
>
> Your particular circumstances are not in any way something you can
> generalise, unless you demonstrate this is caused by an architectural
> requirement rather than an implementation defect.

You are right, I should have been more precise and clarified "some arm machines
I have access to". Note though that the problem addressed by
https://lore.kernel.org/all/20251217112128.1401896-1-lrizzo@google.com/
is not for one broken snowflake. It affects multiple SoC families from
all vendors
(Intel, AMD, ARM), and is not new at all. Back in 2020 when Eric
Dumazet and I developed
napi_defer_hard_irqs to address this very problem on a specific platform (x86).
And sure, there are platforms that tolerate 30M intrs/s without a sweat.

Anyways. Systems are what they are, some have suboptimal
implementations which make certain operations more expensive
than they could be. We can just say "tough luck" and write them off
as broken, or try to mitigate the problem, and I am just exploring
how we can do the latter without harming common cases.

> > The round trip through the PCI endpoint for mask_irq(), caused by the
> > readback to make sure the PCI write has been sent, is almost always
> > (or really always) unnecessary.  Masking is inherently racy; waiting
> > that the PCIe write has arrived at the device won't guarantee that an
> > interrupt has arrived in the meantime, so there is really no benefit
> > in the readback (which, for instance, can be conditionally removed with
> > code like the one below).
> >
> > I measured the cost of pci_irq_mask_msix() and it goes from 1000-1500ns
> > with the readl(), down to 40-50ns without it.
> >
> > Once we remove the costly readback, is there any remaining reason
> > to overwrite [un]mask_irq() with irq_chip_[un]mask_parent() ?
>
> So you are effectively not masking at all and just rely on hope
> instead. I have the utmost confidence in this sort of stuff. Totally.

I don't understand the above comment.
Masking happens as a result of the PCIe write,
which will eventually reach the device. The presence of the
readback does nothing to accelerate the landing of the write.

If the expectation was "after the readl() there are no interrupts",
that is incorrect, because one may have been generated before
the mask landed, be in flight in the interrupt controller,
and fire after the readl() completes.

> What you missing is that hitting the config space is causing pretty
> high overhead in KVM guests, where the accesses (write and read to the
> MSI masks) are trapped all the way to userspace (and back into VFIO),
> while the masking at the ITS level is much cheaper.
>
> Masking at the ITS level (and only there) also means that the VM can
> be migrated without having to worry about the PBA in each device,
> because the pending state is already part of the VM's memory, nicely
> tucked away in the RD tables.

Good point about the guest. Regardless, without actually masking the
PCIe interrupts, on certain platforms performance really collapses
so one may have to choose the lesser evil.

My goal is to see if there is a way to optimize where it makes sense:
- first, how often do we actually call mask_irq() outside moderation ?
  it seems to happen only does it when the interrupt migrates to a
  different CPU, and handle_fasteoi_irq() for IRQS_ONESHOT
- second, how expensive is it to do the pci mask?
  You  make a good point about the guest, but for the host case,
  I still think we are paying the readback cost unnecessarily.
- third, how expensive is it to NOT do the mask?
  For this I did indicate data for large servers of all kinds that suffer.
  https://lore.kernel.org/all/20251217112128.1401896-1-lrizzo@google.com/

Given all the above, I think there is a case for having this optimization
(ignore device mask, just rely on interrupt controller)
configurable at least as a CONFIG option.

>
> Finally, it aligns PCI devices with non-PCI device behaviour,
> something that is highly desirable.

Sure, generally this is a good thing; except that what I was
pointing out is a peculiar PCI flaw on many platforms which,
unfortunate as it may be, requires a specific mitigation.

cheers
luigi

>
> For me, that totally beats your interrupt mitigation thing.
>
> Thanks,
>
>         M.
>
> --
> Jazz isn't dead. It just smells funny.
Re: [patch 1/2] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag
Posted by Marc Zyngier 1 month, 2 weeks ago
On Sun, 21 Dec 2025 12:41:37 +0000,
Luigi Rizzo <lrizzo@google.com> wrote:
> 
> On Sun, Dec 21, 2025 at 12:55 PM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Sat, 20 Dec 2025 19:31:19 +0000,
> > Luigi Rizzo <lrizzo@google.com> wrote:
> > >
> > > There are platforms (including some ARM SoC) where the MSIx
> > > writes are a performance killer, because they are exceedingly
> > > serializing on the PCIe root port.
> > >
> > > These platforms are the key motivation for Global Software
> > > Interrupt Moderation (GSIM) which relies on actually masking
> > > device interrupts so the MSIx writes are not generated.
> > > https://lore.kernel.org/all/20251217112128.1401896-1-lrizzo@google.com/
> > >
> > > Overriding mask/unmask with irq_chip_mask_parent() makes software
> > > moderation ineffective. GSIM works great on ARM platforms before
> > > this patch, but becomes ineffective afterwards, e.g. on linux 6.18.
> >
> > You do realise that "ARM platforms" means nothing at all, right? What
> > you actually mean is "the ARM machines I have access to exhibit some
> > platform-specific behaviour that may or may not be a general
> > behaviour".
> >
> > Your particular circumstances are not in any way something you can
> > generalise, unless you demonstrate this is caused by an architectural
> > requirement rather than an implementation defect.
> 
> You are right, I should have been more precise and clarified "some arm machines
> I have access to". Note though that the problem addressed by
> https://lore.kernel.org/all/20251217112128.1401896-1-lrizzo@google.com/
> is not for one broken snowflake. It affects multiple SoC families from
> all vendors
> (Intel, AMD, ARM), and is not new at all. Back in 2020 when Eric
> Dumazet and I developed
> napi_defer_hard_irqs to address this very problem on a specific platform (x86).
> And sure, there are platforms that tolerate 30M intrs/s without a sweat.
> 
> Anyways. Systems are what they are, some have suboptimal
> implementations which make certain operations more expensive
> than they could be. We can just say "tough luck" and write them off
> as broken, or try to mitigate the problem, and I am just exploring
> how we can do the latter without harming common cases.
> 
> > > The round trip through the PCI endpoint for mask_irq(), caused by the
> > > readback to make sure the PCI write has been sent, is almost always
> > > (or really always) unnecessary.  Masking is inherently racy; waiting
> > > that the PCIe write has arrived at the device won't guarantee that an
> > > interrupt has arrived in the meantime, so there is really no benefit
> > > in the readback (which, for instance, can be conditionally removed with
> > > code like the one below).
> > >
> > > I measured the cost of pci_irq_mask_msix() and it goes from 1000-1500ns
> > > with the readl(), down to 40-50ns without it.
> > >
> > > Once we remove the costly readback, is there any remaining reason
> > > to overwrite [un]mask_irq() with irq_chip_[un]mask_parent() ?
> >
> > So you are effectively not masking at all and just rely on hope
> > instead. I have the utmost confidence in this sort of stuff. Totally.
> 
> I don't understand the above comment.
> Masking happens as a result of the PCIe write,
> which will eventually reach the device. The presence of the
> readback does nothing to accelerate the landing of the write.

It doesn't accelerate it. It *guarantees* that the write is observed
and has taken effect. It acts as a completion barrier. Without it, the
write can be buffered at an arbitrary location in the interconnect, or
stored in the device but not acted upon.

What you have here is the equivalent of throwing a message in a bottle
at sea, and expecting a guaranteed reply.

> 
> If the expectation was "after the readl() there are no interrupts",
> that is incorrect, because one may have been generated before
> the mask landed, be in flight in the interrupt controller,
> and fire after the readl() completes.

What is guaranteed is that there is no *new* interrupt. Without read,
there is no requirement that the mask ever takes effect.

	M.

-- 
Without deviation from the norm, progress is not possible.
Re: [patch 1/2] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag
Posted by Thomas Gleixner 4 weeks, 1 day ago
On Mon, Dec 22 2025 at 16:16, Marc Zyngier wrote:
>> > > Once we remove the costly readback, is there any remaining reason
>> > > to overwrite [un]mask_irq() with irq_chip_[un]mask_parent() ?
>> >
>> > So you are effectively not masking at all and just rely on hope
>> > instead. I have the utmost confidence in this sort of stuff. Totally.
>> 
>> I don't understand the above comment.
>> Masking happens as a result of the PCIe write,
>> which will eventually reach the device. The presence of the
>> readback does nothing to accelerate the landing of the write.
>
> It doesn't accelerate it. It *guarantees* that the write is observed
> and has taken effect. It acts as a completion barrier. Without it, the
> write can be buffered at an arbitrary location in the interconnect, or
> stored in the device but not acted upon.
>
> What you have here is the equivalent of throwing a message in a bottle
> at sea, and expecting a guaranteed reply.

  https://xkcd.com/3150/
Re: [patch 1/2] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag
Posted by Luigi Rizzo 4 weeks, 1 day ago
On Thu, Jan 8, 2026 at 10:32 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Mon, Dec 22 2025 at 16:16, Marc Zyngier wrote:
> >> > > Once we remove the costly readback, is there any remaining reason
> >> > > to overwrite [un]mask_irq() with irq_chip_[un]mask_parent() ?
> >> >
> >> > So you are effectively not masking at all and just rely on hope
> >> > instead. I have the utmost confidence in this sort of stuff. Totally.
> >>
> >> I don't understand the above comment.
> >> Masking happens as a result of the PCIe write,
> >> which will eventually reach the device. The presence of the
> >> readback does nothing to accelerate the landing of the write.
> >
> > It doesn't accelerate it. It *guarantees* that the write is observed
> > and has taken effect. It acts as a completion barrier. Without it, the
> > write can be buffered at an arbitrary location in the interconnect, or
> > stored in the device but not acted upon.
> >
> > What you have here is the equivalent of throwing a message in a bottle
> > at sea, and expecting a guaranteed reply.
>
>   https://xkcd.com/3150/


The irony is that the change I was commenting about
completely removes masking at the PCI level (ie not even the bottle).

Anyways, coming back to my secondary point,
which is what does the readback give us, I want to repeat my arguments:

- knowing that the mask() has landed on the device does not guarantee
  that there was not a previously generated MSIx write in transit.
  So even with the readback, the interrupt subsystem must be
  prepared to handle the pending interrupt (which of course it is).

- if the concern is that the mask() write may never reach the
  device, there is no reason to expect that the unmask()
  will be treated differently. Yet, the unmask() has no readback

- unless I am missing some cases, outside of moderation
  the mask() is only ever called on interrupt
  shutdown or on affinity migration. so the frequency should
  be negligible and the benefits of the optimization probably
  hardly measurable (which is also the reason I don't
  particularly care about removing the writeback, I only
  wanted to clarify what it gives)

cheers
luigi
Re: [patch 1/2] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag
Posted by Thomas Gleixner 4 weeks, 1 day ago
On Thu, Jan 08 2026 at 22:55, Luigi Rizzo wrote:

> On Thu, Jan 8, 2026 at 10:32 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> On Mon, Dec 22 2025 at 16:16, Marc Zyngier wrote:
>> >> > > Once we remove the costly readback, is there any remaining reason
>> >> > > to overwrite [un]mask_irq() with irq_chip_[un]mask_parent() ?
>> >> >
>> >> > So you are effectively not masking at all and just rely on hope
>> >> > instead. I have the utmost confidence in this sort of stuff. Totally.
>> >>
>> >> I don't understand the above comment.
>> >> Masking happens as a result of the PCIe write,
>> >> which will eventually reach the device. The presence of the
>> >> readback does nothing to accelerate the landing of the write.
>> >
>> > It doesn't accelerate it. It *guarantees* that the write is observed
>> > and has taken effect. It acts as a completion barrier. Without it, the
>> > write can be buffered at an arbitrary location in the interconnect, or
>> > stored in the device but not acted upon.
>> >
>> > What you have here is the equivalent of throwing a message in a bottle
>> > at sea, and expecting a guaranteed reply.
>>
>>   https://xkcd.com/3150/
>
>
> The irony is that the change I was commenting about
> completely removes masking at the PCI level (ie not even the bottle).
>
> Anyways, coming back to my secondary point,
> which is what does the readback give us, I want to repeat my arguments:
>
> - knowing that the mask() has landed on the device does not guarantee
>   that there was not a previously generated MSIx write in transit.
>   So even with the readback, the interrupt subsystem must be
>   prepared to handle the pending interrupt (which of course it is).

That's not the point. The point is that _after_ the mask has reached the
device it is guaranteed that no further interrupts are generated until
unmask() is invoked. That's what is required e.g. for writing the MSI
message to the device.

> - if the concern is that the mask() write may never reach the
>   device, there is no reason to expect that the unmask()
>   will be treated differently. Yet, the unmask() has no readback

It will reach the device at some random point in time. The queued writes
are not lost. They are drained at some point and unmask() does not
require an guarantee like mask() does.

> - unless I am missing some cases, outside of moderation
>   the mask() is only ever called on interrupt
>   shutdown or on affinity migration.

Correct.

Thanks,

        tglx
Re: [patch 1/2] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag
Posted by Luigi Rizzo 4 weeks, 1 day ago
On Fri, Jan 9, 2026 at 1:20 PM Thomas Gleixner <tglx@kernel.org> wrote:
>
> On Thu, Jan 08 2026 at 22:55, Luigi Rizzo wrote:
>
> > ...
> > Anyways, coming back to my secondary point,
> > which is what does the readback give us, I want to repeat my arguments:
> >
> > - knowing that the mask() has landed on the device does not guarantee
> >   that there was not a previously generated MSIx write in transit.
> >   So even with the readback, the interrupt subsystem must be
> >   prepared to handle the pending interrupt (which of course it is).
>
> That's not the point. The point is that _after_ the mask has reached the
> device it is guaranteed that no further interrupts are generated until
> unmask() is invoked. That's what is required e.g. for writing the MSI
> message to the device.

I keep hearing about this guarantee, which is surely true, but I argue that
it is useless both during normal activity (when interrupts may be
occasionally masked during affinity changes) and at shutdown
(when there is a lot more additional state cleanup done on
the device, which insures that there are no more interrupts,
save the one(s) happening between the mask() write and readback,
which we have no way to block, anyways (and may fire on the CPU at
some unspecified time, even after the readback, because that does not
control the pipeline from the device to the interrupt controller etc.),
and that the kernel stack is well equipped to handle and ignore.

But maybe I am missing some context, so can you explain what do you mean by
"required e.g. for writing the MSI message to the device" ?
(both parts, what is "writing a MSI message to the device" and
what makes it a requirement to have interrupts masked).

The only, weak, justification that Gemini gives for masking is that
"if we leave interrupts unmasked before doing a function level reset, the
device might send a malformed TLP", but if that is the case (asynchronous
reset breaking an in flight TLP??), that's a hardware error that is trivially
detected and recovered from.

>
> > - if the concern is that the mask() write may never reach the
> >   device, there is no reason to expect that the unmask()
> >   will be treated differently. Yet, the unmask() has no readback
>
> It will reach the device at some random point in time. The queued writes
> are not lost. They are drained at some point

Exactly my point, queued writes are not lost and will eventually land.

> and unmask() does not
> require an guarantee like mask() does.

Again, I have seen no explanation so far on why the mask requires a guarantee,
and as I have tried to explain, the unavoidable stray interrupts
generated before the readback
are as bad (or actually as harmless) as the ones that we avoid with
the readback.

cheers
luigi
Re: [patch 1/2] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag
Posted by Thomas Gleixner 4 weeks ago
On Fri, Jan 09 2026 at 14:00, Luigi Rizzo wrote:
> On Fri, Jan 9, 2026 at 1:20 PM Thomas Gleixner <tglx@kernel.org> wrote:
>> That's not the point. The point is that _after_ the mask has reached the
>> device it is guaranteed that no further interrupts are generated until
>> unmask() is invoked. That's what is required e.g. for writing the MSI
>> message to the device.

Actually I have to correct myself. We stopped doing the readback on
mask() in pci_write_msg_msix() because the writes are guaranteed to be
ordered, so the device will see them in correct order:

         PCI_MSIX_ENTRY_VECTOR_CTRL  // mask
         PCI_MSIX_ENTRY_LOWER_ADDR   // MSI message address low
         PCI_MSIX_ENTRY_UPPER_ADDR   // MSI message address high
         PCI_MSIX_ENTRY_DATA         // MSI message data
         PCI_MSIX_ENTRY_VECTOR_CTRL  // unmask

which in turn satisfies the specification requirement that the MSIX
entry has to be masked in order to change the message:

  "If software changes the Address or Data value of an entry while the
   entry is unmasked, the result is undefined."

So yes, for the affinity change case, the read back is not required and
not done anymore. The read back was there in the initial implementation
and it turned out to be for paranoia sake.

It's still required to mask, but that's achieved through ordering.

> I keep hearing about this guarantee, which is surely true, but I argue that
> it is useless both during normal activity (when interrupts may be
> occasionally masked during affinity changes) and at shutdown
> (when there is a lot more additional state cleanup done on
> the device, which insures that there are no more interrupts,
> save the one(s) happening between the mask() write and readback,
> which we have no way to block, anyways (and may fire on the CPU at
> some unspecified time, even after the readback, because that does not
> control the pipeline from the device to the interrupt controller etc.),
> and that the kernel stack is well equipped to handle and ignore.

There is a difference and it has nothing to do with other device
cleanups at all. That's a pure interrupt management problem:

free_irq()
   1)   mask()
   2)   synchronize_irq()
   3)   invalidate vector

So if #1 does not guarantee that the interrupt is masked, then #2 is not
guaranteed to see a pending interrupt and #3 can invalidate the vector
before the interrupt becomes pending and raised in a CPU, which then
results in a spurious interrupt warning or in the worst case an IOMMU
exception. While that's "handled" by the kernel somewhat gracefully,
it's not ignored because it's invalid state.

The readback after mask() guarantees that an interrupt which was raised
_before_ the mask write reached the device has arrived at its
destination and is therefore detectable as pending or in progress
_before_ the vector is invalidated. That's simply because the interrupt
message preceeds the read reply and the CPU is stalled until the read
comes back.

> The only, weak, justification that Gemini gives for masking is that

Groan....

> Again, I have seen no explanation so far on why the mask requires a
> guarantee, and as I have tried to explain, the unavoidable stray
> interrupts generated before the readback are as bad (or actually as
> harmless) as the ones that we avoid with the readback.

Stray interrupts are only harmless when the interrupt is in a functional
state, but _not_ when the required resources are not longer available.

So instead of this unholy per CPU state and penalizing the VIRT case,
which Marc is concerned off, something like the uncompiled below should
just work. Then you can use these new callbacks for your mitigation muck
and everything else is unaffected.

Though this mitigation stuff will have serious implications when running
in a VM because the MMIO write will be intercepted, but that's a
different story.

What might be worth to investigate is whether the irq_mask() callback
for MSI-X can use the lazy mechanism (without readback). But that's an
orthogonal issue and needs some thoughts and testing.

Thanks,

        tglx
---
--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -162,6 +162,11 @@ static void pci_irq_mask_msix(struct irq
 	pci_msix_mask(irq_data_get_msi_desc(data));
 }
 
+static void pci_irq_mask_msix_lazy(struct irq_data *data)
+{
+	pci_msix_mask_lazy(irq_data_get_msi_desc(data));
+}
+
 static void pci_irq_unmask_msix(struct irq_data *data)
 {
 	pci_msix_unmask(irq_data_get_msi_desc(data));
@@ -182,7 +187,9 @@ static const struct msi_domain_template
 		.irq_startup		= pci_irq_startup_msix,
 		.irq_shutdown		= pci_irq_shutdown_msix,
 		.irq_mask		= pci_irq_mask_msix,
+		.irq_mask_dev		= pci_irq_mask_msix_lazy,
 		.irq_unmask		= pci_irq_unmask_msix,
+		.irq_unmask_dev		= pci_irq_unmask_msix,
 		.irq_write_msi_msg	= pci_msi_domain_write_msg,
 		.flags			= IRQCHIP_ONESHOT_SAFE,
 	},
--- a/drivers/pci/msi/msi.h
+++ b/drivers/pci/msi/msi.h
@@ -40,10 +40,15 @@ static inline void pci_msix_write_vector
 		writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
 }
 
-static inline void pci_msix_mask(struct msi_desc *desc)
+static inline void pci_msix_mask_lazy(struct msi_desc *desc)
 {
 	desc->pci.msix_ctrl |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
 	pci_msix_write_vector_ctrl(desc, desc->pci.msix_ctrl);
+}
+
+static inline void pci_msix_mask(struct msi_desc *desc)
+{
+	pci_msix_mask_lazy(desc);
 	/* Flush write to device */
 	readl(desc->pci.mask_base);
 }
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -451,7 +451,11 @@ static inline irq_hw_number_t irqd_to_hw
  * @irq_ack:		start of a new interrupt
  * @irq_mask:		mask an interrupt source
  * @irq_mask_ack:	ack and mask an interrupt source
+ * @irq_mask_dev:	Optional callback to mask at the device level for
+ *			interrupt moderation purposes. Only valid for the outermost
+ *			interrupt chip in a hierarchy.
  * @irq_unmask:		unmask an interrupt source
+ * @irq_unmask_dev:	Counterpart to @irq_mask_dev (required when @irq_mask_dev is not NULL)
  * @irq_eoi:		end of interrupt
  * @irq_set_affinity:	Set the CPU affinity on SMP machines. If the force
  *			argument is true, it tells the driver to
@@ -499,7 +503,9 @@ struct irq_chip {
 	void		(*irq_ack)(struct irq_data *data);
 	void		(*irq_mask)(struct irq_data *data);
 	void		(*irq_mask_ack)(struct irq_data *data);
+	void		(*irq_mask_dev)(struct irq_data *data);
 	void		(*irq_unmask)(struct irq_data *data);
+	void		(*irq_unmask_dev)(struct irq_data *data);
 	void		(*irq_eoi)(struct irq_data *data);
 
 	int		(*irq_set_affinity)(struct irq_data *data, const struct cpumask *dest, bool force);
[tip: irq/drivers] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag
Posted by tip-bot2 for Marc Zyngier 5 months ago
The following commit has been merged into the irq/drivers branch of tip:

Commit-ID:     f09c1d63e895e1b45248a75656a41df2e8102874
Gitweb:        https://git.kernel.org/tip/f09c1d63e895e1b45248a75656a41df2e8102874
Author:        Marc Zyngier <maz@kernel.org>
AuthorDate:    Wed, 03 Sep 2025 16:04:46 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 09 Sep 2025 14:44:30 +02:00

irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag

For systems that implement interrupt masking at the interrupt controller
level, the MSI library offers MSI_FLAG_PCI_MSI_MASK_PARENT.  It indicates
that it isn't enough to only unmask the interrupt at the PCI device level,
but that the interrupt controller must also be involved.

However, the way this is currently done is less than optimal, as the
masking/unmasking is done on both sides, always. It would be far cheaper to
unmask both at the start of times, and then only deal with the interrupt
controller mask, which is cheaper than a round-trip to the PCI endpoint.

Now that the PCI/MSI layer implements irq_startup() and irq_shutdown()
callbacks, which [un]mask at the PCI level and honor the request to
[un]mask the parent, this can be trivially done.

Overwrite the irq_mask/unmask() callbacks of the device domain interrupt
chip with irq_[un]mask_parent() when the parent domain asks for it.

[ tglx: Adopted to the PCI/MSI changes ]

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/all/20250903135433.380783272@linutronix.de

---
 drivers/irqchip/irq-msi-lib.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/irqchip/irq-msi-lib.c b/drivers/irqchip/irq-msi-lib.c
index 9089440..d5eefc3 100644
--- a/drivers/irqchip/irq-msi-lib.c
+++ b/drivers/irqchip/irq-msi-lib.c
@@ -112,6 +112,20 @@ bool msi_lib_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
 	 */
 	if (!chip->irq_set_affinity && !(info->flags & MSI_FLAG_NO_AFFINITY))
 		chip->irq_set_affinity = msi_domain_set_affinity;
+
+	/*
+	 * If the parent domain insists on being in charge of masking, obey
+	 * blindly. The interrupt is un-masked at the PCI level on startup
+	 * and masked on shutdown to prevent rogue interrupts after the
+	 * driver freed the interrupt. Not masking it at the PCI level
+	 * speeds up operation for disable/enable_irq() as it avoids
+	 * getting all the way out to the PCI device.
+	 */
+	if (info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT) {
+		chip->irq_mask		= irq_chip_mask_parent;
+		chip->irq_unmask	= irq_chip_unmask_parent;
+	}
+
 	return true;
 }
 EXPORT_SYMBOL_GPL(msi_lib_init_dev_msi_info);
[patch 2/2] PCI/MSI: Remove the conditional parent [un]mask logic
Posted by Thomas Gleixner 5 months, 1 week ago
Now that msi_lib_init_dev_msi_info() overwrites the irq_[un]mask()
callbacks when the MSI_FLAG_PCI_MSI_MASK_PARENT flag is set by the parent
domain, the conditional [un]mask logic is obsolete.

Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/msi/irqdomain.c |   20 --------------------
 1 file changed, 20 deletions(-)

--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -170,22 +170,6 @@ static unsigned int cond_startup_parent(
 	return 0;
 }
 
-static __always_inline void cond_mask_parent(struct irq_data *data)
-{
-	struct msi_domain_info *info = data->domain->host_data;
-
-	if (unlikely(info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT))
-		irq_chip_mask_parent(data);
-}
-
-static __always_inline void cond_unmask_parent(struct irq_data *data)
-{
-	struct msi_domain_info *info = data->domain->host_data;
-
-	if (unlikely(info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT))
-		irq_chip_unmask_parent(data);
-}
-
 static void pci_irq_shutdown_msi(struct irq_data *data)
 {
 	struct msi_desc *desc = irq_data_get_msi_desc(data);
@@ -208,14 +192,12 @@ static void pci_irq_mask_msi(struct irq_
 	struct msi_desc *desc = irq_data_get_msi_desc(data);
 
 	pci_msi_mask(desc, BIT(data->irq - desc->irq));
-	cond_mask_parent(data);
 }
 
 static void pci_irq_unmask_msi(struct irq_data *data)
 {
 	struct msi_desc *desc = irq_data_get_msi_desc(data);
 
-	cond_unmask_parent(data);
 	pci_msi_unmask(desc, BIT(data->irq - desc->irq));
 }
 
@@ -268,12 +250,10 @@ static unsigned int pci_irq_startup_msix
 static void pci_irq_mask_msix(struct irq_data *data)
 {
 	pci_msix_mask(irq_data_get_msi_desc(data));
-	cond_mask_parent(data);
 }
 
 static void pci_irq_unmask_msix(struct irq_data *data)
 {
-	cond_unmask_parent(data);
 	pci_msix_unmask(irq_data_get_msi_desc(data));
 }
Re: [patch 2/2] PCI/MSI: Remove the conditional parent [un]mask logic
Posted by Bjorn Helgaas 5 months ago
On Wed, Sep 03, 2025 at 04:04:48PM +0200, Thomas Gleixner wrote:
> Now that msi_lib_init_dev_msi_info() overwrites the irq_[un]mask()
> callbacks when the MSI_FLAG_PCI_MSI_MASK_PARENT flag is set by the parent
> domain, the conditional [un]mask logic is obsolete.
> 
> Remove it.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Bjorn Helgaas <bhelgaas@google.com>

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

> ---
>  drivers/pci/msi/irqdomain.c |   20 --------------------
>  1 file changed, 20 deletions(-)
> 
> --- a/drivers/pci/msi/irqdomain.c
> +++ b/drivers/pci/msi/irqdomain.c
> @@ -170,22 +170,6 @@ static unsigned int cond_startup_parent(
>  	return 0;
>  }
>  
> -static __always_inline void cond_mask_parent(struct irq_data *data)
> -{
> -	struct msi_domain_info *info = data->domain->host_data;
> -
> -	if (unlikely(info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT))
> -		irq_chip_mask_parent(data);
> -}
> -
> -static __always_inline void cond_unmask_parent(struct irq_data *data)
> -{
> -	struct msi_domain_info *info = data->domain->host_data;
> -
> -	if (unlikely(info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT))
> -		irq_chip_unmask_parent(data);
> -}
> -
>  static void pci_irq_shutdown_msi(struct irq_data *data)
>  {
>  	struct msi_desc *desc = irq_data_get_msi_desc(data);
> @@ -208,14 +192,12 @@ static void pci_irq_mask_msi(struct irq_
>  	struct msi_desc *desc = irq_data_get_msi_desc(data);
>  
>  	pci_msi_mask(desc, BIT(data->irq - desc->irq));
> -	cond_mask_parent(data);
>  }
>  
>  static void pci_irq_unmask_msi(struct irq_data *data)
>  {
>  	struct msi_desc *desc = irq_data_get_msi_desc(data);
>  
> -	cond_unmask_parent(data);
>  	pci_msi_unmask(desc, BIT(data->irq - desc->irq));
>  }
>  
> @@ -268,12 +250,10 @@ static unsigned int pci_irq_startup_msix
>  static void pci_irq_mask_msix(struct irq_data *data)
>  {
>  	pci_msix_mask(irq_data_get_msi_desc(data));
> -	cond_mask_parent(data);
>  }
>  
>  static void pci_irq_unmask_msix(struct irq_data *data)
>  {
> -	cond_unmask_parent(data);
>  	pci_msix_unmask(irq_data_get_msi_desc(data));
>  }
>  
>
[tip: irq/drivers] PCI/MSI: Remove the conditional parent [un]mask logic
Posted by tip-bot2 for Thomas Gleixner 5 months ago
The following commit has been merged into the irq/drivers branch of tip:

Commit-ID:     ba9d484ed3578705fcd24795b800e8e4364afb8c
Gitweb:        https://git.kernel.org/tip/ba9d484ed3578705fcd24795b800e8e4364afb8c
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Wed, 03 Sep 2025 16:04:48 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 09 Sep 2025 14:44:30 +02:00

PCI/MSI: Remove the conditional parent [un]mask logic

Now that msi_lib_init_dev_msi_info() overwrites the irq_[un]mask()
callbacks when the MSI_FLAG_PCI_MSI_MASK_PARENT flag is set by the parent
domain, the conditional [un]mask logic is obsolete.

Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/all/20250903135433.444329373@linutronix.de

---
 drivers/pci/msi/irqdomain.c | 20 --------------------
 1 file changed, 20 deletions(-)

diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
index b11b7f6..dfb61f1 100644
--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -170,22 +170,6 @@ static unsigned int cond_startup_parent(struct irq_data *data)
 	return 0;
 }
 
-static __always_inline void cond_mask_parent(struct irq_data *data)
-{
-	struct msi_domain_info *info = data->domain->host_data;
-
-	if (unlikely(info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT))
-		irq_chip_mask_parent(data);
-}
-
-static __always_inline void cond_unmask_parent(struct irq_data *data)
-{
-	struct msi_domain_info *info = data->domain->host_data;
-
-	if (unlikely(info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT))
-		irq_chip_unmask_parent(data);
-}
-
 static void pci_irq_shutdown_msi(struct irq_data *data)
 {
 	struct msi_desc *desc = irq_data_get_msi_desc(data);
@@ -208,14 +192,12 @@ static void pci_irq_mask_msi(struct irq_data *data)
 	struct msi_desc *desc = irq_data_get_msi_desc(data);
 
 	pci_msi_mask(desc, BIT(data->irq - desc->irq));
-	cond_mask_parent(data);
 }
 
 static void pci_irq_unmask_msi(struct irq_data *data)
 {
 	struct msi_desc *desc = irq_data_get_msi_desc(data);
 
-	cond_unmask_parent(data);
 	pci_msi_unmask(desc, BIT(data->irq - desc->irq));
 }
 
@@ -268,12 +250,10 @@ static unsigned int pci_irq_startup_msix(struct irq_data *data)
 static void pci_irq_mask_msix(struct irq_data *data)
 {
 	pci_msix_mask(irq_data_get_msi_desc(data));
-	cond_mask_parent(data);
 }
 
 static void pci_irq_unmask_msix(struct irq_data *data)
 {
-	cond_unmask_parent(data);
 	pci_msix_unmask(irq_data_get_msi_desc(data));
 }