[PATCH 2/4] irqchip/gic-v3-its: Implement .msi_teardown() callback

Marc Zyngier posted 4 patches 7 months, 1 week ago
There is a newer version of this series
[PATCH 2/4] irqchip/gic-v3-its: Implement .msi_teardown() callback
Posted by Marc Zyngier 7 months, 1 week ago
We currently nuke the structure representing an endpoint device
translating via an ITS on freeing the last LPI allocated for it.

That's an unfortunate state of affair, as it is pretty common for
a driver to allocate a single MSI, do something clever, teardown
this MSI, and reallocate a whole bunch of them. The nvme driver
does exactly that, amongst others.

What happens in that case is that the core code is buggy enough
to issue another .msi_prepare() call, even if it shouldn't.
This luckily cancels the above behaviour and hides the problem.

In order to fix the core code, let's start by implementing the new
.msi_teardown() callback. Nothing calls it yet, so a side effect
is that the its_dev structure will not be freed and that the DID
will stay mapped. Not a big deal, and this will be solved in the
following patch.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 drivers/irqchip/irq-gic-v3-its-msi-parent.c | 10 ++++
 drivers/irqchip/irq-gic-v3-its.c            | 56 +++++++++++++--------
 2 files changed, 45 insertions(+), 21 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its-msi-parent.c b/drivers/irqchip/irq-gic-v3-its-msi-parent.c
index bdb04c8081480..76b94f55b00e2 100644
--- a/drivers/irqchip/irq-gic-v3-its-msi-parent.c
+++ b/drivers/irqchip/irq-gic-v3-its-msi-parent.c
@@ -159,6 +159,14 @@ static int its_pmsi_prepare(struct irq_domain *domain, struct device *dev,
 					  dev, nvec, info);
 }
 
+static void its_msi_teardown(struct irq_domain *domain, msi_alloc_info_t *info)
+{
+	struct msi_domain_info *msi_info;
+
+	msi_info = msi_get_domain_info(domain->parent);
+	msi_info->ops->msi_teardown(domain->parent, info);
+}
+
 static bool its_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
 				  struct irq_domain *real_parent, struct msi_domain_info *info)
 {
@@ -182,6 +190,7 @@ static bool its_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
 		 * %MSI_MAX_INDEX.
 		 */
 		info->ops->msi_prepare = its_pci_msi_prepare;
+		info->ops->msi_teardown = its_msi_teardown;
 		break;
 	case DOMAIN_BUS_DEVICE_MSI:
 	case DOMAIN_BUS_WIRED_TO_MSI:
@@ -190,6 +199,7 @@ static bool its_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
 		 * size is also known at domain creation time.
 		 */
 		info->ops->msi_prepare = its_pmsi_prepare;
+		info->ops->msi_teardown = its_msi_teardown;
 		break;
 	default:
 		/* Confused. How did the lib return true? */
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 0115ad6c82593..3472b97477104 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -3620,8 +3620,43 @@ static int its_msi_prepare(struct irq_domain *domain, struct device *dev,
 	return err;
 }
 
+static void its_msi_teardown(struct irq_domain *domain, msi_alloc_info_t *info)
+{
+	struct msi_domain_info *msi_info;
+	struct its_device *its_dev;
+	struct its_node *its;
+	u32 dev_id;
+
+	dev_id = info->scratchpad[0].ul;
+
+	msi_info = msi_get_domain_info(domain);
+	its = msi_info->data;
+
+	guard(mutex)(&its->dev_alloc_lock);
+
+	its_dev = its_find_device(its, dev_id);
+
+	/* If the device is shared, keep everything around */
+	if (its_dev->shared)
+		return;
+
+	/* LPIs should have been already unmapped at this stage */
+	if (WARN_ON_ONCE(!bitmap_empty(its_dev->event_map.lpi_map,
+				       its_dev->event_map.nr_lpis)))
+		return;
+
+	its_lpi_free(its_dev->event_map.lpi_map,
+		     its_dev->event_map.lpi_base,
+		     its_dev->event_map.nr_lpis);
+
+	/* Unmap device/itt, and get rid of the tracking */
+	its_send_mapd(its_dev, 0);
+	its_free_device(its_dev);
+}
+
 static struct msi_domain_ops its_msi_domain_ops = {
 	.msi_prepare	= its_msi_prepare,
+	.msi_teardown	= its_msi_teardown,
 };
 
 static int its_irq_gic_domain_alloc(struct irq_domain *domain,
@@ -3722,7 +3757,6 @@ static void its_irq_domain_free(struct irq_domain *domain, unsigned int virq,
 {
 	struct irq_data *d = irq_domain_get_irq_data(domain, virq);
 	struct its_device *its_dev = irq_data_get_irq_chip_data(d);
-	struct its_node *its = its_dev->its;
 	int i;
 
 	bitmap_release_region(its_dev->event_map.lpi_map,
@@ -3736,26 +3770,6 @@ static void its_irq_domain_free(struct irq_domain *domain, unsigned int virq,
 		irq_domain_reset_irq_data(data);
 	}
 
-	mutex_lock(&its->dev_alloc_lock);
-
-	/*
-	 * If all interrupts have been freed, start mopping the
-	 * floor. This is conditioned on the device not being shared.
-	 */
-	if (!its_dev->shared &&
-	    bitmap_empty(its_dev->event_map.lpi_map,
-			 its_dev->event_map.nr_lpis)) {
-		its_lpi_free(its_dev->event_map.lpi_map,
-			     its_dev->event_map.lpi_base,
-			     its_dev->event_map.nr_lpis);
-
-		/* Unmap device/itt */
-		its_send_mapd(its_dev, 0);
-		its_free_device(its_dev);
-	}
-
-	mutex_unlock(&its->dev_alloc_lock);
-
 	irq_domain_free_irqs_parent(domain, virq, nr_irqs);
 }
 
-- 
2.39.2
Re: [PATCH 2/4] irqchip/gic-v3-its: Implement .msi_teardown() callback
Posted by Lorenzo Pieralisi 7 months, 1 week ago
On Sun, May 11, 2025 at 05:35:18PM +0100, Marc Zyngier wrote:
> We currently nuke the structure representing an endpoint device
> translating via an ITS on freeing the last LPI allocated for it.
> 
> That's an unfortunate state of affair, as it is pretty common for
> a driver to allocate a single MSI, do something clever, teardown
> this MSI, and reallocate a whole bunch of them. The nvme driver
> does exactly that, amongst others.
> 
> What happens in that case is that the core code is buggy enough
> to issue another .msi_prepare() call, even if it shouldn't.
> This luckily cancels the above behaviour and hides the problem.
> 
> In order to fix the core code, let's start by implementing the new
> .msi_teardown() callback. Nothing calls it yet, so a side effect
> is that the its_dev structure will not be freed and that the DID
> will stay mapped. Not a big deal, and this will be solved in the
> following patch.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  drivers/irqchip/irq-gic-v3-its-msi-parent.c | 10 ++++
>  drivers/irqchip/irq-gic-v3-its.c            | 56 +++++++++++++--------
>  2 files changed, 45 insertions(+), 21 deletions(-)

First off, thanks a lot for putting this together, it makes an awful
lot of sense to me.

> index 0115ad6c82593..3472b97477104 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -3620,8 +3620,43 @@ static int its_msi_prepare(struct irq_domain *domain, struct device *dev,
>  	return err;
>  }
>  
> +static void its_msi_teardown(struct irq_domain *domain, msi_alloc_info_t *info)
> +{
> +	struct msi_domain_info *msi_info;
> +	struct its_device *its_dev;
> +	struct its_node *its;
> +	u32 dev_id;
> +
> +	dev_id = info->scratchpad[0].ul;

I have just managed to get to a keyboard :), I don't think the dev_id
makes it to this point, we overwrite it with the its_dev pointer in
its_msi_prepare() (could use second scratchpad for the pointer maybe ?).

I was bitten by this while removing the old IWB code into the new one
(unrelated to this code but that's how I noticed scratchpad is a union).

Ignore me if I am mistaken, just reading the code, have not tested it
(but I am about to do it for v5).

Thanks,
Lorenzo

> +
> +	msi_info = msi_get_domain_info(domain);
> +	its = msi_info->data;
> +
> +	guard(mutex)(&its->dev_alloc_lock);
> +
> +	its_dev = its_find_device(its, dev_id);
> +
> +	/* If the device is shared, keep everything around */
> +	if (its_dev->shared)
> +		return;
> +
> +	/* LPIs should have been already unmapped at this stage */
> +	if (WARN_ON_ONCE(!bitmap_empty(its_dev->event_map.lpi_map,
> +				       its_dev->event_map.nr_lpis)))
> +		return;
> +
> +	its_lpi_free(its_dev->event_map.lpi_map,
> +		     its_dev->event_map.lpi_base,
> +		     its_dev->event_map.nr_lpis);
> +
> +	/* Unmap device/itt, and get rid of the tracking */
> +	its_send_mapd(its_dev, 0);
> +	its_free_device(its_dev);
> +}
> +
>  static struct msi_domain_ops its_msi_domain_ops = {
>  	.msi_prepare	= its_msi_prepare,
> +	.msi_teardown	= its_msi_teardown,
>  };
>  
>  static int its_irq_gic_domain_alloc(struct irq_domain *domain,
> @@ -3722,7 +3757,6 @@ static void its_irq_domain_free(struct irq_domain *domain, unsigned int virq,
>  {
>  	struct irq_data *d = irq_domain_get_irq_data(domain, virq);
>  	struct its_device *its_dev = irq_data_get_irq_chip_data(d);
> -	struct its_node *its = its_dev->its;
>  	int i;
>  
>  	bitmap_release_region(its_dev->event_map.lpi_map,
> @@ -3736,26 +3770,6 @@ static void its_irq_domain_free(struct irq_domain *domain, unsigned int virq,
>  		irq_domain_reset_irq_data(data);
>  	}
>  
> -	mutex_lock(&its->dev_alloc_lock);
> -
> -	/*
> -	 * If all interrupts have been freed, start mopping the
> -	 * floor. This is conditioned on the device not being shared.
> -	 */
> -	if (!its_dev->shared &&
> -	    bitmap_empty(its_dev->event_map.lpi_map,
> -			 its_dev->event_map.nr_lpis)) {
> -		its_lpi_free(its_dev->event_map.lpi_map,
> -			     its_dev->event_map.lpi_base,
> -			     its_dev->event_map.nr_lpis);
> -
> -		/* Unmap device/itt */
> -		its_send_mapd(its_dev, 0);
> -		its_free_device(its_dev);
> -	}
> -
> -	mutex_unlock(&its->dev_alloc_lock);
> -
>  	irq_domain_free_irqs_parent(domain, virq, nr_irqs);
>  }
>  
> -- 
> 2.39.2
>
Re: [PATCH 2/4] irqchip/gic-v3-its: Implement .msi_teardown() callback
Posted by Marc Zyngier 7 months, 1 week ago
On Mon, 12 May 2025 17:30:58 +0100,
Lorenzo Pieralisi <lpieralisi@kernel.org> wrote:
> 
> On Sun, May 11, 2025 at 05:35:18PM +0100, Marc Zyngier wrote:
> > We currently nuke the structure representing an endpoint device
> > translating via an ITS on freeing the last LPI allocated for it.
> > 
> > That's an unfortunate state of affair, as it is pretty common for
> > a driver to allocate a single MSI, do something clever, teardown
> > this MSI, and reallocate a whole bunch of them. The nvme driver
> > does exactly that, amongst others.
> > 
> > What happens in that case is that the core code is buggy enough
> > to issue another .msi_prepare() call, even if it shouldn't.
> > This luckily cancels the above behaviour and hides the problem.
> > 
> > In order to fix the core code, let's start by implementing the new
> > .msi_teardown() callback. Nothing calls it yet, so a side effect
> > is that the its_dev structure will not be freed and that the DID
> > will stay mapped. Not a big deal, and this will be solved in the
> > following patch.
> > 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  drivers/irqchip/irq-gic-v3-its-msi-parent.c | 10 ++++
> >  drivers/irqchip/irq-gic-v3-its.c            | 56 +++++++++++++--------
> >  2 files changed, 45 insertions(+), 21 deletions(-)
> 
> First off, thanks a lot for putting this together, it makes an awful
> lot of sense to me.
> 
> > index 0115ad6c82593..3472b97477104 100644
> > --- a/drivers/irqchip/irq-gic-v3-its.c
> > +++ b/drivers/irqchip/irq-gic-v3-its.c
> > @@ -3620,8 +3620,43 @@ static int its_msi_prepare(struct irq_domain *domain, struct device *dev,
> >  	return err;
> >  }
> >  
> > +static void its_msi_teardown(struct irq_domain *domain, msi_alloc_info_t *info)
> > +{
> > +	struct msi_domain_info *msi_info;
> > +	struct its_device *its_dev;
> > +	struct its_node *its;
> > +	u32 dev_id;
> > +
> > +	dev_id = info->scratchpad[0].ul;
> 
> I have just managed to get to a keyboard :), I don't think the dev_id
> makes it to this point, we overwrite it with the its_dev pointer in
> its_msi_prepare() (could use second scratchpad for the pointer maybe ?).
> 
> I was bitten by this while removing the old IWB code into the new one
> (unrelated to this code but that's how I noticed scratchpad is a union).

Gah, this is missing the fixup I had on top and that I didn't squash:

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index d8c4d3b8256f3..7e0e7f0160936 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -3622,19 +3622,9 @@ static int its_msi_prepare(struct irq_domain *domain, struct device *dev,
 
 static void its_msi_teardown(struct irq_domain *domain, msi_alloc_info_t *info)
 {
-	struct msi_domain_info *msi_info;
-	struct its_device *its_dev;
-	struct its_node *its;
-	u32 dev_id;
-
-	dev_id = info->scratchpad[0].ul;
-
-	msi_info = msi_get_domain_info(domain);
-	its = msi_info->data;
-
-	guard(mutex)(&its->dev_alloc_lock);
+	struct its_device *its_dev = info->scratchpad[0].ptr;
 
-	its_dev = its_find_device(its, dev_id);
+	guard(mutex)(&its_dev->its->dev_alloc_lock);
 
 	/* If the device is shared, keep everything around */
 	if (its_dev->shared)


So no need for another scratchpad entry -- the device structure has
everything we need. I'll repost things with tglx's comments addressed
and this fix correctly squashed in.

Sorry for the noise,

	M.

-- 
Without deviation from the norm, progress is not possible.
Re: [PATCH 2/4] irqchip/gic-v3-its: Implement .msi_teardown() callback
Posted by Thomas Gleixner 7 months, 1 week ago
On Sun, May 11 2025 at 17:35, Marc Zyngier wrote:
> We currently nuke the structure representing an endpoint device

How is we? We means nothing as you know :)

> translating via an ITS on freeing the last LPI allocated for it.
>
> That's an unfortunate state of affair, as it is pretty common for
> a driver to allocate a single MSI, do something clever, teardown
> this MSI, and reallocate a whole bunch of them. The nvme driver
> does exactly that, amongst others.
>
> What happens in that case is that the core code is buggy enough
> to issue another .msi_prepare() call, even if it shouldn't.
> This luckily cancels the above behaviour and hides the problem.
>
> In order to fix the core code, let's start by implementing the new

s/let's//

I really have to understand why everyone is so fond of "let's". It only
makes limited sense when the patch is proposed, but in a change log it
does not make sense at all.

> .msi_teardown() callback. Nothing calls it yet, so a side effect
> is that the its_dev structure will not be freed and that the DID
> will stay mapped. Not a big deal, and this will be solved in the
> following patch.

Now I see why you added this incomprehensible condition into the core
code. Bah.

Why don't you add this callback once you changed the prepare muck, which
introduces info::alloc_data?

Thanks,

        tglx
Re: [PATCH 2/4] irqchip/gic-v3-its: Implement .msi_teardown() callback
Posted by Marc Zyngier 7 months, 1 week ago
On Mon, 12 May 2025 15:34:59 +0100,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Sun, May 11 2025 at 17:35, Marc Zyngier wrote:
> > We currently nuke the structure representing an endpoint device
> 
> How is we? We means nothing as you know :)

We means a lot to *me*.

> 
> > translating via an ITS on freeing the last LPI allocated for it.
> >
> > That's an unfortunate state of affair, as it is pretty common for
> > a driver to allocate a single MSI, do something clever, teardown
> > this MSI, and reallocate a whole bunch of them. The nvme driver
> > does exactly that, amongst others.
> >
> > What happens in that case is that the core code is buggy enough
> > to issue another .msi_prepare() call, even if it shouldn't.
> > This luckily cancels the above behaviour and hides the problem.
> >
> > In order to fix the core code, let's start by implementing the new
> 
> s/let's//
> 
> I really have to understand why everyone is so fond of "let's". It only
> makes limited sense when the patch is proposed, but in a change log it
> does not make sense at all.
> 
> > .msi_teardown() callback. Nothing calls it yet, so a side effect
> > is that the its_dev structure will not be freed and that the DID
> > will stay mapped. Not a big deal, and this will be solved in the
> > following patch.
> 
> Now I see why you added this incomprehensible condition into the core
> code. Bah.
> 
> Why don't you add this callback once you changed the prepare muck, which
> introduces info::alloc_data?

Changing prepare first breaks nvme and igbxe, amongst other things,
for the reasons explained just above, so you need to implement
teardown first, plug the MSI controller into it, and only then you can
switch prepare.

What I can do is:

(1) introduce teardown without the call site in msi_remove_device_irq_domain()

(2) add its implementation in the ITS driver

(3) move the prepare crap

(4) add the teardown call to msi_remove_device_irq_domain(), without
    the check on info->alloc_data

With that order, everything will keep working, with a temporary leak
of ITTs in the ITS driver between patches (2) and (4).

Is that something you can live with?

	M.

-- 
Without deviation from the norm, progress is not possible.
Re: [PATCH 2/4] irqchip/gic-v3-its: Implement .msi_teardown() callback
Posted by Thomas Gleixner 7 months, 1 week ago
On Mon, May 12 2025 at 17:09, Marc Zyngier wrote:
> On Mon, 12 May 2025 15:34:59 +0100,
> Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Changing prepare first breaks nvme and igbxe, amongst other things,
> for the reasons explained just above, so you need to implement
> teardown first, plug the MSI controller into it, and only then you can
> switch prepare.
>
> What I can do is:
>
> (1) introduce teardown without the call site in msi_remove_device_irq_domain()
>
> (2) add its implementation in the ITS driver
>
> (3) move the prepare crap
>
> (4) add the teardown call to msi_remove_device_irq_domain(), without
>     the check on info->alloc_data
>
> With that order, everything will keep working, with a temporary leak
> of ITTs in the ITS driver between patches (2) and (4).
>
> Is that something you can live with?

Sounds good.