[PATCH V2 4/8] vfio/pci: Use xarray for interrupt context storage

Reinette Chatre posted 8 patches 2 years, 10 months ago
There is a newer version of this series
[PATCH V2 4/8] vfio/pci: Use xarray for interrupt context storage
Posted by Reinette Chatre 2 years, 10 months ago
Interrupt context is statically allocated at the time interrupts
are allocated. Following allocation, the context is managed by
directly accessing the elements of the array using the vector
as index. The storage is released when interrupts are disabled.

It is possible to dynamically allocate a single MSI-X index
after MSI-X has been enabled. A dynamic storage for interrupt context
is needed to support this. Replace the interrupt context array with an
xarray (similar to what the core uses as store for MSI descriptors)
that can support the dynamic expansion while maintaining the
custom that uses the vector as index.

Use the new data storage to allocate all elements once and free all
elements together. Dynamic usage follows.

Create helpers with understanding that it is only possible
to (after initial MSI-X enabling) allocate a single MSI-X index at
a time. The only time multiple MSI-X are allocated is during initial
MSI-X enabling where failure results in no allocations.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since RFC V1:
- Let vfio_irq_ctx_alloc_single() return pointer to allocated
  context. (Alex)
- Transition INTx allocation to simplified vfio_irq_ctx_alloc_single().
- Improve accuracy of changelog.

 drivers/vfio/pci/vfio_pci_core.c  |  1 +
 drivers/vfio/pci/vfio_pci_intrs.c | 77 ++++++++++++++++++++-----------
 include/linux/vfio_pci_core.h     |  2 +-
 3 files changed, 53 insertions(+), 27 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index a5ab416cf476..ae0e161c7fc9 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -2102,6 +2102,7 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
 	INIT_LIST_HEAD(&vdev->vma_list);
 	INIT_LIST_HEAD(&vdev->sriov_pfs_item);
 	init_rwsem(&vdev->memory_lock);
+	xa_init(&vdev->ctx);
 
 	return 0;
 }
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index ece371ebea00..00119641aa19 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -52,25 +52,60 @@ static
 struct vfio_pci_irq_ctx *vfio_irq_ctx_get(struct vfio_pci_core_device *vdev,
 					  unsigned long index)
 {
-	if (index >= vdev->num_ctx)
-		return NULL;
-	return &vdev->ctx[index];
+	return xa_load(&vdev->ctx, index);
 }
 
 static void vfio_irq_ctx_free_all(struct vfio_pci_core_device *vdev)
 {
-	kfree(vdev->ctx);
+	struct vfio_pci_irq_ctx *ctx;
+	unsigned long index;
+
+	xa_for_each(&vdev->ctx, index, ctx) {
+		xa_erase(&vdev->ctx, index);
+		kfree(ctx);
+	}
+}
+
+static struct vfio_pci_irq_ctx *
+vfio_irq_ctx_alloc_single(struct vfio_pci_core_device *vdev,
+			  unsigned long index)
+{
+	struct vfio_pci_irq_ctx *ctx;
+	int ret;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL_ACCOUNT);
+	if (!ctx)
+		return NULL;
+
+	ret = xa_insert(&vdev->ctx, index, ctx, GFP_KERNEL_ACCOUNT);
+	if (ret) {
+		kfree(ctx);
+		return NULL;
+	}
+
+	return ctx;
 }
 
+/* Only called during initial interrupt enabling. Never after.  */
 static int vfio_irq_ctx_alloc_num(struct vfio_pci_core_device *vdev,
 				  unsigned long num)
 {
-	vdev->ctx = kcalloc(num, sizeof(struct vfio_pci_irq_ctx),
-			    GFP_KERNEL_ACCOUNT);
-	if (!vdev->ctx)
-		return -ENOMEM;
+	struct vfio_pci_irq_ctx *ctx;
+	unsigned long index;
+
+	WARN_ON(!xa_empty(&vdev->ctx));
+
+	for (index = 0; index < num; index++) {
+		ctx = vfio_irq_ctx_alloc_single(vdev, index);
+		if (!ctx)
+			goto err;
+	}
 
 	return 0;
+
+err:
+	vfio_irq_ctx_free_all(vdev);
+	return -ENOMEM;
 }
 
 /*
@@ -218,7 +253,6 @@ static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
 static int vfio_intx_enable(struct vfio_pci_core_device *vdev)
 {
 	struct vfio_pci_irq_ctx *ctx;
-	int ret;
 
 	if (!is_irq_none(vdev))
 		return -EINVAL;
@@ -226,15 +260,9 @@ static int vfio_intx_enable(struct vfio_pci_core_device *vdev)
 	if (!vdev->pdev->irq)
 		return -ENODEV;
 
-	ret = vfio_irq_ctx_alloc_num(vdev, 1);
-	if (ret)
-		return ret;
-
-	ctx = vfio_irq_ctx_get(vdev, 0);
-	if (!ctx) {
-		vfio_irq_ctx_free_all(vdev);
-		return -EINVAL;
-	}
+	ctx = vfio_irq_ctx_alloc_single(vdev, 0);
+	if (!ctx)
+		return -ENOMEM;
 
 	vdev->num_ctx = 1;
 
@@ -486,16 +514,13 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
 {
 	struct pci_dev *pdev = vdev->pdev;
 	struct vfio_pci_irq_ctx *ctx;
-	unsigned int i;
+	unsigned long i;
 	u16 cmd;
 
-	for (i = 0; i < vdev->num_ctx; i++) {
-		ctx = vfio_irq_ctx_get(vdev, i);
-		if (ctx) {
-			vfio_virqfd_disable(&ctx->unmask);
-			vfio_virqfd_disable(&ctx->mask);
-			vfio_msi_set_vector_signal(vdev, i, -1, msix);
-		}
+	xa_for_each(&vdev->ctx, i, ctx) {
+		vfio_virqfd_disable(&ctx->unmask);
+		vfio_virqfd_disable(&ctx->mask);
+		vfio_msi_set_vector_signal(vdev, i, -1, msix);
 	}
 
 	cmd = vfio_pci_memory_lock_and_enable(vdev);
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 367fd79226a3..61d7873a3973 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -59,7 +59,7 @@ struct vfio_pci_core_device {
 	struct perm_bits	*msi_perm;
 	spinlock_t		irqlock;
 	struct mutex		igate;
-	struct vfio_pci_irq_ctx	*ctx;
+	struct xarray		ctx;
 	int			num_ctx;
 	int			irq_type;
 	int			num_regions;
-- 
2.34.1
RE: [PATCH V2 4/8] vfio/pci: Use xarray for interrupt context storage
Posted by Liu, Jing2 2 years, 10 months ago
Hi Reinette,

> From: Chatre, Reinette <reinette.chatre@intel.com>
> Subject: [PATCH V2 4/8] vfio/pci: Use xarray for interrupt context storage
> 
> Interrupt context is statically allocated at the time interrupts are allocated.
> Following allocation, the context is managed by directly accessing the
elements of the array using the vector as index. The storage is release
when interrupts are disabled.
> 

Looking at the dynamic allocation change for the time after MSI-x is
enabled in vfio_msi_set_vector_signal(), do we need to consider changing 
the allocation of context/interrupt in vfio_msi_enable() for MSI-x to the 
same way, which only allocates for vectors with a valid signal fd when 
dynamic MSI-x is supported?

Because in dynamic logic, I think when enabling MSI-x, not all vectors from 
zero to nvec should be allocated, and they would be done until they are really
used with setting the singal fd.

Not sure if this comment being replied here is the good place since
vfio_msi_enable() seems no change in this series. 😊 


Thanks,
Jing


> 
> Use the new data storage to allocate all elements once and free all elements
> together. Dynamic usage follows.
> 
> Create helpers with understanding that it is only possible to (after initial MSI-X
> enabling) allocate a single MSI-X index at a time. The only time multiple MSI-X
> are allocated is during initial MSI-X enabling where failure results in no
> allocations.
> 
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
> Changes since RFC V1:
> - Let vfio_irq_ctx_alloc_single() return pointer to allocated
>   context. (Alex)
> - Transition INTx allocation to simplified vfio_irq_ctx_alloc_single().
> - Improve accuracy of changelog.
> 
>  drivers/vfio/pci/vfio_pci_core.c  |  1 +  drivers/vfio/pci/vfio_pci_intrs.c | 77
> ++++++++++++++++++++-----------
>  include/linux/vfio_pci_core.h     |  2 +-
>  3 files changed, 53 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index a5ab416cf476..ae0e161c7fc9 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -2102,6 +2102,7 @@ int vfio_pci_core_init_dev(struct vfio_device
> *core_vdev)
>  	INIT_LIST_HEAD(&vdev->vma_list);
>  	INIT_LIST_HEAD(&vdev->sriov_pfs_item);
>  	init_rwsem(&vdev->memory_lock);
> +	xa_init(&vdev->ctx);
> 
>  	return 0;
>  }
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index ece371ebea00..00119641aa19 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -52,25 +52,60 @@ static
>  struct vfio_pci_irq_ctx *vfio_irq_ctx_get(struct vfio_pci_core_device *vdev,
>  					  unsigned long index)
>  {
> -	if (index >= vdev->num_ctx)
> -		return NULL;
> -	return &vdev->ctx[index];
> +	return xa_load(&vdev->ctx, index);
>  }
> 
>  static void vfio_irq_ctx_free_all(struct vfio_pci_core_device *vdev)  {
> -	kfree(vdev->ctx);
> +	struct vfio_pci_irq_ctx *ctx;
> +	unsigned long index;
> +
> +	xa_for_each(&vdev->ctx, index, ctx) {
> +		xa_erase(&vdev->ctx, index);
> +		kfree(ctx);
> +	}
> +}
> +
> +static struct vfio_pci_irq_ctx *
> +vfio_irq_ctx_alloc_single(struct vfio_pci_core_device *vdev,
> +			  unsigned long index)
> +{
> +	struct vfio_pci_irq_ctx *ctx;
> +	int ret;
> +
> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL_ACCOUNT);
> +	if (!ctx)
> +		return NULL;
> +
> +	ret = xa_insert(&vdev->ctx, index, ctx, GFP_KERNEL_ACCOUNT);
> +	if (ret) {
> +		kfree(ctx);
> +		return NULL;
> +	}
> +
> +	return ctx;
>  }
> 
> +/* Only called during initial interrupt enabling. Never after.  */
>  static int vfio_irq_ctx_alloc_num(struct vfio_pci_core_device *vdev,
>  				  unsigned long num)
>  {
> -	vdev->ctx = kcalloc(num, sizeof(struct vfio_pci_irq_ctx),
> -			    GFP_KERNEL_ACCOUNT);
> -	if (!vdev->ctx)
> -		return -ENOMEM;
> +	struct vfio_pci_irq_ctx *ctx;
> +	unsigned long index;
> +
> +	WARN_ON(!xa_empty(&vdev->ctx));
> +
> +	for (index = 0; index < num; index++) {
> +		ctx = vfio_irq_ctx_alloc_single(vdev, index);
> +		if (!ctx)
> +			goto err;
> +	}
> 
>  	return 0;
> +
> +err:
> +	vfio_irq_ctx_free_all(vdev);
> +	return -ENOMEM;
>  }
> 
>  /*
> @@ -218,7 +253,6 @@ static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
> static int vfio_intx_enable(struct vfio_pci_core_device *vdev)  {
>  	struct vfio_pci_irq_ctx *ctx;
> -	int ret;
> 
>  	if (!is_irq_none(vdev))
>  		return -EINVAL;
> @@ -226,15 +260,9 @@ static int vfio_intx_enable(struct vfio_pci_core_device
> *vdev)
>  	if (!vdev->pdev->irq)
>  		return -ENODEV;
> 
> -	ret = vfio_irq_ctx_alloc_num(vdev, 1);
> -	if (ret)
> -		return ret;
> -
> -	ctx = vfio_irq_ctx_get(vdev, 0);
> -	if (!ctx) {
> -		vfio_irq_ctx_free_all(vdev);
> -		return -EINVAL;
> -	}
> +	ctx = vfio_irq_ctx_alloc_single(vdev, 0);
> +	if (!ctx)
> +		return -ENOMEM;
> 
>  	vdev->num_ctx = 1;
> 
> @@ -486,16 +514,13 @@ static void vfio_msi_disable(struct
> vfio_pci_core_device *vdev, bool msix)  {
>  	struct pci_dev *pdev = vdev->pdev;
>  	struct vfio_pci_irq_ctx *ctx;
> -	unsigned int i;
> +	unsigned long i;
>  	u16 cmd;
> 
> -	for (i = 0; i < vdev->num_ctx; i++) {
> -		ctx = vfio_irq_ctx_get(vdev, i);
> -		if (ctx) {
> -			vfio_virqfd_disable(&ctx->unmask);
> -			vfio_virqfd_disable(&ctx->mask);
> -			vfio_msi_set_vector_signal(vdev, i, -1, msix);
> -		}
> +	xa_for_each(&vdev->ctx, i, ctx) {
> +		vfio_virqfd_disable(&ctx->unmask);
> +		vfio_virqfd_disable(&ctx->mask);
> +		vfio_msi_set_vector_signal(vdev, i, -1, msix);
>  	}
> 
>  	cmd = vfio_pci_memory_lock_and_enable(vdev);
> diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index
> 367fd79226a3..61d7873a3973 100644
> --- a/include/linux/vfio_pci_core.h
> +++ b/include/linux/vfio_pci_core.h
> @@ -59,7 +59,7 @@ struct vfio_pci_core_device {
>  	struct perm_bits	*msi_perm;
>  	spinlock_t		irqlock;
>  	struct mutex		igate;
> -	struct vfio_pci_irq_ctx	*ctx;
> +	struct xarray		ctx;
>  	int			num_ctx;
>  	int			irq_type;
>  	int			num_regions;
> --
> 2.34.1

Re: [PATCH V2 4/8] vfio/pci: Use xarray for interrupt context storage
Posted by Reinette Chatre 2 years, 10 months ago
Hi Jing,

On 4/7/2023 12:21 AM, Liu, Jing2 wrote:
> Hi Reinette,
> 
>> From: Chatre, Reinette <reinette.chatre@intel.com>
>> Subject: [PATCH V2 4/8] vfio/pci: Use xarray for interrupt context storage
>>
>> Interrupt context is statically allocated at the time interrupts are allocated.
>> Following allocation, the context is managed by directly accessing the
> elements of the array using the vector as index. The storage is release
> when interrupts are disabled.
>>
> 
> Looking at the dynamic allocation change for the time after MSI-x is
> enabled in vfio_msi_set_vector_signal(), do we need to consider changing 
> the allocation of context/interrupt in vfio_msi_enable() for MSI-x to the 
> same way, which only allocates for vectors with a valid signal fd when 
> dynamic MSI-x is supported?
> 
> Because in dynamic logic, I think when enabling MSI-x, not all vectors from 
> zero to nvec should be allocated, and they would be done until they are really
> used with setting the singal fd.
> 
> Not sure if this comment being replied here is the good place since
> vfio_msi_enable() seems no change in this series. 😊 

This is a good question and from what I understand it could be done either way.

vfio_msi_enable()->pci_alloc_irq_vectors() would always be required because
pci_alloc_irq_vectors() enables MSI-X in addition to allocating the vectors.

I expect it to be efficient to allocate a range using pci_alloc_irq_vectors()
at the same time as MSI-X enabling in anticipation of their subsequent activation
after needing to only take the vfio and MSI locks once.

As you indicate, it is also possible to only allocate one vector during MSI-X
enabling using pci_alloc_irq_vectors(), leaving the other allocations to
vfio_msi_set_vector_signal(). Not a major issue but this would require some
additional special cases within vfio_msi_enable() while the current solution
allocating all vectors using pci_alloc_irq_vectors() works for all types.

Considering which would be more efficient would depend on the use cases. The
current solution may be considered less efficient if the user enables MSI-X
by providing a range of vectors without triggers(*). From what I can tell
this is not a possible use case when using Qemu. Qemu enables MSI-X by
calling VFIO_DEVICE_SET_IRQS for vector 0 with a trigger. Making a change
to only allocate vector 0 using pci_alloc_irq_vectors() and the rest using
vfio_msi_set_vector_signal() would thus have no impact on Qemu. Both
implementations behave the same for Qemu.

Considering the efficiency of allocating multiple vectors together that
works for all interrupts (MSI, non dynamic MSI-X, and dynamic MSI-X) without
any impact to Qemu I do lean towards keeping the current implementation.

Reinette

(*) Whether it is less efficient may possibly be debated considering that 
it may be desirable to use allocated interrupts as a cache:
https://lore.kernel.org/lkml/20230404122444.59e36a99.alex.williamson@redhat.com/