During migration restoring, vfio_enable_vectors() is called to restore
enabling MSI-X interrupts for assigned devices. It sets the range from 0
to nr_vectors to kernel to enable MSI-X and the vectors unmasked in
guest. During the MSI-X enabling, all the vectors within the range are
allocated according to the ioctl().
When dynamic MSI-X allocation is supported, we only want the guest
unmasked vectors being allocated and enabled. Therefore, Qemu can first
set vector 0 to enable MSI-X and after that, all the vectors can be
allocated in need.
Signed-off-by: Jing Liu <jing2.liu@intel.com>
---
hw/vfio/pci.c | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 8c485636445c..43ffacd5b36a 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -375,6 +375,38 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
int ret = 0, i, argsz;
int32_t *fds;
+ /*
+ * If dynamic MSI-X allocation is supported, the vectors to be allocated
+ * and enabled can be scattered. Before kernel enabling MSI-X, setting
+ * nr_vectors causes all these vectors being allocated on host.
+ *
+ * To keep allocation as needed, first setup vector 0 with an invalid
+ * fd to make MSI-X enabled, then enable vectors by setting all so that
+ * kernel allocates and enables interrupts only when enabled in guest.
+ */
+ if (msix && !(vdev->msix->irq_info_flags & VFIO_IRQ_INFO_NORESIZE)) {
+ argsz = sizeof(*irq_set) + sizeof(*fds);
+
+ irq_set = g_malloc0(argsz);
+ irq_set->argsz = argsz;
+ irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
+ VFIO_IRQ_SET_ACTION_TRIGGER;
+ irq_set->index = msix ? VFIO_PCI_MSIX_IRQ_INDEX :
+ VFIO_PCI_MSI_IRQ_INDEX;
+ irq_set->start = 0;
+ irq_set->count = 1;
+ fds = (int32_t *)&irq_set->data;
+ fds[0] = -1;
+
+ ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+ g_free(irq_set);
+
+ if (ret) {
+ return ret;
+ }
+ }
+
argsz = sizeof(*irq_set) + (vdev->nr_vectors * sizeof(*fds));
irq_set = g_malloc0(argsz);
--
2.27.0
On Thu, 27 Jul 2023 03:24:10 -0400
Jing Liu <jing2.liu@intel.com> wrote:
> During migration restoring, vfio_enable_vectors() is called to restore
> enabling MSI-X interrupts for assigned devices. It sets the range from 0
> to nr_vectors to kernel to enable MSI-X and the vectors unmasked in
> guest. During the MSI-X enabling, all the vectors within the range are
> allocated according to the ioctl().
>
> When dynamic MSI-X allocation is supported, we only want the guest
> unmasked vectors being allocated and enabled. Therefore, Qemu can first
> set vector 0 to enable MSI-X and after that, all the vectors can be
> allocated in need.
>
> Signed-off-by: Jing Liu <jing2.liu@intel.com>
> ---
> hw/vfio/pci.c | 32 ++++++++++++++++++++++++++++++++
> 1 file changed, 32 insertions(+)
>
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 8c485636445c..43ffacd5b36a 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -375,6 +375,38 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
> int ret = 0, i, argsz;
> int32_t *fds;
>
> + /*
> + * If dynamic MSI-X allocation is supported, the vectors to be allocated
> + * and enabled can be scattered. Before kernel enabling MSI-X, setting
> + * nr_vectors causes all these vectors being allocated on host.
s/being/to be/
> + *
> + * To keep allocation as needed, first setup vector 0 with an invalid
> + * fd to make MSI-X enabled, then enable vectors by setting all so that
> + * kernel allocates and enables interrupts only when enabled in guest.
> + */
> + if (msix && !(vdev->msix->irq_info_flags & VFIO_IRQ_INFO_NORESIZE)) {
!vdev->msix->noresize again seems cleaner.
> + argsz = sizeof(*irq_set) + sizeof(*fds);
> +
> + irq_set = g_malloc0(argsz);
> + irq_set->argsz = argsz;
> + irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
> + VFIO_IRQ_SET_ACTION_TRIGGER;
> + irq_set->index = msix ? VFIO_PCI_MSIX_IRQ_INDEX :
> + VFIO_PCI_MSI_IRQ_INDEX;
Why are we testing msix again within a branch that requires msix?
> + irq_set->start = 0;
> + irq_set->count = 1;
> + fds = (int32_t *)&irq_set->data;
> + fds[0] = -1;
> +
> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
> +
> + g_free(irq_set);
> +
> + if (ret) {
> + return ret;
> + }
> + }
So your goal here is simply to get the kernel to call vfio_msi_enable()
with nvec = 1 to get MSI-X enabled on the device, which then allows the
kernel to use the dynamic expansion when we call SET_IRQS again with a
potentially sparse set of eventfds to vector mappings. This seems very
similar to the nr_vectors == 0 branch of vfio_msix_enable() where it
uses a do_use and release call to accomplish getting MSI-X enabled. We
should consolidate, probably by pulling this out into a function since
it seems cleaner to use the fd = -1 trick than to setup userspace
triggering and immediately release. Thanks,
Alex
> +
> argsz = sizeof(*irq_set) + (vdev->nr_vectors * sizeof(*fds));
>
> irq_set = g_malloc0(argsz);
Hi Alex,
> On July 28, 2023 1:25 AM, Alex Williamson <alex.williamson@redhat.com> wrote:
>
> On Thu, 27 Jul 2023 03:24:10 -0400
> Jing Liu <jing2.liu@intel.com> wrote:
>
> > During migration restoring, vfio_enable_vectors() is called to restore
> > enabling MSI-X interrupts for assigned devices. It sets the range from
> > 0 to nr_vectors to kernel to enable MSI-X and the vectors unmasked in
> > guest. During the MSI-X enabling, all the vectors within the range are
> > allocated according to the ioctl().
> >
> > When dynamic MSI-X allocation is supported, we only want the guest
> > unmasked vectors being allocated and enabled. Therefore, Qemu can
> > first set vector 0 to enable MSI-X and after that, all the vectors can
> > be allocated in need.
> >
> > Signed-off-by: Jing Liu <jing2.liu@intel.com>
> > ---
> > hw/vfio/pci.c | 32 ++++++++++++++++++++++++++++++++
> > 1 file changed, 32 insertions(+)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index
> > 8c485636445c..43ffacd5b36a 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -375,6 +375,38 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev,
> bool msix)
> > int ret = 0, i, argsz;
> > int32_t *fds;
> >
> > + /*
> > + * If dynamic MSI-X allocation is supported, the vectors to be allocated
> > + * and enabled can be scattered. Before kernel enabling MSI-X, setting
> > + * nr_vectors causes all these vectors being allocated on host.
>
> s/being/to be/
Will change.
>
> > + *
> > + * To keep allocation as needed, first setup vector 0 with an invalid
> > + * fd to make MSI-X enabled, then enable vectors by setting all so that
> > + * kernel allocates and enables interrupts only when enabled in guest.
> > + */
> > + if (msix && !(vdev->msix->irq_info_flags &
> > + VFIO_IRQ_INFO_NORESIZE)) {
>
> !vdev->msix->noresize again seems cleaner.
Sure, will change.
>
> > + argsz = sizeof(*irq_set) + sizeof(*fds);
> > +
> > + irq_set = g_malloc0(argsz);
> > + irq_set->argsz = argsz;
> > + irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
> > + VFIO_IRQ_SET_ACTION_TRIGGER;
> > + irq_set->index = msix ? VFIO_PCI_MSIX_IRQ_INDEX :
> > + VFIO_PCI_MSI_IRQ_INDEX;
>
> Why are we testing msix again within a branch that requires msix?
Ah, yes. Will remove the test.
>
> > + irq_set->start = 0;
> > + irq_set->count = 1;
> > + fds = (int32_t *)&irq_set->data;
> > + fds[0] = -1;
> > +
> > + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS,
> > + irq_set);
> > +
> > + g_free(irq_set);
> > +
> > + if (ret) {
> > + return ret;
> > + }
> > + }
>
> So your goal here is simply to get the kernel to call vfio_msi_enable() with nvec
> = 1 to get MSI-X enabled on the device, which then allows the kernel to use the
> dynamic expansion when we call SET_IRQS again with a potentially sparse set of
> eventfds to vector mappings.
Yes, that's what I can think out to get MSI-X enabled first. The only question is that,
when getting kernel to call vfio_msi_enable() with nvec=1, kernel will allocate one
interrupt along with enabling MSI-X, which cannot avoid.
Therefore, if we set vector 0 for example, irq for vec 0 will be allocated in kernel.
And later if vector 0 is unmasked in guest, then enable it as normal; but if vector 0
is always masked in guest, then we leave an allocated irq there (unenabled though)
until MSI-X disable.
I'm not sure if this is okay, but cannot think out other cleaner way.
And I also wonder if it is possible, or vector 0 is always being enabled?
This seems very similar to the nr_vectors == 0
> branch of vfio_msix_enable() where it uses a do_use and release call to
> accomplish getting MSI-X enabled.
They are similar. Use a do_use to setup userspace triggering also makes kernel
one allocated irq there. And my understanding is that, the following release function
actually won't release if it is a userspace trigger.
static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
{
/*
* There are still old guests that mask and unmask vectors on every
* interrupt. If we're using QEMU bypass with a KVM irqfd, leave all of
* the KVM setup in place, simply switch VFIO to use the non-bypass
* eventfd. We'll then fire the interrupt through QEMU and the MSI-X
* core will mask the interrupt and set pending bits, allowing it to
* be re-asserted on unmask. Nothing to do if already using QEMU mode.
*/
...
}
We should consolidate, probably by pulling
> this out into a function since it seems cleaner to use the fd = -1 trick than to
> setup userspace triggering and immediately release. Thanks,
Oh, yes, agree that uses fd=-1 trick is cleaner and we don't need depend on the maskable
bit in qemu. According to your suggestion, I will create a function e.g.,
vfio_enable_msix_no_vec(vdev), which only sets vector 0 with fd=-1 to kernel, and
returns the result back.
Thanks,
Jing
>
> Alex
>
> > +
> > argsz = sizeof(*irq_set) + (vdev->nr_vectors * sizeof(*fds));
> >
> > irq_set = g_malloc0(argsz);
© 2016 - 2026 Red Hat, Inc.