[PATCH] migration: ensure APIC is loaded prior to VFIO PCI devices

Yanfei Xu posted 1 patch 2 months, 4 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20250818131127.1021648-1-yanfei.xu@bytedance.com
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>
hw/intc/apic_common.c       | 1 +
include/migration/vmstate.h | 1 +
2 files changed, 2 insertions(+)
[PATCH] migration: ensure APIC is loaded prior to VFIO PCI devices
Posted by Yanfei Xu 2 months, 4 weeks ago
The load procedure of VFIO PCI devices involves setting up IRT
for each VFIO PCI devices. This requires determining whether an
interrupt is single-destination interrupt to decide between
Posted Interrupt(PI) or remapping mode for the IRTE. However,
determining this may require accessing the VM's APIC registers.

For example:
ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irqs)
  ...
    kvm_arch_irq_bypass_add_producer
      kvm_x86_call(pi_update_irte)
        vmx_pi_update_irte
          kvm_intr_is_single_vcpu

If the LAPIC has not been loaded yet, interrupts will use remapping
mode. To prevent the fallback of interrupt mode, keep APIC is always
loaded prior to VFIO PCI devices.

Signed-off-by: Yicong Shen <shenyicong.1023@bytedance.com>
Signed-off-by: Yanfei Xu <yanfei.xu@bytedance.com>
---
 hw/intc/apic_common.c       | 1 +
 include/migration/vmstate.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index 37a7a7019d..394fe02013 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -379,6 +379,7 @@ static const VMStateDescription vmstate_apic_common = {
     .pre_load = apic_pre_load,
     .pre_save = apic_dispatch_pre_save,
     .post_load = apic_dispatch_post_load,
+    .priority = MIG_PRI_APIC,
     .fields = (const VMStateField[]) {
         VMSTATE_UINT32(apicbase, APICCommonState),
         VMSTATE_UINT8(id, APICCommonState),
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 1ff7bd9ac4..22e988c5db 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -163,6 +163,7 @@ typedef enum {
     MIG_PRI_IOMMU,              /* Must happen before PCI devices */
     MIG_PRI_PCI_BUS,            /* Must happen before IOMMU */
     MIG_PRI_VIRTIO_MEM,         /* Must happen before IOMMU */
+    MIG_PRI_APIC,               /* Must happen before PCI devices */
     MIG_PRI_GICV3_ITS,          /* Must happen before PCI devices */
     MIG_PRI_GICV3,              /* Must happen before the ITS */
     MIG_PRI_MAX,
-- 
2.20.1
Re: [PATCH] migration: ensure APIC is loaded prior to VFIO PCI devices
Posted by Peter Xu 1 month, 2 weeks ago
On Mon, Aug 18, 2025 at 09:11:27PM +0800, Yanfei Xu wrote:
> The load procedure of VFIO PCI devices involves setting up IRT
> for each VFIO PCI devices. This requires determining whether an
> interrupt is single-destination interrupt to decide between
> Posted Interrupt(PI) or remapping mode for the IRTE. However,
> determining this may require accessing the VM's APIC registers.
> 
> For example:
> ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irqs)
>   ...
>     kvm_arch_irq_bypass_add_producer
>       kvm_x86_call(pi_update_irte)
>         vmx_pi_update_irte
>           kvm_intr_is_single_vcpu
> 
> If the LAPIC has not been loaded yet, interrupts will use remapping
> mode. To prevent the fallback of interrupt mode, keep APIC is always
> loaded prior to VFIO PCI devices.
> 
> Signed-off-by: Yicong Shen <shenyicong.1023@bytedance.com>
> Signed-off-by: Yanfei Xu <yanfei.xu@bytedance.com>
> ---
>  hw/intc/apic_common.c       | 1 +
>  include/migration/vmstate.h | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
> index 37a7a7019d..394fe02013 100644
> --- a/hw/intc/apic_common.c
> +++ b/hw/intc/apic_common.c
> @@ -379,6 +379,7 @@ static const VMStateDescription vmstate_apic_common = {
>      .pre_load = apic_pre_load,
>      .pre_save = apic_dispatch_pre_save,
>      .post_load = apic_dispatch_post_load,
> +    .priority = MIG_PRI_APIC,
>      .fields = (const VMStateField[]) {
>          VMSTATE_UINT32(apicbase, APICCommonState),
>          VMSTATE_UINT8(id, APICCommonState),
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index 1ff7bd9ac4..22e988c5db 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -163,6 +163,7 @@ typedef enum {
>      MIG_PRI_IOMMU,              /* Must happen before PCI devices */
>      MIG_PRI_PCI_BUS,            /* Must happen before IOMMU */
>      MIG_PRI_VIRTIO_MEM,         /* Must happen before IOMMU */
> +    MIG_PRI_APIC,               /* Must happen before PCI devices */
>      MIG_PRI_GICV3_ITS,          /* Must happen before PCI devices */
>      MIG_PRI_GICV3,              /* Must happen before the ITS */
>      MIG_PRI_MAX,
> -- 
> 2.20.1
> 

+Cedric, +Alex

queued.

-- 
Peter Xu
Re: [PATCH] migration: ensure APIC is loaded prior to VFIO PCI devices
Posted by Cédric Le Goater 1 month, 2 weeks ago
On 9/29/25 17:51, Peter Xu wrote:
> On Mon, Aug 18, 2025 at 09:11:27PM +0800, Yanfei Xu wrote:
>> The load procedure of VFIO PCI devices involves setting up IRT
>> for each VFIO PCI devices. This requires determining whether an
>> interrupt is single-destination interrupt to decide between
>> Posted Interrupt(PI) or remapping mode for the IRTE. However,
>> determining this may require accessing the VM's APIC registers.
>>
>> For example:
>> ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irqs)
>>    ...
>>      kvm_arch_irq_bypass_add_producer
>>        kvm_x86_call(pi_update_irte)
>>          vmx_pi_update_irte
>>            kvm_intr_is_single_vcpu
>>
>> If the LAPIC has not been loaded yet, interrupts will use remapping
>> mode. To prevent the fallback of interrupt mode, keep APIC is always
>> loaded prior to VFIO PCI devices.
>>
>> Signed-off-by: Yicong Shen <shenyicong.1023@bytedance.com>
>> Signed-off-by: Yanfei Xu <yanfei.xu@bytedance.com>
>> ---
>>   hw/intc/apic_common.c       | 1 +
>>   include/migration/vmstate.h | 1 +
>>   2 files changed, 2 insertions(+)
>>
>> diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
>> index 37a7a7019d..394fe02013 100644
>> --- a/hw/intc/apic_common.c
>> +++ b/hw/intc/apic_common.c
>> @@ -379,6 +379,7 @@ static const VMStateDescription vmstate_apic_common = {
>>       .pre_load = apic_pre_load,
>>       .pre_save = apic_dispatch_pre_save,
>>       .post_load = apic_dispatch_post_load,
>> +    .priority = MIG_PRI_APIC,
>>       .fields = (const VMStateField[]) {
>>           VMSTATE_UINT32(apicbase, APICCommonState),
>>           VMSTATE_UINT8(id, APICCommonState),
>> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
>> index 1ff7bd9ac4..22e988c5db 100644
>> --- a/include/migration/vmstate.h
>> +++ b/include/migration/vmstate.h
>> @@ -163,6 +163,7 @@ typedef enum {
>>       MIG_PRI_IOMMU,              /* Must happen before PCI devices */
>>       MIG_PRI_PCI_BUS,            /* Must happen before IOMMU */
>>       MIG_PRI_VIRTIO_MEM,         /* Must happen before IOMMU */
>> +    MIG_PRI_APIC,               /* Must happen before PCI devices */
>>       MIG_PRI_GICV3_ITS,          /* Must happen before PCI devices */
>>       MIG_PRI_GICV3,              /* Must happen before the ITS */
>>       MIG_PRI_MAX,
>> -- 
>> 2.20.1
>>
> 
> +Cedric, +Alex
> 
> queued.
> 

Perhaps we could group the interrupt controller priorities
under a common MIG_PRI_INTC ? PPC is very much the same,
although we managed to order restore from the machine load
handler IIRC.

Anyhow, LGTM.

Thanks,

C.
Re: [PATCH] migration: ensure APIC is loaded prior to VFIO PCI devices
Posted by Peter Xu 1 month, 2 weeks ago
On Tue, Sep 30, 2025 at 10:27:34AM +0200, Cédric Le Goater wrote:
> On 9/29/25 17:51, Peter Xu wrote:
> > On Mon, Aug 18, 2025 at 09:11:27PM +0800, Yanfei Xu wrote:
> > > The load procedure of VFIO PCI devices involves setting up IRT
> > > for each VFIO PCI devices. This requires determining whether an
> > > interrupt is single-destination interrupt to decide between
> > > Posted Interrupt(PI) or remapping mode for the IRTE. However,
> > > determining this may require accessing the VM's APIC registers.
> > > 
> > > For example:
> > > ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irqs)
> > >    ...
> > >      kvm_arch_irq_bypass_add_producer
> > >        kvm_x86_call(pi_update_irte)
> > >          vmx_pi_update_irte
> > >            kvm_intr_is_single_vcpu
> > > 
> > > If the LAPIC has not been loaded yet, interrupts will use remapping
> > > mode. To prevent the fallback of interrupt mode, keep APIC is always
> > > loaded prior to VFIO PCI devices.
> > > 
> > > Signed-off-by: Yicong Shen <shenyicong.1023@bytedance.com>
> > > Signed-off-by: Yanfei Xu <yanfei.xu@bytedance.com>
> > > ---
> > >   hw/intc/apic_common.c       | 1 +
> > >   include/migration/vmstate.h | 1 +
> > >   2 files changed, 2 insertions(+)
> > > 
> > > diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
> > > index 37a7a7019d..394fe02013 100644
> > > --- a/hw/intc/apic_common.c
> > > +++ b/hw/intc/apic_common.c
> > > @@ -379,6 +379,7 @@ static const VMStateDescription vmstate_apic_common = {
> > >       .pre_load = apic_pre_load,
> > >       .pre_save = apic_dispatch_pre_save,
> > >       .post_load = apic_dispatch_post_load,
> > > +    .priority = MIG_PRI_APIC,
> > >       .fields = (const VMStateField[]) {
> > >           VMSTATE_UINT32(apicbase, APICCommonState),
> > >           VMSTATE_UINT8(id, APICCommonState),
> > > diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> > > index 1ff7bd9ac4..22e988c5db 100644
> > > --- a/include/migration/vmstate.h
> > > +++ b/include/migration/vmstate.h
> > > @@ -163,6 +163,7 @@ typedef enum {
> > >       MIG_PRI_IOMMU,              /* Must happen before PCI devices */
> > >       MIG_PRI_PCI_BUS,            /* Must happen before IOMMU */
> > >       MIG_PRI_VIRTIO_MEM,         /* Must happen before IOMMU */
> > > +    MIG_PRI_APIC,               /* Must happen before PCI devices */
> > >       MIG_PRI_GICV3_ITS,          /* Must happen before PCI devices */
> > >       MIG_PRI_GICV3,              /* Must happen before the ITS */
> > >       MIG_PRI_MAX,
> > > -- 
> > > 2.20.1
> > > 
> > 
> > +Cedric, +Alex
> > 
> > queued.
> > 
> 
> Perhaps we could group the interrupt controller priorities
> under a common MIG_PRI_INTC ? PPC is very much the same,
> although we managed to order restore from the machine load
> handler IIRC.
> 
> Anyhow, LGTM.

Agreed. That, when introduced, can also take ARM's into account (that may
need a MIG_PRI_INTC_HIGH for ITS).

Thanks,

-- 
Peter Xu


Re: [PATCH] migration: ensure APIC is loaded prior to VFIO PCI devices
Posted by Peter Xu 2 months, 4 weeks ago
On Mon, Aug 18, 2025 at 09:11:27PM +0800, Yanfei Xu wrote:
> The load procedure of VFIO PCI devices involves setting up IRT
> for each VFIO PCI devices. This requires determining whether an
> interrupt is single-destination interrupt to decide between
> Posted Interrupt(PI) or remapping mode for the IRTE. However,
> determining this may require accessing the VM's APIC registers.
> 
> For example:
> ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irqs)
>   ...
>     kvm_arch_irq_bypass_add_producer
>       kvm_x86_call(pi_update_irte)
>         vmx_pi_update_irte
>           kvm_intr_is_single_vcpu
> 
> If the LAPIC has not been loaded yet, interrupts will use remapping
> mode. To prevent the fallback of interrupt mode, keep APIC is always
> loaded prior to VFIO PCI devices.
> 
> Signed-off-by: Yicong Shen <shenyicong.1023@bytedance.com>
> Signed-off-by: Yanfei Xu <yanfei.xu@bytedance.com>
> ---
>  hw/intc/apic_common.c       | 1 +
>  include/migration/vmstate.h | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
> index 37a7a7019d..394fe02013 100644
> --- a/hw/intc/apic_common.c
> +++ b/hw/intc/apic_common.c
> @@ -379,6 +379,7 @@ static const VMStateDescription vmstate_apic_common = {
>      .pre_load = apic_pre_load,
>      .pre_save = apic_dispatch_pre_save,
>      .post_load = apic_dispatch_post_load,
> +    .priority = MIG_PRI_APIC,
>      .fields = (const VMStateField[]) {
>          VMSTATE_UINT32(apicbase, APICCommonState),
>          VMSTATE_UINT8(id, APICCommonState),
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index 1ff7bd9ac4..22e988c5db 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -163,6 +163,7 @@ typedef enum {
>      MIG_PRI_IOMMU,              /* Must happen before PCI devices */
>      MIG_PRI_PCI_BUS,            /* Must happen before IOMMU */
>      MIG_PRI_VIRTIO_MEM,         /* Must happen before IOMMU */
> +    MIG_PRI_APIC,               /* Must happen before PCI devices */

As the list grows, maybe it's a good idea we start to document exactly the
reasons inline into the enum.  Can sure be done on top.  This patch alone
looks like stable material..

Reviewed-by: Peter Xu <peterx@redhat.com>

>      MIG_PRI_GICV3_ITS,          /* Must happen before PCI devices */
>      MIG_PRI_GICV3,              /* Must happen before the ITS */
>      MIG_PRI_MAX,
> -- 
> 2.20.1
> 

-- 
Peter Xu