[PATCH v2 1/2] vfio/pci: Fix vfio-pci sub-page MMIO BAR mmaping in live migration

Kunkun Jiang posted 2 patches 1 month, 1 week ago
Maintainers: Alex Williamson <alex.williamson@redhat.com>

[PATCH v2 1/2] vfio/pci: Fix vfio-pci sub-page MMIO BAR mmaping in live migration

Posted by Kunkun Jiang 1 month, 1 week ago
We expand MemoryRegions of vfio-pci sub-page MMIO BARs to
vfio_pci_write_config to improve IO performance.
The MemoryRegions of destination VM will not be expanded
successful in live migration, because their addresses have
been updated in vmstate_load_state (vfio_pci_load_config).

So iterate BARs in vfio_pci_write_config and try to update
sub-page BARs.

Fixes: c5e2fb3ce4d (vfio: Add save and load functions for VFIO PCI devices)
Reported-by: Nianyao Tang <tangnianyao@huawei.com>
Reported-by: Qixin Gan <ganqixin@huawei.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
---
 hw/vfio/pci.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index e1ea1d8a23..43c7e93153 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2453,7 +2453,12 @@ static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
 {
     VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
     PCIDevice *pdev = &vdev->pdev;
-    int ret;
+    pcibus_t old_addr[PCI_NUM_REGIONS - 1];
+    int bar, ret;
+
+    for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
+        old_addr[bar] = pdev->io_regions[bar].addr;
+    }
 
     ret = vmstate_load_state(f, &vmstate_vfio_pci_config, vdev, 1);
     if (ret) {
@@ -2463,6 +2468,14 @@ static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
     vfio_pci_write_config(pdev, PCI_COMMAND,
                           pci_get_word(pdev->config + PCI_COMMAND), 2);
 
+    for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
+        if (old_addr[bar] != pdev->io_regions[bar].addr &&
+            vdev->bars[bar].region.size > 0 &&
+            vdev->bars[bar].region.size < qemu_real_host_page_size) {
+            vfio_sub_page_bar_update_mapping(pdev, bar);
+        }
+    }
+
     if (msi_enabled(pdev)) {
         vfio_msi_enable(vdev);
     } else if (msix_enabled(pdev)) {
-- 
2.23.0


Re: [PATCH v2 1/2] vfio/pci: Fix vfio-pci sub-page MMIO BAR mmaping in live migration

Posted by Eric Auger 2 days, 8 hours ago
Hi Kunkun,

On 9/14/21 3:53 AM, Kunkun Jiang wrote:
> We expand MemoryRegions of vfio-pci sub-page MMIO BARs to
> vfio_pci_write_config to improve IO performance.
s/to vfio_pci_write_config/ in vfio_pci_write_config()
> The MemoryRegions of destination VM will not be expanded
> successful in live migration, because their addresses have
s/will not be expanded successful in live migration/are not expanded
anymore after live migration (?) Is that the correct symptom?
> been updated in vmstate_load_state (vfio_pci_load_config).
>
> So iterate BARs in vfio_pci_write_config and try to update
> sub-page BARs.
>
> Fixes: c5e2fb3ce4d (vfio: Add save and load functions for VFIO PCI devices)
is it an actual fix or an optimization?
> Reported-by: Nianyao Tang <tangnianyao@huawei.com>
> Reported-by: Qixin Gan <ganqixin@huawei.com>
> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
> ---
>  hw/vfio/pci.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index e1ea1d8a23..43c7e93153 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2453,7 +2453,12 @@ static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
>  {
>      VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>      PCIDevice *pdev = &vdev->pdev;
> -    int ret;
> +    pcibus_t old_addr[PCI_NUM_REGIONS - 1];
> +    int bar, ret;
> +
> +    for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
> +        old_addr[bar] = pdev->io_regions[bar].addr;
> +    }
what are those values before the vmstate_load_state ie. can't you do the
vfio_sub_page_bar_update_mapping() unconditionnaly on old_addr[bar] !=
pdev->io_regions[bar].addr
>  
>      ret = vmstate_load_state(f, &vmstate_vfio_pci_config, vdev, 1);
>      if (ret) {
> @@ -2463,6 +2468,14 @@ static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
>      vfio_pci_write_config(pdev, PCI_COMMAND,
>                            pci_get_word(pdev->config + PCI_COMMAND), 2);
>  
> +    for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
> +        if (old_addr[bar] != pdev->io_regions[bar].addr &&
> +            vdev->bars[bar].region.size > 0 &&
> +            vdev->bars[bar].region.size < qemu_real_host_page_size) {
> +            vfio_sub_page_bar_update_mapping(pdev, bar);
> +        }
> +    }
> +
>      if (msi_enabled(pdev)) {
>          vfio_msi_enable(vdev);
>      } else if (msix_enabled(pdev)) {
Thanks

Eric


Re: [PATCH v2 1/2] vfio/pci: Fix vfio-pci sub-page MMIO BAR mmaping in live migration

Posted by Kunkun Jiang 1 day, 14 hours ago
Hi Eric,

On 2021/10/22 0:15, Eric Auger wrote:
> Hi Kunkun,
>
> On 9/14/21 3:53 AM, Kunkun Jiang wrote:
>> We expand MemoryRegions of vfio-pci sub-page MMIO BARs to
>> vfio_pci_write_config to improve IO performance.
> s/to vfio_pci_write_config/ in vfio_pci_write_config()
Thank you for your review. I will correct it in v3.
>> The MemoryRegions of destination VM will not be expanded
>> successful in live migration, because their addresses have
> s/will not be expanded successful in live migration/are not expanded
> anymore after live migration (?) Is that the correct symptom?
You are right. They are not expanded anymore after live migration,
not expanded unsuccessfully. I used the wrong words.
>> been updated in vmstate_load_state (vfio_pci_load_config).
>>
>> So iterate BARs in vfio_pci_write_config and try to update
>> sub-page BARs.
>>
>> Fixes: c5e2fb3ce4d (vfio: Add save and load functions for VFIO PCI devices)
> is it an actual fix or an optimization?
I recently realized that this is actually an optimization.

The VF driver in VM use the assembly language instructions,
which can operate two registers simultaneously, like stp, ldp.
These instructions would result in non-ISV data abort, which
cannot be handled well at the moment.

If the driver doesn't use such instructions,  not expanding
only affects performance.

I will add these to the commit message in the next version.
>> Reported-by: Nianyao Tang <tangnianyao@huawei.com>
>> Reported-by: Qixin Gan <ganqixin@huawei.com>
>> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
>> ---
>>   hw/vfio/pci.c | 15 ++++++++++++++-
>>   1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index e1ea1d8a23..43c7e93153 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -2453,7 +2453,12 @@ static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
>>   {
>>       VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>>       PCIDevice *pdev = &vdev->pdev;
>> -    int ret;
>> +    pcibus_t old_addr[PCI_NUM_REGIONS - 1];
>> +    int bar, ret;
>> +
>> +    for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
>> +        old_addr[bar] = pdev->io_regions[bar].addr;
>> +    }
> what are those values before the vmstate_load_state ie. can't you do the
Are you referring to pdev->io_regions[bar].addr ? All of the bars addr is
PCI_BAR_UNMAPPED (~(pcibus_t)0) before the vmstate_load_state.
> vfio_sub_page_bar_update_mapping() unconditionnaly on old_addr[bar] !=
> pdev->io_regions[bar].addr
The size of Bar is a power of 2. The Bar, whose size is greater than host
page size, doesn't need to be expanded.

Can you explain more? May be I misunderstood you.

Thanks,
Kunkun Jiang
>>   
>>       ret = vmstate_load_state(f, &vmstate_vfio_pci_config, vdev, 1);
>>       if (ret) {
>> @@ -2463,6 +2468,14 @@ static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
>>       vfio_pci_write_config(pdev, PCI_COMMAND,
>>                             pci_get_word(pdev->config + PCI_COMMAND), 2);
>>   
>> +    for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
>> +        if (old_addr[bar] != pdev->io_regions[bar].addr &&
>> +            vdev->bars[bar].region.size > 0 &&
>> +            vdev->bars[bar].region.size < qemu_real_host_page_size) {
>> +            vfio_sub_page_bar_update_mapping(pdev, bar);
>> +        }
>> +    }
>> +
>>       if (msi_enabled(pdev)) {
>>           vfio_msi_enable(vdev);
>>       } else if (msix_enabled(pdev)) {
> Thanks
>
> Eric
>
> .

Re: [PATCH v2 1/2] vfio/pci: Fix vfio-pci sub-page MMIO BAR mmaping in live migration

Posted by Eric Auger 10 hours ago
Hi Kunkun,

On 10/22/21 12:01 PM, Kunkun Jiang wrote:
> Hi Eric,
>
> On 2021/10/22 0:15, Eric Auger wrote:
>> Hi Kunkun,
>>
>> On 9/14/21 3:53 AM, Kunkun Jiang wrote:
>>> We expand MemoryRegions of vfio-pci sub-page MMIO BARs to
>>> vfio_pci_write_config to improve IO performance.
>> s/to vfio_pci_write_config/ in vfio_pci_write_config()
> Thank you for your review. I will correct it in v3.
>>> The MemoryRegions of destination VM will not be expanded
>>> successful in live migration, because their addresses have
>> s/will not be expanded successful in live migration/are not expanded
>> anymore after live migration (?) Is that the correct symptom?
> You are right. They are not expanded anymore after live migration,
> not expanded unsuccessfully. I used the wrong words.
>>> been updated in vmstate_load_state (vfio_pci_load_config).
>>>
>>> So iterate BARs in vfio_pci_write_config and try to update
>>> sub-page BARs.
>>>
>>> Fixes: c5e2fb3ce4d (vfio: Add save and load functions for VFIO PCI
>>> devices)
>> is it an actual fix or an optimization?
> I recently realized that this is actually an optimization.
>
> The VF driver in VM use the assembly language instructions,
> which can operate two registers simultaneously, like stp, ldp.
> These instructions would result in non-ISV data abort, which
> cannot be handled well at the moment.
>
> If the driver doesn't use such instructions,  not expanding
> only affects performance.
>
> I will add these to the commit message in the next version.
>>> Reported-by: Nianyao Tang <tangnianyao@huawei.com>
>>> Reported-by: Qixin Gan <ganqixin@huawei.com>
>>> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
>>> ---
>>>   hw/vfio/pci.c | 15 ++++++++++++++-
>>>   1 file changed, 14 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>> index e1ea1d8a23..43c7e93153 100644
>>> --- a/hw/vfio/pci.c
>>> +++ b/hw/vfio/pci.c
>>> @@ -2453,7 +2453,12 @@ static int vfio_pci_load_config(VFIODevice
>>> *vbasedev, QEMUFile *f)
>>>   {
>>>       VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice,
>>> vbasedev);
>>>       PCIDevice *pdev = &vdev->pdev;
>>> -    int ret;
>>> +    pcibus_t old_addr[PCI_NUM_REGIONS - 1];
>>> +    int bar, ret;
>>> +
>>> +    for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
>>> +        old_addr[bar] = pdev->io_regions[bar].addr;
>>> +    }
>> what are those values before the vmstate_load_state ie. can't you do the
> Are you referring to pdev->io_regions[bar].addr ? All of the bars addr is
> PCI_BAR_UNMAPPED (~(pcibus_t)0) before the vmstate_load_state.
OK that was my assumption. What I meant is in that case you always have

old_addr[bar] != pdev->io_regions[bar].addr, right? In the positive this check is not needed and you don't need old_addr at all.
In the original code this was needed since one wanted to call 
vfio_sub_page_bar_update_mapping() only for the bar base address that were changed during the 
vfio_pci_write_config.

Thanks

Eric

>> vfio_sub_page_bar_update_mapping() unconditionnaly on old_addr[bar] !=
>> pdev->io_regions[bar].addr
> The size of Bar is a power of 2. The Bar, whose size is greater than host
> page size, doesn't need to be expanded.
>
> Can you explain more? May be I misunderstood you.
>
> Thanks,
> Kunkun Jiang
>>>         ret = vmstate_load_state(f, &vmstate_vfio_pci_config, vdev, 1);
>>>       if (ret) {
>>> @@ -2463,6 +2468,14 @@ static int vfio_pci_load_config(VFIODevice
>>> *vbasedev, QEMUFile *f)
>>>       vfio_pci_write_config(pdev, PCI_COMMAND,
>>>                             pci_get_word(pdev->config +
>>> PCI_COMMAND), 2);
>>>   +    for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
>>> +        if (old_addr[bar] != pdev->io_regions[bar].addr &&
>>> +            vdev->bars[bar].region.size > 0 &&
>>> +            vdev->bars[bar].region.size < qemu_real_host_page_size) {
>>> +            vfio_sub_page_bar_update_mapping(pdev, bar);
>>> +        }
>>> +    }
>>> +
>>>       if (msi_enabled(pdev)) {
>>>           vfio_msi_enable(vdev);
>>>       } else if (msix_enabled(pdev)) {
>> Thanks
>>
>> Eric
>>
>> .
>