[PATCH v7 12/12] hw/acpi: Make the PCI hot-plug aware of SR-IOV

Lukasz Maniak posted 12 patches 3 years, 10 months ago
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>, Igor Mammedov <imammedo@redhat.com>, Ani Sinha <ani@anisinha.ca>, Keith Busch <kbusch@kernel.org>, Klaus Jensen <its@irrelevant.dk>, Stefan Hajnoczi <stefanha@redhat.com>, Fam Zheng <fam@euphon.net>, "Philippe Mathieu-Daudé" <f4bug@amsat.org>, Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
There is a newer version of this series
[PATCH v7 12/12] hw/acpi: Make the PCI hot-plug aware of SR-IOV
Posted by Lukasz Maniak 3 years, 10 months ago
From: Łukasz Gieryk <lukasz.gieryk@linux.intel.com>

PCI device capable of SR-IOV support is a new, still-experimental
feature with only a single working example of the Nvme device.

This patch in an attempt to fix a double-free problem when a
SR-IOV-capable Nvme device is hot-unplugged. The problem and the
reproduction steps can be found in this thread:

https://patchew.org/QEMU/20220217174504.1051716-1-lukasz.maniak@linux.intel.com/20220217174504.1051716-14-lukasz.maniak@linux.intel.com/

Details of the proposed solution are, for convenience, included below.

1) The current SR-IOV implementation assumes it’s the PhysicalFunction
   that creates and deletes VirtualFunctions.
2) It’s a design decision (the Nvme device at least) for the VFs to be
   of the same class as PF. Effectively, they share the dc->hotpluggable
   value.
3) When a VF is created, it’s added as a child node to PF’s PCI bus
   slot.
4) Monitor/device_del triggers the ACPI mechanism. The implementation is
   not aware of SR/IOV and ejects PF’s PCI slot, directly unrealizing all
   hot-pluggable (!acpi_pcihp_pc_no_hotplug) children nodes.
5) VFs are unrealized directly, and it doesn’t work well with (1).
   SR/IOV structures are not updated, so when it’s PF’s turn to be
   unrealized, it works on stale pointers to already-deleted VFs.

Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com>
---
 hw/acpi/pcihp.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index 6351bd3424d..248839e1110 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -192,8 +192,12 @@ static bool acpi_pcihp_pc_no_hotplug(AcpiPciHpState *s, PCIDevice *dev)
      * ACPI doesn't allow hotplug of bridge devices.  Don't allow
      * hot-unplug of bridge devices unless they were added by hotplug
      * (and so, not described by acpi).
+     *
+     * Don't allow hot-unplug of SR-IOV Virtual Functions, as they
+     * will be removed implicitly, when Physical Function is unplugged.
      */
-    return (pc->is_bridge && !dev->qdev.hotplugged) || !dc->hotpluggable;
+    return (pc->is_bridge && !dev->qdev.hotplugged) || !dc->hotpluggable ||
+           pci_is_vf(dev);
 }
 
 static void acpi_pcihp_eject_slot(AcpiPciHpState *s, unsigned bsel, unsigned slots)
-- 
2.25.1


Re: [PATCH v7 12/12] hw/acpi: Make the PCI hot-plug aware of SR-IOV
Posted by Michael S. Tsirkin 3 years, 9 months ago
On Fri, Mar 18, 2022 at 08:18:19PM +0100, Lukasz Maniak wrote:
> From: Łukasz Gieryk <lukasz.gieryk@linux.intel.com>
> 
> PCI device capable of SR-IOV support is a new, still-experimental
> feature with only a single working example of the Nvme device.
> 
> This patch in an attempt to fix a double-free problem when a
> SR-IOV-capable Nvme device is hot-unplugged. The problem and the
> reproduction steps can be found in this thread:
> 
> https://patchew.org/QEMU/20220217174504.1051716-1-lukasz.maniak@linux.intel.com/20220217174504.1051716-14-lukasz.maniak@linux.intel.com/
> 
> Details of the proposed solution are, for convenience, included below.
> 
> 1) The current SR-IOV implementation assumes it’s the PhysicalFunction
>    that creates and deletes VirtualFunctions.
> 2) It’s a design decision (the Nvme device at least) for the VFs to be
>    of the same class as PF. Effectively, they share the dc->hotpluggable
>    value.
> 3) When a VF is created, it’s added as a child node to PF’s PCI bus
>    slot.
> 4) Monitor/device_del triggers the ACPI mechanism. The implementation is
>    not aware of SR/IOV and ejects PF’s PCI slot, directly unrealizing all
>    hot-pluggable (!acpi_pcihp_pc_no_hotplug) children nodes.
> 5) VFs are unrealized directly, and it doesn’t work well with (1).
>    SR/IOV structures are not updated, so when it’s PF’s turn to be
>    unrealized, it works on stale pointers to already-deleted VFs.
> 
> Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com>

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>

feel free to include when merging the rest of the patchset.

> ---
>  hw/acpi/pcihp.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> index 6351bd3424d..248839e1110 100644
> --- a/hw/acpi/pcihp.c
> +++ b/hw/acpi/pcihp.c
> @@ -192,8 +192,12 @@ static bool acpi_pcihp_pc_no_hotplug(AcpiPciHpState *s, PCIDevice *dev)
>       * ACPI doesn't allow hotplug of bridge devices.  Don't allow
>       * hot-unplug of bridge devices unless they were added by hotplug
>       * (and so, not described by acpi).
> +     *
> +     * Don't allow hot-unplug of SR-IOV Virtual Functions, as they
> +     * will be removed implicitly, when Physical Function is unplugged.
>       */
> -    return (pc->is_bridge && !dev->qdev.hotplugged) || !dc->hotpluggable;
> +    return (pc->is_bridge && !dev->qdev.hotplugged) || !dc->hotpluggable ||
> +           pci_is_vf(dev);
>  }
>  
>  static void acpi_pcihp_eject_slot(AcpiPciHpState *s, unsigned bsel, unsigned slots)
> -- 
> 2.25.1


Re: [PATCH v7 12/12] hw/acpi: Make the PCI hot-plug aware of SR-IOV
Posted by Igor Mammedov 3 years, 10 months ago
On Fri, 18 Mar 2022 20:18:19 +0100
Lukasz Maniak <lukasz.maniak@linux.intel.com> wrote:

> From: Łukasz Gieryk <lukasz.gieryk@linux.intel.com>
> 
> PCI device capable of SR-IOV support is a new, still-experimental
> feature with only a single working example of the Nvme device.
> 
> This patch in an attempt to fix a double-free problem when a
> SR-IOV-capable Nvme device is hot-unplugged. The problem and the
> reproduction steps can be found in this thread:
> 
> https://patchew.org/QEMU/20220217174504.1051716-1-lukasz.maniak@linux.intel.com/20220217174504.1051716-14-lukasz.maniak@linux.intel.com/

pls include that in patch description.

> Details of the proposed solution are, for convenience, included below.
> 
> 1) The current SR-IOV implementation assumes it’s the PhysicalFunction
>    that creates and deletes VirtualFunctions.
> 2) It’s a design decision (the Nvme device at least) for the VFs to be
>    of the same class as PF. Effectively, they share the dc->hotpluggable
>    value.
> 3) When a VF is created, it’s added as a child node to PF’s PCI bus
>    slot.
> 4) Monitor/device_del triggers the ACPI mechanism. The implementation is
>    not aware of SR/IOV and ejects PF’s PCI slot, directly unrealizing all
>    hot-pluggable (!acpi_pcihp_pc_no_hotplug) children nodes.
> 5) VFs are unrealized directly, and it doesn’t work well with (1).
>    SR/IOV structures are not updated, so when it’s PF’s turn to be
>    unrealized, it works on stale pointers to already-deleted VFs.
it's unclear what's bing hotpluged and unplugged, it would be better if
you included QEMU CLI and relevan qmp/monito commands to reproduce it.

> 
> Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com>
> ---
>  hw/acpi/pcihp.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> index 6351bd3424d..248839e1110 100644
> --- a/hw/acpi/pcihp.c
> +++ b/hw/acpi/pcihp.c
> @@ -192,8 +192,12 @@ static bool acpi_pcihp_pc_no_hotplug(AcpiPciHpState *s, PCIDevice *dev)
>       * ACPI doesn't allow hotplug of bridge devices.  Don't allow
>       * hot-unplug of bridge devices unless they were added by hotplug
>       * (and so, not described by acpi).
> +     *
> +     * Don't allow hot-unplug of SR-IOV Virtual Functions, as they
> +     * will be removed implicitly, when Physical Function is unplugged.
>       */
> -    return (pc->is_bridge && !dev->qdev.hotplugged) || !dc->hotpluggable;
> +    return (pc->is_bridge && !dev->qdev.hotplugged) || !dc->hotpluggable ||
> +           pci_is_vf(dev);
>  }
>  
>  static void acpi_pcihp_eject_slot(AcpiPciHpState *s, unsigned bsel, unsigned slots)
Re: [PATCH v7 12/12] hw/acpi: Make the PCI hot-plug aware of SR-IOV
Posted by Łukasz Gieryk 3 years, 10 months ago
On Thu, Mar 31, 2022 at 02:38:41PM +0200, Igor Mammedov wrote:
> it's unclear what's bing hotpluged and unplugged, it would be better if
> you included QEMU CLI and relevan qmp/monito commands to reproduce it.

Qemu CLI:
---------
-device pcie-root-port,slot=0,id=rp0
-device nvme-subsys,id=subsys0
-device nvme,id=nvme0,bus=rp0,serial=deadbeef,subsys=subsys0,sriov_max_vfs=1,sriov_vq_flexible=2,sriov_vi_flexible=1

Guest OS:
---------
sudo nvme virt-mgmt /dev/nvme0 -c 0 -r 1 -a 1 -n 0
sudo nvme virt-mgmt /dev/nvme0 -c 0 -r 0 -a 1 -n 0
echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset
sleep 1
echo 1 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs
nvme virt-mgmt /dev/nvme0 -c 1 -r 1 -a 8 -n 1
nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 8 -n 2
nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 9 -n 0
sleep 2
echo 01:00.1 > /sys/bus/pci/drivers/nvme/bind

Qemu monitor:
-------------
device_del nvme0
Re: [PATCH v7 12/12] hw/acpi: Make the PCI hot-plug aware of SR-IOV
Posted by Lukasz Maniak 3 years, 9 months ago
On Mon, Apr 04, 2022 at 11:41:46AM +0200, Łukasz Gieryk wrote:
> On Thu, Mar 31, 2022 at 02:38:41PM +0200, Igor Mammedov wrote:
> > it's unclear what's bing hotpluged and unplugged, it would be better if
> > you included QEMU CLI and relevan qmp/monito commands to reproduce it.
> 
> Qemu CLI:
> ---------
> -device pcie-root-port,slot=0,id=rp0
> -device nvme-subsys,id=subsys0
> -device nvme,id=nvme0,bus=rp0,serial=deadbeef,subsys=subsys0,sriov_max_vfs=1,sriov_vq_flexible=2,sriov_vi_flexible=1
> 
> Guest OS:
> ---------
> sudo nvme virt-mgmt /dev/nvme0 -c 0 -r 1 -a 1 -n 0
> sudo nvme virt-mgmt /dev/nvme0 -c 0 -r 0 -a 1 -n 0
> echo 1 > /sys/bus/pci/devices/0000:01:00.0/reset
> sleep 1
> echo 1 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs
> nvme virt-mgmt /dev/nvme0 -c 1 -r 1 -a 8 -n 1
> nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 8 -n 2
> nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 9 -n 0
> sleep 2
> echo 01:00.1 > /sys/bus/pci/drivers/nvme/bind
> 
> Qemu monitor:
> -------------
> device_del nvme0
>

Hi Igor,

Do you need any more details on this?

Best regards,
Lukasz