[PATCH 00/14] virtio-net: add support for SR-IOV emulation

Akihiko Odaki posted 14 patches 12 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20231202-sriov-v1-0-32b3570f7bd6@daynix.com
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Paolo Bonzini <pbonzini@redhat.com>, "Daniel P. Berrangé" <berrange@redhat.com>, Eduardo Habkost <eduardo@habkost.net>, Akihiko Odaki <akihiko.odaki@daynix.com>, Sriram Yagnaraman <sriram.yagnaraman@est.tech>, Jason Wang <jasowang@redhat.com>, Keith Busch <kbusch@kernel.org>, Klaus Jensen <its@irrelevant.dk>, Alex Williamson <alex.williamson@redhat.com>, "Cédric Le Goater" <clg@redhat.com>
There is a newer version of this series
docs/pcie_sriov.txt            |   2 +-
include/hw/pci/pci_device.h    |  21 +++++
include/hw/pci/pcie_sriov.h    |  13 ++-
include/hw/qdev-core.h         |  61 +++++-------
include/hw/virtio/virtio-net.h |   3 +-
include/hw/virtio/virtio-pci.h |   2 +
include/monitor/qdev.h         |   2 +
hw/core/qdev.c                 |  19 ----
hw/net/igb.c                   |   2 +-
hw/net/virtio-net.c            |  24 +----
hw/nvme/ctrl.c                 |   2 +-
hw/pci/msix.c                  |   8 +-
hw/pci/pci.c                   |  61 +++++++++++-
hw/pci/pcie_sriov.c            |  71 +++++++++++---
hw/vfio/pci.c                  |   3 +-
hw/virtio/virtio-net-pci.c     |  15 +++
hw/virtio/virtio-pci.c         | 208 +++++++++++++++++++++++++++++++++++++++--
system/qdev-monitor.c          |  49 +++++++---
18 files changed, 442 insertions(+), 124 deletions(-)
[PATCH 00/14] virtio-net: add support for SR-IOV emulation
Posted by Akihiko Odaki 12 months ago
Introduction
------------

This series is based on the RFC series submitted by Yui Washizu[1].
See also [2] for the context.

This series enables SR-IOV emulation for virtio-net. It is useful
to test SR-IOV support on the guest, or to expose several vDPA devices in a
VM. vDPA devices can also provide L2 switching feature for offloading
though it is out of scope to allow the guest to configure such a feature.

The new code of SR-IOV emulation for virtio-net actually resides in
virtio-pci since it's specific to PCI. Although it is written in a way
agnostic to the virtio device type, it is restricted for virtio-net because
of lack of validation.

User Interface
--------------

A user can configure a SR-IOV capable virtio-net device by adding
virtio-net-pci functions to a bus. Below is a command line example:
  -netdev user,id=n -netdev user,id=o -netdev user,id=p -netdev user,id=q
  -device virtio-net-pci,addr=0x0.0x3,netdev=q,sriov-pf=f
  -device virtio-net-pci,addr=0x0.0x2,netdev=p,sriov-pf=f
  -device virtio-net-pci,addr=0x0.0x1,netdev=o,sriov-pf=f
  -device virtio-net-pci,addr=0x0.0x0,netdev=n,id=f

The VFs specify the paired PF with "sriov-pf" property. The PF must be
added after all VFs. It is user's responsibility to ensure that VFs have
function numbers larger than one of the PF, and the function numbers have
a consistent stride.

Implementation Challenge
------------------------

The major problem with SR-IOV emulation is that it allows the guest to
realize and unrealize VFs at runtime, which means we cannot realize VFs at
initialization time and keep them. In this series, virtio-pci realizes VFs
at initialization time, but instead of keeping them, it extracts VF
configurations that are necessary to initialize the PF and device options
that will be used to realize VFs later, and unrealize them.

Retrieving Device Options
-------------------------

Usually device options are applied with property setters, and applied
options are bound to a particular device instance. It is problematic for
SR-IOV emulation because it recreates device instances at runtime. The
earlier RFC series[1] had no configurability because of this.
Looking at the code, I found there are currently two methods to retrieve
device options at initialization time, but both of them had downsides.

Existing Approach: DeviceState::opts
------------------------------------

One of them is to reading DeviceState::opts, which holds options except
"id", "bus", and "driver". However, this member of DeviceState is only used
by vfio to know the "rombar" option of pci-device is set and vfio shouldn't
do that in my opinion. DeviceState::opts is untyped, and it is
responsibility of pci-device to type the "rombar" property, but vfio reads
the untyped value in an intrusive way. There will be no usage of
DeviceState::opts If I eliminate this hacky usage, and keeping it only for
SR-IOV emulation of virtio-net is too much. As such, I determined
DeviceState::opts should be gone.

Existing Approach: DeviceListener::hide_device()
------------------------------------------------

The other method is to use DeviceListener::hide_device() callback. The
callback receives device options and decide *not* to realize the device
when a device is being added. virtio-net uses it to _hide_ the primary
device.

A downside of this approach is that it needs explicit registration.
virtio-net failover implementation only registers a DeviceListener after
a virtio-net device is added so it simply *ignores* the primary device
if it is added before the virtio-net device. It is better generate some
error message in such a situation at least.

Another problem of DeviceListener::hide_device() is that it is called for
all devices. For virtio-net failover, the primary device should be a
pci-device. For the SR-IOV emulation, the VF should be a virtio-pci.

Proposal: DeviceClass::hide()
-----------------------------

In this series, I propose DeviceClass:hide() as an alternative to
DeviceListener::hide_device(). A device that can be hidden implements this
function to decide whether it should be hidden. It requires no
registration, and encapsled in specific devices.

Summary
-------

Patch 1 will change the definition of "rombar" property of pci-device to
eliminate DeviceState::opts access in vfio. It will be used later to
generate an error if rombar is requested for SR-IOV VF.
Patch 2 removes DeviceState::opts.
Patch 3 adds DeviceClass::hide().
Patch 4 and 5 use DeviceClass::hide() to implement virtio-net failover.
Patch 6 removes DeviceListener::hide_device().
Patch [7, 11] makes trivial changes for SR-IOV emulation.
Patch 12 changes the common SR-IOV emulation code to accept device options.
Patch 13 adds the SR-IOV emulation code to virtio-pci.
Patch 14 enables the SR-IOV emulation code for virtio-net.

[1] https://patchew.org/QEMU/1689731808-3009-1-git-send-email-yui.washidu@gmail.com/
[2] https://lore.kernel.org/all/5d46f455-f530-4e5e-9ae7-13a2297d4bc5@daynix.com/

Co-developed-by: Yui Washizu <yui.washidu@gmail.com>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
Akihiko Odaki (14):
      vfio: Avoid inspecting option QDict for rombar
      hw/qdev: Remove opts member
      qdev: Add DeviceClass::hide()
      hw/pci: Add pci-failover
      virtio-net: Implement pci-failover
      qdev: Remove DeviceListener::hide_device()
      hw/pci: Add hide()
      qdev: Add qdev_device_new_from_qdict()
      hw/pci: Do not add ROM BAR for SR-IOV VF
      msix: Call pcie_sriov_vf_register_bar() for SR-IOV VF
      pcie_sriov: Release VFs failed to realize
      pcie_sriov: Allow to specify VF device options
      virtio-pci: add SR-IOV capability
      virtio-net: Add SR-IOV capability

 docs/pcie_sriov.txt            |   2 +-
 include/hw/pci/pci_device.h    |  21 +++++
 include/hw/pci/pcie_sriov.h    |  13 ++-
 include/hw/qdev-core.h         |  61 +++++-------
 include/hw/virtio/virtio-net.h |   3 +-
 include/hw/virtio/virtio-pci.h |   2 +
 include/monitor/qdev.h         |   2 +
 hw/core/qdev.c                 |  19 ----
 hw/net/igb.c                   |   2 +-
 hw/net/virtio-net.c            |  24 +----
 hw/nvme/ctrl.c                 |   2 +-
 hw/pci/msix.c                  |   8 +-
 hw/pci/pci.c                   |  61 +++++++++++-
 hw/pci/pcie_sriov.c            |  71 +++++++++++---
 hw/vfio/pci.c                  |   3 +-
 hw/virtio/virtio-net-pci.c     |  15 +++
 hw/virtio/virtio-pci.c         | 208 +++++++++++++++++++++++++++++++++++++++--
 system/qdev-monitor.c          |  49 +++++++---
 18 files changed, 442 insertions(+), 124 deletions(-)
---
base-commit: 4705fc0c8511d073bee4751c3c974aab2b10a970
change-id: 20231202-sriov-9402fb262be8

Best regards,
-- 
Akihiko Odaki <akihiko.odaki@daynix.com>
Re: [PATCH 00/14] virtio-net: add support for SR-IOV emulation
Posted by Akihiko Odaki 12 months ago
On 2023/12/02 17:00, Akihiko Odaki wrote:
> Introduction
> ------------
> 
> This series is based on the RFC series submitted by Yui Washizu[1].
> See also [2] for the context.
> 
> This series enables SR-IOV emulation for virtio-net. It is useful
> to test SR-IOV support on the guest, or to expose several vDPA devices in a
> VM. vDPA devices can also provide L2 switching feature for offloading
> though it is out of scope to allow the guest to configure such a feature.
> 
> The new code of SR-IOV emulation for virtio-net actually resides in
> virtio-pci since it's specific to PCI. Although it is written in a way
> agnostic to the virtio device type, it is restricted for virtio-net because
> of lack of validation.

I forgot to prefix this as RFC. It is the first version of the series 
and I'm open for design changes.
Re: [PATCH 00/14] virtio-net: add support for SR-IOV emulation
Posted by Yui Washizu 11 months, 4 weeks ago
On 2023/12/02 17:08, Akihiko Odaki wrote:
> On 2023/12/02 17:00, Akihiko Odaki wrote:
>> Introduction
>> ------------
>>
>> This series is based on the RFC series submitted by Yui Washizu[1].
>> See also [2] for the context.
>>
>> This series enables SR-IOV emulation for virtio-net. It is useful
>> to test SR-IOV support on the guest, or to expose several vDPA 
>> devices in a
>> VM. vDPA devices can also provide L2 switching feature for offloading
>> though it is out of scope to allow the guest to configure such a 
>> feature.
>>
>> The new code of SR-IOV emulation for virtio-net actually resides in
>> virtio-pci since it's specific to PCI. Although it is written in a way
>> agnostic to the virtio device type, it is restricted for virtio-net 
>> because
>> of lack of validation.
>
> I forgot to prefix this as RFC. It is the first version of the series 
> and I'm open for design changes.


Thank you. I'll proceed with building and reviewing the patch content.