[PATCH RFC v2 00/12] virtio-net: add support for SR-IOV emulation

Akihiko Odaki posted 12 patches 5 months, 1 week ago
Failed in applying to current master (apply log)
There is a newer version of this series
docs/pcie_sriov.txt         |   8 +-
include/hw/pci/pci.h        |   2 +-
include/hw/pci/pci_device.h |  13 +-
include/hw/pci/pcie_sriov.h |  25 ++-
include/hw/qdev-core.h      |   4 -
hw/core/qdev.c              |   1 -
hw/net/igb.c                |   3 +-
hw/nvme/ctrl.c              |   3 +-
hw/pci/pci.c                |  98 +++++++-----
hw/pci/pci_host.c           |   4 +-
hw/pci/pcie.c               |   4 +-
hw/pci/pcie_sriov.c         | 360 +++++++++++++++++++++++++++++++++-----------
hw/vfio/pci.c               |   3 +-
hw/virtio/virtio-net-pci.c  |   1 +
hw/virtio/virtio-pci.c      |   7 +
system/qdev-monitor.c       |  12 +-
16 files changed, 395 insertions(+), 153 deletions(-)
[PATCH RFC v2 00/12] virtio-net: add support for SR-IOV emulation
Posted by Akihiko Odaki 5 months, 1 week ago
Introduction
------------

This series is based on the RFC series submitted by Yui Washizu[1].
See also [2] for the context.

This series enables SR-IOV emulation for virtio-net. It is useful
to test SR-IOV support on the guest, or to expose several vDPA devices
in a VM. vDPA devices can also provide L2 switching feature for
offloading though it is out of scope to allow the guest to configure
such a feature.

The PF side code resides in virtio-pci. The VF side code resides in
the PCI common infrastructure, but it is restricted to work only for
virtio-net-pci because of lack of validation.

User Interface
--------------

A user can configure a SR-IOV capable virtio-net device by adding
virtio-net-pci functions to a bus. Below is a command line example:
  -netdev user,id=n -netdev user,id=o
  -netdev user,id=p -netdev user,id=q
  -device pcie-root-port,id=b
  -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f
  -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f
  -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f
  -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f

The VFs specify the paired PF with "sriov-pf" property. The PF must be
added after all VFs. It is user's responsibility to ensure that VFs have
function numbers larger than one of the PF, and the function numbers
have a consistent stride.

Keeping VF instances
--------------------

A problem with SR-IOV emulation is that it needs to hotplug the VFs as
the guest requests. Previously, this behavior was implemented by
realizing and unrealizing VFs at runtime. However, this strategy does
not work well for the proposed virtio-net emulation; in this proposal,
device options passed in the command line must be maintained as VFs
are hotplugged, but they are consumed when the machine starts and not
available after that, which makes realizing VFs at runtime impossible.

As an strategy alternative to runtime realization/unrealization, this
series proposes to reuse the code to power down PCI Express devices.
When a PCI Express device is powered down, it will be hidden from the
guest but will be kept realized. This effectively implements the
behavior we need for the SR-IOV emulation.

Summary
-------

Patch [1, 5] refactors the PCI infrastructure code.
Patch [6, 10] adds user-created SR-IOV VF infrastructure.
Patch 11 makes virtio-pci work as SR-IOV PF for user-created VFs.
Patch 12 allows user to create SR-IOV VFs with virtio-net-pci.

[1] https://patchew.org/QEMU/1689731808-3009-1-git-send-email-yui.washidu@gmail.com/
[2] https://lore.kernel.org/all/5d46f455-f530-4e5e-9ae7-13a2297d4bc5@daynix.com/

Co-developed-by: Yui Washizu <yui.washidu@gmail.com>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
Changes in v2:
- Changed to keep VF instances.
- Link to v1: https://lore.kernel.org/r/20231202-sriov-v1-0-32b3570f7bd6@daynix.com

---
Akihiko Odaki (12):
      hw/pci: Initialize PCI multifunction after realization
      hw/pci: Determine if rombar is explicitly enabled
      hw/pci: Do not add ROM BAR for SR-IOV VF
      vfio: Avoid inspecting option QDict for rombar
      hw/qdev: Remove opts member
      pcie_sriov: Reuse SR-IOV VF device instances
      pcie_sriov: Release VFs failed to realize
      pcie_sriov: Ensure PF and VF are mutually exclusive
      pcie_sriov: Check PCI Express for SR-IOV PF
      pcie_sriov: Allow user to create SR-IOV device
      virtio-pci: Implement SR-IOV PF
      virtio-net: Implement SR-IOV VF

 docs/pcie_sriov.txt         |   8 +-
 include/hw/pci/pci.h        |   2 +-
 include/hw/pci/pci_device.h |  13 +-
 include/hw/pci/pcie_sriov.h |  25 ++-
 include/hw/qdev-core.h      |   4 -
 hw/core/qdev.c              |   1 -
 hw/net/igb.c                |   3 +-
 hw/nvme/ctrl.c              |   3 +-
 hw/pci/pci.c                |  98 +++++++-----
 hw/pci/pci_host.c           |   4 +-
 hw/pci/pcie.c               |   4 +-
 hw/pci/pcie_sriov.c         | 360 +++++++++++++++++++++++++++++++++-----------
 hw/vfio/pci.c               |   3 +-
 hw/virtio/virtio-net-pci.c  |   1 +
 hw/virtio/virtio-pci.c      |   7 +
 system/qdev-monitor.c       |  12 +-
 16 files changed, 395 insertions(+), 153 deletions(-)
---
base-commit: 4705fc0c8511d073bee4751c3c974aab2b10a970
change-id: 20231202-sriov-9402fb262be8

Best regards,
-- 
Akihiko Odaki <akihiko.odaki@daynix.com>
Re: [PATCH RFC v2 00/12] virtio-net: add support for SR-IOV emulation
Posted by Jason Wang 5 months, 1 week ago
On Sun, Dec 10, 2023 at 12:06 PM Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>
> Introduction
> ------------
>
> This series is based on the RFC series submitted by Yui Washizu[1].
> See also [2] for the context.
>
> This series enables SR-IOV emulation for virtio-net. It is useful
> to test SR-IOV support on the guest, or to expose several vDPA devices
> in a VM. vDPA devices can also provide L2 switching feature for
> offloading though it is out of scope to allow the guest to configure
> such a feature.
>
> The PF side code resides in virtio-pci. The VF side code resides in
> the PCI common infrastructure, but it is restricted to work only for
> virtio-net-pci because of lack of validation.
>
> User Interface
> --------------
>
> A user can configure a SR-IOV capable virtio-net device by adding
> virtio-net-pci functions to a bus. Below is a command line example:
>   -netdev user,id=n -netdev user,id=o
>   -netdev user,id=p -netdev user,id=q
>   -device pcie-root-port,id=b
>   -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f
>   -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f
>   -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f
>   -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f
>
> The VFs specify the paired PF with "sriov-pf" property. The PF must be
> added after all VFs. It is user's responsibility to ensure that VFs have
> function numbers larger than one of the PF, and the function numbers
> have a consistent stride.

This seems not user friendly. Any reason we can't just allow user to
specify the stride here?

Btw, I vaguely remember qemu allows the params to be accepted as a
list. If this is true, we can accept a list of netdev here?

>
> Keeping VF instances
> --------------------
>
> A problem with SR-IOV emulation is that it needs to hotplug the VFs as
> the guest requests. Previously, this behavior was implemented by
> realizing and unrealizing VFs at runtime. However, this strategy does
> not work well for the proposed virtio-net emulation; in this proposal,
> device options passed in the command line must be maintained as VFs
> are hotplugged, but they are consumed when the machine starts and not
> available after that, which makes realizing VFs at runtime impossible.

Could we store the device options in the PF?

Thanks

>
> As an strategy alternative to runtime realization/unrealization, this
> series proposes to reuse the code to power down PCI Express devices.
> When a PCI Express device is powered down, it will be hidden from the
> guest but will be kept realized. This effectively implements the
> behavior we need for the SR-IOV emulation.
>
> Summary
> -------
>
> Patch [1, 5] refactors the PCI infrastructure code.
> Patch [6, 10] adds user-created SR-IOV VF infrastructure.
> Patch 11 makes virtio-pci work as SR-IOV PF for user-created VFs.
> Patch 12 allows user to create SR-IOV VFs with virtio-net-pci.
>
> [1] https://patchew.org/QEMU/1689731808-3009-1-git-send-email-yui.washidu@gmail.com/
> [2] https://lore.kernel.org/all/5d46f455-f530-4e5e-9ae7-13a2297d4bc5@daynix.com/
>
> Co-developed-by: Yui Washizu <yui.washidu@gmail.com>
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> ---
> Changes in v2:
> - Changed to keep VF instances.
> - Link to v1: https://lore.kernel.org/r/20231202-sriov-v1-0-32b3570f7bd6@daynix.com
>
> ---
> Akihiko Odaki (12):
>       hw/pci: Initialize PCI multifunction after realization
>       hw/pci: Determine if rombar is explicitly enabled
>       hw/pci: Do not add ROM BAR for SR-IOV VF
>       vfio: Avoid inspecting option QDict for rombar
>       hw/qdev: Remove opts member
>       pcie_sriov: Reuse SR-IOV VF device instances
>       pcie_sriov: Release VFs failed to realize
>       pcie_sriov: Ensure PF and VF are mutually exclusive
>       pcie_sriov: Check PCI Express for SR-IOV PF
>       pcie_sriov: Allow user to create SR-IOV device
>       virtio-pci: Implement SR-IOV PF
>       virtio-net: Implement SR-IOV VF
>
>  docs/pcie_sriov.txt         |   8 +-
>  include/hw/pci/pci.h        |   2 +-
>  include/hw/pci/pci_device.h |  13 +-
>  include/hw/pci/pcie_sriov.h |  25 ++-
>  include/hw/qdev-core.h      |   4 -
>  hw/core/qdev.c              |   1 -
>  hw/net/igb.c                |   3 +-
>  hw/nvme/ctrl.c              |   3 +-
>  hw/pci/pci.c                |  98 +++++++-----
>  hw/pci/pci_host.c           |   4 +-
>  hw/pci/pcie.c               |   4 +-
>  hw/pci/pcie_sriov.c         | 360 +++++++++++++++++++++++++++++++++-----------
>  hw/vfio/pci.c               |   3 +-
>  hw/virtio/virtio-net-pci.c  |   1 +
>  hw/virtio/virtio-pci.c      |   7 +
>  system/qdev-monitor.c       |  12 +-
>  16 files changed, 395 insertions(+), 153 deletions(-)
> ---
> base-commit: 4705fc0c8511d073bee4751c3c974aab2b10a970
> change-id: 20231202-sriov-9402fb262be8
>
> Best regards,
> --
> Akihiko Odaki <akihiko.odaki@daynix.com>
>
Re: [PATCH RFC v2 00/12] virtio-net: add support for SR-IOV emulation
Posted by Akihiko Odaki 5 months, 1 week ago
On 2023/12/11 11:52, Jason Wang wrote:
> On Sun, Dec 10, 2023 at 12:06 PM Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>
>> Introduction
>> ------------
>>
>> This series is based on the RFC series submitted by Yui Washizu[1].
>> See also [2] for the context.
>>
>> This series enables SR-IOV emulation for virtio-net. It is useful
>> to test SR-IOV support on the guest, or to expose several vDPA devices
>> in a VM. vDPA devices can also provide L2 switching feature for
>> offloading though it is out of scope to allow the guest to configure
>> such a feature.
>>
>> The PF side code resides in virtio-pci. The VF side code resides in
>> the PCI common infrastructure, but it is restricted to work only for
>> virtio-net-pci because of lack of validation.
>>
>> User Interface
>> --------------
>>
>> A user can configure a SR-IOV capable virtio-net device by adding
>> virtio-net-pci functions to a bus. Below is a command line example:
>>    -netdev user,id=n -netdev user,id=o
>>    -netdev user,id=p -netdev user,id=q
>>    -device pcie-root-port,id=b
>>    -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f
>>    -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f
>>    -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f
>>    -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f
>>
>> The VFs specify the paired PF with "sriov-pf" property. The PF must be
>> added after all VFs. It is user's responsibility to ensure that VFs have
>> function numbers larger than one of the PF, and the function numbers
>> have a consistent stride.
> 
> This seems not user friendly. Any reason we can't just allow user to
> specify the stride here?

It should be possible to assign addr automatically without requiring 
user to specify the stride. I'll try that in the next version.

> 
> Btw, I vaguely remember qemu allows the params to be accepted as a
> list. If this is true, we can accept a list of netdev here?

Yes, rocker does that. But the problem is not just about getting 
parameters needed for VFs, which I forgot to mention in the cover letter 
and will explain below.

> 
>>
>> Keeping VF instances
>> --------------------
>>
>> A problem with SR-IOV emulation is that it needs to hotplug the VFs as
>> the guest requests. Previously, this behavior was implemented by
>> realizing and unrealizing VFs at runtime. However, this strategy does
>> not work well for the proposed virtio-net emulation; in this proposal,
>> device options passed in the command line must be maintained as VFs
>> are hotplugged, but they are consumed when the machine starts and not
>> available after that, which makes realizing VFs at runtime impossible.
> 
> Could we store the device options in the PF?

I wrote it's to store the device options, but the problem is actually 
more about realizing VFs at runtime instead of at the initialization time.

Realizing VFs at runtime have two major problems. One is that it delays 
the validations of options; invalid options will be noticed when the 
guest requests to realize VFs. netdevs also warn that they are not used 
at initialization time, not knowing that they will be used by VFs later. 
References to other QEMU objects in the option may also die before VFs 
are realized.

The other problem is that QEMU cannot interact with the unrealized VFs. 
For example, if you type "device_add virtio-net-pci,id=vf,sriov-pf=pf" 
in HMP, you will expect "device_del vf" works, but it's hard to 
implement such behaviors with unrealized VFs.

I was first going to compromise and allow such quirky behaviors, but I 
realized such a compromise is unnecessary if we reuse the PCI power down 
logic so I wrote v2.

Regards,
Akihiko Odaki

Re: [PATCH RFC v2 00/12] virtio-net: add support for SR-IOV emulation
Posted by Jason Wang 5 months, 1 week ago
On Mon, Dec 11, 2023 at 1:30 PM Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>
> On 2023/12/11 11:52, Jason Wang wrote:
> > On Sun, Dec 10, 2023 at 12:06 PM Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> >>
> >> Introduction
> >> ------------
> >>
> >> This series is based on the RFC series submitted by Yui Washizu[1].
> >> See also [2] for the context.
> >>
> >> This series enables SR-IOV emulation for virtio-net. It is useful
> >> to test SR-IOV support on the guest, or to expose several vDPA devices
> >> in a VM. vDPA devices can also provide L2 switching feature for
> >> offloading though it is out of scope to allow the guest to configure
> >> such a feature.
> >>
> >> The PF side code resides in virtio-pci. The VF side code resides in
> >> the PCI common infrastructure, but it is restricted to work only for
> >> virtio-net-pci because of lack of validation.
> >>
> >> User Interface
> >> --------------
> >>
> >> A user can configure a SR-IOV capable virtio-net device by adding
> >> virtio-net-pci functions to a bus. Below is a command line example:
> >>    -netdev user,id=n -netdev user,id=o
> >>    -netdev user,id=p -netdev user,id=q
> >>    -device pcie-root-port,id=b
> >>    -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f
> >>    -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f
> >>    -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f
> >>    -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f
> >>
> >> The VFs specify the paired PF with "sriov-pf" property. The PF must be
> >> added after all VFs. It is user's responsibility to ensure that VFs have
> >> function numbers larger than one of the PF, and the function numbers
> >> have a consistent stride.
> >
> > This seems not user friendly. Any reason we can't just allow user to
> > specify the stride here?
>
> It should be possible to assign addr automatically without requiring
> user to specify the stride. I'll try that in the next version.
>
> >
> > Btw, I vaguely remember qemu allows the params to be accepted as a
> > list. If this is true, we can accept a list of netdev here?
>
> Yes, rocker does that. But the problem is not just about getting
> parameters needed for VFs, which I forgot to mention in the cover letter
> and will explain below.
>
> >
> >>
> >> Keeping VF instances
> >> --------------------
> >>
> >> A problem with SR-IOV emulation is that it needs to hotplug the VFs as
> >> the guest requests. Previously, this behavior was implemented by
> >> realizing and unrealizing VFs at runtime. However, this strategy does
> >> not work well for the proposed virtio-net emulation; in this proposal,
> >> device options passed in the command line must be maintained as VFs
> >> are hotplugged, but they are consumed when the machine starts and not
> >> available after that, which makes realizing VFs at runtime impossible.
> >
> > Could we store the device options in the PF?
>
> I wrote it's to store the device options, but the problem is actually
> more about realizing VFs at runtime instead of at the initialization time.
>
> Realizing VFs at runtime have two major problems. One is that it delays
> the validations of options; invalid options will be noticed when the
> guest requests to realize VFs.

If PCI spec allows the failure when creating VF, then it should not be
a problem.

> netdevs also warn that they are not used
> at initialization time, not knowing that they will be used by VFs later.

We could invent things to calm down this false positive.

> References to other QEMU objects in the option may also die before VFs
> are realized.

Is there any other thing than netdev we need to consider?

>
> The other problem is that QEMU cannot interact with the unrealized VFs.
> For example, if you type "device_add virtio-net-pci,id=vf,sriov-pf=pf"
> in HMP, you will expect "device_del vf" works, but it's hard to
> implement such behaviors with unrealized VFs.

I think hotplug can only be done at PF level if we do that.

>
> I was first going to compromise and allow such quirky behaviors, but I
> realized such a compromise is unnecessary if we reuse the PCI power down
> logic so I wrote v2.

Haven't checked the code, but anything related to the PM here?

Thanks

>
> Regards,
> Akihiko Odaki
>
Re: [PATCH RFC v2 00/12] virtio-net: add support for SR-IOV emulation
Posted by Akihiko Odaki 5 months, 1 week ago
On 2023/12/11 16:26, Jason Wang wrote:
> On Mon, Dec 11, 2023 at 1:30 PM Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>
>> On 2023/12/11 11:52, Jason Wang wrote:
>>> On Sun, Dec 10, 2023 at 12:06 PM Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>>>
>>>> Introduction
>>>> ------------
>>>>
>>>> This series is based on the RFC series submitted by Yui Washizu[1].
>>>> See also [2] for the context.
>>>>
>>>> This series enables SR-IOV emulation for virtio-net. It is useful
>>>> to test SR-IOV support on the guest, or to expose several vDPA devices
>>>> in a VM. vDPA devices can also provide L2 switching feature for
>>>> offloading though it is out of scope to allow the guest to configure
>>>> such a feature.
>>>>
>>>> The PF side code resides in virtio-pci. The VF side code resides in
>>>> the PCI common infrastructure, but it is restricted to work only for
>>>> virtio-net-pci because of lack of validation.
>>>>
>>>> User Interface
>>>> --------------
>>>>
>>>> A user can configure a SR-IOV capable virtio-net device by adding
>>>> virtio-net-pci functions to a bus. Below is a command line example:
>>>>     -netdev user,id=n -netdev user,id=o
>>>>     -netdev user,id=p -netdev user,id=q
>>>>     -device pcie-root-port,id=b
>>>>     -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f
>>>>     -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f
>>>>     -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f
>>>>     -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f
>>>>
>>>> The VFs specify the paired PF with "sriov-pf" property. The PF must be
>>>> added after all VFs. It is user's responsibility to ensure that VFs have
>>>> function numbers larger than one of the PF, and the function numbers
>>>> have a consistent stride.
>>>
>>> This seems not user friendly. Any reason we can't just allow user to
>>> specify the stride here?
>>
>> It should be possible to assign addr automatically without requiring
>> user to specify the stride. I'll try that in the next version.
>>
>>>
>>> Btw, I vaguely remember qemu allows the params to be accepted as a
>>> list. If this is true, we can accept a list of netdev here?
>>
>> Yes, rocker does that. But the problem is not just about getting
>> parameters needed for VFs, which I forgot to mention in the cover letter
>> and will explain below.
>>
>>>
>>>>
>>>> Keeping VF instances
>>>> --------------------
>>>>
>>>> A problem with SR-IOV emulation is that it needs to hotplug the VFs as
>>>> the guest requests. Previously, this behavior was implemented by
>>>> realizing and unrealizing VFs at runtime. However, this strategy does
>>>> not work well for the proposed virtio-net emulation; in this proposal,
>>>> device options passed in the command line must be maintained as VFs
>>>> are hotplugged, but they are consumed when the machine starts and not
>>>> available after that, which makes realizing VFs at runtime impossible.
>>>
>>> Could we store the device options in the PF?
>>
>> I wrote it's to store the device options, but the problem is actually
>> more about realizing VFs at runtime instead of at the initialization time.
>>
>> Realizing VFs at runtime have two major problems. One is that it delays
>> the validations of options; invalid options will be noticed when the
>> guest requests to realize VFs.
> 
> If PCI spec allows the failure when creating VF, then it should not be
> a problem.

I doubt the spec cares such a failure at all. VF enablement should 
always work for a real hardware. It's neither user-friendly to tell 
configuration errors at runtime.

> 
>> netdevs also warn that they are not used
>> at initialization time, not knowing that they will be used by VFs later.
> 
> We could invent things to calm down this false positive.
> 
>> References to other QEMU objects in the option may also die before VFs
>> are realized.
> 
> Is there any other thing than netdev we need to consider?

You will also want to set a distinct mac for each VF. Other properties 
does not matter much in my opinion.

> 
>>
>> The other problem is that QEMU cannot interact with the unrealized VFs.
>> For example, if you type "device_add virtio-net-pci,id=vf,sriov-pf=pf"
>> in HMP, you will expect "device_del vf" works, but it's hard to
>> implement such behaviors with unrealized VFs.
> 
> I think hotplug can only be done at PF level if we do that.

Assuming you mean to let netdev and mac accept arrays, yes.

> 
>>
>> I was first going to compromise and allow such quirky behaviors, but I
>> realized such a compromise is unnecessary if we reuse the PCI power down
>> logic so I wrote v2.
> 
> Haven't checked the code, but anything related to the PM here?

You mean power management? We don't have to care about PCI power down 
for VFs because powering down a SR-IOV PCI device will reset it and thus 
disable its VFs.

Regards,
Akihiko Odaki

Re: [PATCH RFC v2 00/12] virtio-net: add support for SR-IOV emulation
Posted by Jason Wang 5 months, 1 week ago
On Mon, Dec 11, 2023 at 4:29 PM Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>
> On 2023/12/11 16:26, Jason Wang wrote:
> > On Mon, Dec 11, 2023 at 1:30 PM Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> >>
> >> On 2023/12/11 11:52, Jason Wang wrote:
> >>> On Sun, Dec 10, 2023 at 12:06 PM Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> >>>>
> >>>> Introduction
> >>>> ------------
> >>>>
> >>>> This series is based on the RFC series submitted by Yui Washizu[1].
> >>>> See also [2] for the context.
> >>>>
> >>>> This series enables SR-IOV emulation for virtio-net. It is useful
> >>>> to test SR-IOV support on the guest, or to expose several vDPA devices
> >>>> in a VM. vDPA devices can also provide L2 switching feature for
> >>>> offloading though it is out of scope to allow the guest to configure
> >>>> such a feature.
> >>>>
> >>>> The PF side code resides in virtio-pci. The VF side code resides in
> >>>> the PCI common infrastructure, but it is restricted to work only for
> >>>> virtio-net-pci because of lack of validation.
> >>>>
> >>>> User Interface
> >>>> --------------
> >>>>
> >>>> A user can configure a SR-IOV capable virtio-net device by adding
> >>>> virtio-net-pci functions to a bus. Below is a command line example:
> >>>>     -netdev user,id=n -netdev user,id=o
> >>>>     -netdev user,id=p -netdev user,id=q
> >>>>     -device pcie-root-port,id=b
> >>>>     -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f
> >>>>     -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f
> >>>>     -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f
> >>>>     -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f
> >>>>
> >>>> The VFs specify the paired PF with "sriov-pf" property. The PF must be
> >>>> added after all VFs. It is user's responsibility to ensure that VFs have
> >>>> function numbers larger than one of the PF, and the function numbers
> >>>> have a consistent stride.
> >>>
> >>> This seems not user friendly. Any reason we can't just allow user to
> >>> specify the stride here?
> >>
> >> It should be possible to assign addr automatically without requiring
> >> user to specify the stride. I'll try that in the next version.
> >>
> >>>
> >>> Btw, I vaguely remember qemu allows the params to be accepted as a
> >>> list. If this is true, we can accept a list of netdev here?
> >>
> >> Yes, rocker does that. But the problem is not just about getting
> >> parameters needed for VFs, which I forgot to mention in the cover letter
> >> and will explain below.
> >>
> >>>
> >>>>
> >>>> Keeping VF instances
> >>>> --------------------
> >>>>
> >>>> A problem with SR-IOV emulation is that it needs to hotplug the VFs as
> >>>> the guest requests. Previously, this behavior was implemented by
> >>>> realizing and unrealizing VFs at runtime. However, this strategy does
> >>>> not work well for the proposed virtio-net emulation; in this proposal,
> >>>> device options passed in the command line must be maintained as VFs
> >>>> are hotplugged, but they are consumed when the machine starts and not
> >>>> available after that, which makes realizing VFs at runtime impossible.
> >>>
> >>> Could we store the device options in the PF?
> >>
> >> I wrote it's to store the device options, but the problem is actually
> >> more about realizing VFs at runtime instead of at the initialization time.
> >>
> >> Realizing VFs at runtime have two major problems. One is that it delays
> >> the validations of options; invalid options will be noticed when the
> >> guest requests to realize VFs.
> >
> > If PCI spec allows the failure when creating VF, then it should not be
> > a problem.
>
> I doubt the spec cares such a failure at all. VF enablement should
> always work for a real hardware. It's neither user-friendly to tell
> configuration errors at runtime.

I'm not sure which options we should care about? Did you mean netdev
options or the virtio-net specific ones?

If VF stick to the same options as PF (except for the SRIOV), it
should be validated during the PF initialization.

>
> >
> >> netdevs also warn that they are not used
> >> at initialization time, not knowing that they will be used by VFs later.
> >
> > We could invent things to calm down this false positive.
> >
> >> References to other QEMU objects in the option may also die before VFs
> >> are realized.
> >
> > Is there any other thing than netdev we need to consider?
>
> You will also want to set a distinct mac for each VF. Other properties
> does not matter much in my opinion.

Qemu doesn't check mac duplication now. So it's up to the mgmt layer.

>
> >
> >>
> >> The other problem is that QEMU cannot interact with the unrealized VFs.
> >> For example, if you type "device_add virtio-net-pci,id=vf,sriov-pf=pf"
> >> in HMP, you will expect "device_del vf" works, but it's hard to
> >> implement such behaviors with unrealized VFs.
> >
> > I think hotplug can only be done at PF level if we do that.
>
> Assuming you mean to let netdev and mac accept arrays, yes.
>
> >
> >>
> >> I was first going to compromise and allow such quirky behaviors, but I
> >> realized such a compromise is unnecessary if we reuse the PCI power down
> >> logic so I wrote v2.
> >
> > Haven't checked the code, but anything related to the PM here?
>
> You mean power management? We don't have to care about PCI power down
> for VFs because powering down a SR-IOV PCI device will reset it and thus
> disable its VFs.

Ok.

Thanks

>
> Regards,
> Akihiko Odaki
>
Re: [PATCH RFC v2 00/12] virtio-net: add support for SR-IOV emulation
Posted by Akihiko Odaki 5 months, 1 week ago
On 2023/12/12 13:12, Jason Wang wrote:
> On Mon, Dec 11, 2023 at 4:29 PM Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>
>> On 2023/12/11 16:26, Jason Wang wrote:
>>> On Mon, Dec 11, 2023 at 1:30 PM Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>>>
>>>> On 2023/12/11 11:52, Jason Wang wrote:
>>>>> On Sun, Dec 10, 2023 at 12:06 PM Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>>>>>>
>>>>>> Introduction
>>>>>> ------------
>>>>>>
>>>>>> This series is based on the RFC series submitted by Yui Washizu[1].
>>>>>> See also [2] for the context.
>>>>>>
>>>>>> This series enables SR-IOV emulation for virtio-net. It is useful
>>>>>> to test SR-IOV support on the guest, or to expose several vDPA devices
>>>>>> in a VM. vDPA devices can also provide L2 switching feature for
>>>>>> offloading though it is out of scope to allow the guest to configure
>>>>>> such a feature.
>>>>>>
>>>>>> The PF side code resides in virtio-pci. The VF side code resides in
>>>>>> the PCI common infrastructure, but it is restricted to work only for
>>>>>> virtio-net-pci because of lack of validation.
>>>>>>
>>>>>> User Interface
>>>>>> --------------
>>>>>>
>>>>>> A user can configure a SR-IOV capable virtio-net device by adding
>>>>>> virtio-net-pci functions to a bus. Below is a command line example:
>>>>>>      -netdev user,id=n -netdev user,id=o
>>>>>>      -netdev user,id=p -netdev user,id=q
>>>>>>      -device pcie-root-port,id=b
>>>>>>      -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f
>>>>>>      -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f
>>>>>>      -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f
>>>>>>      -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f
>>>>>>
>>>>>> The VFs specify the paired PF with "sriov-pf" property. The PF must be
>>>>>> added after all VFs. It is user's responsibility to ensure that VFs have
>>>>>> function numbers larger than one of the PF, and the function numbers
>>>>>> have a consistent stride.
>>>>>
>>>>> This seems not user friendly. Any reason we can't just allow user to
>>>>> specify the stride here?
>>>>
>>>> It should be possible to assign addr automatically without requiring
>>>> user to specify the stride. I'll try that in the next version.
>>>>
>>>>>
>>>>> Btw, I vaguely remember qemu allows the params to be accepted as a
>>>>> list. If this is true, we can accept a list of netdev here?
>>>>
>>>> Yes, rocker does that. But the problem is not just about getting
>>>> parameters needed for VFs, which I forgot to mention in the cover letter
>>>> and will explain below.
>>>>
>>>>>
>>>>>>
>>>>>> Keeping VF instances
>>>>>> --------------------
>>>>>>
>>>>>> A problem with SR-IOV emulation is that it needs to hotplug the VFs as
>>>>>> the guest requests. Previously, this behavior was implemented by
>>>>>> realizing and unrealizing VFs at runtime. However, this strategy does
>>>>>> not work well for the proposed virtio-net emulation; in this proposal,
>>>>>> device options passed in the command line must be maintained as VFs
>>>>>> are hotplugged, but they are consumed when the machine starts and not
>>>>>> available after that, which makes realizing VFs at runtime impossible.
>>>>>
>>>>> Could we store the device options in the PF?
>>>>
>>>> I wrote it's to store the device options, but the problem is actually
>>>> more about realizing VFs at runtime instead of at the initialization time.
>>>>
>>>> Realizing VFs at runtime have two major problems. One is that it delays
>>>> the validations of options; invalid options will be noticed when the
>>>> guest requests to realize VFs.
>>>
>>> If PCI spec allows the failure when creating VF, then it should not be
>>> a problem.
>>
>> I doubt the spec cares such a failure at all. VF enablement should
>> always work for a real hardware. It's neither user-friendly to tell
>> configuration errors at runtime.
> 
> I'm not sure which options we should care about? Did you mean netdev
> options or the virtio-net specific ones?
> 
> If VF stick to the same options as PF (except for the SRIOV), it
> should be validated during the PF initialization.

I'm aware that it's necessary to validate netdev options and PCI 
function numbers (a.k.a. addr/devfn). I'm not sure if the other options 
may result in an invalid VF configuration.

That said, I think it's better to let the VF realization code validate 
the configuration at PF realization - it's less error-prone and 
potentially requires less code. It also benefits existing SR-IOV devices 
(igb and nvme) so I'm going to push that change forward whether it will 
be needed for virtio-net SR-IOV emulation.

Assuming the change to realize VFs early is going to happen for igb and 
nvme, most of the changes *only* needed by virtio-net SR-IOV emulation 
is done by:
patch 10 "pcie_sriov: Allow user to create SR-IOV device".

> 
>>
>>>
>>>> netdevs also warn that they are not used
>>>> at initialization time, not knowing that they will be used by VFs later.
>>>
>>> We could invent things to calm down this false positive.
>>>
>>>> References to other QEMU objects in the option may also die before VFs
>>>> are realized.
>>>
>>> Is there any other thing than netdev we need to consider?
>>
>> You will also want to set a distinct mac for each VF. Other properties
>> does not matter much in my opinion.
> 
> Qemu doesn't check mac duplication now. So it's up to the mgmt layer.

Right. mac is not important; it's just nice to have.

Regards,
Akihiko Odaki

Re: [PATCH RFC v2 00/12] virtio-net: add support for SR-IOV emulation
Posted by Yui Washizu 5 months ago
On 2023/12/10 13:05, Akihiko Odaki wrote:
> Introduction
> ------------
>
> This series is based on the RFC series submitted by Yui Washizu[1].
> See also [2] for the context.
>
> This series enables SR-IOV emulation for virtio-net. It is useful
> to test SR-IOV support on the guest, or to expose several vDPA devices
> in a VM. vDPA devices can also provide L2 switching feature for
> offloading though it is out of scope to allow the guest to configure
> such a feature.
>
> The PF side code resides in virtio-pci. The VF side code resides in
> the PCI common infrastructure, but it is restricted to work only for
> virtio-net-pci because of lack of validation.
>
> User Interface
> --------------
>
> A user can configure a SR-IOV capable virtio-net device by adding
> virtio-net-pci functions to a bus. Below is a command line example:
>    -netdev user,id=n -netdev user,id=o
>    -netdev user,id=p -netdev user,id=q
>    -device pcie-root-port,id=b
>    -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f
>    -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f
>    -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f
>    -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f
>
> The VFs specify the paired PF with "sriov-pf" property. The PF must be
> added after all VFs. It is user's responsibility to ensure that VFs have
> function numbers larger than one of the PF, and the function numbers
> have a consistent stride.


I attempted to create a VF on a VM using your patches and
tested the creation of one VF on the VM.
I initiated QEMU with the following commands:
-netdev tap,id=n,vhost=on -netdev tap,id=o,vhost=on
-device pcie-root-port,id=b
-device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f
-device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f

However, when creating just one VF,
QEMU crashed at hw/pci/pci.c:1443.
---
uint32_t vf_num = (d->devfn - (pf->devfn + vf_offset)) / vf_stride;
---
The error output was:
'CPU 1/KVM trap divide error ip:... sp:... error:0 in kvm-qemu.'

Interestingly, when creating two VFs, QEMU did not crash.

>
> Keeping VF instances
> --------------------
>
> A problem with SR-IOV emulation is that it needs to hotplug the VFs as
> the guest requests. Previously, this behavior was implemented by
> realizing and unrealizing VFs at runtime. However, this strategy does
> not work well for the proposed virtio-net emulation; in this proposal,
> device options passed in the command line must be maintained as VFs
> are hotplugged, but they are consumed when the machine starts and not
> available after that, which makes realizing VFs at runtime impossible.
>
> As an strategy alternative to runtime realization/unrealization, this
> series proposes to reuse the code to power down PCI Express devices.
> When a PCI Express device is powered down, it will be hidden from the
> guest but will be kept realized. This effectively implements the
> behavior we need for the SR-IOV emulation.
>
> Summary
> -------
>
> Patch [1, 5] refactors the PCI infrastructure code.
> Patch [6, 10] adds user-created SR-IOV VF infrastructure.
> Patch 11 makes virtio-pci work as SR-IOV PF for user-created VFs.
> Patch 12 allows user to create SR-IOV VFs with virtio-net-pci.
>
> [1] https://patchew.org/QEMU/1689731808-3009-1-git-send-email-yui.washidu@gmail.com/
> [2] https://lore.kernel.org/all/5d46f455-f530-4e5e-9ae7-13a2297d4bc5@daynix.com/
>
> Co-developed-by: Yui Washizu <yui.washidu@gmail.com>
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> ---
> Changes in v2:
> - Changed to keep VF instances.
> - Link to v1: https://lore.kernel.org/r/20231202-sriov-v1-0-32b3570f7bd6@daynix.com
>
> ---
> Akihiko Odaki (12):
>        hw/pci: Initialize PCI multifunction after realization
>        hw/pci: Determine if rombar is explicitly enabled
>        hw/pci: Do not add ROM BAR for SR-IOV VF
>        vfio: Avoid inspecting option QDict for rombar
>        hw/qdev: Remove opts member
>        pcie_sriov: Reuse SR-IOV VF device instances
>        pcie_sriov: Release VFs failed to realize
>        pcie_sriov: Ensure PF and VF are mutually exclusive
>        pcie_sriov: Check PCI Express for SR-IOV PF
>        pcie_sriov: Allow user to create SR-IOV device
>        virtio-pci: Implement SR-IOV PF
>        virtio-net: Implement SR-IOV VF
>
>   docs/pcie_sriov.txt         |   8 +-
>   include/hw/pci/pci.h        |   2 +-
>   include/hw/pci/pci_device.h |  13 +-
>   include/hw/pci/pcie_sriov.h |  25 ++-
>   include/hw/qdev-core.h      |   4 -
>   hw/core/qdev.c              |   1 -
>   hw/net/igb.c                |   3 +-
>   hw/nvme/ctrl.c              |   3 +-
>   hw/pci/pci.c                |  98 +++++++-----
>   hw/pci/pci_host.c           |   4 +-
>   hw/pci/pcie.c               |   4 +-
>   hw/pci/pcie_sriov.c         | 360 +++++++++++++++++++++++++++++++++-----------
>   hw/vfio/pci.c               |   3 +-
>   hw/virtio/virtio-net-pci.c  |   1 +
>   hw/virtio/virtio-pci.c      |   7 +
>   system/qdev-monitor.c       |  12 +-
>   16 files changed, 395 insertions(+), 153 deletions(-)
> ---
> base-commit: 4705fc0c8511d073bee4751c3c974aab2b10a970
> change-id: 20231202-sriov-9402fb262be8
>
> Best regards,