[v2] Extend vhost-user to support VFIO based accelerators

[Qemu-devel] [PATCH v2 0/6] Extend vhost-user to support VFIO based accelerators

Posted by Tiwei Bie 6 years, 1 month ago

This patch set does some small extensions to vhost-user protocol
to support VFIO based accelerators, and makes it possible to get
the similar performance of VFIO based PCI passthru while keeping
the virtio device emulation in QEMU.

How does accelerator accelerate vhost (data path)
=================================================

Any virtio ring compatible devices potentially can be used as the
vhost data path accelerators. We can setup the accelerator based
on the informations (e.g. memory table, features, ring info, etc)
available on the vhost backend. And accelerator will be able to use
the virtio ring provided by the virtio driver in the VM directly.
So the virtio driver in the VM can exchange e.g. network packets
with the accelerator directly via the virtio ring. That is to say,
we will be able to use the accelerator to accelerate the vhost
data path. We call it vDPA: vhost Data Path Acceleration.

Notice: Although the accelerator can talk with the virtio driver
in the VM via the virtio ring directly. The control path events
(e.g. device start/stop) in the VM will still be trapped and handled
by QEMU, and QEMU will deliver such events to the vhost backend
via standard vhost protocol.

Below link is an example showing how to setup a such environment
via nested VM. In this case, the virtio device in the outer VM is
the accelerator. It will be used to accelerate the virtio device
in the inner VM. In reality, we could use virtio ring compatible
hardware device as the accelerators.

http://dpdk.org/ml/archives/dev/2017-December/085044.html

In above example, it doesn't require any changes to QEMU, but
it has lower performance compared with the traditional VFIO
based PCI passthru. And that's the problem this patch set wants
to solve.

The performance issue of vDPA/vhost-user and solutions
======================================================

For vhost-user backend, the critical issue in vDPA is that the
data path performance is relatively low and some host threads are
needed for the data path, because some necessary mechanisms are
missing to support:

1) guest driver notifies the device directly;
2) device interrupts the guest directly;

So this patch set does some small extensions to the vhost-user
protocol to make both of them possible. It leverages the same
mechanisms (e.g. EPT and Posted-Interrupt on Intel platform) as
the PCI passthru.

A new protocol feature bit is added to negotiate the accelerator
feature support. Two new slave message types are added to control
the notify region and queue interrupt passthru for each queue.
From the view of vhost-user protocol design, it's very flexible.
The passthru can be enabled/disabled for each queue individually,
and it's possible to accelerate each queue by different devices.
More design and implementation details can be found from the last
patch.

Difference between vDPA and PCI passthru
========================================

The key difference between PCI passthru and vDPA is that, in vDPA
only the data path of the device (e.g. DMA ring, notify region and
queue interrupt) is pass-throughed to the VM, the device control
path (e.g. PCI configuration space and MMIO regions) is still
defined and emulated by QEMU.

The benefits of keeping virtio device emulation in QEMU compared
with virtio device PCI passthru include (but not limit to):

- consistent device interface for guest OS in the VM;
- max flexibility on the hardware (i.e. the accelerators) design;
- leveraging the existing virtio live-migration framework;

Why extend vhost-user for vDPA
==============================

We have already implemented various virtual switches (e.g. OVS-DPDK)
based on vhost-user for VMs in the Cloud. They are purely software
running on CPU cores. When we have accelerators for such NFVi applications,
it's ideal if the applications could keep using the original interface
(i.e. vhost-user netdev) with QEMU, and infrastructure is able to decide
when and how to switch between CPU and accelerators within the interface.
And the switching (i.e. switch between CPU and accelerators) can be done
flexibly and quickly inside the applications.

More details about this can be found from the Cunming's discussions on
the RFC patch set.

Update notes
============

IOMMU feature bit check is removed in this version, because:

The IOMMU feature is negotiable, when an accelerator is used and
it doesn't support virtual IOMMU, its driver just won't provide
this feature bit when vhost library querying its features. And if
it supports the virtual IOMMU, its driver can provide this feature
bit. It's not reasonable to add this limitation in this patch set.

The previous links:
RFC: http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.html
v1:  http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06028.html

v1 -> v2:
- Add some explanations about why extend vhost-user in commit log (Paolo);
- Bug fix in slave_read() according to Stefan's fix in DPDK;
- Remove IOMMU feature check and related commit log;
- Some minor refinements;
- Rebase to the latest QEMU;

RFC -> v1:
- Add some details about how vDPA works in cover letter (Alexey)
- Add some details about the OVS offload use-case in cover letter (Jason)
- Move PCI specific stuffs out of vhost-user (Jason)
- Handle the virtual IOMMU case (Jason)
- Move VFIO group management code into vfio/common.c (Alex)
- Various refinements;
(approximately sorted by comment posting time)

Tiwei Bie (6):
  vhost-user: support receiving file descriptors in slave_read
  vhost-user: introduce shared vhost-user state
  virtio: support adding sub-regions for notify region
  vfio: support getting VFIOGroup from groupfd
  vfio: remove DPRINTF() definition from vfio-common.h
  vhost-user: add VFIO based accelerators support

 Makefile.target                 |   4 +
 docs/interop/vhost-user.txt     |  57 +++++++++
 hw/scsi/vhost-user-scsi.c       |   6 +-
 hw/vfio/common.c                |  97 +++++++++++++++-
 hw/virtio/vhost-user.c          | 248 +++++++++++++++++++++++++++++++++++++++-
 hw/virtio/virtio-pci.c          |  48 ++++++++
 hw/virtio/virtio-pci.h          |   5 +
 hw/virtio/virtio.c              |  39 +++++++
 include/hw/vfio/vfio-common.h   |  11 +-
 include/hw/virtio/vhost-user.h  |  34 ++++++
 include/hw/virtio/virtio-scsi.h |   6 +-
 include/hw/virtio/virtio.h      |   5 +
 include/qemu/osdep.h            |   1 +
 net/vhost-user.c                |  30 ++---
 scripts/create_config           |   3 +
 15 files changed, 561 insertions(+), 33 deletions(-)
 create mode 100644 include/hw/virtio/vhost-user.h

-- 
2.11.0

Re: [Qemu-devel] [PATCH v2 0/6] Extend vhost-user to support VFIO based accelerators

Posted by Michael S. Tsirkin 6 years, 1 month ago

On Mon, Mar 19, 2018 at 03:15:31PM +0800, Tiwei Bie wrote:
> This patch set does some small extensions to vhost-user protocol
> to support VFIO based accelerators, and makes it possible to get
> the similar performance of VFIO based PCI passthru while keeping
> the virtio device emulation in QEMU.

I love your patches!
Yet there are some things to improve.
Posting comments separately as individual messages.


> How does accelerator accelerate vhost (data path)
> =================================================
> 
> Any virtio ring compatible devices potentially can be used as the
> vhost data path accelerators. We can setup the accelerator based
> on the informations (e.g. memory table, features, ring info, etc)
> available on the vhost backend. And accelerator will be able to use
> the virtio ring provided by the virtio driver in the VM directly.
> So the virtio driver in the VM can exchange e.g. network packets
> with the accelerator directly via the virtio ring. That is to say,
> we will be able to use the accelerator to accelerate the vhost
> data path. We call it vDPA: vhost Data Path Acceleration.
> 
> Notice: Although the accelerator can talk with the virtio driver
> in the VM via the virtio ring directly. The control path events
> (e.g. device start/stop) in the VM will still be trapped and handled
> by QEMU, and QEMU will deliver such events to the vhost backend
> via standard vhost protocol.
> 
> Below link is an example showing how to setup a such environment
> via nested VM. In this case, the virtio device in the outer VM is
> the accelerator. It will be used to accelerate the virtio device
> in the inner VM. In reality, we could use virtio ring compatible
> hardware device as the accelerators.
> 
> http://dpdk.org/ml/archives/dev/2017-December/085044.html
> 
> In above example, it doesn't require any changes to QEMU, but
> it has lower performance compared with the traditional VFIO
> based PCI passthru. And that's the problem this patch set wants
> to solve.
> 
> The performance issue of vDPA/vhost-user and solutions
> ======================================================
> 
> For vhost-user backend, the critical issue in vDPA is that the
> data path performance is relatively low and some host threads are
> needed for the data path, because some necessary mechanisms are
> missing to support:
> 
> 1) guest driver notifies the device directly;
> 2) device interrupts the guest directly;
> 
> So this patch set does some small extensions to the vhost-user
> protocol to make both of them possible. It leverages the same
> mechanisms (e.g. EPT and Posted-Interrupt on Intel platform) as
> the PCI passthru.
> 
> A new protocol feature bit is added to negotiate the accelerator
> feature support. Two new slave message types are added to control
> the notify region and queue interrupt passthru for each queue.
> >From the view of vhost-user protocol design, it's very flexible.
> The passthru can be enabled/disabled for each queue individually,
> and it's possible to accelerate each queue by different devices.
> More design and implementation details can be found from the last
> patch.
> 
> Difference between vDPA and PCI passthru
> ========================================
> 
> The key difference between PCI passthru and vDPA is that, in vDPA
> only the data path of the device (e.g. DMA ring, notify region and
> queue interrupt) is pass-throughed to the VM, the device control
> path (e.g. PCI configuration space and MMIO regions) is still
> defined and emulated by QEMU.
> 
> The benefits of keeping virtio device emulation in QEMU compared
> with virtio device PCI passthru include (but not limit to):
> 
> - consistent device interface for guest OS in the VM;
> - max flexibility on the hardware (i.e. the accelerators) design;
> - leveraging the existing virtio live-migration framework;
> 
> Why extend vhost-user for vDPA
> ==============================
> 
> We have already implemented various virtual switches (e.g. OVS-DPDK)
> based on vhost-user for VMs in the Cloud. They are purely software
> running on CPU cores. When we have accelerators for such NFVi applications,
> it's ideal if the applications could keep using the original interface
> (i.e. vhost-user netdev) with QEMU, and infrastructure is able to decide
> when and how to switch between CPU and accelerators within the interface.
> And the switching (i.e. switch between CPU and accelerators) can be done
> flexibly and quickly inside the applications.
> 
> More details about this can be found from the Cunming's discussions on
> the RFC patch set.
> 
> Update notes
> ============
> 
> IOMMU feature bit check is removed in this version, because:
> 
> The IOMMU feature is negotiable, when an accelerator is used and
> it doesn't support virtual IOMMU, its driver just won't provide
> this feature bit when vhost library querying its features. And if
> it supports the virtual IOMMU, its driver can provide this feature
> bit. It's not reasonable to add this limitation in this patch set.
> 
> The previous links:
> RFC: http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.html
> v1:  http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06028.html
> 
> v1 -> v2:
> - Add some explanations about why extend vhost-user in commit log (Paolo);
> - Bug fix in slave_read() according to Stefan's fix in DPDK;
> - Remove IOMMU feature check and related commit log;
> - Some minor refinements;
> - Rebase to the latest QEMU;
> 
> RFC -> v1:
> - Add some details about how vDPA works in cover letter (Alexey)
> - Add some details about the OVS offload use-case in cover letter (Jason)
> - Move PCI specific stuffs out of vhost-user (Jason)
> - Handle the virtual IOMMU case (Jason)
> - Move VFIO group management code into vfio/common.c (Alex)
> - Various refinements;
> (approximately sorted by comment posting time)
> 
> Tiwei Bie (6):
>   vhost-user: support receiving file descriptors in slave_read
>   vhost-user: introduce shared vhost-user state
>   virtio: support adding sub-regions for notify region
>   vfio: support getting VFIOGroup from groupfd
>   vfio: remove DPRINTF() definition from vfio-common.h
>   vhost-user: add VFIO based accelerators support
> 
>  Makefile.target                 |   4 +
>  docs/interop/vhost-user.txt     |  57 +++++++++
>  hw/scsi/vhost-user-scsi.c       |   6 +-
>  hw/vfio/common.c                |  97 +++++++++++++++-
>  hw/virtio/vhost-user.c          | 248 +++++++++++++++++++++++++++++++++++++++-
>  hw/virtio/virtio-pci.c          |  48 ++++++++
>  hw/virtio/virtio-pci.h          |   5 +
>  hw/virtio/virtio.c              |  39 +++++++
>  include/hw/vfio/vfio-common.h   |  11 +-
>  include/hw/virtio/vhost-user.h  |  34 ++++++
>  include/hw/virtio/virtio-scsi.h |   6 +-
>  include/hw/virtio/virtio.h      |   5 +
>  include/qemu/osdep.h            |   1 +
>  net/vhost-user.c                |  30 ++---
>  scripts/create_config           |   3 +
>  15 files changed, 561 insertions(+), 33 deletions(-)
>  create mode 100644 include/hw/virtio/vhost-user.h
> 
> -- 
> 2.11.0

Re: [Qemu-devel] [PATCH v2 0/6] Extend vhost-user to support VFIO based accelerators

Posted by Tiwei Bie 6 years, 1 month ago

On Thu, Mar 22, 2018 at 04:55:39PM +0200, Michael S. Tsirkin wrote:
> On Mon, Mar 19, 2018 at 03:15:31PM +0800, Tiwei Bie wrote:
> > This patch set does some small extensions to vhost-user protocol
> > to support VFIO based accelerators, and makes it possible to get
> > the similar performance of VFIO based PCI passthru while keeping
> > the virtio device emulation in QEMU.
> 
> I love your patches!
> Yet there are some things to improve.
> Posting comments separately as individual messages.
> 

Thank you so much! :-)

It may take me some time to address all your comments.
They're really helpful! I'll try to address and reply
to these comments in the next few days. Thanks again!
I do appreciate it!

Best regards,
Tiwei Bie

> 
> > How does accelerator accelerate vhost (data path)
> > =================================================
> > 
> > Any virtio ring compatible devices potentially can be used as the
> > vhost data path accelerators. We can setup the accelerator based
> > on the informations (e.g. memory table, features, ring info, etc)
> > available on the vhost backend. And accelerator will be able to use
> > the virtio ring provided by the virtio driver in the VM directly.
> > So the virtio driver in the VM can exchange e.g. network packets
> > with the accelerator directly via the virtio ring. That is to say,
> > we will be able to use the accelerator to accelerate the vhost
> > data path. We call it vDPA: vhost Data Path Acceleration.
> > 
> > Notice: Although the accelerator can talk with the virtio driver
> > in the VM via the virtio ring directly. The control path events
> > (e.g. device start/stop) in the VM will still be trapped and handled
> > by QEMU, and QEMU will deliver such events to the vhost backend
> > via standard vhost protocol.
> > 
> > Below link is an example showing how to setup a such environment
> > via nested VM. In this case, the virtio device in the outer VM is
> > the accelerator. It will be used to accelerate the virtio device
> > in the inner VM. In reality, we could use virtio ring compatible
> > hardware device as the accelerators.
> > 
> > http://dpdk.org/ml/archives/dev/2017-December/085044.html
> > 
> > In above example, it doesn't require any changes to QEMU, but
> > it has lower performance compared with the traditional VFIO
> > based PCI passthru. And that's the problem this patch set wants
> > to solve.
> > 
> > The performance issue of vDPA/vhost-user and solutions
> > ======================================================
> > 
> > For vhost-user backend, the critical issue in vDPA is that the
> > data path performance is relatively low and some host threads are
> > needed for the data path, because some necessary mechanisms are
> > missing to support:
> > 
> > 1) guest driver notifies the device directly;
> > 2) device interrupts the guest directly;
> > 
> > So this patch set does some small extensions to the vhost-user
> > protocol to make both of them possible. It leverages the same
> > mechanisms (e.g. EPT and Posted-Interrupt on Intel platform) as
> > the PCI passthru.
> > 
> > A new protocol feature bit is added to negotiate the accelerator
> > feature support. Two new slave message types are added to control
> > the notify region and queue interrupt passthru for each queue.
> > >From the view of vhost-user protocol design, it's very flexible.
> > The passthru can be enabled/disabled for each queue individually,
> > and it's possible to accelerate each queue by different devices.
> > More design and implementation details can be found from the last
> > patch.
> > 
> > Difference between vDPA and PCI passthru
> > ========================================
> > 
> > The key difference between PCI passthru and vDPA is that, in vDPA
> > only the data path of the device (e.g. DMA ring, notify region and
> > queue interrupt) is pass-throughed to the VM, the device control
> > path (e.g. PCI configuration space and MMIO regions) is still
> > defined and emulated by QEMU.
> > 
> > The benefits of keeping virtio device emulation in QEMU compared
> > with virtio device PCI passthru include (but not limit to):
> > 
> > - consistent device interface for guest OS in the VM;
> > - max flexibility on the hardware (i.e. the accelerators) design;
> > - leveraging the existing virtio live-migration framework;
> > 
> > Why extend vhost-user for vDPA
> > ==============================
> > 
> > We have already implemented various virtual switches (e.g. OVS-DPDK)
> > based on vhost-user for VMs in the Cloud. They are purely software
> > running on CPU cores. When we have accelerators for such NFVi applications,
> > it's ideal if the applications could keep using the original interface
> > (i.e. vhost-user netdev) with QEMU, and infrastructure is able to decide
> > when and how to switch between CPU and accelerators within the interface.
> > And the switching (i.e. switch between CPU and accelerators) can be done
> > flexibly and quickly inside the applications.
> > 
> > More details about this can be found from the Cunming's discussions on
> > the RFC patch set.
> > 
> > Update notes
> > ============
> > 
> > IOMMU feature bit check is removed in this version, because:
> > 
> > The IOMMU feature is negotiable, when an accelerator is used and
> > it doesn't support virtual IOMMU, its driver just won't provide
> > this feature bit when vhost library querying its features. And if
> > it supports the virtual IOMMU, its driver can provide this feature
> > bit. It's not reasonable to add this limitation in this patch set.
> > 
> > The previous links:
> > RFC: http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.html
> > v1:  http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06028.html
> > 
> > v1 -> v2:
> > - Add some explanations about why extend vhost-user in commit log (Paolo);
> > - Bug fix in slave_read() according to Stefan's fix in DPDK;
> > - Remove IOMMU feature check and related commit log;
> > - Some minor refinements;
> > - Rebase to the latest QEMU;
> > 
> > RFC -> v1:
> > - Add some details about how vDPA works in cover letter (Alexey)
> > - Add some details about the OVS offload use-case in cover letter (Jason)
> > - Move PCI specific stuffs out of vhost-user (Jason)
> > - Handle the virtual IOMMU case (Jason)
> > - Move VFIO group management code into vfio/common.c (Alex)
> > - Various refinements;
> > (approximately sorted by comment posting time)
> > 
> > Tiwei Bie (6):
> >   vhost-user: support receiving file descriptors in slave_read
> >   vhost-user: introduce shared vhost-user state
> >   virtio: support adding sub-regions for notify region
> >   vfio: support getting VFIOGroup from groupfd
> >   vfio: remove DPRINTF() definition from vfio-common.h
> >   vhost-user: add VFIO based accelerators support
> > 
> >  Makefile.target                 |   4 +
> >  docs/interop/vhost-user.txt     |  57 +++++++++
> >  hw/scsi/vhost-user-scsi.c       |   6 +-
> >  hw/vfio/common.c                |  97 +++++++++++++++-
> >  hw/virtio/vhost-user.c          | 248 +++++++++++++++++++++++++++++++++++++++-
> >  hw/virtio/virtio-pci.c          |  48 ++++++++
> >  hw/virtio/virtio-pci.h          |   5 +
> >  hw/virtio/virtio.c              |  39 +++++++
> >  include/hw/vfio/vfio-common.h   |  11 +-
> >  include/hw/virtio/vhost-user.h  |  34 ++++++
> >  include/hw/virtio/virtio-scsi.h |   6 +-
> >  include/hw/virtio/virtio.h      |   5 +
> >  include/qemu/osdep.h            |   1 +
> >  net/vhost-user.c                |  30 ++---
> >  scripts/create_config           |   3 +
> >  15 files changed, 561 insertions(+), 33 deletions(-)
> >  create mode 100644 include/hw/virtio/vhost-user.h
> > 
> > -- 
> > 2.11.0

Re: [Qemu-devel] [PATCH v2 0/6] Extend vhost-user to support VFIO based accelerators

Posted by Michael S. Tsirkin 6 years, 1 month ago

On Mon, Mar 19, 2018 at 03:15:31PM +0800, Tiwei Bie wrote:
> This patch set does some small extensions to vhost-user protocol
> to support VFIO based accelerators, and makes it possible to get
> the similar performance of VFIO based PCI passthru while keeping
> the virtio device emulation in QEMU.
> 
> How does accelerator accelerate vhost (data path)
> =================================================
> 
> Any virtio ring compatible devices potentially can be used as the
> vhost data path accelerators. We can setup the accelerator based
> on the informations (e.g. memory table, features, ring info, etc)
> available on the vhost backend. And accelerator will be able to use
> the virtio ring provided by the virtio driver in the VM directly.
> So the virtio driver in the VM can exchange e.g. network packets
> with the accelerator directly via the virtio ring. That is to say,
> we will be able to use the accelerator to accelerate the vhost
> data path. We call it vDPA: vhost Data Path Acceleration.
> 
> Notice: Although the accelerator can talk with the virtio driver
> in the VM via the virtio ring directly. The control path events
> (e.g. device start/stop) in the VM will still be trapped and handled
> by QEMU, and QEMU will deliver such events to the vhost backend
> via standard vhost protocol.
> 
> Below link is an example showing how to setup a such environment
> via nested VM. In this case, the virtio device in the outer VM is
> the accelerator. It will be used to accelerate the virtio device
> in the inner VM. In reality, we could use virtio ring compatible
> hardware device as the accelerators.
> 
> http://dpdk.org/ml/archives/dev/2017-December/085044.html

I understand that it might be challenging due to
the tight coupling with VFIO. Still - isn't there
a way do make it easier to set a testing rig up?

In particular can we avoid the dpdk requirement for testing?



> In above example, it doesn't require any changes to QEMU, but
> it has lower performance compared with the traditional VFIO
> based PCI passthru. And that's the problem this patch set wants
> to solve.
> 
> The performance issue of vDPA/vhost-user and solutions
> ======================================================
> 
> For vhost-user backend, the critical issue in vDPA is that the
> data path performance is relatively low and some host threads are
> needed for the data path, because some necessary mechanisms are
> missing to support:
> 
> 1) guest driver notifies the device directly;
> 2) device interrupts the guest directly;
> 
> So this patch set does some small extensions to the vhost-user
> protocol to make both of them possible. It leverages the same
> mechanisms (e.g. EPT and Posted-Interrupt on Intel platform) as
> the PCI passthru.

Not all platforms support posted interrupts, and EPT isn't
required for MMIO to be mapped to devices.

It probably makes sense to separate the more portable
host notification offload from the less portable
guest notification offload.



> A new protocol feature bit is added to negotiate the accelerator
> feature support. Two new slave message types are added to control
> the notify region and queue interrupt passthru for each queue.
> >From the view of vhost-user protocol design, it's very flexible.
> The passthru can be enabled/disabled for each queue individually,
> and it's possible to accelerate each queue by different devices.
> More design and implementation details can be found from the last
> patch.
> 
> Difference between vDPA and PCI passthru
> ========================================
> 
> The key difference between PCI passthru and vDPA is that, in vDPA
> only the data path of the device (e.g. DMA ring, notify region and
> queue interrupt) is pass-throughed to the VM, the device control
> path (e.g. PCI configuration space and MMIO regions) is still
> defined and emulated by QEMU.
> 
> The benefits of keeping virtio device emulation in QEMU compared
> with virtio device PCI passthru include (but not limit to):
> 
> - consistent device interface for guest OS in the VM;
> - max flexibility on the hardware (i.e. the accelerators) design;
> - leveraging the existing virtio live-migration framework;
> 
> Why extend vhost-user for vDPA
> ==============================
> 
> We have already implemented various virtual switches (e.g. OVS-DPDK)
> based on vhost-user for VMs in the Cloud. They are purely software
> running on CPU cores. When we have accelerators for such NFVi applications,
> it's ideal if the applications could keep using the original interface
> (i.e. vhost-user netdev) with QEMU, and infrastructure is able to decide
> when and how to switch between CPU and accelerators within the interface.
> And the switching (i.e. switch between CPU and accelerators) can be done
> flexibly and quickly inside the applications.
> 
> More details about this can be found from the Cunming's discussions on
> the RFC patch set.
> 
> Update notes
> ============
> 
> IOMMU feature bit check is removed in this version, because:
> 
> The IOMMU feature is negotiable, when an accelerator is used and
> it doesn't support virtual IOMMU, its driver just won't provide
> this feature bit when vhost library querying its features. And if
> it supports the virtual IOMMU, its driver can provide this feature
> bit. It's not reasonable to add this limitation in this patch set.

Fair enough. Still:
Can hardware on intel platforms actually support IOTLB requests?
Don't you need to add support for vIOMMU shadowing instead?


> The previous links:
> RFC: http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg04844.html
> v1:  http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06028.html
> 
> v1 -> v2:
> - Add some explanations about why extend vhost-user in commit log (Paolo);
> - Bug fix in slave_read() according to Stefan's fix in DPDK;
> - Remove IOMMU feature check and related commit log;
> - Some minor refinements;
> - Rebase to the latest QEMU;
> 
> RFC -> v1:
> - Add some details about how vDPA works in cover letter (Alexey)
> - Add some details about the OVS offload use-case in cover letter (Jason)
> - Move PCI specific stuffs out of vhost-user (Jason)
> - Handle the virtual IOMMU case (Jason)
> - Move VFIO group management code into vfio/common.c (Alex)
> - Various refinements;
> (approximately sorted by comment posting time)
> 
> Tiwei Bie (6):
>   vhost-user: support receiving file descriptors in slave_read
>   vhost-user: introduce shared vhost-user state
>   virtio: support adding sub-regions for notify region
>   vfio: support getting VFIOGroup from groupfd
>   vfio: remove DPRINTF() definition from vfio-common.h
>   vhost-user: add VFIO based accelerators support
> 
>  Makefile.target                 |   4 +
>  docs/interop/vhost-user.txt     |  57 +++++++++
>  hw/scsi/vhost-user-scsi.c       |   6 +-
>  hw/vfio/common.c                |  97 +++++++++++++++-
>  hw/virtio/vhost-user.c          | 248 +++++++++++++++++++++++++++++++++++++++-
>  hw/virtio/virtio-pci.c          |  48 ++++++++
>  hw/virtio/virtio-pci.h          |   5 +
>  hw/virtio/virtio.c              |  39 +++++++
>  include/hw/vfio/vfio-common.h   |  11 +-
>  include/hw/virtio/vhost-user.h  |  34 ++++++
>  include/hw/virtio/virtio-scsi.h |   6 +-
>  include/hw/virtio/virtio.h      |   5 +
>  include/qemu/osdep.h            |   1 +
>  net/vhost-user.c                |  30 ++---
>  scripts/create_config           |   3 +
>  15 files changed, 561 insertions(+), 33 deletions(-)
>  create mode 100644 include/hw/virtio/vhost-user.h
> 
> -- 
> 2.11.0

Re: [Qemu-devel] [PATCH v2 0/6] Extend vhost-user to support VFIO based accelerators

Posted by Tiwei Bie 6 years, 1 month ago

On Thu, Mar 22, 2018 at 06:40:18PM +0200, Michael S. Tsirkin wrote:
> On Mon, Mar 19, 2018 at 03:15:31PM +0800, Tiwei Bie wrote:
[...]
> > 
> > Below link is an example showing how to setup a such environment
> > via nested VM. In this case, the virtio device in the outer VM is
> > the accelerator. It will be used to accelerate the virtio device
> > in the inner VM. In reality, we could use virtio ring compatible
> > hardware device as the accelerators.
> > 
> > http://dpdk.org/ml/archives/dev/2017-December/085044.html
> 
> I understand that it might be challenging due to
> the tight coupling with VFIO. Still - isn't there
> a way do make it easier to set a testing rig up?
> 
> In particular can we avoid the dpdk requirement for testing?
> 

If we want to try vDPA (e.g. use one virtio device to accelerate
another virtio device of a VM), I think we need vDPA. Otherwise
we will need to write a VFIO based userspace virtio driver and
find another vhost-user backend.

> 
> 
> > In above example, it doesn't require any changes to QEMU, but
> > it has lower performance compared with the traditional VFIO
> > based PCI passthru. And that's the problem this patch set wants
> > to solve.
> > 
> > The performance issue of vDPA/vhost-user and solutions
> > ======================================================
> > 
> > For vhost-user backend, the critical issue in vDPA is that the
> > data path performance is relatively low and some host threads are
> > needed for the data path, because some necessary mechanisms are
> > missing to support:
> > 
> > 1) guest driver notifies the device directly;
> > 2) device interrupts the guest directly;
> > 
> > So this patch set does some small extensions to the vhost-user
> > protocol to make both of them possible. It leverages the same
> > mechanisms (e.g. EPT and Posted-Interrupt on Intel platform) as
> > the PCI passthru.
> 
> Not all platforms support posted interrupts, and EPT isn't
> required for MMIO to be mapped to devices.
> 
> It probably makes sense to separate the more portable
> host notification offload from the less portable
> guest notification offload.
> 

Make sense. I'll split the two types of offloads. Thanks for
the suggestion!

> 
> 
> > A new protocol feature bit is added to negotiate the accelerator
> > feature support. Two new slave message types are added to control
> > the notify region and queue interrupt passthru for each queue.
> > >From the view of vhost-user protocol design, it's very flexible.
> > The passthru can be enabled/disabled for each queue individually,
> > and it's possible to accelerate each queue by different devices.
> > More design and implementation details can be found from the last
> > patch.
> > 
> > Difference between vDPA and PCI passthru
> > ========================================
> > 
> > The key difference between PCI passthru and vDPA is that, in vDPA
> > only the data path of the device (e.g. DMA ring, notify region and
> > queue interrupt) is pass-throughed to the VM, the device control
> > path (e.g. PCI configuration space and MMIO regions) is still
> > defined and emulated by QEMU.
> > 
> > The benefits of keeping virtio device emulation in QEMU compared
> > with virtio device PCI passthru include (but not limit to):
> > 
> > - consistent device interface for guest OS in the VM;
> > - max flexibility on the hardware (i.e. the accelerators) design;
> > - leveraging the existing virtio live-migration framework;
> > 
> > Why extend vhost-user for vDPA
> > ==============================
> > 
> > We have already implemented various virtual switches (e.g. OVS-DPDK)
> > based on vhost-user for VMs in the Cloud. They are purely software
> > running on CPU cores. When we have accelerators for such NFVi applications,
> > it's ideal if the applications could keep using the original interface
> > (i.e. vhost-user netdev) with QEMU, and infrastructure is able to decide
> > when and how to switch between CPU and accelerators within the interface.
> > And the switching (i.e. switch between CPU and accelerators) can be done
> > flexibly and quickly inside the applications.
> > 
> > More details about this can be found from the Cunming's discussions on
> > the RFC patch set.
> > 
> > Update notes
> > ============
> > 
> > IOMMU feature bit check is removed in this version, because:
> > 
> > The IOMMU feature is negotiable, when an accelerator is used and
> > it doesn't support virtual IOMMU, its driver just won't provide
> > this feature bit when vhost library querying its features. And if
> > it supports the virtual IOMMU, its driver can provide this feature
> > bit. It's not reasonable to add this limitation in this patch set.
> 
> Fair enough. Still:
> Can hardware on intel platforms actually support IOTLB requests?
> Don't you need to add support for vIOMMU shadowing instead?
> 

For the hardware I have, I guess they can't for now.

Best regards,
Tiwei Bie

Re: [Qemu-devel] [PATCH v2 0/6] Extend vhost-user to support VFIO based accelerators

Posted by Michael S. Tsirkin 6 years, 1 month ago

On Wed, Mar 28, 2018 at 08:24:07PM +0800, Tiwei Bie wrote:
> > > Update notes
> > > ============
> > > 
> > > IOMMU feature bit check is removed in this version, because:
> > > 
> > > The IOMMU feature is negotiable, when an accelerator is used and
> > > it doesn't support virtual IOMMU, its driver just won't provide
> > > this feature bit when vhost library querying its features. And if
> > > it supports the virtual IOMMU, its driver can provide this feature
> > > bit. It's not reasonable to add this limitation in this patch set.
> > 
> > Fair enough. Still:
> > Can hardware on intel platforms actually support IOTLB requests?
> > Don't you need to add support for vIOMMU shadowing instead?
> > 
> 
> For the hardware I have, I guess they can't for now.

So VFIO in QEMU has support for vIOMMU shadowing.
Can you use that somehow?

Ability to run dpdk within guest seems important.

-- 
MST

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v2 0/6] Extend vhost-user to support VFIO based accelerators

Posted by Tiwei Bie 6 years, 1 month ago

On Wed, Mar 28, 2018 at 06:33:01PM +0300, Michael S. Tsirkin wrote:
> On Wed, Mar 28, 2018 at 08:24:07PM +0800, Tiwei Bie wrote:
> > > > Update notes
> > > > ============
> > > > 
> > > > IOMMU feature bit check is removed in this version, because:
> > > > 
> > > > The IOMMU feature is negotiable, when an accelerator is used and
> > > > it doesn't support virtual IOMMU, its driver just won't provide
> > > > this feature bit when vhost library querying its features. And if
> > > > it supports the virtual IOMMU, its driver can provide this feature
> > > > bit. It's not reasonable to add this limitation in this patch set.
> > > 
> > > Fair enough. Still:
> > > Can hardware on intel platforms actually support IOTLB requests?
> > > Don't you need to add support for vIOMMU shadowing instead?
> > > 
> > 
> > For the hardware I have, I guess they can't for now.
> 
> So VFIO in QEMU has support for vIOMMU shadowing.
> Can you use that somehow?

Yeah, I guess we can use it in some way. Actually supporting
vIOMMU is a quite interesting feature. It would provide
better security, and for the hardware backend case there
would be no performance penalty with static mapping after
the backend got all the mappings. I think it could be done
as another work. Based on your previous suggestion in this
thread, I have split the guest notification offload and host
notification offload (I'll send the new version very soon).
And I plan to let this patch set just focus on fixing the
most critical performance issue - the host notification offload.
With this fix, using hardware backend in vhost-user could get
a very big performance boost and become much more practicable.
So maybe we can focus on fixing this critical performance issue
first. How do you think?

> 
> Ability to run dpdk within guest seems important.

I think vIOMMU isn't a must to run DPDK in guest. For Linux
guest we also have igb_uio and uio_pci_generic to run DPDK,
for FreeBSD guest we have nic_uio. They don't need vIOMMU,
and they could offer the best performance.

Best regards,
Tiwei Bie

> 
> -- 
> MST
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v2 0/6] Extend vhost-user to support VFIO based accelerators

Posted by Michael S. Tsirkin 6 years, 1 month ago

On Thu, Mar 29, 2018 at 11:33:29AM +0800, Tiwei Bie wrote:
> On Wed, Mar 28, 2018 at 06:33:01PM +0300, Michael S. Tsirkin wrote:
> > On Wed, Mar 28, 2018 at 08:24:07PM +0800, Tiwei Bie wrote:
> > > > > Update notes
> > > > > ============
> > > > > 
> > > > > IOMMU feature bit check is removed in this version, because:
> > > > > 
> > > > > The IOMMU feature is negotiable, when an accelerator is used and
> > > > > it doesn't support virtual IOMMU, its driver just won't provide
> > > > > this feature bit when vhost library querying its features. And if
> > > > > it supports the virtual IOMMU, its driver can provide this feature
> > > > > bit. It's not reasonable to add this limitation in this patch set.
> > > > 
> > > > Fair enough. Still:
> > > > Can hardware on intel platforms actually support IOTLB requests?
> > > > Don't you need to add support for vIOMMU shadowing instead?
> > > > 
> > > 
> > > For the hardware I have, I guess they can't for now.
> > 
> > So VFIO in QEMU has support for vIOMMU shadowing.
> > Can you use that somehow?
> 
> Yeah, I guess we can use it in some way. Actually supporting
> vIOMMU is a quite interesting feature. It would provide
> better security, and for the hardware backend case there
> would be no performance penalty with static mapping after
> the backend got all the mappings. I think it could be done
> as another work. Based on your previous suggestion in this
> thread, I have split the guest notification offload and host
> notification offload (I'll send the new version very soon).
> And I plan to let this patch set just focus on fixing the
> most critical performance issue - the host notification offload.
> With this fix, using hardware backend in vhost-user could get
> a very big performance boost and become much more practicable.
> So maybe we can focus on fixing this critical performance issue
> first. How do you think?

I think correctness and security go first before performance.
vIOMMU goes under security.

> > 
> > Ability to run dpdk within guest seems important.
> 
> I think vIOMMU isn't a must to run DPDK in guest.

Oh yes it is.

> For Linux
> guest we also have igb_uio and uio_pci_generic to run DPDK,
> for FreeBSD guest we have nic_uio.

These hacks offer no protection from a buggy userspace corrupting guest
kernel memory. Given DPDK is routinely linked into closed source
applications, this is not a configuration anyone can support.


> They don't need vIOMMU,
> and they could offer the best performance.
> 
> Best regards,
> Tiwei Bie
> 
> > 
> > -- 
> > MST
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >