[PATCH v5 0/5] virtio-pci: enable blk and scsi multi-queue by default

Stefan Hajnoczi posted 5 patches 3 years, 10 months ago
Test FreeBSD passed
Test docker-quick@centos7 passed
Test checkpatch passed
Test docker-mingw@fedora passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20200706135650.438362-1-stefanha@redhat.com
There is a newer version of this series
hw/virtio/virtio-pci.h             |  9 +++++++++
include/hw/virtio/vhost-user-blk.h |  2 ++
include/hw/virtio/virtio-blk.h     |  2 ++
include/hw/virtio/virtio-scsi.h    |  5 +++++
hw/block/vhost-user-blk.c          |  6 +++++-
hw/block/virtio-blk.c              |  6 +++++-
hw/core/machine.c                  |  5 +++++
hw/scsi/vhost-scsi.c               |  3 ++-
hw/scsi/vhost-user-scsi.c          |  5 +++--
hw/scsi/virtio-scsi.c              | 13 ++++++++----
hw/virtio/vhost-scsi-pci.c         |  9 +++++++--
hw/virtio/vhost-user-blk-pci.c     |  4 ++++
hw/virtio/vhost-user-scsi-pci.c    |  9 +++++++--
hw/virtio/virtio-blk-pci.c         |  7 ++++++-
hw/virtio/virtio-pci.c             | 32 ++++++++++++++++++++++++++++++
hw/virtio/virtio-scsi-pci.c        |  9 +++++++--
16 files changed, 110 insertions(+), 16 deletions(-)
[PATCH v5 0/5] virtio-pci: enable blk and scsi multi-queue by default
Posted by Stefan Hajnoczi 3 years, 10 months ago
v4:
 * Sorry for the long delay. I considered replacing this series with a simpler
   approach. Real hardware ships with a fixed number of queues (e.g. 128). The
   equivalent can be done in QEMU too. That way we don't need to magically size
   num_queues. In the end I decided against this approach because the Linux
   virtio_blk.ko and virtio_scsi.ko guest drivers unconditionally initialized
   all available queues until recently (it was written with
   num_queues=num_vcpus in mind). It doesn't make sense for a 1 CPU guest to
   bring up 128 virtqueues (waste of resources and possibly weird performance
   effects with blk-mq).
 * Honor maximum number of MSI-X vectors and virtqueues [Daniel Berrange]
 * Update commit descriptions to mention maximum MSI-X vector and virtqueue
   caps [Raphael]
v3:
 * Introduce virtio_pci_optimal_num_queues() helper to enforce VIRTIO_QUEUE_MAX
   in one place
 * Use VIRTIO_SCSI_VQ_NUM_FIXED constant in all cases [Cornelia]
 * Update hw/core/machine.c compat properties for QEMU 5.0 [Michael]
v3:
 * Add new performance results that demonstrate the scalability
 * Mention that this is PCI-specific [Cornelia]
v2:
 * Let the virtio-DEVICE-pci device select num-queues because the optimal
   multi-queue configuration may differ between virtio-pci, virtio-mmio, and
   virtio-ccw [Cornelia]

Enabling multi-queue on virtio-pci storage devices improves performance on SMP
guests because the completion interrupt is handled on the vCPU that submitted
the I/O request.  This avoids IPIs inside the guest.

Note that performance is unchanged in these cases:
1. Uniprocessor guests.  They don't have IPIs.
2. Application threads might be scheduled on the sole vCPU that handles
   completion interrupts purely by chance.  (This is one reason why benchmark
   results can vary noticably between runs.)
3. Users may bind the application to the vCPU that handles completion
   interrupts.

Set the number of queues to the number of vCPUs by default on virtio-blk and
virtio-scsi PCI devices.  Older machine types continue to default to 1 queue
for live migration compatibility.

Random read performance:
      IOPS
q=1    78k
q=32  104k  +33%

Boot time:
      Duration
q=1        51s
q=32     1m41s  +98%

Guest configuration: 32 vCPUs, 101 virtio-blk-pci disks

Previously measured results on a 4 vCPU guest were also positive but showed a
smaller 1-4% performance improvement.  They are no longer valid because
significant event loop optimizations have been merged.

Stefan Hajnoczi (5):
  virtio-pci: add virtio_pci_optimal_num_queues() helper
  virtio-scsi: introduce a constant for fixed virtqueues
  virtio-scsi: default num_queues to -smp N
  virtio-blk: default num_queues to -smp N
  vhost-user-blk: default num_queues to -smp N

 hw/virtio/virtio-pci.h             |  9 +++++++++
 include/hw/virtio/vhost-user-blk.h |  2 ++
 include/hw/virtio/virtio-blk.h     |  2 ++
 include/hw/virtio/virtio-scsi.h    |  5 +++++
 hw/block/vhost-user-blk.c          |  6 +++++-
 hw/block/virtio-blk.c              |  6 +++++-
 hw/core/machine.c                  |  5 +++++
 hw/scsi/vhost-scsi.c               |  3 ++-
 hw/scsi/vhost-user-scsi.c          |  5 +++--
 hw/scsi/virtio-scsi.c              | 13 ++++++++----
 hw/virtio/vhost-scsi-pci.c         |  9 +++++++--
 hw/virtio/vhost-user-blk-pci.c     |  4 ++++
 hw/virtio/vhost-user-scsi-pci.c    |  9 +++++++--
 hw/virtio/virtio-blk-pci.c         |  7 ++++++-
 hw/virtio/virtio-pci.c             | 32 ++++++++++++++++++++++++++++++
 hw/virtio/virtio-scsi-pci.c        |  9 +++++++--
 16 files changed, 110 insertions(+), 16 deletions(-)

-- 
2.26.2

Re: [PATCH v5 0/5] virtio-pci: enable blk and scsi multi-queue by default
Posted by Michael S. Tsirkin 3 years, 9 months ago
On Mon, Jul 06, 2020 at 02:56:45PM +0100, Stefan Hajnoczi wrote:
> v4:
>  * Sorry for the long delay. I considered replacing this series with a simpler
>    approach. Real hardware ships with a fixed number of queues (e.g. 128). The
>    equivalent can be done in QEMU too. That way we don't need to magically size
>    num_queues. In the end I decided against this approach because the Linux
>    virtio_blk.ko and virtio_scsi.ko guest drivers unconditionally initialized
>    all available queues until recently (it was written with
>    num_queues=num_vcpus in mind). It doesn't make sense for a 1 CPU guest to
>    bring up 128 virtqueues (waste of resources and possibly weird performance
>    effects with blk-mq).
>  * Honor maximum number of MSI-X vectors and virtqueues [Daniel Berrange]
>  * Update commit descriptions to mention maximum MSI-X vector and virtqueue
>    caps [Raphael]
> v3:
>  * Introduce virtio_pci_optimal_num_queues() helper to enforce VIRTIO_QUEUE_MAX
>    in one place
>  * Use VIRTIO_SCSI_VQ_NUM_FIXED constant in all cases [Cornelia]
>  * Update hw/core/machine.c compat properties for QEMU 5.0 [Michael]
> v3:
>  * Add new performance results that demonstrate the scalability
>  * Mention that this is PCI-specific [Cornelia]
> v2:
>  * Let the virtio-DEVICE-pci device select num-queues because the optimal
>    multi-queue configuration may differ between virtio-pci, virtio-mmio, and
>    virtio-ccw [Cornelia]
> 
> Enabling multi-queue on virtio-pci storage devices improves performance on SMP
> guests because the completion interrupt is handled on the vCPU that submitted
> the I/O request.  This avoids IPIs inside the guest.
> 
> Note that performance is unchanged in these cases:
> 1. Uniprocessor guests.  They don't have IPIs.
> 2. Application threads might be scheduled on the sole vCPU that handles
>    completion interrupts purely by chance.  (This is one reason why benchmark
>    results can vary noticably between runs.)
> 3. Users may bind the application to the vCPU that handles completion
>    interrupts.
> 
> Set the number of queues to the number of vCPUs by default on virtio-blk and
> virtio-scsi PCI devices.  Older machine types continue to default to 1 queue
> for live migration compatibility.
> 
> Random read performance:
>       IOPS
> q=1    78k
> q=32  104k  +33%
> 
> Boot time:
>       Duration
> q=1        51s
> q=32     1m41s  +98%
> 
> Guest configuration: 32 vCPUs, 101 virtio-blk-pci disks
> 
> Previously measured results on a 4 vCPU guest were also positive but showed a
> smaller 1-4% performance improvement.  They are no longer valid because
> significant event loop optimizations have been merged.

I'm guessing this should be deferred to the next release as
it (narrowly) missed the freeze window. Does this make sense to you?

> Stefan Hajnoczi (5):
>   virtio-pci: add virtio_pci_optimal_num_queues() helper
>   virtio-scsi: introduce a constant for fixed virtqueues
>   virtio-scsi: default num_queues to -smp N
>   virtio-blk: default num_queues to -smp N
>   vhost-user-blk: default num_queues to -smp N
> 
>  hw/virtio/virtio-pci.h             |  9 +++++++++
>  include/hw/virtio/vhost-user-blk.h |  2 ++
>  include/hw/virtio/virtio-blk.h     |  2 ++
>  include/hw/virtio/virtio-scsi.h    |  5 +++++
>  hw/block/vhost-user-blk.c          |  6 +++++-
>  hw/block/virtio-blk.c              |  6 +++++-
>  hw/core/machine.c                  |  5 +++++
>  hw/scsi/vhost-scsi.c               |  3 ++-
>  hw/scsi/vhost-user-scsi.c          |  5 +++--
>  hw/scsi/virtio-scsi.c              | 13 ++++++++----
>  hw/virtio/vhost-scsi-pci.c         |  9 +++++++--
>  hw/virtio/vhost-user-blk-pci.c     |  4 ++++
>  hw/virtio/vhost-user-scsi-pci.c    |  9 +++++++--
>  hw/virtio/virtio-blk-pci.c         |  7 ++++++-
>  hw/virtio/virtio-pci.c             | 32 ++++++++++++++++++++++++++++++
>  hw/virtio/virtio-scsi-pci.c        |  9 +++++++--
>  16 files changed, 110 insertions(+), 16 deletions(-)
> 
> -- 
> 2.26.2
> 


Re: [PATCH v5 0/5] virtio-pci: enable blk and scsi multi-queue by default
Posted by Stefan Hajnoczi 3 years, 9 months ago
On Wed, Jul 08, 2020 at 06:59:41AM -0400, Michael S. Tsirkin wrote:
> On Mon, Jul 06, 2020 at 02:56:45PM +0100, Stefan Hajnoczi wrote:
> > v4:
> >  * Sorry for the long delay. I considered replacing this series with a simpler
> >    approach. Real hardware ships with a fixed number of queues (e.g. 128). The
> >    equivalent can be done in QEMU too. That way we don't need to magically size
> >    num_queues. In the end I decided against this approach because the Linux
> >    virtio_blk.ko and virtio_scsi.ko guest drivers unconditionally initialized
> >    all available queues until recently (it was written with
> >    num_queues=num_vcpus in mind). It doesn't make sense for a 1 CPU guest to
> >    bring up 128 virtqueues (waste of resources and possibly weird performance
> >    effects with blk-mq).
> >  * Honor maximum number of MSI-X vectors and virtqueues [Daniel Berrange]
> >  * Update commit descriptions to mention maximum MSI-X vector and virtqueue
> >    caps [Raphael]
> > v3:
> >  * Introduce virtio_pci_optimal_num_queues() helper to enforce VIRTIO_QUEUE_MAX
> >    in one place
> >  * Use VIRTIO_SCSI_VQ_NUM_FIXED constant in all cases [Cornelia]
> >  * Update hw/core/machine.c compat properties for QEMU 5.0 [Michael]
> > v3:
> >  * Add new performance results that demonstrate the scalability
> >  * Mention that this is PCI-specific [Cornelia]
> > v2:
> >  * Let the virtio-DEVICE-pci device select num-queues because the optimal
> >    multi-queue configuration may differ between virtio-pci, virtio-mmio, and
> >    virtio-ccw [Cornelia]
> > 
> > Enabling multi-queue on virtio-pci storage devices improves performance on SMP
> > guests because the completion interrupt is handled on the vCPU that submitted
> > the I/O request.  This avoids IPIs inside the guest.
> > 
> > Note that performance is unchanged in these cases:
> > 1. Uniprocessor guests.  They don't have IPIs.
> > 2. Application threads might be scheduled on the sole vCPU that handles
> >    completion interrupts purely by chance.  (This is one reason why benchmark
> >    results can vary noticably between runs.)
> > 3. Users may bind the application to the vCPU that handles completion
> >    interrupts.
> > 
> > Set the number of queues to the number of vCPUs by default on virtio-blk and
> > virtio-scsi PCI devices.  Older machine types continue to default to 1 queue
> > for live migration compatibility.
> > 
> > Random read performance:
> >       IOPS
> > q=1    78k
> > q=32  104k  +33%
> > 
> > Boot time:
> >       Duration
> > q=1        51s
> > q=32     1m41s  +98%
> > 
> > Guest configuration: 32 vCPUs, 101 virtio-blk-pci disks
> > 
> > Previously measured results on a 4 vCPU guest were also positive but showed a
> > smaller 1-4% performance improvement.  They are no longer valid because
> > significant event loop optimizations have been merged.
> 
> I'm guessing this should be deferred to the next release as
> it (narrowly) missed the freeze window. Does this make sense to you?

Yes, that is fine. Thanks!

Stefan