[for-6.1 0/4] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci

Greg Kurz posted 4 patches 6 days, 16 hours ago
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20210407143501.244343-1-groug@kaod.org
Maintainers: Paolo Bonzini <pbonzini@redhat.com>, "Michael S. Tsirkin" <mst@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>, Max Reitz <mreitz@redhat.com>, Kevin Wolf <kwolf@redhat.com>, Fam Zheng <fam@euphon.net>
hw/block/dataplane/virtio-blk.c | 36 +++++++++++++++++++--
hw/scsi/virtio-scsi-dataplane.c | 56 ++++++++++++++++++++++-----------
2 files changed, 72 insertions(+), 20 deletions(-)

[for-6.1 0/4] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci

Posted by Greg Kurz 6 days, 16 hours ago
Now that virtio-scsi-pci and virtio-blk-pci map 1 virtqueue per vCPU,
a serious slow down may be observed on setups with a big enough number
of vCPUs.

Exemple with a pseries guest on a bi-POWER9 socket system (128 HW threads):

              virtio-scsi      virtio-blk

1		0m20.922s	0m21.346s
2		0m21.230s	0m20.350s
4		0m21.761s	0m20.997s
8		0m22.770s	0m20.051s
16		0m22.038s	0m19.994s
32		0m22.928s	0m20.803s
64		0m26.583s	0m22.953s
128		0m41.273s	0m32.333s
256		2m4.727s 	1m16.924s
384		6m5.563s 	3m26.186s

Both perf and gprof indicate that QEMU is hogging CPUs when setting up
the ioeventfds:

 67.88%  swapper         [kernel.kallsyms]  [k] power_pmu_enable
  9.47%  qemu-kvm        [kernel.kallsyms]  [k] smp_call_function_single
  8.64%  qemu-kvm        [kernel.kallsyms]  [k] power_pmu_enable
=>2.79%  qemu-kvm        qemu-kvm           [.] memory_region_ioeventfd_before
=>2.12%  qemu-kvm        qemu-kvm           [.] address_space_update_ioeventfds
  0.56%  kworker/8:0-mm  [kernel.kallsyms]  [k] smp_call_function_single

address_space_update_ioeventfds() is called when committing an MR
transaction, i.e. for each ioeventfd with the current code base,
and it internally loops on all ioventfds:

static void address_space_update_ioeventfds(AddressSpace *as)
{
[...]
    FOR_EACH_FLAT_RANGE(fr, view) {
        for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {

This means that the setup of ioeventfds for these devices has
quadratic time complexity.

This series simply changes the device models to extend the transaction
to all virtqueueues, like already done in the past in the generic
code with 710fccf80d78 ("virtio: improve virtio devices initialization
time").

Only virtio-scsi and virtio-blk are covered here, but a similar change
might also be beneficial to other device types such as host-scsi-pci,
vhost-user-scsi-pci and vhost-user-blk-pci.

              virtio-scsi      virtio-blk

1		0m21.271s	0m22.076s
2		0m20.912s	0m19.716s
4		0m20.508s	0m19.310s
8		0m21.374s	0m20.273s
16		0m21.559s	0m21.374s
32		0m22.532s	0m21.271s
64		0m26.550s	0m22.007s
128		0m29.115s	0m27.446s
256		0m44.752s	0m41.004s
384		1m2.884s	0m58.023s

This should fix https://bugzilla.redhat.com/show_bug.cgi?id=1927108
which reported the issue for virtio-scsi-pci.

Changes since RFC:

As suggested by Stefan, splimplify the code by directly beginning and
committing the memory transaction from the device model, without all
the virtio specific proxying code and no changes needed in the memory
subsystem.

Greg Kurz (4):
  virtio-blk: Fix rollback path in virtio_blk_data_plane_start()
  virtio-blk: Configure all host notifiers in a single MR transaction
  virtio-scsi: Set host notifiers and callbacks separately
  virtio-scsi: Configure all host notifiers in a single MR transaction

 hw/block/dataplane/virtio-blk.c | 36 +++++++++++++++++++--
 hw/scsi/virtio-scsi-dataplane.c | 56 ++++++++++++++++++++++-----------
 2 files changed, 72 insertions(+), 20 deletions(-)

-- 
2.26.3