[PATCH v5 00/12] blkio: add libblkio BlockDriver

Stefan Hajnoczi posted 12 patches 1 year, 7 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20220927193431.22302-1-stefanha@redhat.com
Maintainers: Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>, Fam Zheng <fam@euphon.net>, John Snow <jsnow@redhat.com>, Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>, Eric Blake <eblake@redhat.com>, "Philippe Mathieu-Daudé" <f4bug@amsat.org>, "Denis V. Lunev" <den@openvz.org>, Wen Congyang <wencongyang2@huawei.com>, Xie Changlong <xiechanglong.d@gmail.com>, "Richard W.M. Jones" <rjones@redhat.com>, Jeff Cody <codyprime@gmail.com>, "Michael S. Tsirkin" <mst@redhat.com>, Eduardo Habkost <eduardo@habkost.net>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Yanan Wang <wangyanan55@huawei.com>, Richard Henderson <richard.henderson@linaro.org>, Paolo Bonzini <pbonzini@redhat.com>, Markus Armbruster <armbru@redhat.com>, Peter Xu <peterx@redhat.com>, David Hildenbrand <david@redhat.com>, Thomas Huth <thuth@redhat.com>, Laurent Vivier <lvivier@redhat.com>
There is a newer version of this series
MAINTAINERS                                 |    7 +
meson_options.txt                           |    2 +
qapi/block-core.json                        |   53 +-
meson.build                                 |    9 +
include/block/block-common.h                |    9 +
include/block/block-global-state.h          |   10 +-
include/block/block_int-common.h            |   15 +-
include/exec/cpu-common.h                   |    1 +
include/hw/virtio/virtio-blk.h              |    2 +
include/qemu/coroutine.h                    |   15 +-
include/sysemu/block-backend-global-state.h |    4 +-
include/sysemu/block-ram-registrar.h        |   37 +
block.c                                     |   14 +
block/blkio.c                               | 1017 +++++++++++++++++++
block/blkverify.c                           |    4 +-
block/block-backend.c                       |    8 +-
block/block-ram-registrar.c                 |   54 +
block/crypto.c                              |    4 +-
block/file-posix.c                          |    1 -
block/gluster.c                             |    1 -
block/io.c                                  |  101 +-
block/mirror.c                              |    2 +
block/nbd.c                                 |    1 -
block/nvme.c                                |   20 +-
block/parallels.c                           |    1 -
block/qcow.c                                |    2 -
block/qed.c                                 |    1 -
block/raw-format.c                          |    2 +
block/replication.c                         |    1 -
block/ssh.c                                 |    1 -
block/vhdx.c                                |    1 -
hw/block/virtio-blk.c                       |   39 +-
hw/core/numa.c                              |   17 +
qemu-img.c                                  |    6 +-
softmmu/physmem.c                           |    5 +
stubs/physmem.c                             |   13 +
tests/qtest/modules-test.c                  |    3 +
util/qemu-coroutine-lock.c                  |    9 +-
util/vfio-helpers.c                         |    5 +-
block/meson.build                           |    2 +
scripts/meson-buildoptions.sh               |    3 +
stubs/meson.build                           |    1 +
42 files changed, 1412 insertions(+), 91 deletions(-)
create mode 100644 include/sysemu/block-ram-registrar.h
create mode 100644 block/blkio.c
create mode 100644 block/block-ram-registrar.c
create mode 100644 stubs/physmem.c
[PATCH v5 00/12] blkio: add libblkio BlockDriver
Posted by Stefan Hajnoczi 1 year, 7 months ago
v5:
- Drop "RFC" since libblkio 1.0 has been released and the library API is stable
- Disable BDRV_REQ_REGISTERED_BUF if we run out of blkio_mem_regions. The
  bounce buffer slow path is taken when there are not enough blkio_mem_regions
  to cover guest RAM. [Hanna & David Hildenbrand]
- Call ram_block_discard_disable() when mem-region-pinned property is true or
  absent [David Hildenbrand]
- Use a bounce buffer pool instead of allocating/freeing a buffer for each
  request. This reduces the number of blkio_mem_regions required for bounce
  buffers to 1 and avoids frequent blkio_mem_region_map/unmap() calls.
- Switch to .bdrv_co_*() instead of .bdrv_aio_*(). Needed for the bounce buffer
  pool's CoQueue.
v4:
- Patch 1:
  - Add virtio-blk-vhost-user driver [Kevin]
  - Drop .bdrv_parse_filename() and .bdrv_needs_filename for virtio-blk-vhost-vdpa [Stefano]
  - Add copyright and license header [Hanna]
  - Drop .bdrv_parse_filename() in favor of --blockdev or json: [Hanna]
  - Clarify that "filename" is always non-NULL for io_uring [Hanna]
  - Check that virtio-blk-vhost-vdpa "path" option is non-NULL [Hanna]
  - Fix virtio-blk-vhost-vdpa cache.direct=off logic [Hanna]
  - Use macros for driver names [Hanna]
  - Assert that the driver name is valid [Hanna]
  - Update "readonly" property name to "read-only" [Hanna]
  - Call blkio_detach_aio_context() in blkio_close() [Hanna]
  - Avoid uint32_t * to int * casts in blkio_refresh_limits() [Hanna]
  - Remove write zeroes and discard from the todo list [Hanna]
  - Use PRIu32 instead of %d for uint32_t [Hanna]
  - Fix error messages with buf-alignment instead of optimal-io-size [Hanna]
  - Call map/unmap APIs since libblkio alloc/free APIs no longer do that
  - Update QAPI schema QEMU version to 7.2
- Patch 5:
  - Expand the BDRV_REQ_REGISTERED_BUF flag passthrough and drop assert(!flags)
    in drivers [Hanna]
- Patch 7:
  - Fix BLK->BDRV typo [Hanna]
  - Make BlockRAMRegistrar handle failure [Hanna]
- Patch 8:
  - Replace memory_region_get_fd() approach with qemu_ram_get_fd()
- Patch 10:
  - Use (void)ret; to discard unused return value [Hanna]
  - libblkio's blkio_unmap_mem_region() API no longer has a return value
  - Check for registered bufs that cross RAMBlocks [Hanna]
- Patch 11:
  - Handle bdrv_register_buf() errors [Hanna]
v3:
- Add virtio-blk-vhost-vdpa for vdpa-blk devices including VDUSE
- Add discard and write zeroes support
- Rebase and adopt latest libblkio APIs
v2:
- Add BDRV_REQ_REGISTERED_BUF to bs.supported_write_flags [Stefano]
- Use new blkioq_get_num_completions() API
- Implement .bdrv_refresh_limits()

This patch series adds a QEMU BlockDriver for libblkio
(https://gitlab.com/libblkio/libblkio/), a library for high-performance block
device I/O. This work was presented at KVM Forum 2022 and slides are available
here:
https://static.sched.com/hosted_files/kvmforum2022/8c/libblkio-kvm-forum-2022.pdf

The second patch adds the core BlockDriver and most of the libblkio API usage.
Three libblkio drivers are included:
- io_uring
- virtio-blk-vhost-user
- virtio-blk-vhost-vdpa

The remainder of the patch series reworks the existing QEMU bdrv_register_buf()
API so virtio-blk emulation efficiently map guest RAM for libblkio - some
libblkio drivers require that I/O buffer memory is pre-registered (think VFIO,
vhost, etc).

Vladimir requested performance results that show the effect of the
BDRV_REQ_REGISTERED_BUF flag. I ran the patches against qemu-storage-daemon's
vhost-user-blk export with iodepth=1 bs=512 to see the per-request overhead due
to bounce buffer allocation/mapping:

Name                                   IOPS   Error
bounce-buf                          4373.81 ± 0.01%
registered-buf                     13062.80 ± 0.67%

The BDRV_REQ_REGISTERED_BUF optimization version is about 3x faster.

See the BlockDriver struct in block/blkio.c for a list of APIs that still need
to be implemented. The core functionality is covered.

Regarding the design: each libblkio driver is a separately named BlockDriver.
That means there is an "io_uring" BlockDriver and not a generic "libblkio"
BlockDriver. This way QAPI and open parameters are type-safe and mandatory
parameters can be checked by QEMU.

Stefan Hajnoczi (12):
  coroutine: add flag to re-queue at front of CoQueue
  blkio: add libblkio block driver
  numa: call ->ram_block_removed() in ram_block_notifer_remove()
  block: pass size to bdrv_unregister_buf()
  block: use BdrvRequestFlags type for supported flag fields
  block: add BDRV_REQ_REGISTERED_BUF request flag
  block: return errors from bdrv_register_buf()
  block: add BlockRAMRegistrar
  exec/cpu-common: add qemu_ram_get_fd()
  stubs: add qemu_ram_block_from_host() and qemu_ram_get_fd()
  blkio: implement BDRV_REQ_REGISTERED_BUF optimization
  virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint

 MAINTAINERS                                 |    7 +
 meson_options.txt                           |    2 +
 qapi/block-core.json                        |   53 +-
 meson.build                                 |    9 +
 include/block/block-common.h                |    9 +
 include/block/block-global-state.h          |   10 +-
 include/block/block_int-common.h            |   15 +-
 include/exec/cpu-common.h                   |    1 +
 include/hw/virtio/virtio-blk.h              |    2 +
 include/qemu/coroutine.h                    |   15 +-
 include/sysemu/block-backend-global-state.h |    4 +-
 include/sysemu/block-ram-registrar.h        |   37 +
 block.c                                     |   14 +
 block/blkio.c                               | 1017 +++++++++++++++++++
 block/blkverify.c                           |    4 +-
 block/block-backend.c                       |    8 +-
 block/block-ram-registrar.c                 |   54 +
 block/crypto.c                              |    4 +-
 block/file-posix.c                          |    1 -
 block/gluster.c                             |    1 -
 block/io.c                                  |  101 +-
 block/mirror.c                              |    2 +
 block/nbd.c                                 |    1 -
 block/nvme.c                                |   20 +-
 block/parallels.c                           |    1 -
 block/qcow.c                                |    2 -
 block/qed.c                                 |    1 -
 block/raw-format.c                          |    2 +
 block/replication.c                         |    1 -
 block/ssh.c                                 |    1 -
 block/vhdx.c                                |    1 -
 hw/block/virtio-blk.c                       |   39 +-
 hw/core/numa.c                              |   17 +
 qemu-img.c                                  |    6 +-
 softmmu/physmem.c                           |    5 +
 stubs/physmem.c                             |   13 +
 tests/qtest/modules-test.c                  |    3 +
 util/qemu-coroutine-lock.c                  |    9 +-
 util/vfio-helpers.c                         |    5 +-
 block/meson.build                           |    2 +
 scripts/meson-buildoptions.sh               |    3 +
 stubs/meson.build                           |    1 +
 42 files changed, 1412 insertions(+), 91 deletions(-)
 create mode 100644 include/sysemu/block-ram-registrar.h
 create mode 100644 block/blkio.c
 create mode 100644 block/block-ram-registrar.c
 create mode 100644 stubs/physmem.c

-- 
2.37.3


Re: [PATCH v5 00/12] blkio: add libblkio BlockDriver
Posted by Stefano Garzarella 1 year, 7 months ago
Hi Stefan,
I tested this series with the vDPA block simulator in Linux v6.0.
It worked well, but I had a segfault when blkio_map_mem_region() fails.

In my case, it failed because I forgot to increase the `memlock` limit 
with `ulimit -l unlimited` since the simulator still requires locking 
all the VM memory.

QEMU failed with this messages:
$ qemu-system-x86_64 ... \
  -blockdev node-name=drive_src1,driver=virtio-blk-vhost-vdpa,path=/dev/vhost-vdpa-0,cache.direct=on
qemu-system-x86_64: -device virtio-blk-pci,id=src1,bootindex=2,drive=drive_src1,iothread=iothread0: Failed to add blkio mem region 0x7f60bfe00000 with size 536870912: Bad address (os error 14)
qemu-system-x86_64: -device virtio-blk-pci,id=src1,bootindex=2,drive=drive_src1,iothread=iothread0: Failed to add blkio mem region 0x7f60be400000 with size 16777216: Bad address (os error 14)
[1]    20803 segmentation fault

IIUC this could be related to a double call to 
ram_block_notifier_remove() in the error path of ram_block_added() 
(block/block-ram-registrar.c) callback.

Maybe we should call blk_register_buf() only if r->ok is true.

The stack trace is the following:
#0  0x00005641a8b097dd in ram_block_notifier_remove (n=n@entry=0x5641ab8354d8) at ../hw/core/numa.c:850
#1  0x00005641a89c4701 in ram_block_added
     (n=0x5641ab8354d8, host=<optimized out>, size=<optimized out>, max_size=<optimized out>)
    at ../block/block-ram-registrar.c:20
#2  0x00005641a8b083af in ram_block_notify_add_single (rb=0x5641ab3d2810, opaque=0x5641ab8354d8)
    at ../hw/core/numa.c:820
#3  0x00005641a8b92d8d in qemu_ram_foreach_block
    (func=func@entry=0x5641a8b08370 <ram_block_notify_add_single>, opaque=opaque@entry=0x5641ab8354d8)
    at ../softmmu/physmem.c:3571
#4  0x00005641a8b097af in ram_block_notifier_add (n=n@entry=0x5641ab8354d8) at ../hw/core/numa.c:844
#5  0x00005641a89c474f in blk_ram_registrar_init (r=r@entry=0x5641ab8354d0, blk=<optimized out>)
    at ../block/block-ram-registrar.c:46
#6  0x00005641a8affe88 in virtio_blk_device_realize (dev=0x5641ab835230, errp=0x7ffc72657190)
    at ../hw/block/virtio-blk.c:1239
#7  0x00005641a8b4b716 in virtio_device_realize (dev=0x5641ab835230, errp=0x7ffc726571f0)
    at ../hw/virtio/virtio.c:3684
#8  0x00005641a8c2049b in device_set_realized (obj=<optimized out>, value=<optimized out>, errp=0x7ffc72657380)
    at ../hw/core/qdev.c:553
#9  0x00005641a8c24738 in property_set_bool
    (obj=0x5641ab835230, v=<optimized out>, name=<optimized out>, opaque=0x5641aa518820, errp=0x7ffc72657380)
    at ../qom/object.c:2273
#10 0x00005641a8c27683 in object_property_set
   
     (obj=obj@entry=0x5641ab835230, name=name@entry=0x5641a8e9719a "realized", v=v@entry=0x5641ab83f5b0, errp=errp@entry=0x7ffc72657380) at ../qom/object.c:1408
#11 0x00005641a8c2a97f in object_property_set_qobject
    (obj=obj@entry=0x5641ab835230, name=name@entry=0x5641a8e9719a "realized", value=value@entry=0x5641ab83f4f0, errp=errp@entry=0x7ffc72657380) at ../qom/qom-qobject.c:28
#12 0x00005641a8c27c84 in object_property_set_bool
    (obj=0x5641ab835230, name=0x5641a8e9719a "realized", value=<optimized out>, errp=0x7ffc72657380)
    at ../qom/object.c:1477
#13 0x00005641a8929b74 in pci_qdev_realize (qdev=<optimized out>, errp=<optimized out>) at ../hw/pci/pci.c:2218
#14 0x00005641a8c2049b in device_set_realized (obj=<optimized out>, value=<optimized out>, errp=0x7ffc726575b0)
    at ../hw/core/qdev.c:553
#15 0x00005641a8c24738 in property_set_bool
    (obj=0x5641ab82ce70, v=<optimized out>, name=<optimized out>, opaque=0x5641aa518820, errp=0x7ffc726575b0)
    at ../qom/object.c:2273
#16 0x00005641a8c27683 in object_property_set

Thanks,
Stefano

On Tue, Sep 27, 2022 at 9:34 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> v5:
> - Drop "RFC" since libblkio 1.0 has been released and the library API is stable
> - Disable BDRV_REQ_REGISTERED_BUF if we run out of blkio_mem_regions. The
>   bounce buffer slow path is taken when there are not enough blkio_mem_regions
>   to cover guest RAM. [Hanna & David Hildenbrand]
> - Call ram_block_discard_disable() when mem-region-pinned property is true or
>   absent [David Hildenbrand]
> - Use a bounce buffer pool instead of allocating/freeing a buffer for each
>   request. This reduces the number of blkio_mem_regions required for bounce
>   buffers to 1 and avoids frequent blkio_mem_region_map/unmap() calls.
> - Switch to .bdrv_co_*() instead of .bdrv_aio_*(). Needed for the bounce buffer
>   pool's CoQueue.
> v4:
> - Patch 1:
>   - Add virtio-blk-vhost-user driver [Kevin]
>   - Drop .bdrv_parse_filename() and .bdrv_needs_filename for virtio-blk-vhost-vdpa [Stefano]
>   - Add copyright and license header [Hanna]
>   - Drop .bdrv_parse_filename() in favor of --blockdev or json: [Hanna]
>   - Clarify that "filename" is always non-NULL for io_uring [Hanna]
>   - Check that virtio-blk-vhost-vdpa "path" option is non-NULL [Hanna]
>   - Fix virtio-blk-vhost-vdpa cache.direct=off logic [Hanna]
>   - Use macros for driver names [Hanna]
>   - Assert that the driver name is valid [Hanna]
>   - Update "readonly" property name to "read-only" [Hanna]
>   - Call blkio_detach_aio_context() in blkio_close() [Hanna]
>   - Avoid uint32_t * to int * casts in blkio_refresh_limits() [Hanna]
>   - Remove write zeroes and discard from the todo list [Hanna]
>   - Use PRIu32 instead of %d for uint32_t [Hanna]
>   - Fix error messages with buf-alignment instead of optimal-io-size [Hanna]
>   - Call map/unmap APIs since libblkio alloc/free APIs no longer do that
>   - Update QAPI schema QEMU version to 7.2
> - Patch 5:
>   - Expand the BDRV_REQ_REGISTERED_BUF flag passthrough and drop assert(!flags)
>     in drivers [Hanna]
> - Patch 7:
>   - Fix BLK->BDRV typo [Hanna]
>   - Make BlockRAMRegistrar handle failure [Hanna]
> - Patch 8:
>   - Replace memory_region_get_fd() approach with qemu_ram_get_fd()
> - Patch 10:
>   - Use (void)ret; to discard unused return value [Hanna]
>   - libblkio's blkio_unmap_mem_region() API no longer has a return value
>   - Check for registered bufs that cross RAMBlocks [Hanna]
> - Patch 11:
>   - Handle bdrv_register_buf() errors [Hanna]
> v3:
> - Add virtio-blk-vhost-vdpa for vdpa-blk devices including VDUSE
> - Add discard and write zeroes support
> - Rebase and adopt latest libblkio APIs
> v2:
> - Add BDRV_REQ_REGISTERED_BUF to bs.supported_write_flags [Stefano]
> - Use new blkioq_get_num_completions() API
> - Implement .bdrv_refresh_limits()
>
> This patch series adds a QEMU BlockDriver for libblkio
> (https://gitlab.com/libblkio/libblkio/), a library for high-performance block
> device I/O. This work was presented at KVM Forum 2022 and slides are available
> here:
> https://static.sched.com/hosted_files/kvmforum2022/8c/libblkio-kvm-forum-2022.pdf
>
> The second patch adds the core BlockDriver and most of the libblkio API usage.
> Three libblkio drivers are included:
> - io_uring
> - virtio-blk-vhost-user
> - virtio-blk-vhost-vdpa
>
> The remainder of the patch series reworks the existing QEMU bdrv_register_buf()
> API so virtio-blk emulation efficiently map guest RAM for libblkio - some
> libblkio drivers require that I/O buffer memory is pre-registered (think VFIO,
> vhost, etc).
>
> Vladimir requested performance results that show the effect of the
> BDRV_REQ_REGISTERED_BUF flag. I ran the patches against qemu-storage-daemon's
> vhost-user-blk export with iodepth=1 bs=512 to see the per-request overhead due
> to bounce buffer allocation/mapping:
>
> Name                                   IOPS   Error
> bounce-buf                          4373.81 ± 0.01%
> registered-buf                     13062.80 ± 0.67%
>
> The BDRV_REQ_REGISTERED_BUF optimization version is about 3x faster.
>
> See the BlockDriver struct in block/blkio.c for a list of APIs that still need
> to be implemented. The core functionality is covered.
>
> Regarding the design: each libblkio driver is a separately named BlockDriver.
> That means there is an "io_uring" BlockDriver and not a generic "libblkio"
> BlockDriver. This way QAPI and open parameters are type-safe and mandatory
> parameters can be checked by QEMU.
>
> Stefan Hajnoczi (12):
>   coroutine: add flag to re-queue at front of CoQueue
>   blkio: add libblkio block driver
>   numa: call ->ram_block_removed() in ram_block_notifer_remove()
>   block: pass size to bdrv_unregister_buf()
>   block: use BdrvRequestFlags type for supported flag fields
>   block: add BDRV_REQ_REGISTERED_BUF request flag
>   block: return errors from bdrv_register_buf()
>   block: add BlockRAMRegistrar
>   exec/cpu-common: add qemu_ram_get_fd()
>   stubs: add qemu_ram_block_from_host() and qemu_ram_get_fd()
>   blkio: implement BDRV_REQ_REGISTERED_BUF optimization
>   virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint
>
>  MAINTAINERS                                 |    7 +
>  meson_options.txt                           |    2 +
>  qapi/block-core.json                        |   53 +-
>  meson.build                                 |    9 +
>  include/block/block-common.h                |    9 +
>  include/block/block-global-state.h          |   10 +-
>  include/block/block_int-common.h            |   15 +-
>  include/exec/cpu-common.h                   |    1 +
>  include/hw/virtio/virtio-blk.h              |    2 +
>  include/qemu/coroutine.h                    |   15 +-
>  include/sysemu/block-backend-global-state.h |    4 +-
>  include/sysemu/block-ram-registrar.h        |   37 +
>  block.c                                     |   14 +
>  block/blkio.c                               | 1017 +++++++++++++++++++
>  block/blkverify.c                           |    4 +-
>  block/block-backend.c                       |    8 +-
>  block/block-ram-registrar.c                 |   54 +
>  block/crypto.c                              |    4 +-
>  block/file-posix.c                          |    1 -
>  block/gluster.c                             |    1 -
>  block/io.c                                  |  101 +-
>  block/mirror.c                              |    2 +
>  block/nbd.c                                 |    1 -
>  block/nvme.c                                |   20 +-
>  block/parallels.c                           |    1 -
>  block/qcow.c                                |    2 -
>  block/qed.c                                 |    1 -
>  block/raw-format.c                          |    2 +
>  block/replication.c                         |    1 -
>  block/ssh.c                                 |    1 -
>  block/vhdx.c                                |    1 -
>  hw/block/virtio-blk.c                       |   39 +-
>  hw/core/numa.c                              |   17 +
>  qemu-img.c                                  |    6 +-
>  softmmu/physmem.c                           |    5 +
>  stubs/physmem.c                             |   13 +
>  tests/qtest/modules-test.c                  |    3 +
>  util/qemu-coroutine-lock.c                  |    9 +-
>  util/vfio-helpers.c                         |    5 +-
>  block/meson.build                           |    2 +
>  scripts/meson-buildoptions.sh               |    3 +
>  stubs/meson.build                           |    1 +
>  42 files changed, 1412 insertions(+), 91 deletions(-)
>  create mode 100644 include/sysemu/block-ram-registrar.h
>  create mode 100644 block/blkio.c
>  create mode 100644 block/block-ram-registrar.c
>  create mode 100644 stubs/physmem.c
>
> --
> 2.37.3
>
Re: [PATCH v5 00/12] blkio: add libblkio BlockDriver
Posted by Stefan Hajnoczi 1 year, 7 months ago
On Thu, Oct 06, 2022 at 02:18:35PM +0200, Stefano Garzarella wrote:
> Hi Stefan,
> I tested this series with the vDPA block simulator in Linux v6.0.
> It worked well, but I had a segfault when blkio_map_mem_region() fails.
> 
> In my case, it failed because I forgot to increase the `memlock` limit 
> with `ulimit -l unlimited` since the simulator still requires locking 
> all the VM memory.
> 
> QEMU failed with this messages:
> $ qemu-system-x86_64 ... \
>   -blockdev node-name=drive_src1,driver=virtio-blk-vhost-vdpa,path=/dev/vhost-vdpa-0,cache.direct=on
> qemu-system-x86_64: -device virtio-blk-pci,id=src1,bootindex=2,drive=drive_src1,iothread=iothread0: Failed to add blkio mem region 0x7f60bfe00000 with size 536870912: Bad address (os error 14)
> qemu-system-x86_64: -device virtio-blk-pci,id=src1,bootindex=2,drive=drive_src1,iothread=iothread0: Failed to add blkio mem region 0x7f60be400000 with size 16777216: Bad address (os error 14)
> [1]    20803 segmentation fault
> 
> IIUC this could be related to a double call to 
> ram_block_notifier_remove() in the error path of ram_block_added() 
> (block/block-ram-registrar.c) callback.
> 
> Maybe we should call blk_register_buf() only if r->ok is true.

Thanks for pointing this out!

The notifiers don't use QLIST_FOREACH_SAFE() so calling
ram_block_notifier_remove() from within .ram_block_added() is incorrect.

Stefan

> 
> The stack trace is the following:
> #0  0x00005641a8b097dd in ram_block_notifier_remove (n=n@entry=0x5641ab8354d8) at ../hw/core/numa.c:850
> #1  0x00005641a89c4701 in ram_block_added
>      (n=0x5641ab8354d8, host=<optimized out>, size=<optimized out>, max_size=<optimized out>)
>     at ../block/block-ram-registrar.c:20
> #2  0x00005641a8b083af in ram_block_notify_add_single (rb=0x5641ab3d2810, opaque=0x5641ab8354d8)
>     at ../hw/core/numa.c:820
> #3  0x00005641a8b92d8d in qemu_ram_foreach_block
>     (func=func@entry=0x5641a8b08370 <ram_block_notify_add_single>, opaque=opaque@entry=0x5641ab8354d8)
>     at ../softmmu/physmem.c:3571
> #4  0x00005641a8b097af in ram_block_notifier_add (n=n@entry=0x5641ab8354d8) at ../hw/core/numa.c:844
> #5  0x00005641a89c474f in blk_ram_registrar_init (r=r@entry=0x5641ab8354d0, blk=<optimized out>)
>     at ../block/block-ram-registrar.c:46
> #6  0x00005641a8affe88 in virtio_blk_device_realize (dev=0x5641ab835230, errp=0x7ffc72657190)
>     at ../hw/block/virtio-blk.c:1239
> #7  0x00005641a8b4b716 in virtio_device_realize (dev=0x5641ab835230, errp=0x7ffc726571f0)
>     at ../hw/virtio/virtio.c:3684
> #8  0x00005641a8c2049b in device_set_realized (obj=<optimized out>, value=<optimized out>, errp=0x7ffc72657380)
>     at ../hw/core/qdev.c:553
> #9  0x00005641a8c24738 in property_set_bool
>     (obj=0x5641ab835230, v=<optimized out>, name=<optimized out>, opaque=0x5641aa518820, errp=0x7ffc72657380)
>     at ../qom/object.c:2273
> #10 0x00005641a8c27683 in object_property_set
>    
>      (obj=obj@entry=0x5641ab835230, name=name@entry=0x5641a8e9719a "realized", v=v@entry=0x5641ab83f5b0, errp=errp@entry=0x7ffc72657380) at ../qom/object.c:1408
> #11 0x00005641a8c2a97f in object_property_set_qobject
>     (obj=obj@entry=0x5641ab835230, name=name@entry=0x5641a8e9719a "realized", value=value@entry=0x5641ab83f4f0, errp=errp@entry=0x7ffc72657380) at ../qom/qom-qobject.c:28
> #12 0x00005641a8c27c84 in object_property_set_bool
>     (obj=0x5641ab835230, name=0x5641a8e9719a "realized", value=<optimized out>, errp=0x7ffc72657380)
>     at ../qom/object.c:1477
> #13 0x00005641a8929b74 in pci_qdev_realize (qdev=<optimized out>, errp=<optimized out>) at ../hw/pci/pci.c:2218
> #14 0x00005641a8c2049b in device_set_realized (obj=<optimized out>, value=<optimized out>, errp=0x7ffc726575b0)
>     at ../hw/core/qdev.c:553
> #15 0x00005641a8c24738 in property_set_bool
>     (obj=0x5641ab82ce70, v=<optimized out>, name=<optimized out>, opaque=0x5641aa518820, errp=0x7ffc726575b0)
>     at ../qom/object.c:2273
> #16 0x00005641a8c27683 in object_property_set
> 
> Thanks,
> Stefano
> 
> On Tue, Sep 27, 2022 at 9:34 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > v5:
> > - Drop "RFC" since libblkio 1.0 has been released and the library API is stable
> > - Disable BDRV_REQ_REGISTERED_BUF if we run out of blkio_mem_regions. The
> >   bounce buffer slow path is taken when there are not enough blkio_mem_regions
> >   to cover guest RAM. [Hanna & David Hildenbrand]
> > - Call ram_block_discard_disable() when mem-region-pinned property is true or
> >   absent [David Hildenbrand]
> > - Use a bounce buffer pool instead of allocating/freeing a buffer for each
> >   request. This reduces the number of blkio_mem_regions required for bounce
> >   buffers to 1 and avoids frequent blkio_mem_region_map/unmap() calls.
> > - Switch to .bdrv_co_*() instead of .bdrv_aio_*(). Needed for the bounce buffer
> >   pool's CoQueue.
> > v4:
> > - Patch 1:
> >   - Add virtio-blk-vhost-user driver [Kevin]
> >   - Drop .bdrv_parse_filename() and .bdrv_needs_filename for virtio-blk-vhost-vdpa [Stefano]
> >   - Add copyright and license header [Hanna]
> >   - Drop .bdrv_parse_filename() in favor of --blockdev or json: [Hanna]
> >   - Clarify that "filename" is always non-NULL for io_uring [Hanna]
> >   - Check that virtio-blk-vhost-vdpa "path" option is non-NULL [Hanna]
> >   - Fix virtio-blk-vhost-vdpa cache.direct=off logic [Hanna]
> >   - Use macros for driver names [Hanna]
> >   - Assert that the driver name is valid [Hanna]
> >   - Update "readonly" property name to "read-only" [Hanna]
> >   - Call blkio_detach_aio_context() in blkio_close() [Hanna]
> >   - Avoid uint32_t * to int * casts in blkio_refresh_limits() [Hanna]
> >   - Remove write zeroes and discard from the todo list [Hanna]
> >   - Use PRIu32 instead of %d for uint32_t [Hanna]
> >   - Fix error messages with buf-alignment instead of optimal-io-size [Hanna]
> >   - Call map/unmap APIs since libblkio alloc/free APIs no longer do that
> >   - Update QAPI schema QEMU version to 7.2
> > - Patch 5:
> >   - Expand the BDRV_REQ_REGISTERED_BUF flag passthrough and drop assert(!flags)
> >     in drivers [Hanna]
> > - Patch 7:
> >   - Fix BLK->BDRV typo [Hanna]
> >   - Make BlockRAMRegistrar handle failure [Hanna]
> > - Patch 8:
> >   - Replace memory_region_get_fd() approach with qemu_ram_get_fd()
> > - Patch 10:
> >   - Use (void)ret; to discard unused return value [Hanna]
> >   - libblkio's blkio_unmap_mem_region() API no longer has a return value
> >   - Check for registered bufs that cross RAMBlocks [Hanna]
> > - Patch 11:
> >   - Handle bdrv_register_buf() errors [Hanna]
> > v3:
> > - Add virtio-blk-vhost-vdpa for vdpa-blk devices including VDUSE
> > - Add discard and write zeroes support
> > - Rebase and adopt latest libblkio APIs
> > v2:
> > - Add BDRV_REQ_REGISTERED_BUF to bs.supported_write_flags [Stefano]
> > - Use new blkioq_get_num_completions() API
> > - Implement .bdrv_refresh_limits()
> >
> > This patch series adds a QEMU BlockDriver for libblkio
> > (https://gitlab.com/libblkio/libblkio/), a library for high-performance block
> > device I/O. This work was presented at KVM Forum 2022 and slides are available
> > here:
> > https://static.sched.com/hosted_files/kvmforum2022/8c/libblkio-kvm-forum-2022.pdf
> >
> > The second patch adds the core BlockDriver and most of the libblkio API usage.
> > Three libblkio drivers are included:
> > - io_uring
> > - virtio-blk-vhost-user
> > - virtio-blk-vhost-vdpa
> >
> > The remainder of the patch series reworks the existing QEMU bdrv_register_buf()
> > API so virtio-blk emulation efficiently map guest RAM for libblkio - some
> > libblkio drivers require that I/O buffer memory is pre-registered (think VFIO,
> > vhost, etc).
> >
> > Vladimir requested performance results that show the effect of the
> > BDRV_REQ_REGISTERED_BUF flag. I ran the patches against qemu-storage-daemon's
> > vhost-user-blk export with iodepth=1 bs=512 to see the per-request overhead due
> > to bounce buffer allocation/mapping:
> >
> > Name                                   IOPS   Error
> > bounce-buf                          4373.81 ± 0.01%
> > registered-buf                     13062.80 ± 0.67%
> >
> > The BDRV_REQ_REGISTERED_BUF optimization version is about 3x faster.
> >
> > See the BlockDriver struct in block/blkio.c for a list of APIs that still need
> > to be implemented. The core functionality is covered.
> >
> > Regarding the design: each libblkio driver is a separately named BlockDriver.
> > That means there is an "io_uring" BlockDriver and not a generic "libblkio"
> > BlockDriver. This way QAPI and open parameters are type-safe and mandatory
> > parameters can be checked by QEMU.
> >
> > Stefan Hajnoczi (12):
> >   coroutine: add flag to re-queue at front of CoQueue
> >   blkio: add libblkio block driver
> >   numa: call ->ram_block_removed() in ram_block_notifer_remove()
> >   block: pass size to bdrv_unregister_buf()
> >   block: use BdrvRequestFlags type for supported flag fields
> >   block: add BDRV_REQ_REGISTERED_BUF request flag
> >   block: return errors from bdrv_register_buf()
> >   block: add BlockRAMRegistrar
> >   exec/cpu-common: add qemu_ram_get_fd()
> >   stubs: add qemu_ram_block_from_host() and qemu_ram_get_fd()
> >   blkio: implement BDRV_REQ_REGISTERED_BUF optimization
> >   virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint
> >
> >  MAINTAINERS                                 |    7 +
> >  meson_options.txt                           |    2 +
> >  qapi/block-core.json                        |   53 +-
> >  meson.build                                 |    9 +
> >  include/block/block-common.h                |    9 +
> >  include/block/block-global-state.h          |   10 +-
> >  include/block/block_int-common.h            |   15 +-
> >  include/exec/cpu-common.h                   |    1 +
> >  include/hw/virtio/virtio-blk.h              |    2 +
> >  include/qemu/coroutine.h                    |   15 +-
> >  include/sysemu/block-backend-global-state.h |    4 +-
> >  include/sysemu/block-ram-registrar.h        |   37 +
> >  block.c                                     |   14 +
> >  block/blkio.c                               | 1017 +++++++++++++++++++
> >  block/blkverify.c                           |    4 +-
> >  block/block-backend.c                       |    8 +-
> >  block/block-ram-registrar.c                 |   54 +
> >  block/crypto.c                              |    4 +-
> >  block/file-posix.c                          |    1 -
> >  block/gluster.c                             |    1 -
> >  block/io.c                                  |  101 +-
> >  block/mirror.c                              |    2 +
> >  block/nbd.c                                 |    1 -
> >  block/nvme.c                                |   20 +-
> >  block/parallels.c                           |    1 -
> >  block/qcow.c                                |    2 -
> >  block/qed.c                                 |    1 -
> >  block/raw-format.c                          |    2 +
> >  block/replication.c                         |    1 -
> >  block/ssh.c                                 |    1 -
> >  block/vhdx.c                                |    1 -
> >  hw/block/virtio-blk.c                       |   39 +-
> >  hw/core/numa.c                              |   17 +
> >  qemu-img.c                                  |    6 +-
> >  softmmu/physmem.c                           |    5 +
> >  stubs/physmem.c                             |   13 +
> >  tests/qtest/modules-test.c                  |    3 +
> >  util/qemu-coroutine-lock.c                  |    9 +-
> >  util/vfio-helpers.c                         |    5 +-
> >  block/meson.build                           |    2 +
> >  scripts/meson-buildoptions.sh               |    3 +
> >  stubs/meson.build                           |    1 +
> >  42 files changed, 1412 insertions(+), 91 deletions(-)
> >  create mode 100644 include/sysemu/block-ram-registrar.h
> >  create mode 100644 block/blkio.c
> >  create mode 100644 block/block-ram-registrar.c
> >  create mode 100644 stubs/physmem.c
> >
> > --
> > 2.37.3
> >
>