[PATCH v4 00/15] io_uring: add Linux io_uring AIO engine

Stefan Hajnoczi posted 15 patches 4 years, 3 months ago
Test docker-mingw@fedora passed
Test checkpatch passed
Test docker-quick@centos7 passed
Test FreeBSD passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20200114105921.131880-1-stefanha@redhat.com
Maintainers: Julia Suvorova <jusual@redhat.com>, Fam Zheng <fam@euphon.net>, Markus Armbruster <armbru@redhat.com>, Eric Blake <eblake@redhat.com>, Aarushi Mehta <mehta.aaru20@gmail.com>, Stefan Hajnoczi <stefanha@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, Max Reitz <mreitz@redhat.com>, Kevin Wolf <kwolf@redhat.com>
There is a newer version of this series
MAINTAINERS                   |   9 +
block.c                       |  22 ++
block/Makefile.objs           |   3 +
block/file-posix.c            |  85 +++++--
block/io_uring.c              | 433 ++++++++++++++++++++++++++++++++++
block/trace-events            |  12 +
blockdev.c                    |  12 +-
configure                     |  27 +++
include/block/aio.h           |  16 +-
include/block/block.h         |   2 +
include/block/raw-aio.h       |  12 +
qapi/block-core.json          |   4 +-
qemu-img-cmds.hx              |   4 +-
qemu-img.c                    |  11 +-
qemu-img.texi                 |   5 +-
qemu-io.c                     |  25 +-
qemu-nbd.c                    |  12 +-
qemu-nbd.texi                 |   4 +-
stubs/Makefile.objs           |   1 +
stubs/io_uring.c              |  32 +++
tests/qemu-iotests/028        |   2 +-
tests/qemu-iotests/058        |   2 +-
tests/qemu-iotests/089        |   4 +-
tests/qemu-iotests/091        |   4 +-
tests/qemu-iotests/109        |   2 +-
tests/qemu-iotests/147        |   5 +-
tests/qemu-iotests/181        |   8 +-
tests/qemu-iotests/183        |   4 +-
tests/qemu-iotests/185        |  10 +-
tests/qemu-iotests/200        |   2 +-
tests/qemu-iotests/201        |   8 +-
tests/qemu-iotests/check      |  15 +-
tests/qemu-iotests/common.rc  |  14 ++
tests/qemu-iotests/iotests.py |  12 +-
util/async.c                  |  36 +++
35 files changed, 787 insertions(+), 72 deletions(-)
create mode 100644 block/io_uring.c
create mode 100644 stubs/io_uring.c
[PATCH v4 00/15] io_uring: add Linux io_uring AIO engine
Posted by Stefan Hajnoczi 4 years, 3 months ago
v13:
 * Drop unnecessary changes in Patch 8 [Stefano]

v12:
 * Reword BlockdevAioOptions QAPI schema commit description [Markus]
 * Increase QAPI "Since: 4.2" to "Since: 5.0"
 * Explain rationale for io_uring stubs in commit description [Kevin]
 * Tried to use file.aio=io_uring instead of BDRV_O_IO_URING but it's really
   hard to make qemu-iotests work.  Tests build blkdebug: and other graphs so
   the syntax for io_uring is dependent on the test case.  I scrapped this
   approach and went back to a global flag.

v11:
 * Drop fd registration because it breaks QEMU's file locking and will need to
   be resolved in a separate patch series
 * Drop line-wrapping changes that accidentally broke several qemu-iotests

v10:
 * Dropped kernel submission queue polling, it requires root and has additional
   limitations.  It should be benchmarked and considered for inclusion later,
   maybe even together with kernel side changes.
 * Add io_uring_register_files() return value to trace_luring_fd_register()
 * Fix indentation in luring_fd_unregister()
 * Set s->fd_reg.fd_array to NULL after g_free() to avoid dangling pointers
 * Simplify fd registration code
 * Add luring_fd_unregister() and call it from file-posix.c to prevent
   fd leaks
 * Add trace_luring_fd_unregister() trace event
 * Add missing space to qemu-img command-line documentation
 * Update MAINTAINERS file [Julia]
 * Rename MAX_EVENTS to MAX_ENTRIES [Julia]
 * Define ioq_submit() before callers so the prototype isn't necessary [Julia]
 * Declare variables at the beginning of the block in luring_init() [Julia]

This patch series is based on Aarushi Mehta's v9 patch series written for
Google Summer of Code 2019:

  https://lists.gnu.org/archive/html/qemu-devel/2019-08/msg00179.html

It adds a new AIO engine that uses the new Linux io_uring API.  This is the
successor to Linux AIO with a number of improvements:
1. Both O_DIRECT and buffered I/O work
2. fdatasync(2) is supported (no need for a separate thread pool!)
3. True async behavior so the syscall doesn't block (Linux AIO got there to some degree...)
4. Advanced performance optimizations are available (file registration, memory
   buffer registration, completion polling, submission polling).

Since Aarushi has been busy, I have taken up this patch series.  Booting a
guest works with -drive aio=io_uring and -drive aio=io_uring,cache=none with a
raw file on XFS.

I currently recommend using -drive aio=io_uring only with host block devices
(like NVMe devices).  As of Linux v5.4-rc1 I still hit kernel bugs when using
image files on ext4 or XFS.

Aarushi Mehta (15):
  configure: permit use of io_uring
  qapi/block-core: add option for io_uring
  block/block: add BDRV flag for io_uring
  block/io_uring: implements interfaces for io_uring
  stubs: add stubs for io_uring interface
  util/async: add aio interfaces for io_uring
  blockdev: adds bdrv_parse_aio to use io_uring
  block/file-posix.c: extend to use io_uring
  block: add trace events for io_uring
  block/io_uring: adds userspace completion polling
  qemu-io: adds option to use aio engine
  qemu-img: adds option to use aio engine for benchmarking
  qemu-nbd: adds option for aio engines
  tests/qemu-iotests: enable testing with aio options
  tests/qemu-iotests: use AIOMODE with various tests

 MAINTAINERS                   |   9 +
 block.c                       |  22 ++
 block/Makefile.objs           |   3 +
 block/file-posix.c            |  85 +++++--
 block/io_uring.c              | 433 ++++++++++++++++++++++++++++++++++
 block/trace-events            |  12 +
 blockdev.c                    |  12 +-
 configure                     |  27 +++
 include/block/aio.h           |  16 +-
 include/block/block.h         |   2 +
 include/block/raw-aio.h       |  12 +
 qapi/block-core.json          |   4 +-
 qemu-img-cmds.hx              |   4 +-
 qemu-img.c                    |  11 +-
 qemu-img.texi                 |   5 +-
 qemu-io.c                     |  25 +-
 qemu-nbd.c                    |  12 +-
 qemu-nbd.texi                 |   4 +-
 stubs/Makefile.objs           |   1 +
 stubs/io_uring.c              |  32 +++
 tests/qemu-iotests/028        |   2 +-
 tests/qemu-iotests/058        |   2 +-
 tests/qemu-iotests/089        |   4 +-
 tests/qemu-iotests/091        |   4 +-
 tests/qemu-iotests/109        |   2 +-
 tests/qemu-iotests/147        |   5 +-
 tests/qemu-iotests/181        |   8 +-
 tests/qemu-iotests/183        |   4 +-
 tests/qemu-iotests/185        |  10 +-
 tests/qemu-iotests/200        |   2 +-
 tests/qemu-iotests/201        |   8 +-
 tests/qemu-iotests/check      |  15 +-
 tests/qemu-iotests/common.rc  |  14 ++
 tests/qemu-iotests/iotests.py |  12 +-
 util/async.c                  |  36 +++
 35 files changed, 787 insertions(+), 72 deletions(-)
 create mode 100644 block/io_uring.c
 create mode 100644 stubs/io_uring.c

-- 
2.24.1


Re: [PATCH v4 00/15] io_uring: add Linux io_uring AIO engine
Posted by Stefan Hajnoczi 4 years, 3 months ago
On Tue, Jan 14, 2020 at 10:59:06AM +0000, Stefan Hajnoczi wrote:
> v13:
>  * Drop unnecessary changes in Patch 8 [Stefano]
> 
> v12:
>  * Reword BlockdevAioOptions QAPI schema commit description [Markus]
>  * Increase QAPI "Since: 4.2" to "Since: 5.0"
>  * Explain rationale for io_uring stubs in commit description [Kevin]
>  * Tried to use file.aio=io_uring instead of BDRV_O_IO_URING but it's really
>    hard to make qemu-iotests work.  Tests build blkdebug: and other graphs so
>    the syntax for io_uring is dependent on the test case.  I scrapped this
>    approach and went back to a global flag.
> 
> v11:
>  * Drop fd registration because it breaks QEMU's file locking and will need to
>    be resolved in a separate patch series
>  * Drop line-wrapping changes that accidentally broke several qemu-iotests
> 
> v10:
>  * Dropped kernel submission queue polling, it requires root and has additional
>    limitations.  It should be benchmarked and considered for inclusion later,
>    maybe even together with kernel side changes.
>  * Add io_uring_register_files() return value to trace_luring_fd_register()
>  * Fix indentation in luring_fd_unregister()
>  * Set s->fd_reg.fd_array to NULL after g_free() to avoid dangling pointers
>  * Simplify fd registration code
>  * Add luring_fd_unregister() and call it from file-posix.c to prevent
>    fd leaks
>  * Add trace_luring_fd_unregister() trace event
>  * Add missing space to qemu-img command-line documentation
>  * Update MAINTAINERS file [Julia]
>  * Rename MAX_EVENTS to MAX_ENTRIES [Julia]
>  * Define ioq_submit() before callers so the prototype isn't necessary [Julia]
>  * Declare variables at the beginning of the block in luring_init() [Julia]
> 
> This patch series is based on Aarushi Mehta's v9 patch series written for
> Google Summer of Code 2019:
> 
>   https://lists.gnu.org/archive/html/qemu-devel/2019-08/msg00179.html
> 
> It adds a new AIO engine that uses the new Linux io_uring API.  This is the
> successor to Linux AIO with a number of improvements:
> 1. Both O_DIRECT and buffered I/O work
> 2. fdatasync(2) is supported (no need for a separate thread pool!)
> 3. True async behavior so the syscall doesn't block (Linux AIO got there to some degree...)
> 4. Advanced performance optimizations are available (file registration, memory
>    buffer registration, completion polling, submission polling).
> 
> Since Aarushi has been busy, I have taken up this patch series.  Booting a
> guest works with -drive aio=io_uring and -drive aio=io_uring,cache=none with a
> raw file on XFS.
> 
> I currently recommend using -drive aio=io_uring only with host block devices
> (like NVMe devices).  As of Linux v5.4-rc1 I still hit kernel bugs when using
> image files on ext4 or XFS.
> 
> Aarushi Mehta (15):
>   configure: permit use of io_uring
>   qapi/block-core: add option for io_uring
>   block/block: add BDRV flag for io_uring
>   block/io_uring: implements interfaces for io_uring
>   stubs: add stubs for io_uring interface
>   util/async: add aio interfaces for io_uring
>   blockdev: adds bdrv_parse_aio to use io_uring
>   block/file-posix.c: extend to use io_uring
>   block: add trace events for io_uring
>   block/io_uring: adds userspace completion polling
>   qemu-io: adds option to use aio engine
>   qemu-img: adds option to use aio engine for benchmarking
>   qemu-nbd: adds option for aio engines
>   tests/qemu-iotests: enable testing with aio options
>   tests/qemu-iotests: use AIOMODE with various tests
> 
>  MAINTAINERS                   |   9 +
>  block.c                       |  22 ++
>  block/Makefile.objs           |   3 +
>  block/file-posix.c            |  85 +++++--
>  block/io_uring.c              | 433 ++++++++++++++++++++++++++++++++++
>  block/trace-events            |  12 +
>  blockdev.c                    |  12 +-
>  configure                     |  27 +++
>  include/block/aio.h           |  16 +-
>  include/block/block.h         |   2 +
>  include/block/raw-aio.h       |  12 +
>  qapi/block-core.json          |   4 +-
>  qemu-img-cmds.hx              |   4 +-
>  qemu-img.c                    |  11 +-
>  qemu-img.texi                 |   5 +-
>  qemu-io.c                     |  25 +-
>  qemu-nbd.c                    |  12 +-
>  qemu-nbd.texi                 |   4 +-
>  stubs/Makefile.objs           |   1 +
>  stubs/io_uring.c              |  32 +++
>  tests/qemu-iotests/028        |   2 +-
>  tests/qemu-iotests/058        |   2 +-
>  tests/qemu-iotests/089        |   4 +-
>  tests/qemu-iotests/091        |   4 +-
>  tests/qemu-iotests/109        |   2 +-
>  tests/qemu-iotests/147        |   5 +-
>  tests/qemu-iotests/181        |   8 +-
>  tests/qemu-iotests/183        |   4 +-
>  tests/qemu-iotests/185        |  10 +-
>  tests/qemu-iotests/200        |   2 +-
>  tests/qemu-iotests/201        |   8 +-
>  tests/qemu-iotests/check      |  15 +-
>  tests/qemu-iotests/common.rc  |  14 ++
>  tests/qemu-iotests/iotests.py |  12 +-
>  util/async.c                  |  36 +++
>  35 files changed, 787 insertions(+), 72 deletions(-)
>  create mode 100644 block/io_uring.c
>  create mode 100644 stubs/io_uring.c
> 
> -- 
> 2.24.1
> 
> 

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan
Re: [PATCH v4 00/15] io_uring: add Linux io_uring AIO engine
Posted by Stefan Hajnoczi 4 years, 3 months ago
On Mon, Jan 20, 2020 at 10:35:33AM +0000, Stefan Hajnoczi wrote:
> On Tue, Jan 14, 2020 at 10:59:06AM +0000, Stefan Hajnoczi wrote:
> > v13:
> >  * Drop unnecessary changes in Patch 8 [Stefano]
> > 
> > v12:
> >  * Reword BlockdevAioOptions QAPI schema commit description [Markus]
> >  * Increase QAPI "Since: 4.2" to "Since: 5.0"
> >  * Explain rationale for io_uring stubs in commit description [Kevin]
> >  * Tried to use file.aio=io_uring instead of BDRV_O_IO_URING but it's really
> >    hard to make qemu-iotests work.  Tests build blkdebug: and other graphs so
> >    the syntax for io_uring is dependent on the test case.  I scrapped this
> >    approach and went back to a global flag.
> > 
> > v11:
> >  * Drop fd registration because it breaks QEMU's file locking and will need to
> >    be resolved in a separate patch series
> >  * Drop line-wrapping changes that accidentally broke several qemu-iotests
> > 
> > v10:
> >  * Dropped kernel submission queue polling, it requires root and has additional
> >    limitations.  It should be benchmarked and considered for inclusion later,
> >    maybe even together with kernel side changes.
> >  * Add io_uring_register_files() return value to trace_luring_fd_register()
> >  * Fix indentation in luring_fd_unregister()
> >  * Set s->fd_reg.fd_array to NULL after g_free() to avoid dangling pointers
> >  * Simplify fd registration code
> >  * Add luring_fd_unregister() and call it from file-posix.c to prevent
> >    fd leaks
> >  * Add trace_luring_fd_unregister() trace event
> >  * Add missing space to qemu-img command-line documentation
> >  * Update MAINTAINERS file [Julia]
> >  * Rename MAX_EVENTS to MAX_ENTRIES [Julia]
> >  * Define ioq_submit() before callers so the prototype isn't necessary [Julia]
> >  * Declare variables at the beginning of the block in luring_init() [Julia]
> > 
> > This patch series is based on Aarushi Mehta's v9 patch series written for
> > Google Summer of Code 2019:
> > 
> >   https://lists.gnu.org/archive/html/qemu-devel/2019-08/msg00179.html
> > 
> > It adds a new AIO engine that uses the new Linux io_uring API.  This is the
> > successor to Linux AIO with a number of improvements:
> > 1. Both O_DIRECT and buffered I/O work
> > 2. fdatasync(2) is supported (no need for a separate thread pool!)
> > 3. True async behavior so the syscall doesn't block (Linux AIO got there to some degree...)
> > 4. Advanced performance optimizations are available (file registration, memory
> >    buffer registration, completion polling, submission polling).
> > 
> > Since Aarushi has been busy, I have taken up this patch series.  Booting a
> > guest works with -drive aio=io_uring and -drive aio=io_uring,cache=none with a
> > raw file on XFS.
> > 
> > I currently recommend using -drive aio=io_uring only with host block devices
> > (like NVMe devices).  As of Linux v5.4-rc1 I still hit kernel bugs when using
> > image files on ext4 or XFS.
> > 
> > Aarushi Mehta (15):
> >   configure: permit use of io_uring
> >   qapi/block-core: add option for io_uring
> >   block/block: add BDRV flag for io_uring
> >   block/io_uring: implements interfaces for io_uring
> >   stubs: add stubs for io_uring interface
> >   util/async: add aio interfaces for io_uring
> >   blockdev: adds bdrv_parse_aio to use io_uring
> >   block/file-posix.c: extend to use io_uring
> >   block: add trace events for io_uring
> >   block/io_uring: adds userspace completion polling
> >   qemu-io: adds option to use aio engine
> >   qemu-img: adds option to use aio engine for benchmarking
> >   qemu-nbd: adds option for aio engines
> >   tests/qemu-iotests: enable testing with aio options
> >   tests/qemu-iotests: use AIOMODE with various tests
> > 
> >  MAINTAINERS                   |   9 +
> >  block.c                       |  22 ++
> >  block/Makefile.objs           |   3 +
> >  block/file-posix.c            |  85 +++++--
> >  block/io_uring.c              | 433 ++++++++++++++++++++++++++++++++++
> >  block/trace-events            |  12 +
> >  blockdev.c                    |  12 +-
> >  configure                     |  27 +++
> >  include/block/aio.h           |  16 +-
> >  include/block/block.h         |   2 +
> >  include/block/raw-aio.h       |  12 +
> >  qapi/block-core.json          |   4 +-
> >  qemu-img-cmds.hx              |   4 +-
> >  qemu-img.c                    |  11 +-
> >  qemu-img.texi                 |   5 +-
> >  qemu-io.c                     |  25 +-
> >  qemu-nbd.c                    |  12 +-
> >  qemu-nbd.texi                 |   4 +-
> >  stubs/Makefile.objs           |   1 +
> >  stubs/io_uring.c              |  32 +++
> >  tests/qemu-iotests/028        |   2 +-
> >  tests/qemu-iotests/058        |   2 +-
> >  tests/qemu-iotests/089        |   4 +-
> >  tests/qemu-iotests/091        |   4 +-
> >  tests/qemu-iotests/109        |   2 +-
> >  tests/qemu-iotests/147        |   5 +-
> >  tests/qemu-iotests/181        |   8 +-
> >  tests/qemu-iotests/183        |   4 +-
> >  tests/qemu-iotests/185        |  10 +-
> >  tests/qemu-iotests/200        |   2 +-
> >  tests/qemu-iotests/201        |   8 +-
> >  tests/qemu-iotests/check      |  15 +-
> >  tests/qemu-iotests/common.rc  |  14 ++
> >  tests/qemu-iotests/iotests.py |  12 +-
> >  util/async.c                  |  36 +++
> >  35 files changed, 787 insertions(+), 72 deletions(-)
> >  create mode 100644 block/io_uring.c
> >  create mode 100644 stubs/io_uring.c
> > 
> > -- 
> > 2.24.1
> > 
> > 
> 
> Thanks, applied to my block tree:
> https://github.com/stefanha/qemu/commits/block

Kevin Wolf pointed out that BDRV_O_IO_URING isn't used by this series!
Oops, that means io_uring.c isn't being called anymore.  This bug
slipped in as part of v3.  I've sent a new revision.

Stefan