[Qemu-devel] [RFC PATCH 0/8] qemu-img convert with copy offloading

Fam Zheng posted 8 patches 6 years ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20180329110914.20888-1-famz@redhat.com
Test checkpatch passed
Test docker-build@min-glib failed
Test docker-mingw@fedora passed
Test docker-quick@centos6 failed
Test s390x passed
There is a newer version of this series
block/block-backend.c          |   8 ++
block/file-posix.c             |  92 +++++++++++++++++++-
block/io.c                     | 192 +++++++++++++++++++++++++++++++++++++++++
block/qcow2.c                  | 103 +++++++++++++++-------
block/raw-format.c             |   9 ++
include/block/block.h          |  12 ++-
include/block/block_int.h      |  35 ++++++++
include/block/raw-aio.h        |  10 ++-
include/sysemu/block-backend.h |   4 +
qemu-img.c                     |  45 +++++++++-
10 files changed, 472 insertions(+), 38 deletions(-)
[Qemu-devel] [RFC PATCH 0/8] qemu-img convert with copy offloading
Posted by Fam Zheng 6 years ago
[Posting a preview RFC for the general idea discussion and internal API review.
Libiscsi support is being worked on in the meantime.]

This series introduces block layer API for copy offloading and makes use of it
in qemu-img convert.

For now we implemented the operation in local file protocol with
copy_file_range(2).  Besides that it's possible to add similar to iscsi, nfs
and potentially more.

As far as its usage goes, in addition to qemu-img convert, we can emulate
offloading in scsi-disk (EXTENDED COPY), and do similar to drive-mirror.

The new bdrv_co_map_range can also be an alternative way to implement format
drivers in the future, once we make block/io.c use it in preadv/pwritev paths.

Fam Zheng (8):
  block: Introduce bdrv_co_map_range API
  qcow2: Implement bdrv_co_map_range
  block: Introduce bdrv_co_copy_range
  file-posix: Implement bdrv_co_copy_range
  file-posix: Implement bdrv_co_map_range
  raw: Implement raw_co_map_range
  block-backend: Add blk_co_copy_range
  qemu-img: Convert with copy offloading

 block/block-backend.c          |   8 ++
 block/file-posix.c             |  92 +++++++++++++++++++-
 block/io.c                     | 192 +++++++++++++++++++++++++++++++++++++++++
 block/qcow2.c                  | 103 +++++++++++++++-------
 block/raw-format.c             |   9 ++
 include/block/block.h          |  12 ++-
 include/block/block_int.h      |  35 ++++++++
 include/block/raw-aio.h        |  10 ++-
 include/sysemu/block-backend.h |   4 +
 qemu-img.c                     |  45 +++++++++-
 10 files changed, 472 insertions(+), 38 deletions(-)

-- 
2.14.3


Re: [Qemu-devel] [RFC PATCH 0/8] qemu-img convert with copy offloading
Posted by no-reply@patchew.org 6 years ago
Hi,

This series failed docker-build@min-glib build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

Type: series
Message-id: 20180329110914.20888-1-famz@redhat.com
Subject: [Qemu-devel] [RFC PATCH 0/8] qemu-img convert with copy offloading

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=8
time make docker-test-build@min-glib
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
6f3b33fe62 qemu-img: Convert with copy offloading
b435388660 block-backend: Add blk_co_copy_range
388616cd3c raw: Implement raw_co_map_range
a4e47507b4 file-posix: Implement bdrv_co_map_range
8a6e25ecab file-posix: Implement bdrv_co_copy_range
a3a1a163e2 block: Introduce bdrv_co_copy_range
b42ec41133 qcow2: Implement bdrv_co_map_range
1eadcd710a block: Introduce bdrv_co_map_range API

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-mfpmh0ne/src/dtc'...
Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42'
  BUILD   min-glib
make[1]: Entering directory '/var/tmp/patchew-tester-tmp-mfpmh0ne/src'
  GEN     /var/tmp/patchew-tester-tmp-mfpmh0ne/src/docker-src.2018-03-31-04.12.45.21136/qemu.tar
Cloning into '/var/tmp/patchew-tester-tmp-mfpmh0ne/src/docker-src.2018-03-31-04.12.45.21136/qemu.tar.vroot'...
done.
Checking out files:  11% (690/6066)   
Checking out files:  12% (728/6066)   
Checking out files:  12% (746/6066)   
Checking out files:  13% (789/6066)   
Checking out files:  14% (850/6066)   
Checking out files:  15% (910/6066)   
Checking out files:  16% (971/6066)   
Checking out files:  17% (1032/6066)   
Checking out files:  18% (1092/6066)   
Checking out files:  19% (1153/6066)   
Checking out files:  20% (1214/6066)   
Checking out files:  21% (1274/6066)   
Checking out files:  22% (1335/6066)   
Checking out files:  23% (1396/6066)   
Checking out files:  24% (1456/6066)   
Checking out files:  25% (1517/6066)   
Checking out files:  26% (1578/6066)   
Checking out files:  27% (1638/6066)   
Checking out files:  28% (1699/6066)   
Checking out files:  29% (1760/6066)   
Checking out files:  30% (1820/6066)   
Checking out files:  31% (1881/6066)   
Checking out files:  32% (1942/6066)   
Checking out files:  33% (2002/6066)   
Checking out files:  34% (2063/6066)   
Checking out files:  35% (2124/6066)   
Checking out files:  36% (2184/6066)   
Checking out files:  37% (2245/6066)   
Checking out files:  38% (2306/6066)   
Checking out files:  39% (2366/6066)   
Checking out files:  40% (2427/6066)   
Checking out files:  41% (2488/6066)   
Checking out files:  42% (2548/6066)   
Checking out files:  43% (2609/6066)   
Checking out files:  44% (2670/6066)   
Checking out files:  45% (2730/6066)   
Checking out files:  46% (2791/6066)   
Checking out files:  47% (2852/6066)   
Checking out files:  48% (2912/6066)   
Checking out files:  49% (2973/6066)   
Checking out files:  50% (3033/6066)   
Checking out files:  51% (3094/6066)   
Checking out files:  52% (3155/6066)   
Checking out files:  53% (3215/6066)   
Checking out files:  54% (3276/6066)   
Checking out files:  55% (3337/6066)   
Checking out files:  56% (3397/6066)   
Checking out files:  57% (3458/6066)   
Checking out files:  58% (3519/6066)   
Checking out files:  59% (3579/6066)   
Checking out files:  60% (3640/6066)   
Checking out files:  60% (3687/6066)   
Checking out files:  61% (3701/6066)   
Checking out files:  62% (3761/6066)   
Checking out files:  63% (3822/6066)   
Checking out files:  64% (3883/6066)   
Checking out files:  65% (3943/6066)   
Checking out files:  66% (4004/6066)   
Checking out files:  67% (4065/6066)   
Checking out files:  68% (4125/6066)   
Checking out files:  69% (4186/6066)   
Checking out files:  69% (4226/6066)   
Checking out files:  70% (4247/6066)   
Checking out files:  71% (4307/6066)   
Checking out files:  72% (4368/6066)   
Checking out files:  73% (4429/6066)   
Checking out files:  74% (4489/6066)   
Checking out files:  75% (4550/6066)   
Checking out files:  76% (4611/6066)   
Checking out files:  77% (4671/6066)   
Checking out files:  78% (4732/6066)   
Checking out files:  79% (4793/6066)   
Checking out files:  80% (4853/6066)   
Checking out files:  81% (4914/6066)   
Checking out files:  82% (4975/6066)   
Checking out files:  83% (5035/6066)   
Checking out files:  84% (5096/6066)   
Checking out files:  85% (5157/6066)   
Checking out files:  86% (5217/6066)   
Checking out files:  87% (5278/6066)   
Checking out files:  88% (5339/6066)   
Checking out files:  89% (5399/6066)   
Checking out files:  90% (5460/6066)   
Checking out files:  91% (5521/6066)   
Checking out files:  92% (5581/6066)   
Checking out files:  93% (5642/6066)   
Checking out files:  94% (5703/6066)   
Checking out files:  95% (5763/6066)   
Checking out files:  96% (5824/6066)   
Checking out files:  97% (5885/6066)   
Checking out files:  98% (5945/6066)   
Checking out files:  99% (6006/6066)   
Checking out files: 100% (6066/6066)   
Checking out files: 100% (6066/6066), done.
Your branch is up-to-date with 'origin/test'.
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-mfpmh0ne/src/docker-src.2018-03-31-04.12.45.21136/qemu.tar.vroot/dtc'...
Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42'
Submodule 'ui/keycodemapdb' (git://git.qemu.org/keycodemapdb.git) registered for path 'ui/keycodemapdb'
Cloning into '/var/tmp/patchew-tester-tmp-mfpmh0ne/src/docker-src.2018-03-31-04.12.45.21136/qemu.tar.vroot/ui/keycodemapdb'...
Submodule path 'ui/keycodemapdb': checked out '6b3d716e2b6472eb7189d3220552280ef3d832ce'
tar: /var/tmp/patchew-tester-tmp-mfpmh0ne/src/docker-src.2018-03-31-04.12.45.21136/qemu.tar: Wrote only 2048 of 10240 bytes
tar: Error is not recoverable: exiting now
failed to create tar file
  COPY    RUNNER
    RUN test-build in qemu:min-glib 
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
/var/tmp/qemu/run: line 32: prep_fail: command not found
Environment variables:
HOSTNAME=a38ba0cc2dc5
MAKEFLAGS= -j8
J=8
CCACHE_DIR=/var/tmp/ccache
EXTRA_CONFIGURE_OPTS=
V=
SHOW_ENV=1
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
TARGET_LIST=
SHLVL=1
HOME=/root
TEST_DIR=/tmp/qemu-test
FEATURES= dtc
DEBUG=
_=/usr/bin/env

/var/tmp/qemu/run: line 52: cd: /tmp/qemu-test/src/tests/docker: No such file or directory
/var/tmp/qemu/run: line 57: /test-build: No such file or directory
/var/tmp/qemu/run: line 57: exec: /test-build: cannot execute: No such file or directory
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 407, in <module>
    sys.exit(main())
  File "./tests/docker/docker.py", line 404, in main
    return args.cmdobj.run(args, argv)
  File "./tests/docker/docker.py", line 261, in run
    return Docker().run(argv, args.keep, quiet=args.quiet)
  File "./tests/docker/docker.py", line 229, in run
    quiet=quiet)
  File "./tests/docker/docker.py", line 147, in _do_check
    return subprocess.check_call(self._command + cmd, **kwargs)
  File "/usr/lib64/python2.7/subprocess.py", line 186, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['docker', 'run', '--label', 'com.qemu.instance.uuid=5970da9034bb11e8a39d52540069c830', '-u', '0', '--security-opt', 'seccomp=unconfined', '--rm', '--net=none', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=8', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/root/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-mfpmh0ne/src/docker-src.2018-03-31-04.12.45.21136:/var/tmp/qemu:z,ro', 'qemu:min-glib', '/var/tmp/qemu/run', 'test-build']' returned non-zero exit status 126
make[1]: *** [tests/docker/Makefile.include:129: docker-run] Error 1
make[1]: Leaving directory '/var/tmp/patchew-tester-tmp-mfpmh0ne/src'
make: *** [tests/docker/Makefile.include:163: docker-run-test-build@min-glib] Error 2

real	0m38.034s
user	0m9.304s
sys	0m6.995s
=== OUTPUT END ===

Test command exited with code: 2


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Re: [Qemu-devel] [Qemu-block] [RFC PATCH 0/8] qemu-img convert with copy offloading
Posted by Stefan Hajnoczi 6 years ago
On Thu, Mar 29, 2018 at 07:09:06PM +0800, Fam Zheng wrote:
> [Posting a preview RFC for the general idea discussion and internal API review.
> Libiscsi support is being worked on in the meantime.]
> 
> This series introduces block layer API for copy offloading and makes use of it
> in qemu-img convert.
> 
> For now we implemented the operation in local file protocol with
> copy_file_range(2).  Besides that it's possible to add similar to iscsi, nfs
> and potentially more.
> 
> As far as its usage goes, in addition to qemu-img convert, we can emulate
> offloading in scsi-disk (EXTENDED COPY), and do similar to drive-mirror.
> 
> The new bdrv_co_map_range can also be an alternative way to implement format
> drivers in the future, once we make block/io.c use it in preadv/pwritev paths.

I posted concerns about the bdrv_co_map_range() interface.  It would be
safer to only have a copy_range() interface without exposing how data is
mapped outside the driver where race conditions can occur and the format
driver no longer has full control over file layout.

Stefan
Re: [Qemu-devel] [Qemu-block] [RFC PATCH 0/8] qemu-img convert with copy offloading
Posted by Fam Zheng 6 years ago
On Wed, 04/04 14:23, Stefan Hajnoczi wrote:
> On Thu, Mar 29, 2018 at 07:09:06PM +0800, Fam Zheng wrote:
> > [Posting a preview RFC for the general idea discussion and internal API review.
> > Libiscsi support is being worked on in the meantime.]
> > 
> > This series introduces block layer API for copy offloading and makes use of it
> > in qemu-img convert.
> > 
> > For now we implemented the operation in local file protocol with
> > copy_file_range(2).  Besides that it's possible to add similar to iscsi, nfs
> > and potentially more.
> > 
> > As far as its usage goes, in addition to qemu-img convert, we can emulate
> > offloading in scsi-disk (EXTENDED COPY), and do similar to drive-mirror.
> > 
> > The new bdrv_co_map_range can also be an alternative way to implement format
> > drivers in the future, once we make block/io.c use it in preadv/pwritev paths.
> 
> I posted concerns about the bdrv_co_map_range() interface.  It would be
> safer to only have a copy_range() interface without exposing how data is
> mapped outside the driver where race conditions can occur and the format
> driver no longer has full control over file layout.

It's a good point, but I couldn't think of a way to implement copy_range between
two format drivers: both of them need to recurse down to their bs->file and what
we eventually want is a copy_file_range() on two fds (or an iscsi equivalent):

    src[qcow2]           ->              dst[raw]
        |                                   |
        v                                   v
    src[file]            ->              dst[file]
        |                                   |
        v                                   v
       fd1               ->                 fd2
                    copy_file_range

Maybe we should add BlockDriver.bdrv_co_map_range_{prepare,commit,abort} and
call them from bdrv_co_copy_range(). This way the code path works pretty much
the same way to .bdrv_co_preadv/pwritev.

Fam

Re: [Qemu-devel] [Qemu-block] [RFC PATCH 0/8] qemu-img convert with copy offloading
Posted by Paolo Bonzini 6 years ago
On 04/04/2018 15:49, Fam Zheng wrote:
>> I posted concerns about the bdrv_co_map_range() interface.  It would be
>> safer to only have a copy_range() interface without exposing how data is
>> mapped outside the driver where race conditions can occur and the format
>> driver no longer has full control over file layout.
> It's a good point, but I couldn't think of a way to implement copy_range between
> two format drivers: both of them need to recurse down to their bs->file and what
> we eventually want is a copy_file_range() on two fds (or an iscsi equivalent):
> 
>     src[qcow2]           ->              dst[raw]
>         |                                   |
>         v                                   v
>     src[file]            ->              dst[file]
>         |                                   |
>         v                                   v
>        fd1               ->                 fd2
>                     copy_file_range
> 
> Maybe we should add BlockDriver.bdrv_co_map_range_{prepare,commit,abort} and
> call them from bdrv_co_copy_range(). This way the code path works pretty much
> the same way to .bdrv_co_preadv/pwritev.

Or you could have bdrv_co_copy_range verify the mapping and return
-EAGAIN if it is not valid anymore?  (Then bdrv_co_map_range should fill
in a BlockDriverMapping struct, or something like that).

Paolo

Re: [Qemu-devel] [Qemu-block] [RFC PATCH 0/8] qemu-img convert with copy offloading
Posted by Stefan Hajnoczi 6 years ago
On Wed, Apr 04, 2018 at 09:49:23PM +0800, Fam Zheng wrote:
> On Wed, 04/04 14:23, Stefan Hajnoczi wrote:
> > On Thu, Mar 29, 2018 at 07:09:06PM +0800, Fam Zheng wrote:
> > > [Posting a preview RFC for the general idea discussion and internal API review.
> > > Libiscsi support is being worked on in the meantime.]
> > > 
> > > This series introduces block layer API for copy offloading and makes use of it
> > > in qemu-img convert.
> > > 
> > > For now we implemented the operation in local file protocol with
> > > copy_file_range(2).  Besides that it's possible to add similar to iscsi, nfs
> > > and potentially more.
> > > 
> > > As far as its usage goes, in addition to qemu-img convert, we can emulate
> > > offloading in scsi-disk (EXTENDED COPY), and do similar to drive-mirror.
> > > 
> > > The new bdrv_co_map_range can also be an alternative way to implement format
> > > drivers in the future, once we make block/io.c use it in preadv/pwritev paths.
> > 
> > I posted concerns about the bdrv_co_map_range() interface.  It would be
> > safer to only have a copy_range() interface without exposing how data is
> > mapped outside the driver where race conditions can occur and the format
> > driver no longer has full control over file layout.
> 
> It's a good point, but I couldn't think of a way to implement copy_range between
> two format drivers: both of them need to recurse down to their bs->file and what
> we eventually want is a copy_file_range() on two fds (or an iscsi equivalent):
> 
>     src[qcow2]           ->              dst[raw]
>         |                                   |
>         v                                   v
>     src[file]            ->              dst[file]
>         |                                   |
>         v                                   v
>        fd1               ->                 fd2
>                     copy_file_range
> 
> Maybe we should add BlockDriver.bdrv_co_map_range_{prepare,commit,abort} and
> call them from bdrv_co_copy_range(). This way the code path works pretty much
> the same way to .bdrv_co_preadv/pwritev.

Regarding the recursion, there are two phases:
1. Finding the leaf BDS of the source (the left side of your diagram)
2. finding the leaf BDS of the destination (the right side)

Here is one way to represent this in BlockDriver:

  /* Map [offset, offset + nbytes) range onto a child of src_bs and
   * invoke bdrv_co_copy_file_range_src(child, ...), or invoke
   * bdrv_co_copy_file_range_dst() if the leaf child has been found.
   */
  .bdrv_co_copy_file_range_src(src_bs, dst_bs, offset, nbytes)

  /* Map [offset, offset + nbytes) range onto a child of dst_bs and
   * invoke bdrv_co_copy_file_range_src(src_bs, child, ...), or perform
   * the copy operation if the leaf child has been found.  Return
   * -ENOTSUP if src_bs and the leaf child have different BlockDrivers.
   */
  .bdrv_co_copy_file_range_dst(src_bs, dst_bs, offset, nbytes)

bdrv_copy_file_range() will invoke bdrv_co_copy_file_range_src() on
src[qcow2].  The qcow2 block driver will invoke
bdrv_co_copy_file_range_src() on src[file].  The file-posix driver will
invoke bdrv_co_copy_file_range_dst() on dst[raw].  The raw driver will
invoke bdrv_co_copy_file_range_dst() on dst[file], which sees that
src_bds (src[file]) is also file-posix and then goes ahead with
copy_file_range(2).

In the case where src[qcow2] is on file-posix but dst[raw] is on iSCSI,
the iSCSI .bdrv_co_copy_file_range_dst() call fails with -ENOTSUP and
the block layer can fall back to a traditional copy operation.

With this approach src[qcow2] could take a lock or keep track of a
serializing request struct so that other requests cannot interfere with
the operation, and it's done in a natural way since we remain in the
qcow2 function until the entire operation completes.  There's no need
for bookkeeping structs or callbacks.

Stefan
Re: [Qemu-devel] [Qemu-block] [RFC PATCH 0/8] qemu-img convert with copy offloading
Posted by Paolo Bonzini 6 years ago
On 05/04/2018 14:55, Stefan Hajnoczi wrote:
> bdrv_copy_file_range() will invoke bdrv_co_copy_file_range_src() on
> src[qcow2].  The qcow2 block driver will invoke
> bdrv_co_copy_file_range_src() on src[file].  The file-posix driver will
> invoke bdrv_co_copy_file_range_dst() on dst[raw].  The raw driver will
> invoke bdrv_co_copy_file_range_dst() on dst[file], which sees that
> src_bds (src[file]) is also file-posix and then goes ahead with
> copy_file_range(2).
> 
> In the case where src[qcow2] is on file-posix but dst[raw] is on iSCSI,
> the iSCSI .bdrv_co_copy_file_range_dst() call fails with -ENOTSUP and
> the block layer can fall back to a traditional copy operation.
> 
> With this approach src[qcow2] could take a lock or keep track of a
> serializing request struct so that other requests cannot interfere with
> the operation, and it's done in a natural way since we remain in the
> qcow2 function until the entire operation completes.  There's no need
> for bookkeeping structs or callbacks.

Could there be AB-BA deadlock if the guest attempts a concurrent copy
from A to B and from B to A?

Paolo

Re: [Qemu-devel] [Qemu-block] [RFC PATCH 0/8] qemu-img convert with copy offloading
Posted by Fam Zheng 6 years ago
On Fri, 04/06 13:41, Paolo Bonzini wrote:
> On 05/04/2018 14:55, Stefan Hajnoczi wrote:
> > bdrv_copy_file_range() will invoke bdrv_co_copy_file_range_src() on
> > src[qcow2].  The qcow2 block driver will invoke
> > bdrv_co_copy_file_range_src() on src[file].  The file-posix driver will
> > invoke bdrv_co_copy_file_range_dst() on dst[raw].  The raw driver will
> > invoke bdrv_co_copy_file_range_dst() on dst[file], which sees that
> > src_bds (src[file]) is also file-posix and then goes ahead with
> > copy_file_range(2).
> > 
> > In the case where src[qcow2] is on file-posix but dst[raw] is on iSCSI,
> > the iSCSI .bdrv_co_copy_file_range_dst() call fails with -ENOTSUP and
> > the block layer can fall back to a traditional copy operation.
> > 
> > With this approach src[qcow2] could take a lock or keep track of a
> > serializing request struct so that other requests cannot interfere with
> > the operation, and it's done in a natural way since we remain in the
> > qcow2 function until the entire operation completes.  There's no need
> > for bookkeeping structs or callbacks.
> 
> Could there be AB-BA deadlock if the guest attempts a concurrent copy
> from A to B and from B to A?

I don't think bs_src need to hold its locks when calling into bs_dst for mapping
write ranges. So it should be safe.

Fam