[v4] Misc improvements to crypto block driver

[Qemu-devel] [PATCH v4 1/6] block: use 1 MB bounce buffers for crypto instead of 16KB

Posted by Daniel P. Berrange 8 years, 4 months ago

Using 16KB bounce buffers creates a significant performance
penalty for I/O to encrypted volumes on storage which high
I/O latency (rotating rust & network drives), because it
triggers lots of fairly small I/O operations.

On tests with rotating rust, and cache=none|directsync,
write speed increased from 2MiB/s to 32MiB/s, on a par
with that achieved by the in-kernel luks driver. With
other cache modes the in-kernel driver is still notably
faster because it is able to report completion of the
I/O request before any encryption is done, while the
in-QEMU driver must encrypt the data before completion.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
---
 block/crypto.c | 28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/block/crypto.c b/block/crypto.c
index 58ef6f2f52..684cabeaf8 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -379,7 +379,11 @@ static void block_crypto_close(BlockDriverState *bs)
 }
 
 
-#define BLOCK_CRYPTO_MAX_SECTORS 32
+/*
+ * 1 MB bounce buffer gives good performance / memory tradeoff
+ * when using cache=none|directsync.
+ */
+#define BLOCK_CRYPTO_MAX_IO_SIZE (1024 * 1024)
 
 static coroutine_fn int
 block_crypto_co_readv(BlockDriverState *bs, int64_t sector_num,
@@ -396,12 +400,11 @@ block_crypto_co_readv(BlockDriverState *bs, int64_t sector_num,
 
     qemu_iovec_init(&hd_qiov, qiov->niov);
 
-    /* Bounce buffer so we have a linear mem region for
-     * entire sector. XXX optimize so we avoid bounce
-     * buffer in case that qiov->niov == 1
+    /* Bounce buffer because we don't wish to expose cipher text
+     * in qiov which points to guest memory.
      */
     cipher_data =
-        qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_SECTORS * 512,
+        qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_IO_SIZE,
                                               qiov->size));
     if (cipher_data == NULL) {
         ret = -ENOMEM;
@@ -411,8 +414,8 @@ block_crypto_co_readv(BlockDriverState *bs, int64_t sector_num,
     while (remaining_sectors) {
         cur_nr_sectors = remaining_sectors;
 
-        if (cur_nr_sectors > BLOCK_CRYPTO_MAX_SECTORS) {
-            cur_nr_sectors = BLOCK_CRYPTO_MAX_SECTORS;
+        if (cur_nr_sectors > (BLOCK_CRYPTO_MAX_IO_SIZE / 512)) {
+            cur_nr_sectors = (BLOCK_CRYPTO_MAX_IO_SIZE / 512);
         }
 
         qemu_iovec_reset(&hd_qiov);
@@ -464,12 +467,11 @@ block_crypto_co_writev(BlockDriverState *bs, int64_t sector_num,
 
     qemu_iovec_init(&hd_qiov, qiov->niov);
 
-    /* Bounce buffer so we have a linear mem region for
-     * entire sector. XXX optimize so we avoid bounce
-     * buffer in case that qiov->niov == 1
+    /* Bounce buffer because we're not permitted to touch
+     * contents of qiov - it points to guest memory.
      */
     cipher_data =
-        qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_SECTORS * 512,
+        qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_IO_SIZE,
                                               qiov->size));
     if (cipher_data == NULL) {
         ret = -ENOMEM;
@@ -479,8 +481,8 @@ block_crypto_co_writev(BlockDriverState *bs, int64_t sector_num,
     while (remaining_sectors) {
         cur_nr_sectors = remaining_sectors;
 
-        if (cur_nr_sectors > BLOCK_CRYPTO_MAX_SECTORS) {
-            cur_nr_sectors = BLOCK_CRYPTO_MAX_SECTORS;
+        if (cur_nr_sectors > (BLOCK_CRYPTO_MAX_IO_SIZE / 512)) {
+            cur_nr_sectors = (BLOCK_CRYPTO_MAX_IO_SIZE / 512);
         }
 
         qemu_iovec_to_buf(qiov, bytes_done,
-- 
2.13.5

Re: [Qemu-devel] [PATCH v4 1/6] block: use 1 MB bounce buffers for crypto instead of 16KB

Posted by Eric Blake 8 years, 4 months ago

On 09/27/2017 07:53 AM, Daniel P. Berrange wrote:
> Using 16KB bounce buffers creates a significant performance
> penalty for I/O to encrypted volumes on storage which high
> I/O latency (rotating rust & network drives), because it
> triggers lots of fairly small I/O operations.
> 
> On tests with rotating rust, and cache=none|directsync,
> write speed increased from 2MiB/s to 32MiB/s, on a par
> with that achieved by the in-kernel luks driver. With
> other cache modes the in-kernel driver is still notably
> faster because it is able to report completion of the
> I/O request before any encryption is done, while the
> in-QEMU driver must encrypt the data before completion.
> 
> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
> ---
>  block/crypto.c | 28 +++++++++++++++-------------
>  1 file changed, 15 insertions(+), 13 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>


> @@ -464,12 +467,11 @@ block_crypto_co_writev(BlockDriverState *bs, int64_t sector_num,
>  
>      qemu_iovec_init(&hd_qiov, qiov->niov);
>  
> -    /* Bounce buffer so we have a linear mem region for
> -     * entire sector. XXX optimize so we avoid bounce
> -     * buffer in case that qiov->niov == 1
> +    /* Bounce buffer because we're not permitted to touch
> +     * contents of qiov - it points to guest memory.
>       */
>      cipher_data =
> -        qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_SECTORS * 512,
> +        qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_IO_SIZE,
>                                                qiov->size));

We allocate and free a bounce buffer for every write - is there anything
we can do to reduce malloc() calls by reusing a bounce buffer associated
with a coroutine (we have to be careful that parallel coroutines don't
share bounce buffers, of course).  But that's independent of this patch.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH v4 1/6] block: use 1 MB bounce buffers for crypto instead of 16KB

Posted by Max Reitz 8 years, 4 months ago

On 2017-09-27 14:53, Daniel P. Berrange wrote:
> Using 16KB bounce buffers creates a significant performance
> penalty for I/O to encrypted volumes on storage which high
> I/O latency (rotating rust & network drives), because it
> triggers lots of fairly small I/O operations.
> 
> On tests with rotating rust, and cache=none|directsync,
> write speed increased from 2MiB/s to 32MiB/s, on a par
> with that achieved by the in-kernel luks driver. With
> other cache modes the in-kernel driver is still notably
> faster because it is able to report completion of the
> I/O request before any encryption is done, while the
> in-QEMU driver must encrypt the data before completion.
> 
> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
> ---
>  block/crypto.c | 28 +++++++++++++++-------------
>  1 file changed, 15 insertions(+), 13 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>

[Qemu-devel] [PATCH v4 1/6] block: use 1 MB bounce buffers for crypto instead of 16KB
[Qemu-devel] [PATCH v4 2/6] crypto: expose encryption sector size in APIs
[Qemu-devel] [PATCH v4 3/6] block: fix data type casting for crypto payload offset
[Qemu-devel] [PATCH v4 4/6] block: convert crypto driver to bdrv_co_preadv|pwritev
[Qemu-devel] [PATCH v4 5/6] block: convert qcrypto_block_encrypt|decrypt to take bytes offset
[Qemu-devel] [PATCH v4 6/6] block: support passthrough of BDRV_REQ_FUA in crypto driver