[PATCH 0/2] support block encryption/decryption in parallel

tugy@chinatelecom.cn posted 2 patches 4 weeks ago
Only 0 patches received!
block/crypto.c       | 111 ++++++++++++++++++++++++++++++++++++++++---
block/crypto.h       |   9 ++++
qapi/block-core.json |   6 ++-
qapi/crypto.json     |   6 ++-
4 files changed, 124 insertions(+), 8 deletions(-)
[PATCH 0/2] support block encryption/decryption in parallel
Posted by tugy@chinatelecom.cn 4 weeks ago
From: Guoyi Tu <tugy@chinatelecom.cn>

Currently, disk I/O encryption and decryption operations are performed sequentially
in the main thread or IOthread. When the number of I/O requests increases,
this becomes a performance bottleneck.

To address this issue, this patch use thread pool to perform I/O encryption
and decryption in parallel, improving overall efficiency.

Test results show that enabling the thread pool for encryption and decryption
significantly improve the performance of virtual machine storage devices.


Test Case1: Disk read/write performance using fio in a virtual machine

Virtual Machine: 8c16g, with a disk backing by a LUKS storage device and
                  Ceph as storage backend.
Test Method:
fio -direct=1 -iodepth=32 -rw=xx -ioengine=libaio -bs=4k -size=10G -numjobs=x \
-runtime=1000 -group_reporting -filename=/dev/vdb -name=xxx

Runing the VM on the Intel Xeon 5218 server, The test results are as follows:

|                        |  Serial encryption  | Thread pool encryption|
|                        |  and decryption     | and decryption      |
|        fio             |-----------|---------|-----------|---------|
|                        | BW(MiB/s) | IOPS(K) | BW(MiB/s) | IOPS(K) |
|------------------------|-----------|---------|-----------|---------|
| rw=read numjobs=2      | 499       | 128     | 605       | 155     |
| rw=read numjobs=4      | 529       | 136     | 632       | 162     |
| rw=write numjobs=2     | 493       | 126     | 617       | 158     |
| rw=write numjobs=4     | 534       | 137     | 743       | 190     |


Runing the VM on the HiSilicon Kunpeng-920 server, The test results are as follows:

|                        |  Serial encryption  | Thread pool encryption|
|                        |  and decryption     | and decryption      |
|        fio             |-----------|---------|-----------|---------|
|                        | BW(MiB/s) | IOPS(K) | BW(MiB/s) | IOPS(K) |
|------------------------|-----------|---------|-----------|---------|
| rw=read numjobs=2      | 73.2      | 18.8    | 128       | 39.2    |
| rw=read numjobs=4      | 77.9      | 19.9    | 246       | 62.9    |
| rw=write numjobs=2     | 78        | 19      | 140       | 35.8    |
| rw=write numjobs=4     | 78        | 20.2    | 270       | 69.1    |


Test Case 2:
In addition, performance comparisons were also conducted on the HiSilicon Kunpeng-920
server, testing the conversion of a qcow2 image to a LUKS image using qemu-img convert.
The results show that using thread pool to encryption and decryption all significantly
improve the performance.

Test Method: Create a 40GB qcow2 image and fill it with data, then convert it to a LUKS
             image using qemu-img

* Serial encryption and decryption:
time qemu-img convert -p -m 16 -W --image-opts file.filename=/home/tgy/data.qcow2 \
--object secret,id=sec,data=password -n \
--target-image-opts driver=luks,key-secret=sec,file.filename=/home/tgy/data.luks

    real    7m53.681s
    user    7m52.595s
    sys     0m11.248s


* Thread pool encryption and decryption:
time qemu-img convert -p -m 16 -W --image-opts file.filename=/home/tgy/data.qcow2 \
--object secret,id=sec,data=password -n --target-image-opts \
driver=luks,key-secret=sec,encrypt-in-parallel=on,file.filename=/home/tgy/data.luks

    real    1m43.101s
    user    10m30.239s
    sys     13m13.758s

Guoyi Tu (2):
  crpyto: support encryt and decrypt parallelly using thread pool
  qapi/crypto: support enable encryption/decryption in parallel

 block/crypto.c       | 111 ++++++++++++++++++++++++++++++++++++++++---
 block/crypto.h       |   9 ++++
 qapi/block-core.json |   6 ++-
 qapi/crypto.json     |   6 ++-
 4 files changed, 124 insertions(+), 8 deletions(-)

-- 
2.17.1


Re: [PATCH 0/2] support block encryption/decryption in parallel
Posted by Daniel P. Berrangé 1 week, 5 days ago
On Thu, Nov 28, 2024 at 06:51:20PM +0800, tugy@chinatelecom.cn wrote:
> From: Guoyi Tu <tugy@chinatelecom.cn>
> 
> Currently, disk I/O encryption and decryption operations are performed sequentially
> in the main thread or IOthread. When the number of I/O requests increases,
> this becomes a performance bottleneck.
> 
> To address this issue, this patch use thread pool to perform I/O encryption
> and decryption in parallel, improving overall efficiency.

We already have support for parallel encryption through use of IO threads
since approximately this commit:

  commit af206c284e4c1b17cdfb0f17e898b288c0fc1751
  Author: Stefan Hajnoczi <stefanha@redhat.com>
  Date:   Mon May 27 11:58:50 2024 -0400

    block/crypto: create ciphers on demand
    
    Ciphers are pre-allocated by qcrypto_block_init_cipher() depending on
    the given number of threads. The -device
    virtio-blk-pci,iothread-vq-mapping= feature allows users to assign
    multiple IOThreads to a virtio-blk device, but the association between
    the virtio-blk device and the block driver happens after the block
    driver is already open.
    
    When the number of threads given to qcrypto_block_init_cipher() is
    smaller than the actual number of threads at runtime, the
    block->n_free_ciphers > 0 assertion in qcrypto_block_pop_cipher() can
    fail.
    
    Get rid of qcrypto_block_init_cipher() n_thread's argument and allocate
    ciphers on demand.


Say we have QEMU pinned to 4 host CPUs, and we've setup 4 IO threads
for the disk, then encryption can max out 4 host CPUs worth of resource.

How is this new proposed way to use a thread pool going to do better
than that in an apples-to-apples comparison ? ie allow same number
of host CPUs for both. The fundamental limit is still the AES performance
of the host CPU(s) that you allow QEMU to execute work on. If the thread
pool is allowed to use 4 host CPUs, it shouldn't be significantly different
from allowing use of 4 host CPUs for I/O threads surely ?

Having multiple different ways to support parallel encryption is not
ideal. If there's something I/O threads can't do optimally right
now, is it practical to make them work better ?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [PATCH 0/2] support block encryption/decryption in parallel
Posted by Guoyi Tu 1 week, 6 days ago
Hi Kevin and Hanna, could you share your thoughts on this patch?

I’d greatly appreciate your feedback

--
Guoyi

On 2024/11/28 18:51, tugy@chinatelecom.cn wrote:
> From: Guoyi Tu <tugy@chinatelecom.cn>
> 
> Currently, disk I/O encryption and decryption operations are performed sequentially
> in the main thread or IOthread. When the number of I/O requests increases,
> this becomes a performance bottleneck.
> 
> To address this issue, this patch use thread pool to perform I/O encryption
> and decryption in parallel, improving overall efficiency.
> 
> Test results show that enabling the thread pool for encryption and decryption
> significantly improve the performance of virtual machine storage devices.
> 
> 
> Test Case1: Disk read/write performance using fio in a virtual machine
> 
> Virtual Machine: 8c16g, with a disk backing by a LUKS storage device and
>                    Ceph as storage backend.
> Test Method:
> fio -direct=1 -iodepth=32 -rw=xx -ioengine=libaio -bs=4k -size=10G -numjobs=x \
> -runtime=1000 -group_reporting -filename=/dev/vdb -name=xxx
> 
> Runing the VM on the Intel Xeon 5218 server, The test results are as follows:
> 
> |                        |  Serial encryption  | Thread pool encryption|
> |                        |  and decryption     | and decryption      |
> |        fio             |-----------|---------|-----------|---------|
> |                        | BW(MiB/s) | IOPS(K) | BW(MiB/s) | IOPS(K) |
> |------------------------|-----------|---------|-----------|---------|
> | rw=read numjobs=2      | 499       | 128     | 605       | 155     |
> | rw=read numjobs=4      | 529       | 136     | 632       | 162     |
> | rw=write numjobs=2     | 493       | 126     | 617       | 158     |
> | rw=write numjobs=4     | 534       | 137     | 743       | 190     |
> 
> 
> Runing the VM on the HiSilicon Kunpeng-920 server, The test results are as follows:
> 
> |                        |  Serial encryption  | Thread pool encryption|
> |                        |  and decryption     | and decryption      |
> |        fio             |-----------|---------|-----------|---------|
> |                        | BW(MiB/s) | IOPS(K) | BW(MiB/s) | IOPS(K) |
> |------------------------|-----------|---------|-----------|---------|
> | rw=read numjobs=2      | 73.2      | 18.8    | 128       | 39.2    |
> | rw=read numjobs=4      | 77.9      | 19.9    | 246       | 62.9    |
> | rw=write numjobs=2     | 78        | 19      | 140       | 35.8    |
> | rw=write numjobs=4     | 78        | 20.2    | 270       | 69.1    |
> 
> 
> Test Case 2:
> In addition, performance comparisons were also conducted on the HiSilicon Kunpeng-920
> server, testing the conversion of a qcow2 image to a LUKS image using qemu-img convert.
> The results show that using thread pool to encryption and decryption all significantly
> improve the performance.
> 
> Test Method: Create a 40GB qcow2 image and fill it with data, then convert it to a LUKS
>               image using qemu-img
> 
> * Serial encryption and decryption:
> time qemu-img convert -p -m 16 -W --image-opts file.filename=/home/tgy/data.qcow2 \
> --object secret,id=sec,data=password -n \
> --target-image-opts driver=luks,key-secret=sec,file.filename=/home/tgy/data.luks
> 
>      real    7m53.681s
>      user    7m52.595s
>      sys     0m11.248s
> 
> 
> * Thread pool encryption and decryption:
> time qemu-img convert -p -m 16 -W --image-opts file.filename=/home/tgy/data.qcow2 \
> --object secret,id=sec,data=password -n --target-image-opts \
> driver=luks,key-secret=sec,encrypt-in-parallel=on,file.filename=/home/tgy/data.luks
> 
>      real    1m43.101s
>      user    10m30.239s
>      sys     13m13.758s
> 
> Guoyi Tu (2):
>    crpyto: support encryt and decrypt parallelly using thread pool
>    qapi/crypto: support enable encryption/decryption in parallel
> 
>   block/crypto.c       | 111 ++++++++++++++++++++++++++++++++++++++++---
>   block/crypto.h       |   9 ++++
>   qapi/block-core.json |   6 ++-
>   qapi/crypto.json     |   6 ++-
>   4 files changed, 124 insertions(+), 8 deletions(-)
>