[PATCH v6 1/5] docs/migration: add qatzip compression feature

Yichen Wang posted 5 patches 4 months, 1 week ago
Maintainers: Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>, Paolo Bonzini <pbonzini@redhat.com>, "Daniel P. Berrangé" <berrange@redhat.com>, Eduardo Habkost <eduardo@habkost.net>, "Marc-André Lureau" <marcandre.lureau@redhat.com>, Thomas Huth <thuth@redhat.com>, "Philippe Mathieu-Daudé" <philmd@linaro.org>, Eric Blake <eblake@redhat.com>, Markus Armbruster <armbru@redhat.com>, Laurent Vivier <lvivier@redhat.com>
There is a newer version of this series
[PATCH v6 1/5] docs/migration: add qatzip compression feature
Posted by Yichen Wang 4 months, 1 week ago
From: Yuan Liu <yuan1.liu@intel.com>

add Intel QATzip compression method introduction

Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Yichen Wang <yichen.wang@bytedance.com>
---
 docs/devel/migration/features.rst           |   1 +
 docs/devel/migration/qatzip-compression.rst | 251 ++++++++++++++++++++
 2 files changed, 252 insertions(+)
 create mode 100644 docs/devel/migration/qatzip-compression.rst

diff --git a/docs/devel/migration/features.rst b/docs/devel/migration/features.rst
index 58f8fd9e16..8f431d52f9 100644
--- a/docs/devel/migration/features.rst
+++ b/docs/devel/migration/features.rst
@@ -14,3 +14,4 @@ Migration has plenty of features to support different use cases.
    CPR
    qpl-compression
    uadk-compression
+   qatzip-compression
diff --git a/docs/devel/migration/qatzip-compression.rst b/docs/devel/migration/qatzip-compression.rst
new file mode 100644
index 0000000000..72fa3e2826
--- /dev/null
+++ b/docs/devel/migration/qatzip-compression.rst
@@ -0,0 +1,251 @@
+==================
+QATzip Compression
+==================
+In scenarios with limited network bandwidth, the ``QATzip`` solution can help
+users save a lot of host CPU resources by accelerating compression and
+decompression through the Intel QuickAssist Technology(``QAT``) hardware.
+
+``QATzip`` is a user space library which builds on top of the Intel QuickAssist
+Technology user space library, to provide extended accelerated compression and
+decompression services.
+
+For more ``QATzip`` introduction, please refer to `QATzip Introduction
+<https://github.com/intel/QATzip?tab=readme-ov-file#introductionl>`_
+
+QATzip Compression Framework
+============================
+
+::
+
+  +----------------+
+  | MultiFd Thread |
+  +-------+--------+
+          |
+          | compress/decompress
+  +-------+--------+
+  | QATzip library |
+  +-------+--------+
+          |
+  +-------+--------+
+  |  QAT library   |
+  +-------+--------+
+          |         user space
+  --------+---------------------
+          |         kernel space
+   +------+-------+
+   |  QAT  Driver |
+   +------+-------+
+          |
+   +------+-------+
+   | QAT Devices  |
+   +--------------+
+
+
+QATzip Installation
+-------------------
+
+The ``QATzip`` installation package has been integrated into some Linux
+distributions and can be installed directly. For example, the Ubuntu Server
+24.04 LTS system can be installed using below command
+
+.. code-block:: shell
+
+   #apt search qatzip
+   libqatzip-dev/noble 1.2.0-0ubuntu3 amd64
+     Intel QuickAssist user space library development files
+
+   libqatzip3/noble 1.2.0-0ubuntu3 amd64
+     Intel QuickAssist user space library
+
+   qatzip/noble,now 1.2.0-0ubuntu3 amd64 [installed]
+     Compression user-space tool for Intel QuickAssist Technology
+
+   #sudo apt install libqatzip-dev libqatzip3 qatzip
+
+If your system does not support the ``QATzip`` installation package, you can
+use the source code to build and install, please refer to `QATzip source code installation
+<https://github.com/intel/QATzip?tab=readme-ov-file#build-intel-quickassist-technology-driver>`_
+
+QAT Hardware Deployment
+-----------------------
+
+``QAT`` supports physical functions(PFs) and virtual functions(VFs) for
+deployment, and users can configure ``QAT`` resources for migration according
+to actual needs. For more details about ``QAT`` deployment, please refer to
+`Intel QuickAssist Technology Documentation
+<https://intel.github.io/quickassist/index.html>`_
+
+For more ``QAT`` hardware introduction, please refer to `intel-quick-assist-technology-overview
+<https://www.intel.com/content/www/us/en/architecture-and-technology/intel-quick-assist-technology-overview.html>`_
+
+How To Use QATzip Compression
+=============================
+
+1 - Install ``QATzip`` library
+
+2 - Build ``QEMU`` with ``--enable-qatzip`` parameter
+
+  E.g. configure --target-list=x86_64-softmmu --enable-kvm ``--enable-qatzip``
+
+3 - Set ``migrate_set_parameter multifd-compression qatzip``
+
+4 - Set ``migrate_set_parameter multifd-qatzip-level comp_level``, the default
+comp_level value is 1, and it supports levels from 1 to 9
+
+
+Performance Testing with QATzip
+===============================
+
+Testing environment is being set as below:
+
+VM configuration:16 vCPU, 64G memory;
+
+VM Workload: all vCPUs are idle and 54G memory is filled with Silesia data;
+
+QAT Devices: 4;
+
+Sender migration parameters:
+
+.. code-block:: shell
+
+    migrate_set_capability multifd on
+    migrate_set_parameter multifd-channels 2/4/8
+    migrate_set_parameter max-bandwidth 1G/10G
+    migrate_set_parameter multifd-compression qatzip/zstd
+
+Receiver migration parameters:
+
+.. code-block:: shell
+
+    migrate_set_capability multifd on
+    migrate_set_parameter multifd-channels 2
+    migrate_set_parameter multifd-compression qatzip/zstd
+
+max-bandwidth: 1 GBps (Gbytes/sec)
+
+.. code-block:: text
+
+    |-----------|--------|---------|----------|------|------|
+    |2 Channels |Total   |down     |throughput| send | recv |
+    |           |time(ms)|time(ms) |(mbps)    | cpu %| cpu% |
+    |-----------|--------|---------|----------|------|------|
+    |qatzip     |   21607|       77|      8051|    88|   125|
+    |-----------|--------|---------|----------|------|------|
+    |zstd       |   78351|       96|      2199|   204|    80|
+    |-----------|--------|---------|----------|------|------|
+
+    |-----------|--------|---------|----------|------|------|
+    |4 Channels |Total   |down     |throughput| send | recv |
+    |           |time(ms)|time(ms) |(mbps)    | cpu %| cpu% |
+    |-----------|--------|---------|----------|------|------|
+    |qatzip     |   20336|       25|      8557|   110|   190|
+    |-----------|--------|---------|----------|------|------|
+    |zstd       |   39324|       31|      4389|   406|   160|
+    |-----------|--------|---------|----------|------|------|
+
+    |-----------|--------|---------|----------|------|------|
+    |8 Channels |Total   |down     |throughput| send | recv |
+    |           |time(ms)|time(ms) |(mbps)    | cpu %| cpu% |
+    |-----------|--------|---------|----------|------|------|
+    |qatzip     |   20208|       22|      8613|   125|   300|
+    |-----------|--------|---------|----------|------|------|
+    |zstd       |   20515|       22|      8438|   800|   340|
+    |-----------|--------|---------|----------|------|------|
+
+max-bandwidth: 10 GBps (Gbytes/sec)
+
+.. code-block:: text
+
+    |-----------|--------|---------|----------|------|------|
+    |2 Channels |Total   |down     |throughput| send | recv |
+    |           |time(ms)|time(ms) |(mbps)    | cpu %| cpu% |
+    |-----------|--------|---------|----------|------|------|
+    |qatzip     |   22450|       77|      7748|    80|   125|
+    |-----------|--------|---------|----------|------|------|
+    |zstd       |   78339|       76|      2199|   204|    80|
+    |-----------|--------|---------|----------|------|------|
+
+    |-----------|--------|---------|----------|------|------|
+    |4 Channels |Total   |down     |throughput| send | recv |
+    |           |time(ms)|time(ms) |(mbps)    | cpu %| cpu% |
+    |-----------|--------|---------|----------|------|------|
+    |qatzip     |   13017|       24|     13401|   180|   285|
+    |-----------|--------|---------|----------|------|------|
+    |zstd       |   39466|       21|      4373|   406|   160|
+    |-----------|--------|---------|----------|------|------|
+
+    |-----------|--------|---------|----------|------|------|
+    |8 Channels |Total   |down     |throughput| send | recv |
+    |           |time(ms)|time(ms) |(mbps)    | cpu %| cpu% |
+    |-----------|--------|---------|----------|------|------|
+    |qatzip     |   10255|       22|     17037|   280|   590|
+    |-----------|--------|---------|----------|------|------|
+    |zstd       |   20126|       77|      8595|   810|   340|
+    |-----------|--------|---------|----------|------|------|
+
+max-bandwidth: 1.25 GBps (Gbytes/sec)
+
+.. code-block:: text
+
+    |-----------|--------|---------|----------|----------|------|------|
+    |8 Channels |Total   |down     |throughput|pages per | send | recv |
+    |           |time(ms)|time(ms) |(mbps)    |second    | cpu %| cpu% |
+    |-----------|--------|---------|----------|----------|------|------|
+    |qatzip     |   16630|       28|     10467|   2940235|   160|   360|
+    |-----------|--------|---------|----------|----------|------|------|
+    |zstd       |   20165|       24|      8579|   2391465|   810|   340|
+    |-----------|--------|---------|----------|----------|------|------|
+    |none       |   46063|       40|     10848|    330240|    45|    85|
+    |-----------|--------|---------|----------|----------|------|------|
+
+If the user has enabled compression in live migration, using QAT can save the
+host CPU resources.
+
+When compression is enabled, the bottleneck of migration is usually the
+compression throughput on the sender side, since CPU decompression throughput
+is higher than compression, some reference data
+https://github.com/inikep/lzbench, so more CPU resources need to be allocated
+to the sender side.
+
+Summary:
+
+1. In the 1GBps case, QAT only uses 88% CPU utilization to reach 1GBps, but
+   ZSTD needs 800%.
+
+2. In the 10Gbps case, QAT uses 180% CPU utilization to reach 10GBps. but ZSTD
+   still cannot reach 10Gbps even if it uses 810%.
+
+3. The QAT decompression CPU utilization is higher than compression and ZSTD,
+   because:
+
+   a. When using QAT compression, the data needs to be copied to the QAT memory
+   (for DMA operations), and the same for decompression. However,
+   do_user_addr_fault will be triggered during decompression because the QAT
+   decompressed data is copied to the VM address space for the first time, in
+   addition, both compression and decompression are processed by QAT and do not
+   consume CPU resources, so the CPU utilization of the receiver is slightly
+   higher than the sender.
+
+   b. Since zstd decompression decompresses data directly into the VM address
+   space, there is one less memory copy than QAT, so the CPU utilization on the
+   receiver is better than QAT. For the 1GBps case, the receiver CPU
+   utilization is 125%, and the memory copy occupies ~80% of CPU utilization.
+
+How To Choose Between QATzip and QPL
+====================================
+Starting from Intel 4th Gen Intel Xeon Scalable processors, codenamed Sapphire
+Rapids processor(``SPR``), it supports multiple build-in accelerators including
+``QAT`` and ``IAA``, the former can accelerate ``QATzip``, and the latter is
+used to accelerate ``QPL``.
+
+Here are some suggestions:
+
+1 - If your live migration scenario is limited network bandwidth and ``QAT``
+hardware resources exceed ``IAA``, then use the ``QATzip`` method, which
+can save a lot of host CPU resources for compression.
+
+2 - If your system cannot support shared virtual memory(SVM) technology, please
+use ``QATzip`` method because ``QPL`` performance is not good without SVM
+support.
+
+3 - For other scenarios, please use the ``QPL`` method first.
-- 
Yichen Wang
RE: [PATCH v6 1/5] docs/migration: add qatzip compression feature
Posted by Liu, Yuan1 4 months, 1 week ago
> -----Original Message-----
> From: Yichen Wang <yichen.wang@bytedance.com>
> Sent: Tuesday, July 16, 2024 6:13 AM
> To: Peter Xu <peterx@redhat.com>; Fabiano Rosas <farosas@suse.de>; Paolo
> Bonzini <pbonzini@redhat.com>; Daniel P. Berrangé <berrange@redhat.com>;
> Eduardo Habkost <eduardo@habkost.net>; Marc-André Lureau
> <marcandre.lureau@redhat.com>; Thomas Huth <thuth@redhat.com>; Philippe
> Mathieu-Daudé <philmd@linaro.org>; Eric Blake <eblake@redhat.com>; Markus
> Armbruster <armbru@redhat.com>; Laurent Vivier <lvivier@redhat.com>; qemu-
> devel@nongnu.org
> Cc: Hao Xiang <hao.xiang@linux.dev>; Liu, Yuan1 <yuan1.liu@intel.com>;
> Zou, Nanhai <nanhai.zou@intel.com>; Ho-Ren (Jack) Chuang
> <horenchuang@bytedance.com>; Wang, Yichen <yichen.wang@bytedance.com>
> Subject: [PATCH v6 1/5] docs/migration: add qatzip compression feature
> 
> From: Yuan Liu <yuan1.liu@intel.com>
> 
> add Intel QATzip compression method introduction
> 
> Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
> Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Reviewed-by: Yichen Wang <yichen.wang@bytedance.com>
> ---
>  docs/devel/migration/features.rst           |   1 +
>  docs/devel/migration/qatzip-compression.rst | 251 ++++++++++++++++++++
>  2 files changed, 252 insertions(+)
>  create mode 100644 docs/devel/migration/qatzip-compression.rst
> 
> diff --git a/docs/devel/migration/features.rst
> b/docs/devel/migration/features.rst
> index 58f8fd9e16..8f431d52f9 100644
> --- a/docs/devel/migration/features.rst
> +++ b/docs/devel/migration/features.rst
> @@ -14,3 +14,4 @@ Migration has plenty of features to support different
> use cases.
>     CPR
>     qpl-compression
>     uadk-compression
> +   qatzip-compression
> diff --git a/docs/devel/migration/qatzip-compression.rst
> b/docs/devel/migration/qatzip-compression.rst
> new file mode 100644
> index 0000000000..72fa3e2826
> --- /dev/null
> +++ b/docs/devel/migration/qatzip-compression.rst
> @@ -0,0 +1,251 @@
> +==================
> +QATzip Compression
> +==================
> +In scenarios with limited network bandwidth, the ``QATzip`` solution can
> help
> +users save a lot of host CPU resources by accelerating compression and
> +decompression through the Intel QuickAssist Technology(``QAT``) hardware.

Hi Yichen

Thanks for adding the part of Performance Testing with QATzip, I wonder if we
can remove Performance Testing with QATzip part and directly add the following
content. 

Here, we use a typical example of limited bandwidth to illustrate the advantages
of QATzip. If the user is interested in qatzip, he still needs to verify the performance
by himself.

+The following test was conducted using 8 multifd channels and 10Gbps network
+bandwidth. The results show that, compared to zstd, ``QATzip`` significantly
+saves CPU resources on the sender and reduces migration time. Compared to the
+uncompressed solution, ``QATzip`` greatly improves the dirty page processing
+capability, indicated by the Pages per Second metric, and also reduces the
+total migration time.
+
+::
+
+   VM Configuration: 16 vCPU and 64G memory
+   VM Workload: all vCPUs are idle and 54G memory is filled with Silesia data.
+   QAT Devices: 4
+   |-----------|--------|---------|----------|----------|------|------|
+   |8 Channels |Total   |down     |throughput|pages per | send | recv |
+   |           |time(ms)|time(ms) |(mbps)    |second    | cpu %| cpu% |
+   |-----------|--------|---------|----------|----------|------|------|
+   |qatzip     |   16630|       28|     10467|   2940235|   160|   360|
+   |-----------|--------|---------|----------|----------|------|------|
+   |zstd       |   20165|       24|      8579|   2391465|   810|   340|
+   |-----------|--------|---------|----------|----------|------|------|
+   |none       |   46063|       40|     10848|    330240|    45|    85|
+   |-----------|--------|---------|----------|----------|------|------|


> +``QATzip`` is a user space library which builds on top of the Intel
> QuickAssist
> +Technology user space library, to provide extended accelerated
> compression and
> +decompression services.
> +
> +For more ``QATzip`` introduction, please refer to `QATzip Introduction
> +<https://github.com/intel/QATzip?tab=readme-ov-file#introductionl>`_
> +
> +QATzip Compression Framework
> +============================
> +
> +::
> +
> +  +----------------+
> +  | MultiFd Thread |
> +  +-------+--------+
> +          |
> +          | compress/decompress
> +  +-------+--------+
> +  | QATzip library |
> +  +-------+--------+
> +          |
> +  +-------+--------+
> +  |  QAT library   |
> +  +-------+--------+
> +          |         user space
> +  --------+---------------------
> +          |         kernel space
> +   +------+-------+
> +   |  QAT  Driver |
> +   +------+-------+
> +          |
> +   +------+-------+
> +   | QAT Devices  |
> +   +--------------+
> +
> +
> +QATzip Installation
> +-------------------
> +
> +The ``QATzip`` installation package has been integrated into some Linux
> +distributions and can be installed directly. For example, the Ubuntu
> Server
> +24.04 LTS system can be installed using below command
> +
> +.. code-block:: shell
> +
> +   #apt search qatzip
> +   libqatzip-dev/noble 1.2.0-0ubuntu3 amd64
> +     Intel QuickAssist user space library development files
> +
> +   libqatzip3/noble 1.2.0-0ubuntu3 amd64
> +     Intel QuickAssist user space library
> +
> +   qatzip/noble,now 1.2.0-0ubuntu3 amd64 [installed]
> +     Compression user-space tool for Intel QuickAssist Technology
> +
> +   #sudo apt install libqatzip-dev libqatzip3 qatzip
> +
> +If your system does not support the ``QATzip`` installation package, you
> can
> +use the source code to build and install, please refer to `QATzip source
> code installation
> +<https://github.com/intel/QATzip?tab=readme-ov-file#build-intel-
> quickassist-technology-driver>`_
> +
> +QAT Hardware Deployment
> +-----------------------
> +
> +``QAT`` supports physical functions(PFs) and virtual functions(VFs) for
> +deployment, and users can configure ``QAT`` resources for migration
> according
> +to actual needs. For more details about ``QAT`` deployment, please refer
> to
> +`Intel QuickAssist Technology Documentation
> +<https://intel.github.io/quickassist/index.html>`_
> +
> +For more ``QAT`` hardware introduction, please refer to `intel-quick-
> assist-technology-overview
> +<https://www.intel.com/content/www/us/en/architecture-and-
> technology/intel-quick-assist-technology-overview.html>`_
> +
> +How To Use QATzip Compression
> +=============================
> +
> +1 - Install ``QATzip`` library
> +
> +2 - Build ``QEMU`` with ``--enable-qatzip`` parameter
> +
> +  E.g. configure --target-list=x86_64-softmmu --enable-kvm ``--enable-
> qatzip``
> +
> +3 - Set ``migrate_set_parameter multifd-compression qatzip``
> +
> +4 - Set ``migrate_set_parameter multifd-qatzip-level comp_level``, the
> default
> +comp_level value is 1, and it supports levels from 1 to 9
> +
> +
> +Performance Testing with QATzip
> +===============================
> +
> +Testing environment is being set as below:
> +
> +VM configuration:16 vCPU, 64G memory;
> +
> +VM Workload: all vCPUs are idle and 54G memory is filled with Silesia
> data;
> +
> +QAT Devices: 4;
> +
> +Sender migration parameters:
> +
> +.. code-block:: shell
> +
> +    migrate_set_capability multifd on
> +    migrate_set_parameter multifd-channels 2/4/8
> +    migrate_set_parameter max-bandwidth 1G/10G
> +    migrate_set_parameter multifd-compression qatzip/zstd
> +
> +Receiver migration parameters:
> +
> +.. code-block:: shell
> +
> +    migrate_set_capability multifd on
> +    migrate_set_parameter multifd-channels 2
> +    migrate_set_parameter multifd-compression qatzip/zstd
> +
> +max-bandwidth: 1 GBps (Gbytes/sec)
> +
> +.. code-block:: text
> +
> +    |-----------|--------|---------|----------|------|------|
> +    |2 Channels |Total   |down     |throughput| send | recv |
> +    |           |time(ms)|time(ms) |(mbps)    | cpu %| cpu% |
> +    |-----------|--------|---------|----------|------|------|
> +    |qatzip     |   21607|       77|      8051|    88|   125|
> +    |-----------|--------|---------|----------|------|------|
> +    |zstd       |   78351|       96|      2199|   204|    80|
> +    |-----------|--------|---------|----------|------|------|
> +
> +    |-----------|--------|---------|----------|------|------|
> +    |4 Channels |Total   |down     |throughput| send | recv |
> +    |           |time(ms)|time(ms) |(mbps)    | cpu %| cpu% |
> +    |-----------|--------|---------|----------|------|------|
> +    |qatzip     |   20336|       25|      8557|   110|   190|
> +    |-----------|--------|---------|----------|------|------|
> +    |zstd       |   39324|       31|      4389|   406|   160|
> +    |-----------|--------|---------|----------|------|------|
> +
> +    |-----------|--------|---------|----------|------|------|
> +    |8 Channels |Total   |down     |throughput| send | recv |
> +    |           |time(ms)|time(ms) |(mbps)    | cpu %| cpu% |
> +    |-----------|--------|---------|----------|------|------|
> +    |qatzip     |   20208|       22|      8613|   125|   300|
> +    |-----------|--------|---------|----------|------|------|
> +    |zstd       |   20515|       22|      8438|   800|   340|
> +    |-----------|--------|---------|----------|------|------|
> +
> +max-bandwidth: 10 GBps (Gbytes/sec)
> +
> +.. code-block:: text
> +
> +    |-----------|--------|---------|----------|------|------|
> +    |2 Channels |Total   |down     |throughput| send | recv |
> +    |           |time(ms)|time(ms) |(mbps)    | cpu %| cpu% |
> +    |-----------|--------|---------|----------|------|------|
> +    |qatzip     |   22450|       77|      7748|    80|   125|
> +    |-----------|--------|---------|----------|------|------|
> +    |zstd       |   78339|       76|      2199|   204|    80|
> +    |-----------|--------|---------|----------|------|------|
> +
> +    |-----------|--------|---------|----------|------|------|
> +    |4 Channels |Total   |down     |throughput| send | recv |
> +    |           |time(ms)|time(ms) |(mbps)    | cpu %| cpu% |
> +    |-----------|--------|---------|----------|------|------|
> +    |qatzip     |   13017|       24|     13401|   180|   285|
> +    |-----------|--------|---------|----------|------|------|
> +    |zstd       |   39466|       21|      4373|   406|   160|
> +    |-----------|--------|---------|----------|------|------|
> +
> +    |-----------|--------|---------|----------|------|------|
> +    |8 Channels |Total   |down     |throughput| send | recv |
> +    |           |time(ms)|time(ms) |(mbps)    | cpu %| cpu% |
> +    |-----------|--------|---------|----------|------|------|
> +    |qatzip     |   10255|       22|     17037|   280|   590|
> +    |-----------|--------|---------|----------|------|------|
> +    |zstd       |   20126|       77|      8595|   810|   340|
> +    |-----------|--------|---------|----------|------|------|
> +
> +max-bandwidth: 1.25 GBps (Gbytes/sec)
> +
> +.. code-block:: text
> +
> +    |-----------|--------|---------|----------|----------|------|------|
> +    |8 Channels |Total   |down     |throughput|pages per | send | recv |
> +    |           |time(ms)|time(ms) |(mbps)    |second    | cpu %| cpu% |
> +    |-----------|--------|---------|----------|----------|------|------|
> +    |qatzip     |   16630|       28|     10467|   2940235|   160|   360|
> +    |-----------|--------|---------|----------|----------|------|------|
> +    |zstd       |   20165|       24|      8579|   2391465|   810|   340|
> +    |-----------|--------|---------|----------|----------|------|------|
> +    |none       |   46063|       40|     10848|    330240|    45|    85|
> +    |-----------|--------|---------|----------|----------|------|------|
> +
> +If the user has enabled compression in live migration, using QAT can save
> the
> +host CPU resources.
> +
> +When compression is enabled, the bottleneck of migration is usually the
> +compression throughput on the sender side, since CPU decompression
> throughput
> +is higher than compression, some reference data
> +https://github.com/inikep/lzbench, so more CPU resources need to be
> allocated
> +to the sender side.
> +
> +Summary:
> +
> +1. In the 1GBps case, QAT only uses 88% CPU utilization to reach 1GBps,
> but
> +   ZSTD needs 800%.
> +
> +2. In the 10Gbps case, QAT uses 180% CPU utilization to reach 10GBps. but
> ZSTD
> +   still cannot reach 10Gbps even if it uses 810%.
> +
> +3. The QAT decompression CPU utilization is higher than compression and
> ZSTD,
> +   because:
> +
> +   a. When using QAT compression, the data needs to be copied to the QAT
> memory
> +   (for DMA operations), and the same for decompression. However,
> +   do_user_addr_fault will be triggered during decompression because the
> QAT
> +   decompressed data is copied to the VM address space for the first
> time, in
> +   addition, both compression and decompression are processed by QAT and
> do not
> +   consume CPU resources, so the CPU utilization of the receiver is
> slightly
> +   higher than the sender.
> +
> +   b. Since zstd decompression decompresses data directly into the VM
> address
> +   space, there is one less memory copy than QAT, so the CPU utilization
> on the
> +   receiver is better than QAT. For the 1GBps case, the receiver CPU
> +   utilization is 125%, and the memory copy occupies ~80% of CPU
> utilization.
> +
> +How To Choose Between QATzip and QPL
> +====================================
> +Starting from Intel 4th Gen Intel Xeon Scalable processors, codenamed
> Sapphire
> +Rapids processor(``SPR``), it supports multiple build-in accelerators
> including
> +``QAT`` and ``IAA``, the former can accelerate ``QATzip``, and the latter
> is
> +used to accelerate ``QPL``.
> +
> +Here are some suggestions:
> +
> +1 - If your live migration scenario is limited network bandwidth and
> ``QAT``
> +hardware resources exceed ``IAA``, then use the ``QATzip`` method, which
> +can save a lot of host CPU resources for compression.
> +
> +2 - If your system cannot support shared virtual memory(SVM) technology,
> please
> +use ``QATzip`` method because ``QPL`` performance is not good without SVM
> +support.
> +
> +3 - For other scenarios, please use the ``QPL`` method first.
> --
> Yichen Wang
Re: [PATCH v6 1/5] docs/migration: add qatzip compression feature
Posted by Peter Xu 4 months, 1 week ago
On Tue, Jul 16, 2024 at 02:34:07AM +0000, Liu, Yuan1 wrote:
> > -----Original Message-----
> > From: Yichen Wang <yichen.wang@bytedance.com>
> > Sent: Tuesday, July 16, 2024 6:13 AM
> > To: Peter Xu <peterx@redhat.com>; Fabiano Rosas <farosas@suse.de>; Paolo
> > Bonzini <pbonzini@redhat.com>; Daniel P. Berrangé <berrange@redhat.com>;
> > Eduardo Habkost <eduardo@habkost.net>; Marc-André Lureau
> > <marcandre.lureau@redhat.com>; Thomas Huth <thuth@redhat.com>; Philippe
> > Mathieu-Daudé <philmd@linaro.org>; Eric Blake <eblake@redhat.com>; Markus
> > Armbruster <armbru@redhat.com>; Laurent Vivier <lvivier@redhat.com>; qemu-
> > devel@nongnu.org
> > Cc: Hao Xiang <hao.xiang@linux.dev>; Liu, Yuan1 <yuan1.liu@intel.com>;
> > Zou, Nanhai <nanhai.zou@intel.com>; Ho-Ren (Jack) Chuang
> > <horenchuang@bytedance.com>; Wang, Yichen <yichen.wang@bytedance.com>
> > Subject: [PATCH v6 1/5] docs/migration: add qatzip compression feature
> > 
> > From: Yuan Liu <yuan1.liu@intel.com>
> > 
> > add Intel QATzip compression method introduction
> > 
> > Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
> > Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
> > Reviewed-by: Peter Xu <peterx@redhat.com>
> > Reviewed-by: Yichen Wang <yichen.wang@bytedance.com>
> > ---
> >  docs/devel/migration/features.rst           |   1 +
> >  docs/devel/migration/qatzip-compression.rst | 251 ++++++++++++++++++++
> >  2 files changed, 252 insertions(+)
> >  create mode 100644 docs/devel/migration/qatzip-compression.rst
> > 
> > diff --git a/docs/devel/migration/features.rst
> > b/docs/devel/migration/features.rst
> > index 58f8fd9e16..8f431d52f9 100644
> > --- a/docs/devel/migration/features.rst
> > +++ b/docs/devel/migration/features.rst
> > @@ -14,3 +14,4 @@ Migration has plenty of features to support different
> > use cases.
> >     CPR
> >     qpl-compression
> >     uadk-compression
> > +   qatzip-compression
> > diff --git a/docs/devel/migration/qatzip-compression.rst
> > b/docs/devel/migration/qatzip-compression.rst
> > new file mode 100644
> > index 0000000000..72fa3e2826
> > --- /dev/null
> > +++ b/docs/devel/migration/qatzip-compression.rst
> > @@ -0,0 +1,251 @@
> > +==================
> > +QATzip Compression
> > +==================
> > +In scenarios with limited network bandwidth, the ``QATzip`` solution can
> > help
> > +users save a lot of host CPU resources by accelerating compression and
> > +decompression through the Intel QuickAssist Technology(``QAT``) hardware.
> 
> Hi Yichen
> 
> Thanks for adding the part of Performance Testing with QATzip, I wonder if we
> can remove Performance Testing with QATzip part and directly add the following
> content. 
> 
> Here, we use a typical example of limited bandwidth to illustrate the advantages
> of QATzip. If the user is interested in qatzip, he still needs to verify the performance
> by himself.
> 
> +The following test was conducted using 8 multifd channels and 10Gbps network
> +bandwidth. The results show that, compared to zstd, ``QATzip`` significantly
> +saves CPU resources on the sender and reduces migration time. Compared to the
> +uncompressed solution, ``QATzip`` greatly improves the dirty page processing
> +capability, indicated by the Pages per Second metric, and also reduces the
> +total migration time.
> +
> +::
> +
> +   VM Configuration: 16 vCPU and 64G memory
> +   VM Workload: all vCPUs are idle and 54G memory is filled with Silesia data.
> +   QAT Devices: 4
> +   |-----------|--------|---------|----------|----------|------|------|
> +   |8 Channels |Total   |down     |throughput|pages per | send | recv |
> +   |           |time(ms)|time(ms) |(mbps)    |second    | cpu %| cpu% |
> +   |-----------|--------|---------|----------|----------|------|------|
> +   |qatzip     |   16630|       28|     10467|   2940235|   160|   360|
> +   |-----------|--------|---------|----------|----------|------|------|
> +   |zstd       |   20165|       24|      8579|   2391465|   810|   340|
> +   |-----------|--------|---------|----------|----------|------|------|
> +   |none       |   46063|       40|     10848|    330240|    45|    85|
> +   |-----------|--------|---------|----------|----------|------|------|

Yes this looks much simpler and better.  The 10GBps test isn't that useful
at least, especially with nocomp numbers absent.  I didn't say when looking
previously, but it'll be better to clarify the numbers.

Yuan, thanks so much for reviewing all the relevant patches.  It's very
helpful to us.

-- 
Peter Xu