[PATCH 1/7] docs/migration: add uadk compression feature

Shameer Kolothum via posted 7 patches 5 months, 4 weeks ago
There is a newer version of this series
[PATCH 1/7] docs/migration: add uadk compression feature
Posted by Shameer Kolothum via 5 months, 4 weeks ago
Document UADK(User Space Accelerator Development Kit) library details
and how to use that for migration.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 docs/devel/migration/uadk-compression.rst | 144 ++++++++++++++++++++++
 1 file changed, 144 insertions(+)
 create mode 100644 docs/devel/migration/uadk-compression.rst

diff --git a/docs/devel/migration/uadk-compression.rst b/docs/devel/migration/uadk-compression.rst
new file mode 100644
index 0000000000..988b92631e
--- /dev/null
+++ b/docs/devel/migration/uadk-compression.rst
@@ -0,0 +1,144 @@
+=========================================================
+User Space Accelerator Development Kit (UADK) Compression
+=========================================================
+UADK is a general-purpose user space accelerator framework that uses shared
+virtual addressing (SVA) to provide a unified programming interface for
+hardware acceleration of cryptographic and compression algorithms.
+
+UADK includes Unified/User-space-access-intended Accelerator Framework (UACCE),
+which enables hardware accelerators from different vendors that support SVA to
+adapt to UADK.
+
+Currently, HiSilicon Kunpeng hardware accelerators have been registered with
+UACCE. Through the UADK framework, users can run cryptographic and compression
+algorithms using hardware accelerators instead of CPUs, freeing up CPU
+computing power and improving computing performance.
+
+https://github.com/Linaro/uadk/tree/master/docs
+
+UADK Framework
+==============
+UADK consists of UACCE, vendors' drivers, and an algorithm layer. UADK requires
+the hardware accelerator to support SVA, and the operating system to support
+IOMMU and SVA. Hardware accelerators from different vendors are registered as
+different character devices with UACCE by using kernel-mode drivers of the
+vendors. A user can access the hardware accelerators by performing user-mode
+operations on the character devices.
+
+::
+
+          +----------------------------------+
+          |                apps              |
+          +----+------------------------+----+
+               |                        |
+               |                        |
+       +-------+--------+       +-------+-------+
+       |   scheduler    |       | alg libraries |
+       +-------+--------+       +-------+-------+
+               |                         |
+               |                         |
+               |                         |
+               |                +--------+------+
+               |                | vendor drivers|
+               |                +-+-------------+
+               |                  |
+               |                  |
+            +--+------------------+--+
+            |         libwd          |
+    User    +----+-------------+-----+
+    --------------------------------------------------
+    Kernel    +--+-----+   +------+
+              | uacce  |   | smmu |
+              +---+----+   +------+
+                  |
+              +---+------------------+
+              | vendor kernel driver |
+              +----------------------+
+    --------------------------------------------------
+             +----------------------+
+             |   HW Accelerators    |
+             +----------------------+
+
+UADK Installation
+-----------------
+Build UADK
+^^^^^^^^^^
+
+.. code-block:: shell
+
+    git clone https://github.com/Linaro/uadk.git
+    cd uadk
+    mkdir build
+    ./autogen.sh
+    ./configure --prefix=$PWD/build
+    make
+    make install
+
+Without --prefix, UADK will be installed to /usr/local/lib by default.
+If get error:"cannot find -lnuma", please install the libnuma-dev
+
+Run pkg-config libwd to ensure env is setup correctly
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+* export PKG_CONFIG_PATH=$PWD/build/lib/pkgconfig
+* pkg-config libwd --cflags --libs
+  -I/usr/local/include -L/usr/local/lib -lwd
+
+* export PKG_CONFIG_PATH is required on demand.
+  Not required if UADK is installed to /usr/local/lib
+
+UADK Host Kernel Requirements
+-----------------------------
+User needs to make sure that ``UACCE`` is already supported in Linux kernel.
+The kernel version should be at least v5.9 with SVA (Shared Virtual
+Addressing) enabled.
+
+Kernel Configuration
+^^^^^^^^^^^^^^^^^^^^
+
+``UACCE`` could be built as module or built-in.
+
+Here's an example to enable UACCE with hardware accelerator in HiSilicon
+Kunpeng platform.
+
+*    CONFIG_IOMMU_SVA_LIB=y
+*    CONFIG_ARM_SMMU=y
+*    CONFIG_ARM_SMMU_V3=y
+*    CONFIG_ARM_SMMU_V3_SVA=y
+*    CONFIG_PCI_PASID=y
+*    CONFIG_UACCE=y
+*    CONFIG_CRYPTO_DEV_HISI_QM=y
+*    CONFIG_CRYPTO_DEV_HISI_ZIP=y
+
+Make sure all these above kernel configurations are selected.
+
+Accelerator dev node permissions
+--------------------------------
+Harware accelerators(eg: HiSilicon Kunpeng Zip accelerator) gets registered to
+UADK and char devices are created in dev directory. In order to access resources
+on hardware accelerator devices, write permission should be provided to user.
+
+.. code-block:: shell
+
+    $ sudo chmod 777 /dev/hisi_zip-*
+
+How To Use UADK Compression In Qemu Migration
+---------------------------------------------
+* Make sure UADK is installed as above
+* Build ``Qemu`` with ``--enable-uadk`` parameter
+
+  E.g. configure --target-list=aarch64-softmmu --enable-kvm ``--enable-uadk``
+
+* Enable ``UADK`` compression during migration
+
+  Set ``migrate_set_parameter multifd-compression uadk``
+
+Since UADK uses Shared Virtual Addressing(SVA) and device access virtual memory
+directly it is possible that SMMUv3 may enounter page faults while walking the
+IO page tables. This may impact the performance. In order to mitigate this,
+please make sure to specify ``-mem-prealloc`` parameter to the destination VM
+boot parameters.
+
+Though both UADK and ZLIB are based on the deflate compression algorithm, UADK
+is not fully compatible with ZLIB. Hence, please make sure to use ``uadk`` on
+both source and destination during migration.
-- 
2.17.1


Re: [PATCH 1/7] docs/migration: add uadk compression feature
Posted by Fabiano Rosas 5 months, 3 weeks ago
Shameer Kolothum via <qemu-devel@nongnu.org> writes:

> Document UADK(User Space Accelerator Development Kit) library details
> and how to use that for migration.
>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  docs/devel/migration/uadk-compression.rst | 144 ++++++++++++++++++++++

Missing an entry in the features.rst TOC.

>  1 file changed, 144 insertions(+)
>  create mode 100644 docs/devel/migration/uadk-compression.rst
>
RE: [PATCH 1/7] docs/migration: add uadk compression feature
Posted by Liu, Yuan1 5 months, 4 weeks ago
> -----Original Message-----
> From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Sent: Wednesday, May 29, 2024 5:44 PM
> To: peterx@redhat.com; farosas@suse.de; Liu, Yuan1 <yuan1.liu@intel.com>
> Cc: qemu-devel@nongnu.org; linuxarm@huawei.com; linwenkai6@hisilicon.com;
> zhangfei.gao@linaro.org; huangchenghai2@huawei.com
> Subject: [PATCH 1/7] docs/migration: add uadk compression feature
> 
> Document UADK(User Space Accelerator Development Kit) library details
> and how to use that for migration.
> 
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  docs/devel/migration/uadk-compression.rst | 144 ++++++++++++++++++++++
>  1 file changed, 144 insertions(+)
>  create mode 100644 docs/devel/migration/uadk-compression.rst
> 
> diff --git a/docs/devel/migration/uadk-compression.rst
> b/docs/devel/migration/uadk-compression.rst
> new file mode 100644
> index 0000000000..988b92631e
> --- /dev/null
> +++ b/docs/devel/migration/uadk-compression.rst
> @@ -0,0 +1,144 @@
> +=========================================================
> +User Space Accelerator Development Kit (UADK) Compression
> +=========================================================
> +UADK is a general-purpose user space accelerator framework that uses
> shared
> +virtual addressing (SVA) to provide a unified programming interface for
> +hardware acceleration of cryptographic and compression algorithms.
> +
> +UADK includes Unified/User-space-access-intended Accelerator Framework
> (UACCE),
> +which enables hardware accelerators from different vendors that support
> SVA to
> +adapt to UADK.
> +
> +Currently, HiSilicon Kunpeng hardware accelerators have been registered
> with
> +UACCE. Through the UADK framework, users can run cryptographic and
> compression
> +algorithms using hardware accelerators instead of CPUs, freeing up CPU
> +computing power and improving computing performance.
> +
> +https://github.com/Linaro/uadk/tree/master/docs
> +
> +UADK Framework
> +==============
> +UADK consists of UACCE, vendors' drivers, and an algorithm layer. UADK
> requires
> +the hardware accelerator to support SVA, and the operating system to
> support
> +IOMMU and SVA. Hardware accelerators from different vendors are
> registered as
> +different character devices with UACCE by using kernel-mode drivers of
> the
> +vendors. A user can access the hardware accelerators by performing user-
> mode
> +operations on the character devices.
> +
> +::
> +
> +          +----------------------------------+
> +          |                apps              |
> +          +----+------------------------+----+
> +               |                        |
> +               |                        |
> +       +-------+--------+       +-------+-------+
> +       |   scheduler    |       | alg libraries |
> +       +-------+--------+       +-------+-------+
> +               |                         |
> +               |                         |
> +               |                         |
> +               |                +--------+------+
> +               |                | vendor drivers|
> +               |                +-+-------------+
> +               |                  |
> +               |                  |
> +            +--+------------------+--+
> +            |         libwd          |
> +    User    +----+-------------+-----+
> +    --------------------------------------------------
> +    Kernel    +--+-----+   +------+
> +              | uacce  |   | smmu |
> +              +---+----+   +------+
> +                  |
> +              +---+------------------+
> +              | vendor kernel driver |
> +              +----------------------+
> +    --------------------------------------------------
> +             +----------------------+
> +             |   HW Accelerators    |
> +             +----------------------+
> +
> +UADK Installation
> +-----------------
> +Build UADK
> +^^^^^^^^^^
> +
> +.. code-block:: shell
> +
> +    git clone https://github.com/Linaro/uadk.git
> +    cd uadk
> +    mkdir build
> +    ./autogen.sh
> +    ./configure --prefix=$PWD/build
> +    make
> +    make install
> +
> +Without --prefix, UADK will be installed to /usr/local/lib by default.
> +If get error:"cannot find -lnuma", please install the libnuma-dev
> +
> +Run pkg-config libwd to ensure env is setup correctly
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +* export PKG_CONFIG_PATH=$PWD/build/lib/pkgconfig
> +* pkg-config libwd --cflags --libs
> +  -I/usr/local/include -L/usr/local/lib -lwd
> +
> +* export PKG_CONFIG_PATH is required on demand.
> +  Not required if UADK is installed to /usr/local/lib
> +
> +UADK Host Kernel Requirements
> +-----------------------------
> +User needs to make sure that ``UACCE`` is already supported in Linux
> kernel.
> +The kernel version should be at least v5.9 with SVA (Shared Virtual
> +Addressing) enabled.
> +
> +Kernel Configuration
> +^^^^^^^^^^^^^^^^^^^^
> +
> +``UACCE`` could be built as module or built-in.
> +
> +Here's an example to enable UACCE with hardware accelerator in HiSilicon
> +Kunpeng platform.
> +
> +*    CONFIG_IOMMU_SVA_LIB=y
> +*    CONFIG_ARM_SMMU=y
> +*    CONFIG_ARM_SMMU_V3=y
> +*    CONFIG_ARM_SMMU_V3_SVA=y
> +*    CONFIG_PCI_PASID=y
> +*    CONFIG_UACCE=y
> +*    CONFIG_CRYPTO_DEV_HISI_QM=y
> +*    CONFIG_CRYPTO_DEV_HISI_ZIP=y
> +
> +Make sure all these above kernel configurations are selected.
> +
> +Accelerator dev node permissions
> +--------------------------------
> +Harware accelerators(eg: HiSilicon Kunpeng Zip accelerator) gets
> registered to
> +UADK and char devices are created in dev directory. In order to access
> resources
> +on hardware accelerator devices, write permission should be provided to
> user.
> +
> +.. code-block:: shell
> +
> +    $ sudo chmod 777 /dev/hisi_zip-*
> +
> +How To Use UADK Compression In Qemu Migration
> +---------------------------------------------
> +* Make sure UADK is installed as above
> +* Build ``Qemu`` with ``--enable-uadk`` parameter
> +
> +  E.g. configure --target-list=aarch64-softmmu --enable-kvm ``--enable-
> uadk``
> +
> +* Enable ``UADK`` compression during migration
> +
> +  Set ``migrate_set_parameter multifd-compression uadk``
> +
> +Since UADK uses Shared Virtual Addressing(SVA) and device access virtual
> memory
> +directly it is possible that SMMUv3 may enounter page faults while
> walking the
> +IO page tables. This may impact the performance. In order to mitigate
> this,
> +please make sure to specify ``-mem-prealloc`` parameter to the
> destination VM
> +boot parameters.

Thank you so much for putting the IAA solution at the top and cc me.

I think migration performance will be better with '-mem-prealloc' option,
but I am considering whether '-mem-prealloc' is a mandatory option, from my 
experience, SVA performance drops mainly caused by IOTLB flush and IO page fault,
I had some discussions with Peter Xu about the IOTLB flush issue, and it has 
been improved.
https://patchew.org/QEMU/PH7PR11MB5941F04FBFB964CB2C968866A33E2@PH7PR11MB5941.namprd11.prod.outlook.com/

For IO page fault, the QPL(IAA userspace library) can process page fault
request instead of IOMMU, it means we can disable the I/O page fault feature
on the IAA device, and let the device still use SVA technology to avoid memory
copy.

I will provide the test results in my next version, do you have any ideas or 
suggestions about this, thanks.

> +Though both UADK and ZLIB are based on the deflate compression algorithm,
> UADK
> +is not fully compatible with ZLIB. Hence, please make sure to use
> ``uadk`` on
> +both source and destination during migration.
> --
> 2.17.1

RE: [PATCH 1/7] docs/migration: add uadk compression feature
Posted by Shameerali Kolothum Thodi via 5 months, 4 weeks ago

> -----Original Message-----
> From: Liu, Yuan1 <yuan1.liu@intel.com>
> Sent: Thursday, May 30, 2024 2:25 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> peterx@redhat.com; farosas@suse.de
> Cc: qemu-devel@nongnu.org; Linuxarm <linuxarm@huawei.com>; linwenkai (C)
> <linwenkai6@hisilicon.com>; zhangfei.gao@linaro.org; huangchenghai
> <huangchenghai2@huawei.com>
> Subject: RE: [PATCH 1/7] docs/migration: add uadk compression feature
> 
> > -----Original Message-----
> > From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > Sent: Wednesday, May 29, 2024 5:44 PM
> > To: peterx@redhat.com; farosas@suse.de; Liu, Yuan1 <yuan1.liu@intel.com>
> > Cc: qemu-devel@nongnu.org; linuxarm@huawei.com;
> linwenkai6@hisilicon.com;
> > zhangfei.gao@linaro.org; huangchenghai2@huawei.com
> > Subject: [PATCH 1/7] docs/migration: add uadk compression feature

[...]

> > +Since UADK uses Shared Virtual Addressing(SVA) and device access virtual
> > memory
> > +directly it is possible that SMMUv3 may enounter page faults while
> > walking the
> > +IO page tables. This may impact the performance. In order to mitigate
> > this,
> > +please make sure to specify ``-mem-prealloc`` parameter to the
> > destination VM
> > +boot parameters.
> 
> Thank you so much for putting the IAA solution at the top and cc me.
> 
> I think migration performance will be better with '-mem-prealloc' option,
> but I am considering whether '-mem-prealloc' is a mandatory option, from my
> experience, SVA performance drops mainly caused by IOTLB flush and IO page
> fault,
> I had some discussions with Peter Xu about the IOTLB flush issue, and it has
> been improved.
> https://patchew.org/QEMU/PH7PR11MB5941F04FBFB964CB2C968866A33E2@
> PH7PR11MB5941.namprd11.prod.outlook.com/

Thanks for the link. Yes I have seen that discussion and this series is on top of  that
patch for avoiding the zero page read fault.

> 
> For IO page fault, the QPL(IAA userspace library) can process page fault
> request instead of IOMMU,

Sorry I didn't get this part completely. So if the page fault happens how the library
can handle it without IOMMU? Or you meant library will do memory perfecting before
to avoid the page fault?

 it means we can disable the I/O page fault feature
> on the IAA device, and let the device still use SVA technology to avoid memory
> copy.
> 
> I will provide the test results in my next version, do you have any ideas or
> suggestions about this, thanks.

I think our UADK test tool had an option to prefect the memory(write some random data
to memory) to avoid page fault penalty. I am not sure that is exposed through the API or not.
I will check with our UADK team.

Please do CC me when you post your next revision.

Thanks,
Shameer
RE: [PATCH 1/7] docs/migration: add uadk compression feature
Posted by Liu, Yuan1 5 months, 4 weeks ago
> -----Original Message-----
> From: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Sent: Thursday, May 30, 2024 10:01 PM
> To: Liu, Yuan1 <yuan1.liu@intel.com>; peterx@redhat.com; farosas@suse.de
> Cc: qemu-devel@nongnu.org; Linuxarm <linuxarm@huawei.com>; linwenkai (C)
> <linwenkai6@hisilicon.com>; zhangfei.gao@linaro.org; huangchenghai
> <huangchenghai2@huawei.com>
> Subject: RE: [PATCH 1/7] docs/migration: add uadk compression feature
> 
> 
> 
> > -----Original Message-----
> > From: Liu, Yuan1 <yuan1.liu@intel.com>
> > Sent: Thursday, May 30, 2024 2:25 PM
> > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> > peterx@redhat.com; farosas@suse.de
> > Cc: qemu-devel@nongnu.org; Linuxarm <linuxarm@huawei.com>; linwenkai (C)
> > <linwenkai6@hisilicon.com>; zhangfei.gao@linaro.org; huangchenghai
> > <huangchenghai2@huawei.com>
> > Subject: RE: [PATCH 1/7] docs/migration: add uadk compression feature
> >
> > > -----Original Message-----
> > > From: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > > Sent: Wednesday, May 29, 2024 5:44 PM
> > > To: peterx@redhat.com; farosas@suse.de; Liu, Yuan1
> <yuan1.liu@intel.com>
> > > Cc: qemu-devel@nongnu.org; linuxarm@huawei.com;
> > linwenkai6@hisilicon.com;
> > > zhangfei.gao@linaro.org; huangchenghai2@huawei.com
> > > Subject: [PATCH 1/7] docs/migration: add uadk compression feature
> 
> [...]
> 
> > > +Since UADK uses Shared Virtual Addressing(SVA) and device access
> virtual
> > > memory
> > > +directly it is possible that SMMUv3 may enounter page faults while
> > > walking the
> > > +IO page tables. This may impact the performance. In order to mitigate
> > > this,
> > > +please make sure to specify ``-mem-prealloc`` parameter to the
> > > destination VM
> > > +boot parameters.
> >
> > Thank you so much for putting the IAA solution at the top and cc me.
> >
> > I think migration performance will be better with '-mem-prealloc'
> option,
> > but I am considering whether '-mem-prealloc' is a mandatory option, from
> my
> > experience, SVA performance drops mainly caused by IOTLB flush and IO
> page
> > fault,
> > I had some discussions with Peter Xu about the IOTLB flush issue, and it
> has
> > been improved.
> > https://patchew.org/QEMU/PH7PR11MB5941F04FBFB964CB2C968866A33E2@
> > PH7PR11MB5941.namprd11.prod.outlook.com/
> 
> Thanks for the link. Yes I have seen that discussion and this series is on
> top of  that
> patch for avoiding the zero page read fault.
> 
> >
> > For IO page fault, the QPL(IAA userspace library) can process page fault
> > request instead of IOMMU,
> 
> Sorry I didn't get this part completely. So if the page fault happens how
> the library
> can handle it without IOMMU? Or you meant library will do memory
> perfecting before
> to avoid the page fault?

Yes, when the I/O page fault happens, the hardware will return the fault address
to the QPL, QPL will populate the memory as below, then resubmit the job to
hardware again.

if (AD_STATUS_READ_PAGE_FAULT == completion_record_ptr->status) {
    volatile char* read_fault_address = (char *)fault_address;
    *read_fault_address;
} 
else { // AD_STATUS_WRITE_PAGE_FAULT
    volatile char* write_fault_address = (char *)fault_address;
    *write_fault_address = *write_fault_address;
}

>  it means we can disable the I/O page fault feature
> > on the IAA device, and let the device still use SVA technology to avoid
> memory
> > copy.
> >
> > I will provide the test results in my next version, do you have any
> ideas or
> > suggestions about this, thanks.
> 
> I think our UADK test tool had an option to prefect the memory(write some
> random data
> to memory) to avoid page fault penalty. I am not sure that is exposed
> through the API or not.
> I will check with our UADK team.
> 
> Please do CC me when you post your next revision.

Sure

> Thanks,
> Shameer