RE: [PATCH 0/1] Solve zero page causing multiple page faults

Posted by Liu, Yuan1 4 weeks, 1 day ago
> -----Original Message-----
> From: Liu, Yuan1 <yuan1.liu@intel.com>
> Sent: Monday, April 1, 2024 11:41 PM
> To: peterx@redhat.com; farosas@suse.de
> Cc: qemu-devel@nongnu.org; hao.xiang@bytedance.com;
> bryan.zhang@bytedance.com; Liu, Yuan1 <yuan1.liu@intel.com>; Zou, Nanhai
> <nanhai.zou@intel.com>
> Subject: [PATCH 0/1] Solve zero page causing multiple page faults
> 
> 1. Description of multiple page faults for received zero pages
>     a. -mem-prealloc feature and hugepage backend are not enabled on the
>        destination
>     b. After receiving the zero pages, the destination first determines if
>        the current page content is 0 via buffer_is_zero, this may cause a
>        read page fault
> 
>       perf record -e page-faults information below
>       13.75%  13.75%  multifdrecv_0 qemu-system-x86_64 [.]
> buffer_zero_avx512
>       11.85%  11.85%  multifdrecv_1 qemu-system-x86_64 [.]
> buffer_zero_avx512
>                       multifd_recv_thread
>                       nocomp_recv
>                       multifd_recv_zero_page_process
>                       buffer_is_zero
>                       select_accel_fn
>                       buffer_zero_avx512
> 
>    c. Other page faults mainly come from writing operations to normal and
>       zero pages.
> 
> 2. Solution
>     a. During the multifd migration process, the received pages are
> tracked
>        through RAMBlock's receivedmap.
> 
>     b. If received zero page is not set in recvbitmap, the destination
> will not
>        check whether the page content is 0, thus avoiding the occurrence
> of
>        read fault.
> 
>     c. If the zero page has been set in receivedmap, set the page with 0
>        directly.
> 
>     There are two reasons for this
>     1. It's unlikely a zero page if it's sent once or more.
>     2. For the 1st time destination received a zero page, it must be a
> zero
>        page, so no need to scan for the 1st round.
> 
> 3. Test Result 16 vCPUs and 64G memory VM,  multifd number is 2,
>    and 100G network bandwidth
> 
>     3.1 Test case: 16 vCPUs are idle and only 2G memory are used
>     +-----------+--------+--------+----------+
>     |MultiFD    | total  |downtime|   Page   |
>     |Nocomp     | time   |        | Faults   |
>     |           | (ms)   | (ms)   |          |
>     +-----------+--------+--------+----------+
>     |with       |        |        |          |
>     |recvbitmap |    7335|     180|      2716|
>     +-----------+--------+--------+----------+
>     |without    |        |        |          |
>     |recvbitmap |    7771|     153|    121357|
>     +-----------+--------+--------+----------+
> 
>     +-----------+--------+--------+--------+-------+--------+-------------
> +
>     |MultiFD    | total  |downtime| SVM    |SVM    | IOTLB  | IO
> PageFault|
>     |QPL        | time   |        | IO TLB |IO Page| MaxTime| MaxTime
> |
>     |           | (ms)   | (ms)   | Flush  |Faults | (us)   | (us)
> |
>     +-----------+--------+--------+--------+-------+--------+-------------
> +
>     |with       |        |        |        |       |        |
> |
>     |recvbitmap |   10224|     175|     410|  27429|       1|
> 447|
>     +-----------+--------+--------+--------+-------+--------+-------------
> +
>     |without    |        |        |        |       |        |
> |
>     |recvbitmap |   11253|     153|   80756|  38655|      25|
> 18349|
>     +-----------+--------+--------+--------+-------+--------+-------------
> +
> 
> 
>     3.2 Test case: 16 vCPUs are idle and 56G memory(not zero) are used
>     +-----------+--------+--------+----------+
>     |MultiFD    | total  |downtime|   Page   |
>     |Nocomp     | time   |        | Faults   |
>     |           | (ms)   | (ms)   |          |
>     +-----------+--------+--------+----------+
>     |with       |        |        |          |
>     |recvbitmap |   16825|     165|     52967|
>     +-----------+--------+--------+----------+
>     |without    |        |        |          |
>     |recvbitmap |   12987|     159|   2672677|
>     +-----------+--------+--------+----------+
> 
>     +-----------+--------+--------+--------+-------+--------+-------------
> +
>     |MultiFD    | total  |downtime| SVM    |SVM    | IOTLB  | IO
> PageFault|
>     |QPL        | time   |        | IO TLB |IO Page| MaxTime| MaxTime
> |
>     |           | (ms)   | (ms)   | Flush  |Faults | (us)   | (us)
> |
>     +-----------+--------+--------+--------+-------+--------+-------------
> +
>     |with       |        |        |        |       |        |
> |
>     |recvbitmap |  132315|      77|     890| 937105|      60|
> 9581|
>     +-----------+--------+--------+--------+-------+--------+-------------
> +
>     |without    |        |        |        |       |        |
> |
>     |recvbitmap | >138333|     N/A| 1647701| 981899|      43|
> 21018|
>     +-----------+--------+--------+--------+-------+--------+-------------
> +
> 
> 
> From the test result, both of page faults and IOTLB Flush operations can
> be significantly reduced. The reason is that zero page processing does not
> trigger read faults, and a large number of zero pages do not even trigger
> write faults (Test 3.1), because it is considered that after the
> destination
> is started, the content of unaccessed pages is 0.
> 
> I have a concern here, the RAM memory is allocated by mmap with anonymous
> flag, and if the first received zero page is not set to 0 explicitly, does
> this ensure that the received zero pages memory data is 0?

I got the answer here
MAP_ANONYMOUS
The mapping is not backed by any file; its contents are initialized to zero.  The fd argument is ignored; however, some implementations require fd to be -1 if MAP_ANONYMOUS (or MAP_ANON) is specified, and porta�\
ble applications should ensure this.  The offset argument should be zero.  The use of MAP_ANONYMOUS in conjunction with MAP_SHARED is supported on Linux only since kernel 2.4.
 
> In this case, the performance impact of live migration is not big
> because the destination is not the bottleneck.
> 
> When using QPL (SVM-capable device), even if IOTLB is improved, the
> overall performance will still be seriously degraded because a large
> number of IO page faults are still generated.
> 
> Previous discussion link:
> 1.
> https://lore.kernel.org/all/CAAYibXib+TWnJpV22E=adncdBmwXJRqgRjJXK7X71J=bD
> faxDg@mail.gmail.com/
> 2.
> https://lore.kernel.org/all/PH7PR11MB594123F7EEFEBFCE219AF100A33A2@PH7PR11
> MB5941.namprd11.prod.outlook.com/
> 
> Yuan Liu (1):
>   migration/multifd: solve zero page causing multiple page faults
> 
>  migration/multifd-zero-page.c | 4 +++-
>  migration/multifd-zlib.c      | 1 +
>  migration/multifd-zstd.c      | 1 +
>  migration/multifd.c           | 1 +
>  migration/ram.c               | 4 ++++
>  migration/ram.h               | 1 +
>  6 files changed, 11 insertions(+), 1 deletion(-)
> 
> --
> 2.39.3