> |
> -----Original Message----- > From: Liu, Yuan1 <yuan1.liu@intel.com> > Sent: Monday, April 1, 2024 11:41 PM > To: peterx@redhat.com; farosas@suse.de > Cc: qemu-devel@nongnu.org; hao.xiang@bytedance.com; > bryan.zhang@bytedance.com; Liu, Yuan1 <yuan1.liu@intel.com>; Zou, Nanhai > <nanhai.zou@intel.com> > Subject: [PATCH 0/1] Solve zero page causing multiple page faults > > 1. Description of multiple page faults for received zero pages > a. -mem-prealloc feature and hugepage backend are not enabled on the > destination > b. After receiving the zero pages, the destination first determines if > the current page content is 0 via buffer_is_zero, this may cause a > read page fault > > perf record -e page-faults information below > 13.75% 13.75% multifdrecv_0 qemu-system-x86_64 [.] > buffer_zero_avx512 > 11.85% 11.85% multifdrecv_1 qemu-system-x86_64 [.] > buffer_zero_avx512 > multifd_recv_thread > nocomp_recv > multifd_recv_zero_page_process > buffer_is_zero > select_accel_fn > buffer_zero_avx512 > > c. Other page faults mainly come from writing operations to normal and > zero pages. > > 2. Solution > a. During the multifd migration process, the received pages are > tracked > through RAMBlock's receivedmap. > > b. If received zero page is not set in recvbitmap, the destination > will not > check whether the page content is 0, thus avoiding the occurrence > of > read fault. > > c. If the zero page has been set in receivedmap, set the page with 0 > directly. > > There are two reasons for this > 1. It's unlikely a zero page if it's sent once or more. > 2. For the 1st time destination received a zero page, it must be a > zero > page, so no need to scan for the 1st round. > > 3. Test Result 16 vCPUs and 64G memory VM, multifd number is 2, > and 100G network bandwidth > > 3.1 Test case: 16 vCPUs are idle and only 2G memory are used > +-----------+--------+--------+----------+ > |MultiFD | total |downtime| Page | > |Nocomp | time | | Faults | > | | (ms) | (ms) | | > +-----------+--------+--------+----------+ > |with | | | | > |recvbitmap | 7335| 180| 2716| > +-----------+--------+--------+----------+ > |without | | | | > |recvbitmap | 7771| 153| 121357| > +-----------+--------+--------+----------+ > > +-----------+--------+--------+--------+-------+--------+------------- > + > |MultiFD | total |downtime| SVM |SVM | IOTLB | IO > PageFault| > |QPL | time | | IO TLB |IO Page| MaxTime| MaxTime > | > | | (ms) | (ms) | Flush |Faults | (us) | (us) > | > +-----------+--------+--------+--------+-------+--------+------------- > + > |with | | | | | | > | > |recvbitmap | 10224| 175| 410| 27429| 1| > 447| > +-----------+--------+--------+--------+-------+--------+------------- > + > |without | | | | | | > | > |recvbitmap | 11253| 153| 80756| 38655| 25| > 18349| > +-----------+--------+--------+--------+-------+--------+------------- > + > > > 3.2 Test case: 16 vCPUs are idle and 56G memory(not zero) are used > +-----------+--------+--------+----------+ > |MultiFD | total |downtime| Page | > |Nocomp | time | | Faults | > | | (ms) | (ms) | | > +-----------+--------+--------+----------+ > |with | | | | > |recvbitmap | 16825| 165| 52967| > +-----------+--------+--------+----------+ > |without | | | | > |recvbitmap | 12987| 159| 2672677| > +-----------+--------+--------+----------+ > > +-----------+--------+--------+--------+-------+--------+------------- > + > |MultiFD | total |downtime| SVM |SVM | IOTLB | IO > PageFault| > |QPL | time | | IO TLB |IO Page| MaxTime| MaxTime > | > | | (ms) | (ms) | Flush |Faults | (us) | (us) > | > +-----------+--------+--------+--------+-------+--------+------------- > + > |with | | | | | | > | > |recvbitmap | 132315| 77| 890| 937105| 60| > 9581| > +-----------+--------+--------+--------+-------+--------+------------- > + > |without | | | | | | > | > |recvbitmap | >138333| N/A| 1647701| 981899| 43| > 21018| > +-----------+--------+--------+--------+-------+--------+------------- > + > > > From the test result, both of page faults and IOTLB Flush operations can > be significantly reduced. The reason is that zero page processing does not > trigger read faults, and a large number of zero pages do not even trigger > write faults (Test 3.1), because it is considered that after the > destination > is started, the content of unaccessed pages is 0. > > I have a concern here, the RAM memory is allocated by mmap with anonymous > flag, and if the first received zero page is not set to 0 explicitly, does > this ensure that the received zero pages memory data is 0? I got the answer here MAP_ANONYMOUS The mapping is not backed by any file; its contents are initialized to zero. The fd argument is ignored; however, some implementations require fd to be -1 if MAP_ANONYMOUS (or MAP_ANON) is specified, and porta�\ ble applications should ensure this. The offset argument should be zero. The use of MAP_ANONYMOUS in conjunction with MAP_SHARED is supported on Linux only since kernel 2.4. > In this case, the performance impact of live migration is not big > because the destination is not the bottleneck. > > When using QPL (SVM-capable device), even if IOTLB is improved, the > overall performance will still be seriously degraded because a large > number of IO page faults are still generated. > > Previous discussion link: > 1. > https://lore.kernel.org/all/CAAYibXib+TWnJpV22E=adncdBmwXJRqgRjJXK7X71J=bD > faxDg@mail.gmail.com/ > 2. > https://lore.kernel.org/all/PH7PR11MB594123F7EEFEBFCE219AF100A33A2@PH7PR11 > MB5941.namprd11.prod.outlook.com/ > > Yuan Liu (1): > migration/multifd: solve zero page causing multiple page faults > > migration/multifd-zero-page.c | 4 +++- > migration/multifd-zlib.c | 1 + > migration/multifd-zstd.c | 1 + > migration/multifd.c | 1 + > migration/ram.c | 4 ++++ > migration/ram.h | 1 + > 6 files changed, 11 insertions(+), 1 deletion(-) > > -- > 2.39.3
© 2016 - 2024 Red Hat, Inc.