migration/migration-hmp-cmds.c | 11 ++++ migration/multifd.c | 106 ++++++++++++++++++++++++++++----- migration/multifd.h | 22 ++++++- migration/options.c | 20 +++++++ migration/options.h | 1 + migration/ram.c | 49 ++++++++++++--- migration/trace-events | 8 +-- qapi/migration.json | 39 ++++++++++-- tests/qtest/migration-test.c | 26 ++++++++ 9 files changed, 249 insertions(+), 33 deletions(-)
This patchset is based on Juan Quintela's old series here https://lore.kernel.org/all/20220802063907.18882-1-quintela@redhat.com/ In the multifd live migration model, there is a single migration main thread scanning the page map, queuing the pages to multiple multifd sender threads. The migration main thread runs zero page checking on every page before queuing the page to the sender threads. Zero page checking is a CPU intensive task and hence having a single thread doing all that doesn't scale well. This change introduces a new function to run the zero page checking on the multifd sender threads. This patchset also lays the ground work for future changes to offload zero page checking task to accelerator hardwares. Use two Intel 4th generation Xeon servers for testing. Architecture: x86_64 CPU(s): 192 Thread(s) per core: 2 Core(s) per socket: 48 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 143 Model name: Intel(R) Xeon(R) Platinum 8457C Stepping: 8 CPU MHz: 2538.624 CPU max MHz: 3800.0000 CPU min MHz: 800.0000 Perform multifd live migration with below setup: 1. VM has 100GB memory. All pages in the VM are zero pages. 2. Use tcp socket for live migratio. 3. Use 4 multifd channels and zero page checking on migration main thread. 4. Use 1/2/4 multifd channels and zero page checking on multifd sender threads. 5. Record migration total time from sender QEMU console's "info migrate" command. 6. Calculate throughput with "100GB / total time". +------------------------------------------------------+ |zero-page-checking | total-time(ms) | throughput(GB/s)| +------------------------------------------------------+ |main-thread | 9629 | 10.38GB/s | +------------------------------------------------------+ |multifd-1-threads | 6182 | 16.17GB/s | +------------------------------------------------------+ |multifd-2-threads | 4643 | 21.53GB/s | +------------------------------------------------------+ |multifd-4-threads | 4143 | 24.13GB/s | +------------------------------------------------------+ Apply this patchset on top of commit 39a6e4f87e7b75a45b08d6dc8b8b7c2954c87440 Hao Xiang (6): migration/multifd: Add new migration option multifd-zero-page. migration/multifd: Add zero pages and zero bytes counter to migration status interface. migration/multifd: Support for zero pages transmission in multifd format. migration/multifd: Zero page transmission on the multifd thread. migration/multifd: Enable zero page checking from multifd threads. migration/multifd: Add a new migration test case for legacy zero page checking. migration/migration-hmp-cmds.c | 11 ++++ migration/multifd.c | 106 ++++++++++++++++++++++++++++----- migration/multifd.h | 22 ++++++- migration/options.c | 20 +++++++ migration/options.h | 1 + migration/ram.c | 49 ++++++++++++--- migration/trace-events | 8 +-- qapi/migration.json | 39 ++++++++++-- tests/qtest/migration-test.c | 26 ++++++++ 9 files changed, 249 insertions(+), 33 deletions(-) -- 2.30.2
On Tue, Feb 06, 2024 at 11:19:02PM +0000, Hao Xiang wrote: > This patchset is based on Juan Quintela's old series here > https://lore.kernel.org/all/20220802063907.18882-1-quintela@redhat.com/ > > In the multifd live migration model, there is a single migration main > thread scanning the page map, queuing the pages to multiple multifd > sender threads. The migration main thread runs zero page checking on > every page before queuing the page to the sender threads. Zero page > checking is a CPU intensive task and hence having a single thread doing > all that doesn't scale well. This change introduces a new function > to run the zero page checking on the multifd sender threads. This > patchset also lays the ground work for future changes to offload zero > page checking task to accelerator hardwares. > > Use two Intel 4th generation Xeon servers for testing. > > Architecture: x86_64 > CPU(s): 192 > Thread(s) per core: 2 > Core(s) per socket: 48 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 143 > Model name: Intel(R) Xeon(R) Platinum 8457C > Stepping: 8 > CPU MHz: 2538.624 > CPU max MHz: 3800.0000 > CPU min MHz: 800.0000 > > Perform multifd live migration with below setup: > 1. VM has 100GB memory. All pages in the VM are zero pages. > 2. Use tcp socket for live migratio. > 3. Use 4 multifd channels and zero page checking on migration main thread. > 4. Use 1/2/4 multifd channels and zero page checking on multifd sender > threads. > 5. Record migration total time from sender QEMU console's "info migrate" > command. > 6. Calculate throughput with "100GB / total time". > > +------------------------------------------------------+ > |zero-page-checking | total-time(ms) | throughput(GB/s)| > +------------------------------------------------------+ > |main-thread | 9629 | 10.38GB/s | > +------------------------------------------------------+ > |multifd-1-threads | 6182 | 16.17GB/s | > +------------------------------------------------------+ > |multifd-2-threads | 4643 | 21.53GB/s | > +------------------------------------------------------+ > |multifd-4-threads | 4143 | 24.13GB/s | > +------------------------------------------------------+ This "throughput" is slightly confusing; I was initially surprised to see a large throughput for idle guests. IMHO the "total-time" would explain. Feel free to drop that column if there's a repost. Did you check why 4 channels mostly already reached the top line? Is it because main thread is already spinning 100%? Thanks, -- Peter Xu
On Tue, Feb 6, 2024 at 7:39 PM Peter Xu <peterx@redhat.com> wrote: > > On Tue, Feb 06, 2024 at 11:19:02PM +0000, Hao Xiang wrote: > > This patchset is based on Juan Quintela's old series here > > https://lore.kernel.org/all/20220802063907.18882-1-quintela@redhat.com/ > > > > In the multifd live migration model, there is a single migration main > > thread scanning the page map, queuing the pages to multiple multifd > > sender threads. The migration main thread runs zero page checking on > > every page before queuing the page to the sender threads. Zero page > > checking is a CPU intensive task and hence having a single thread doing > > all that doesn't scale well. This change introduces a new function > > to run the zero page checking on the multifd sender threads. This > > patchset also lays the ground work for future changes to offload zero > > page checking task to accelerator hardwares. > > > > Use two Intel 4th generation Xeon servers for testing. > > > > Architecture: x86_64 > > CPU(s): 192 > > Thread(s) per core: 2 > > Core(s) per socket: 48 > > Socket(s): 2 > > NUMA node(s): 2 > > Vendor ID: GenuineIntel > > CPU family: 6 > > Model: 143 > > Model name: Intel(R) Xeon(R) Platinum 8457C > > Stepping: 8 > > CPU MHz: 2538.624 > > CPU max MHz: 3800.0000 > > CPU min MHz: 800.0000 > > > > Perform multifd live migration with below setup: > > 1. VM has 100GB memory. All pages in the VM are zero pages. > > 2. Use tcp socket for live migratio. > > 3. Use 4 multifd channels and zero page checking on migration main thread. > > 4. Use 1/2/4 multifd channels and zero page checking on multifd sender > > threads. > > 5. Record migration total time from sender QEMU console's "info migrate" > > command. > > 6. Calculate throughput with "100GB / total time". > > > > +------------------------------------------------------+ > > |zero-page-checking | total-time(ms) | throughput(GB/s)| > > +------------------------------------------------------+ > > |main-thread | 9629 | 10.38GB/s | > > +------------------------------------------------------+ > > |multifd-1-threads | 6182 | 16.17GB/s | > > +------------------------------------------------------+ > > |multifd-2-threads | 4643 | 21.53GB/s | > > +------------------------------------------------------+ > > |multifd-4-threads | 4143 | 24.13GB/s | > > +------------------------------------------------------+ > > This "throughput" is slightly confusing; I was initially surprised to see a > large throughput for idle guests. IMHO the "total-time" would explain. > Feel free to drop that column if there's a repost. > > Did you check why 4 channels mostly already reached the top line? Is it > because main thread is already spinning 100%? > > Thanks, > > -- > Peter Xu Sure I will drop "throughput" to avoid confusion. In my testing, 1 multifd channel already makes the main thread spin at 100%. So the total-time is the same across 1/2/4 multifd channels as long as zero page is run on the main migration thread. Of course, this is based on the fact that the network is not the bottleneck. One interesting finding is that multifd 1 channel with multifd zero page has better performance than multifd 1 channel with main migration thread. >
On Wed, Feb 07, 2024 at 04:47:27PM -0800, Hao Xiang wrote: > Sure I will drop "throughput" to avoid confusion. In my testing, 1 > multifd channel already makes the main thread spin at 100%. So the > total-time is the same across 1/2/4 multifd channels as long as zero > page is run on the main migration thread. Of course, this is based on > the fact that the network is not the bottleneck. One interesting > finding is that multifd 1 channel with multifd zero page has better > performance than multifd 1 channel with main migration thread. It's probably because the main thread has even more works to do than "detecting zero page" alone. When zero detection is done in main thread and when the guest is fully idle, it'll consume a major portion of main thread cpu resource scanning those pages already. Consider all pages zero, multifd threads should be fully idle, so n_channels may not matter here. When 1 multifd thread created with zero-page offloading, zero page is fully offloaded from main -> multifd thread even if only one. It's kind of a similar effect of forking the main thread into two threads, so the main thread can be more efficient on other tasks (fetching/scanning dirty bits, etc.). Thanks, -- Peter Xu
© 2016 - 2024 Red Hat, Inc.