1 | CI: https://gitlab.com/peterx/qemu/-/pipelines/1575970314 | 1 | CI: https://gitlab.com/peterx/qemu/-/pipelines/1577280033 |
---|---|---|---|
2 | (note: it's a pipeline of two patchsets, to save CI credits and time) | ||
2 | 3 | ||
3 | Comparing to v1, this series v2 now contains some patches that may be | 4 | v1: https://lore.kernel.org/r/20241205185303.897010-1-peterx@redhat.com |
4 | helpful for either VFIO or postcopy integration on top of multifd. | 5 | v2: https://lore.kernel.org/r/20241206005834.1050905-1-peterx@redhat.com |
5 | 6 | ||
6 | For VFIO, only patch 1 & 2 are relevant. This is the only part that can be | 7 | v3 changelog: |
7 | compared to v1, where I renamed the sync flag to LOCAL and ALL according to | 8 | - R-bs collected |
8 | Fabiano's comment. | 9 | - Update commit message of patch 1 [Fabiano] |
10 | - English updates [Fabiano] | ||
11 | - Update comment for MULTIFD_SYNC_ALL [Fabiano] | ||
12 | - In multifd_send_sync_main(), assert on req type [Fabiano] | ||
13 | - Some more comments and cleanup for RAM_SAVE_FLAG_* movement [Fabiano] | ||
14 | - Update the last document patch [Fabiano] | ||
9 | 15 | ||
10 | For postcopy, it's about patches 3-7, but it needs to be based on 1+2. | 16 | This series provides some changes that may be helpful for either VFIO or |
17 | postcopy integration on top of multifd. | ||
11 | 18 | ||
12 | I put them together because the two changes are relevant on the multifd | 19 | For VFIO, only patches 1 & 2 are relevant. |
13 | sync operation, one must depend on another in some form. So I decided to | ||
14 | send them together here. All these patches can be seen as cleanups / | ||
15 | slight optimizations on top of master branch. | ||
16 | 20 | ||
17 | This time I did more test. Besides CI, qtests, and some real-world multifd | 21 | For postcopy, it's about patches 3-7, but it needs to be based on 1+2 |
18 | tests just to monitor the sync events happen all correct, I made sure to | 22 | because of a context dependency. |
19 | cover 7.2 machine type (which uses the legacy sync) so it still works as | ||
20 | before - basically sync will be more frequent, but all thing keeps working | ||
21 | smoothly so far. | ||
22 | 23 | ||
23 | Fabiano, let me know what do you think comparing to the other patch [1] on | 24 | All these patches can be seen as cleanups / slight optimizations on top of |
24 | simplifying the flush checks. I'm open to any comments. | 25 | master branch with/without the VFIO/postcopy work. |
25 | 26 | ||
26 | Thanks, | 27 | Besides CI, qtests, and some real-world multifd tests just to monitor the |
27 | 28 | sync events happen all correct, I made sure to cover 7.2 machine type | |
28 | [1] https://lore.kernel.org/r/875xo8n4ue.fsf@suse.de | 29 | (which uses the legacy sync) so it still works as before - basically sync |
30 | will be more frequent, but all thing keeps working smoothly so far. | ||
29 | 31 | ||
30 | Thanks, | 32 | Thanks, |
31 | 33 | ||
32 | Peter Xu (7): | 34 | Peter Xu (7): |
33 | migration/multifd: Further remove the SYNC on complete | 35 | migration/multifd: Further remove the SYNC on complete |
... | ... | ||
36 | migration/multifd: Unify RAM_SAVE_FLAG_MULTIFD_FLUSH messages | 38 | migration/multifd: Unify RAM_SAVE_FLAG_MULTIFD_FLUSH messages |
37 | migration/multifd: Remove sync processing on postcopy | 39 | migration/multifd: Remove sync processing on postcopy |
38 | migration/multifd: Cleanup src flushes on condition check | 40 | migration/multifd: Cleanup src flushes on condition check |
39 | migration/multifd: Document the reason to sync for save_setup() | 41 | migration/multifd: Document the reason to sync for save_setup() |
40 | 42 | ||
41 | migration/multifd.h | 23 ++++++++-- | 43 | migration/multifd.h | 27 ++++++++++-- |
42 | migration/ram.h | 25 +++++++++++ | 44 | migration/ram.h | 28 ++++++++++++ |
43 | migration/rdma.h | 7 --- | 45 | migration/rdma.h | 7 --- |
44 | migration/multifd-nocomp.c | 74 ++++++++++++++++++++++++++++++- | 46 | migration/multifd-nocomp.c | 74 ++++++++++++++++++++++++++++++- |
45 | migration/multifd.c | 15 ++++--- | 47 | migration/multifd.c | 17 +++++--- |
46 | migration/ram.c | 91 +++++++++++++++++--------------------- | 48 | migration/ram.c | 89 +++++++++++++++++--------------------- |
47 | 6 files changed, 166 insertions(+), 69 deletions(-) | 49 | 6 files changed, 173 insertions(+), 69 deletions(-) |
48 | 50 | ||
49 | -- | 51 | -- |
50 | 2.47.0 | 52 | 2.47.0 | diff view generated by jsdifflib |
1 | Commit 637280aeb2 ("migration/multifd: Avoid the final FLUSH in | 1 | Commit 637280aeb2 ("migration/multifd: Avoid the final FLUSH in |
---|---|---|---|
2 | complete()") removed the FLUSH operation on complete() which should avoid | 2 | complete()") stopped sending the RAM_SAVE_FLAG_MULTIFD_FLUSH flag at |
3 | one global sync on destination side, because it's not needed. | 3 | ram_save_complete(), because the sync on the destination side is not |
4 | needed due to the last iteration of find_dirty_block() having already | ||
5 | done it. | ||
4 | 6 | ||
5 | However that commit overlooked multifd_ram_flush_and_sync() part of things, | 7 | However, that commit overlooked that multifd_ram_flush_and_sync() on the |
6 | as that's always used together with the FLUSH message on the main channel. | 8 | source side is also not needed at ram_save_complete(), for the same |
9 | reason. | ||
7 | 10 | ||
8 | For very old binaries (multifd_flush_after_each_section==true), that's | 11 | Moreover, removing the RAM_SAVE_FLAG_MULTIFD_FLUSH but keeping the |
9 | still needed because each EOS received on destination will enforce | 12 | multifd_ram_flush_and_sync() means that currently the recv threads will |
10 | all-channel sync once. | 13 | hang when receiving the MULTIFD_FLAG_SYNC message, waiting for the |
14 | destination sync which only happens when RAM_SAVE_FLAG_MULTIFD_FLUSH is | ||
15 | received. | ||
11 | 16 | ||
12 | For new binaries (multifd_flush_after_each_section==false), the flush and | 17 | Luckily, multifd is still all working fine because recv side cleanup |
13 | sync shouldn't be needed just like the FLUSH, with the same reasoning. | 18 | code (mostly multifd_recv_sync_main()) is smart enough to make sure even |
19 | if recv threads are stuck at SYNC it'll get kicked out. And since this | ||
20 | is the completion phase of migration, nothing else will be sent after | ||
21 | the SYNCs. | ||
14 | 22 | ||
15 | Currently, on new binaries we'll still send SYNC messages on multifd | 23 | This needs to be fixed because in the future VFIO will have data to push |
16 | channels, even if FLUSH is omitted at the end. It'll make all recv threads | 24 | after ram_save_complete() and we don't want the recv thread to be stuck |
17 | hang at SYNC message. | 25 | in the MULTIFD_FLAG_SYNC message. |
18 | 26 | ||
19 | Multifd is still all working fine because luckily recv side cleanup | 27 | Remove the unnecessary (and buggy) invocation of |
20 | code (mostly multifd_recv_sync_main()) is smart enough to make sure even if | 28 | multifd_ram_flush_and_sync(). |
21 | recv threads are stuck at SYNC it'll get kicked out. And since this is the | ||
22 | completion phase of migration, nothing else will be sent after the SYNCs. | ||
23 | 29 | ||
24 | This may be needed for VFIO in the future because VFIO can have data to | 30 | For very old binaries (multifd_flush_after_each_section==true), the |
25 | push after ram_save_complete(), hence we don't want the recv thread to be | 31 | flush_and_sync is still needed because each EOS received on destination |
26 | stuck in SYNC message. Remove this limitation will make src even slightly | 32 | will enforce all-channel sync once. |
27 | faster too to avoid some more code. | ||
28 | 33 | ||
29 | Stable branches do not need this patch, as no real bug I can think of that | 34 | Stable branches do not need this patch, as no real bug I can think of |
30 | will go wrong there.. so not attaching Fixes to be clear on the backport | 35 | that will go wrong there.. so not attaching Fixes to be clear on the |
31 | not needed. | 36 | backport not needed. |
32 | 37 | ||
38 | Reviewed-by: Fabiano Rosas <farosas@suse.de> | ||
33 | Signed-off-by: Peter Xu <peterx@redhat.com> | 39 | Signed-off-by: Peter Xu <peterx@redhat.com> |
34 | --- | 40 | --- |
35 | migration/ram.c | 13 ++++++++++--- | 41 | migration/ram.c | 13 ++++++++++--- |
36 | 1 file changed, 10 insertions(+), 3 deletions(-) | 42 | 1 file changed, 10 insertions(+), 3 deletions(-) |
37 | 43 | ||
... | ... | diff view generated by jsdifflib |
1 | Teach multifd_send_sync_main() to sync with threads only. | 1 | Teach multifd_send_sync_main() to sync with threads only. |
---|---|---|---|
2 | 2 | ||
3 | We already have such requests, which is when mapped-ram is enabled with | 3 | We already have such requests, which is when mapped-ram is enabled with |
4 | multifd. In that case, no SYNC messages will be pushed to the stream when | 4 | multifd. In that case, no SYNC messages will be pushed to the stream when |
5 | multifd syncs the sender threads because there's no destination threads | 5 | multifd syncs the sender threads because there's no destination threads |
6 | waiting for that. The whole point of the sync is to make sure all threads | 6 | waiting for that. The whole point of the sync is to make sure all threads |
7 | flushed their jobs. | 7 | finished their jobs. |
8 | 8 | ||
9 | So fundamentally we have a request to do the sync in different ways: | 9 | So fundamentally we have a request to do the sync in different ways: |
10 | 10 | ||
11 | - Either to sync the threads only, | 11 | - Either to sync the threads only, |
12 | - Or to sync the threads but also with the destination side. | 12 | - Or to sync the threads but also with the destination side. |
... | ... | ||
24 | 24 | ||
25 | No functional change intended. | 25 | No functional change intended. |
26 | 26 | ||
27 | Signed-off-by: Peter Xu <peterx@redhat.com> | 27 | Signed-off-by: Peter Xu <peterx@redhat.com> |
28 | --- | 28 | --- |
29 | migration/multifd.h | 19 ++++++++++++++++--- | 29 | migration/multifd.h | 23 ++++++++++++++++++++--- |
30 | migration/multifd-nocomp.c | 7 ++++++- | 30 | migration/multifd-nocomp.c | 7 ++++++- |
31 | migration/multifd.c | 15 +++++++++------ | 31 | migration/multifd.c | 17 +++++++++++------ |
32 | 3 files changed, 31 insertions(+), 10 deletions(-) | 32 | 3 files changed, 37 insertions(+), 10 deletions(-) |
33 | 33 | ||
34 | diff --git a/migration/multifd.h b/migration/multifd.h | 34 | diff --git a/migration/multifd.h b/migration/multifd.h |
35 | index XXXXXXX..XXXXXXX 100644 | 35 | index XXXXXXX..XXXXXXX 100644 |
36 | --- a/migration/multifd.h | 36 | --- a/migration/multifd.h |
37 | +++ b/migration/multifd.h | 37 | +++ b/migration/multifd.h |
... | ... | ||
43 | + /* No sync request */ | 43 | + /* No sync request */ |
44 | + MULTIFD_SYNC_NONE = 0, | 44 | + MULTIFD_SYNC_NONE = 0, |
45 | + /* Sync locally on the sender threads without pushing messages */ | 45 | + /* Sync locally on the sender threads without pushing messages */ |
46 | + MULTIFD_SYNC_LOCAL, | 46 | + MULTIFD_SYNC_LOCAL, |
47 | + /* | 47 | + /* |
48 | + * Sync not only on the sender threads, but also push "SYNC" message to | 48 | + * Sync not only on the sender threads, but also push MULTIFD_FLAG_SYNC |
49 | + * the wire (which is for a remote sync). | 49 | + * message to the wire for each iochannel (which is for a remote sync). |
50 | + * | ||
51 | + * When remote sync is used, need to be paired with a follow up | ||
52 | + * RAM_SAVE_FLAG_EOS / RAM_SAVE_FLAG_MULTIFD_FLUSH message on the main | ||
53 | + * channel. | ||
50 | + */ | 54 | + */ |
51 | + MULTIFD_SYNC_ALL, | 55 | + MULTIFD_SYNC_ALL, |
52 | +} MultiFDSyncReq; | 56 | +} MultiFDSyncReq; |
53 | + | 57 | + |
54 | bool multifd_send_setup(void); | 58 | bool multifd_send_setup(void); |
... | ... | ||
118 | -int multifd_send_sync_main(void) | 122 | -int multifd_send_sync_main(void) |
119 | +int multifd_send_sync_main(MultiFDSyncReq req) | 123 | +int multifd_send_sync_main(MultiFDSyncReq req) |
120 | { | 124 | { |
121 | int i; | 125 | int i; |
122 | bool flush_zero_copy; | 126 | bool flush_zero_copy; |
127 | |||
128 | + assert(req != MULTIFD_SYNC_NONE); | ||
129 | + | ||
130 | flush_zero_copy = migrate_zero_copy_send(); | ||
131 | |||
132 | for (i = 0; i < migrate_multifd_channels(); i++) { | ||
123 | @@ -XXX,XX +XXX,XX @@ int multifd_send_sync_main(void) | 133 | @@ -XXX,XX +XXX,XX @@ int multifd_send_sync_main(void) |
124 | * We should be the only user so far, so not possible to be set by | 134 | * We should be the only user so far, so not possible to be set by |
125 | * others concurrently. | 135 | * others concurrently. |
126 | */ | 136 | */ |
127 | - assert(qatomic_read(&p->pending_sync) == false); | 137 | - assert(qatomic_read(&p->pending_sync) == false); |
... | ... | diff view generated by jsdifflib |
1 | Firstly, we're going to use the multifd flag soon in multifd code, so ram.c | 1 | Firstly, we're going to use the multifd flag soon in multifd code, so ram.c |
---|---|---|---|
2 | isn't gonna work. | 2 | isn't gonna work. |
3 | 3 | ||
4 | Secondly, we have a separate RDMA flag dangling around, which is definitely | 4 | Secondly, we have a separate RDMA flag dangling around, which is definitely |
5 | not obvious. There's one comment that helps, but not too much. | 5 | not obvious. There's one comment that helps, but not too much. |
6 | 6 | ||
7 | We should just put it altogether, so nothing will get overlooked. | 7 | Put all RAM save flags altogether, so nothing will get overlooked. |
8 | 8 | ||
9 | Add a section explain why we can't use bits over 0x200. | ||
10 | |||
11 | Remove RAM_SAVE_FLAG_FULL as it's already not used in QEMU, as the comment | ||
12 | explained. | ||
13 | |||
14 | Reviewed-by: Fabiano Rosas <farosas@suse.de> | ||
9 | Signed-off-by: Peter Xu <peterx@redhat.com> | 15 | Signed-off-by: Peter Xu <peterx@redhat.com> |
10 | --- | 16 | --- |
11 | migration/ram.h | 25 +++++++++++++++++++++++++ | 17 | migration/ram.h | 28 ++++++++++++++++++++++++++++ |
12 | migration/rdma.h | 7 ------- | 18 | migration/rdma.h | 7 ------- |
13 | migration/ram.c | 21 --------------------- | 19 | migration/ram.c | 21 --------------------- |
14 | 3 files changed, 25 insertions(+), 28 deletions(-) | 20 | 3 files changed, 28 insertions(+), 28 deletions(-) |
15 | 21 | ||
16 | diff --git a/migration/ram.h b/migration/ram.h | 22 | diff --git a/migration/ram.h b/migration/ram.h |
17 | index XXXXXXX..XXXXXXX 100644 | 23 | index XXXXXXX..XXXXXXX 100644 |
18 | --- a/migration/ram.h | 24 | --- a/migration/ram.h |
19 | +++ b/migration/ram.h | 25 | +++ b/migration/ram.h |
... | ... | ||
25 | + * RAM_SAVE_FLAG_ZERO used to be named RAM_SAVE_FLAG_COMPRESS, it | 31 | + * RAM_SAVE_FLAG_ZERO used to be named RAM_SAVE_FLAG_COMPRESS, it |
26 | + * worked for pages that were filled with the same char. We switched | 32 | + * worked for pages that were filled with the same char. We switched |
27 | + * it to only search for the zero value. And to avoid confusion with | 33 | + * it to only search for the zero value. And to avoid confusion with |
28 | + * RAM_SAVE_FLAG_COMPRESS_PAGE just rename it. | 34 | + * RAM_SAVE_FLAG_COMPRESS_PAGE just rename it. |
29 | + * | 35 | + * |
30 | + * RAM_SAVE_FLAG_FULL was obsoleted in 2009. | 36 | + * RAM_SAVE_FLAG_FULL (0x01) was obsoleted in 2009. |
31 | + * | 37 | + * |
32 | + * RAM_SAVE_FLAG_COMPRESS_PAGE (0x100) was removed in QEMU 9.1. | 38 | + * RAM_SAVE_FLAG_COMPRESS_PAGE (0x100) was removed in QEMU 9.1. |
39 | + * | ||
40 | + * RAM_SAVE_FLAG_HOOK is only used in RDMA. Whenever this is found in the | ||
41 | + * data stream, the flags will be passed to rdma functions in the | ||
42 | + * incoming-migration side. | ||
43 | + * | ||
44 | + * We can't use any flag that is bigger than 0x200, because the flags are | ||
45 | + * always assumed to be encoded in a ramblock address offset, which is | ||
46 | + * multiple of PAGE_SIZE. Here it means QEMU supports migration with any | ||
47 | + * architecture that has PAGE_SIZE>=1K (0x400). | ||
33 | + */ | 48 | + */ |
34 | +#define RAM_SAVE_FLAG_FULL 0x01 | 49 | +#define RAM_SAVE_FLAG_ZERO 0x002 |
35 | +#define RAM_SAVE_FLAG_ZERO 0x02 | 50 | +#define RAM_SAVE_FLAG_MEM_SIZE 0x004 |
36 | +#define RAM_SAVE_FLAG_MEM_SIZE 0x04 | 51 | +#define RAM_SAVE_FLAG_PAGE 0x008 |
37 | +#define RAM_SAVE_FLAG_PAGE 0x08 | 52 | +#define RAM_SAVE_FLAG_EOS 0x010 |
38 | +#define RAM_SAVE_FLAG_EOS 0x10 | 53 | +#define RAM_SAVE_FLAG_CONTINUE 0x020 |
39 | +#define RAM_SAVE_FLAG_CONTINUE 0x20 | 54 | +#define RAM_SAVE_FLAG_XBZRLE 0x040 |
40 | +#define RAM_SAVE_FLAG_XBZRLE 0x40 | 55 | +#define RAM_SAVE_FLAG_HOOK 0x080 |
41 | +/* | 56 | +#define RAM_SAVE_FLAG_MULTIFD_FLUSH 0x200 |
42 | + * ONLY USED IN RDMA: Whenever this is found in the data stream, the flags | ||
43 | + * will be passed to rdma functions in the incoming-migration side. | ||
44 | + */ | ||
45 | +#define RAM_SAVE_FLAG_HOOK 0x80 | ||
46 | +#define RAM_SAVE_FLAG_MULTIFD_FLUSH 0x200 | ||
47 | +/* We can't use any flag that is bigger than 0x200 */ | ||
48 | + | 57 | + |
49 | extern XBZRLECacheStats xbzrle_counters; | 58 | extern XBZRLECacheStats xbzrle_counters; |
50 | 59 | ||
51 | /* Should be holding either ram_list.mutex, or the RCU lock. */ | 60 | /* Should be holding either ram_list.mutex, or the RCU lock. */ |
52 | diff --git a/migration/rdma.h b/migration/rdma.h | 61 | diff --git a/migration/rdma.h b/migration/rdma.h |
... | ... | diff view generated by jsdifflib |
1 | RAM_SAVE_FLAG_MULTIFD_FLUSH message should always be correlated to a sync | 1 | RAM_SAVE_FLAG_MULTIFD_FLUSH message should always be correlated to a sync |
---|---|---|---|
2 | request on src. Unify such message into one place, and conditionally send | 2 | request on src. Unify such message into one place, and conditionally send |
3 | the message only if necessary. | 3 | the message only if necessary. |
4 | 4 | ||
5 | Reviewed-by: Fabiano Rosas <farosas@suse.de> | ||
5 | Signed-off-by: Peter Xu <peterx@redhat.com> | 6 | Signed-off-by: Peter Xu <peterx@redhat.com> |
6 | --- | 7 | --- |
7 | migration/multifd.h | 2 +- | 8 | migration/multifd.h | 2 +- |
8 | migration/multifd-nocomp.c | 27 +++++++++++++++++++++++++-- | 9 | migration/multifd-nocomp.c | 27 +++++++++++++++++++++++++-- |
9 | migration/ram.c | 18 ++++-------------- | 10 | migration/ram.c | 18 ++++-------------- |
... | ... | diff view generated by jsdifflib |
1 | Multifd never worked with postcopy, at least yet so far. | 1 | Multifd never worked with postcopy, at least yet so far. |
---|---|---|---|
2 | 2 | ||
3 | Remove the sync processing there, because it's confusing, and they should | 3 | Remove the sync processing there, because it's confusing, and they should |
4 | never appear. Now if RAM_SAVE_FLAG_MULTIFD_FLUSH is observed, we fail hard | 4 | never appear. Now if RAM_SAVE_FLAG_MULTIFD_FLUSH is observed, we fail hard |
5 | instead of trying to invoke multifd code. | 5 | instead of trying to invoke multifd code. |
6 | 6 | ||
7 | Reviewed-by: Fabiano Rosas <farosas@suse.de> | ||
7 | Signed-off-by: Peter Xu <peterx@redhat.com> | 8 | Signed-off-by: Peter Xu <peterx@redhat.com> |
8 | --- | 9 | --- |
9 | migration/ram.c | 8 -------- | 10 | migration/ram.c | 8 -------- |
10 | 1 file changed, 8 deletions(-) | 11 | 1 file changed, 8 deletions(-) |
11 | 12 | ||
... | ... | diff view generated by jsdifflib |
1 | The src flush condition check is over complicated, and it's getting more | 1 | The src flush condition check is over complicated, and it's getting more |
---|---|---|---|
2 | out of control if postcopy will be involved. | 2 | out of control if postcopy will be involved. |
3 | 3 | ||
4 | In general, we have two modes to do the sync: legacy or modern ways. | 4 | In general, we have two modes to do the sync: legacy or modern ways. |
5 | Legacy uses per-section flush, modern uses per-round flush. | 5 | Legacy uses per-section flush, modern uses per-round flush. |
6 | 6 | ||
7 | Mapped-ram always uses the modern, which is per-round. | 7 | Mapped-ram always uses the modern, which is per-round. |
8 | 8 | ||
9 | Introduce two helpers, which can greatly simplify the code, and hopefully | 9 | Introduce two helpers, which can greatly simplify the code, and hopefully |
10 | make it readable again. | 10 | make it readable again. |
11 | 11 | ||
12 | Signed-off-by: Peter Xu <peterx@redhat.com> | 12 | Signed-off-by: Peter Xu <peterx@redhat.com> |
13 | --- | 13 | --- |
14 | migration/multifd.h | 2 ++ | 14 | migration/multifd.h | 2 ++ |
15 | migration/multifd-nocomp.c | 42 ++++++++++++++++++++++++++++++++++++++ | 15 | migration/multifd-nocomp.c | 42 ++++++++++++++++++++++++++++++++++++++ |
16 | migration/ram.c | 10 +++------ | 16 | migration/ram.c | 10 +++------ |
17 | 3 files changed, 47 insertions(+), 7 deletions(-) | 17 | 3 files changed, 47 insertions(+), 7 deletions(-) |
18 | 18 | ||
19 | diff --git a/migration/multifd.h b/migration/multifd.h | 19 | diff --git a/migration/multifd.h b/migration/multifd.h |
20 | index XXXXXXX..XXXXXXX 100644 | 20 | index XXXXXXX..XXXXXXX 100644 |
21 | --- a/migration/multifd.h | 21 | --- a/migration/multifd.h |
22 | +++ b/migration/multifd.h | 22 | +++ b/migration/multifd.h |
23 | @@ -XXX,XX +XXX,XX @@ static inline uint32_t multifd_ram_page_count(void) | 23 | @@ -XXX,XX +XXX,XX @@ static inline uint32_t multifd_ram_page_count(void) |
24 | void multifd_ram_save_setup(void); | 24 | void multifd_ram_save_setup(void); |
25 | void multifd_ram_save_cleanup(void); | 25 | void multifd_ram_save_cleanup(void); |
26 | int multifd_ram_flush_and_sync(QEMUFile *f); | 26 | int multifd_ram_flush_and_sync(QEMUFile *f); |
27 | +bool multifd_ram_sync_per_round(void); | 27 | +bool multifd_ram_sync_per_round(void); |
28 | +bool multifd_ram_sync_per_section(void); | 28 | +bool multifd_ram_sync_per_section(void); |
29 | size_t multifd_ram_payload_size(void); | 29 | size_t multifd_ram_payload_size(void); |
30 | void multifd_ram_fill_packet(MultiFDSendParams *p); | 30 | void multifd_ram_fill_packet(MultiFDSendParams *p); |
31 | int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp); | 31 | int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp); |
32 | diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c | 32 | diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c |
33 | index XXXXXXX..XXXXXXX 100644 | 33 | index XXXXXXX..XXXXXXX 100644 |
34 | --- a/migration/multifd-nocomp.c | 34 | --- a/migration/multifd-nocomp.c |
35 | +++ b/migration/multifd-nocomp.c | 35 | +++ b/migration/multifd-nocomp.c |
36 | @@ -XXX,XX +XXX,XX @@ retry: | 36 | @@ -XXX,XX +XXX,XX @@ retry: |
37 | return true; | 37 | return true; |
38 | } | 38 | } |
39 | 39 | ||
40 | +/* | 40 | +/* |
41 | + * We have two modes for multifd flushes: | 41 | + * We have two modes for multifd flushes: |
42 | + * | 42 | + * |
43 | + * - Per-section mode: this is the legacy way to flush, it requires one | 43 | + * - Per-section mode: this is the legacy way to flush, it requires one |
44 | + * MULTIFD_FLAG_SYNC message for each RAM_SAVE_FLAG_EOS. | 44 | + * MULTIFD_FLAG_SYNC message for each RAM_SAVE_FLAG_EOS. |
45 | + * | 45 | + * |
46 | + * - Per-round mode: this is the modern way to flush, it requires one | 46 | + * - Per-round mode: this is the modern way to flush, it requires one |
47 | + * MULTIFD_FLAG_SYNC message only for each round of RAM scan. Normally | 47 | + * MULTIFD_FLAG_SYNC message only for each round of RAM scan. Normally |
48 | + * it's paired with a new RAM_SAVE_FLAG_MULTIFD_FLUSH message in network | 48 | + * it's paired with a new RAM_SAVE_FLAG_MULTIFD_FLUSH message in network |
49 | + * based migrations. | 49 | + * based migrations. |
50 | + * | 50 | + * |
51 | + * One thing to mention is mapped-ram always use the modern way to sync. | 51 | + * One thing to mention is mapped-ram always use the modern way to sync. |
52 | + */ | 52 | + */ |
53 | + | 53 | + |
54 | +/* Do we need a per-section multifd flush (legacy way)? */ | 54 | +/* Do we need a per-section multifd flush (legacy way)? */ |
55 | +bool multifd_ram_sync_per_section(void) | 55 | +bool multifd_ram_sync_per_section(void) |
56 | +{ | 56 | +{ |
57 | + if (!migrate_multifd()) { | 57 | + if (!migrate_multifd()) { |
58 | + return false; | 58 | + return false; |
59 | + } | 59 | + } |
60 | + | 60 | + |
61 | + if (migrate_mapped_ram()) { | 61 | + if (migrate_mapped_ram()) { |
62 | + return false; | 62 | + return false; |
63 | + } | 63 | + } |
64 | + | 64 | + |
65 | + return migrate_multifd_flush_after_each_section(); | 65 | + return migrate_multifd_flush_after_each_section(); |
66 | +} | 66 | +} |
67 | + | 67 | + |
68 | +/* Do we need a per-round multifd flush (modern way)? */ | 68 | +/* Do we need a per-round multifd flush (modern way)? */ |
69 | +bool multifd_ram_sync_per_round(void) | 69 | +bool multifd_ram_sync_per_round(void) |
70 | +{ | 70 | +{ |
71 | + if (!migrate_multifd()) { | 71 | + if (!migrate_multifd()) { |
72 | + return false; | 72 | + return false; |
73 | + } | 73 | + } |
74 | + | 74 | + |
75 | + if (migrate_mapped_ram()) { | 75 | + if (migrate_mapped_ram()) { |
76 | + return true; | 76 | + return true; |
77 | + } | 77 | + } |
78 | + | 78 | + |
79 | + return !migrate_multifd_flush_after_each_section(); | 79 | + return !migrate_multifd_flush_after_each_section(); |
80 | +} | 80 | +} |
81 | + | 81 | + |
82 | int multifd_ram_flush_and_sync(QEMUFile *f) | 82 | int multifd_ram_flush_and_sync(QEMUFile *f) |
83 | { | 83 | { |
84 | MultiFDSyncReq req; | 84 | MultiFDSyncReq req; |
85 | diff --git a/migration/ram.c b/migration/ram.c | 85 | diff --git a/migration/ram.c b/migration/ram.c |
86 | index XXXXXXX..XXXXXXX 100644 | 86 | index XXXXXXX..XXXXXXX 100644 |
87 | --- a/migration/ram.c | 87 | --- a/migration/ram.c |
88 | +++ b/migration/ram.c | 88 | +++ b/migration/ram.c |
89 | @@ -XXX,XX +XXX,XX @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss) | 89 | @@ -XXX,XX +XXX,XX @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss) |
90 | pss->page = 0; | 90 | pss->page = 0; |
91 | pss->block = QLIST_NEXT_RCU(pss->block, next); | 91 | pss->block = QLIST_NEXT_RCU(pss->block, next); |
92 | if (!pss->block) { | 92 | if (!pss->block) { |
93 | - if (migrate_multifd() && | 93 | - if (migrate_multifd() && |
94 | - (!migrate_multifd_flush_after_each_section() || | 94 | - (!migrate_multifd_flush_after_each_section() || |
95 | - migrate_mapped_ram())) { | 95 | - migrate_mapped_ram())) { |
96 | + if (multifd_ram_sync_per_round()) { | 96 | + if (multifd_ram_sync_per_round()) { |
97 | QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel; | 97 | QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel; |
98 | int ret = multifd_ram_flush_and_sync(f); | 98 | int ret = multifd_ram_flush_and_sync(f); |
99 | if (ret < 0) { | 99 | if (ret < 0) { |
100 | @@ -XXX,XX +XXX,XX @@ static int ram_save_iterate(QEMUFile *f, void *opaque) | 100 | @@ -XXX,XX +XXX,XX @@ static int ram_save_iterate(QEMUFile *f, void *opaque) |
101 | 101 | ||
102 | out: | 102 | out: |
103 | if (ret >= 0 && migration_is_running()) { | 103 | if (ret >= 0 && migration_is_running()) { |
104 | - if (migrate_multifd() && migrate_multifd_flush_after_each_section() && | 104 | - if (migrate_multifd() && migrate_multifd_flush_after_each_section() && |
105 | - !migrate_mapped_ram()) { | 105 | - !migrate_mapped_ram()) { |
106 | + if (multifd_ram_sync_per_section()) { | 106 | + if (multifd_ram_sync_per_section()) { |
107 | ret = multifd_ram_flush_and_sync(f); | 107 | ret = multifd_ram_flush_and_sync(f); |
108 | if (ret < 0) { | 108 | if (ret < 0) { |
109 | return ret; | 109 | return ret; |
110 | @@ -XXX,XX +XXX,XX @@ static int ram_save_complete(QEMUFile *f, void *opaque) | 110 | @@ -XXX,XX +XXX,XX @@ static int ram_save_complete(QEMUFile *f, void *opaque) |
111 | } | 111 | } |
112 | } | 112 | } |
113 | 113 | ||
114 | - if (migrate_multifd() && | 114 | - if (migrate_multifd() && |
115 | - migrate_multifd_flush_after_each_section()) { | 115 | - migrate_multifd_flush_after_each_section()) { |
116 | + if (multifd_ram_sync_per_section()) { | 116 | + if (multifd_ram_sync_per_section()) { |
117 | /* | 117 | /* |
118 | * Only the old dest QEMU will need this sync, because each EOS | 118 | * Only the old dest QEMU will need this sync, because each EOS |
119 | * will require one SYNC message on each channel. | 119 | * will require one SYNC message on each channel. |
120 | -- | 120 | -- |
121 | 2.47.0 | 121 | 2.47.0 | diff view generated by jsdifflib |
... | ... | ||
---|---|---|---|
13 | again and do the debug one more time, or anyone confused on why this ever | 13 | again and do the debug one more time, or anyone confused on why this ever |
14 | existed. | 14 | existed. |
15 | 15 | ||
16 | Signed-off-by: Peter Xu <peterx@redhat.com> | 16 | Signed-off-by: Peter Xu <peterx@redhat.com> |
17 | --- | 17 | --- |
18 | migration/ram.c | 27 +++++++++++++++++++++++++++ | 18 | migration/ram.c | 25 +++++++++++++++++++++++++ |
19 | 1 file changed, 27 insertions(+) | 19 | 1 file changed, 25 insertions(+) |
20 | 20 | ||
21 | diff --git a/migration/ram.c b/migration/ram.c | 21 | diff --git a/migration/ram.c b/migration/ram.c |
22 | index XXXXXXX..XXXXXXX 100644 | 22 | index XXXXXXX..XXXXXXX 100644 |
23 | --- a/migration/ram.c | 23 | --- a/migration/ram.c |
24 | +++ b/migration/ram.c | 24 | +++ b/migration/ram.c |
... | ... | ||
36 | + * per-channel to work. | 36 | + * per-channel to work. |
37 | + * | 37 | + * |
38 | + * For modern QEMUs using per-round sync | 38 | + * For modern QEMUs using per-round sync |
39 | + * ===================================== | 39 | + * ===================================== |
40 | + * | 40 | + * |
41 | + * Logically this sync is not needed (because we know there's nothing | 41 | + * Logically such sync is not needed, and recv threads should not run |
42 | + * in the multifd queue yet!). However as a side effect, this makes | 42 | + * until setup ready (using things like channels_ready on src). Then |
43 | + * sure the dest side won't receive any data before it properly reaches | 43 | + * we should be all fine. |
44 | + * ram_load_precopy(). | ||
45 | + * | 44 | + * |
46 | + * Without this sync, src QEMU can send data too soon so that dest may | 45 | + * However even if we add channels_ready to recv side in new QEMUs, old |
47 | + * not have been ready to receive it (e.g., rb->receivedmap may be | 46 | + * QEMU won't have them so this sync will still be needed to make sure |
48 | + * uninitialized, for example). | 47 | + * multifd recv threads won't start processing guest pages early before |
48 | + * ram_load_setup() is properly done. | ||
49 | + * | 49 | + * |
50 | + * Logically "wait for recv setup ready" shouldn't need to involve src | 50 | + * Let's stick with this. Fortunately the overhead is low to sync |
51 | + * QEMU at all, however to be compatible with old QEMUs, let's stick | 51 | + * during setup because the VM is running, so at least it's not |
52 | + * with this. Fortunately the overhead is low to sync during setup | 52 | + * accounted as part of downtime. |
53 | + * because the VM is running, so at least it's not accounted as part of | ||
54 | + * downtime. | ||
55 | + */ | 53 | + */ |
56 | bql_unlock(); | 54 | bql_unlock(); |
57 | ret = multifd_ram_flush_and_sync(f); | 55 | ret = multifd_ram_flush_and_sync(f); |
58 | bql_lock(); | 56 | bql_lock(); |
59 | -- | 57 | -- |
60 | 2.47.0 | 58 | 2.47.0 | diff view generated by jsdifflib |