include/migration/register.h | 15 +++ migration/migration.c | 136 ++++++++++++---------- migration/multifd-nocomp.c | 3 +- migration/multifd.c | 12 +- migration/multifd.h | 5 + migration/options.c | 5 - migration/ram.c | 42 ++++++- migration/savevm.c | 33 ++++++ migration/savevm.h | 1 + tests/qtest/migration/compression-tests.c | 38 +++++- tests/qtest/migration/cpr-tests.c | 6 +- tests/qtest/migration/file-tests.c | 58 +++++---- tests/qtest/migration/framework.c | 76 ++++++++---- tests/qtest/migration/framework.h | 9 +- tests/qtest/migration/misc-tests.c | 4 +- tests/qtest/migration/postcopy-tests.c | 35 +++++- tests/qtest/migration/precopy-tests.c | 48 +++++--- tests/qtest/migration/tls-tests.c | 70 ++++++++++- 18 files changed, 437 insertions(+), 159 deletions(-)
From: Prasad Pandit <pjp@fedoraproject.org>
Hello,
* This series (v9) does minor refactoring and reordering changes as
suggested in the review of earlier series (v8). Also tried to
reproduce/debug a qtest hang issue, but it could not be reproduced.
From the shared stack traces it looked like Postcopy thread was
preparing to finish before migrating all the pages.
===
67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test OK 170.50s 81 subtests passed
===
v8: https://lore.kernel.org/qemu-devel/20250318123846.1370312-1-ppandit@redhat.com/T/#t
* This series (v8) splits earlier patch-2 which enabled multifd and
postcopy options together into two separate patches. One modifies
the channel discovery in migration_ioc_process_incoming() function,
and second one enables the multifd and postcopy migration together.
It also adds the 'save_postcopy_prepare' savevm_state handler to
enable different sections to take an action just before the Postcopy
phase starts. Thank you Peter for these patches.
===
67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test OK 152.66s 81 subtests passed
===
v7: https://lore.kernel.org/qemu-devel/20250228121749.553184-1-ppandit@redhat.com/T/#t
* This series (v7) adds 'MULTIFD_RECV_SYNC' migration command. It is used
to notify the destination migration thread to synchronise with the Multifd
threads. This allows Multifd ('mig/dst/recv_x') threads on the destination
to receive all their data, before they are shutdown.
This series also updates the channel discovery function and qtests as
suggested in the previous review comments.
===
67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test OK 147.84s 81 subtests passed
===
v6: https://lore.kernel.org/qemu-devel/20250215123119.814345-1-ppandit@redhat.com/T/#t
* This series (v6) shuts down Multifd threads before starting Postcopy
migration. It helps to avoid an issue of multifd pages arriving late
at the destination during Postcopy phase and corrupting the vCPU
state. It also reorders the qtest patches and does some refactoring
changes as suggested in previous review.
===
67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test OK 161.35s 73 subtests passed
===
v5: https://lore.kernel.org/qemu-devel/20250205122712.229151-1-ppandit@redhat.com/T/#t
* This series (v5) consolidates migration capabilities setting in one
'set_migration_capabilities()' function, thus simplifying test sources.
It passes all migration tests.
===
66/66 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test OK 143.66s 71 subtests passed
===
v4: https://lore.kernel.org/qemu-devel/20250127120823.144949-1-ppandit@redhat.com/T/#t
* This series (v4) adds more 'multifd+postcopy' qtests which test
Precopy migration with 'postcopy-ram' attribute set. And run
Postcopy migrations with 'multifd' channels enabled.
===
$ ../qtest/migration-test --tap -k -r '/x86_64/migration/multifd+postcopy' | grep -i 'slow test'
# slow test /x86_64/migration/multifd+postcopy/plain executed in 1.29 secs
# slow test /x86_64/migration/multifd+postcopy/recovery/tls/psk executed in 2.48 secs
# slow test /x86_64/migration/multifd+postcopy/preempt/plain executed in 1.49 secs
# slow test /x86_64/migration/multifd+postcopy/preempt/recovery/tls/psk executed in 2.52 secs
# slow test /x86_64/migration/multifd+postcopy/tcp/tls/psk/match executed in 3.62 secs
# slow test /x86_64/migration/multifd+postcopy/tcp/plain/zstd executed in 1.34 secs
# slow test /x86_64/migration/multifd+postcopy/tcp/plain/cancel executed in 2.24 secs
...
66/66 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test OK 148.41s 71 subtests passed
===
v3: https://lore.kernel.org/qemu-devel/20250121131032.1611245-1-ppandit@redhat.com/T/#t
* This series (v3) passes all existing 'tests/qtest/migration/*' tests
and adds a new one to enable multifd channels with postcopy migration.
v2: https://lore.kernel.org/qemu-devel/20241129122256.96778-1-ppandit@redhat.com/T/#u
* This series (v2) further refactors the 'ram_save_target_page'
function to make it independent of the multifd & postcopy change.
v1: https://lore.kernel.org/qemu-devel/20241126115748.118683-1-ppandit@redhat.com/T/#u
* This series removes magic value (4-bytes) introduced in the
previous series for the Postcopy channel.
v0: https://lore.kernel.org/qemu-devel/20241029150908.1136894-1-ppandit@redhat.com/T/#u
* Currently Multifd and Postcopy migration can not be used together.
QEMU shows "Postcopy is not yet compatible with multifd" message.
When migrating guests with large (100's GB) RAM, Multifd threads
help to accelerate migration, but inability to use it with the
Postcopy mode delays guest start up on the destination side.
* This patch series allows to enable both Multifd and Postcopy
migration together. Precopy and Multifd threads work during
the initial guest (RAM) transfer. When migration moves to the
Postcopy phase, Multifd threads are restrained and the Postcopy
threads start to request pages from the source side.
* This series introduces magic value (4-bytes) to be sent on the
Postcopy channel. It helps to differentiate channels and properly
setup incoming connections on the destination side.
Thank you.
---
Peter Xu (2):
migration: Add save_postcopy_prepare() savevm handler
migration/ram: Implement save_postcopy_prepare()
Prasad Pandit (5):
migration/multifd: move macros to multifd header
migration: refactor channel discovery mechanism
migration: enable multifd and postcopy together
tests/qtest/migration: consolidate set capabilities
tests/qtest/migration: add postcopy tests with multifd
include/migration/register.h | 15 +++
migration/migration.c | 136 ++++++++++++----------
migration/multifd-nocomp.c | 3 +-
migration/multifd.c | 12 +-
migration/multifd.h | 5 +
migration/options.c | 5 -
migration/ram.c | 42 ++++++-
migration/savevm.c | 33 ++++++
migration/savevm.h | 1 +
tests/qtest/migration/compression-tests.c | 38 +++++-
tests/qtest/migration/cpr-tests.c | 6 +-
tests/qtest/migration/file-tests.c | 58 +++++----
tests/qtest/migration/framework.c | 76 ++++++++----
tests/qtest/migration/framework.h | 9 +-
tests/qtest/migration/misc-tests.c | 4 +-
tests/qtest/migration/postcopy-tests.c | 35 +++++-
tests/qtest/migration/precopy-tests.c | 48 +++++---
tests/qtest/migration/tls-tests.c | 70 ++++++++++-
18 files changed, 437 insertions(+), 159 deletions(-)
--
2.49.0
Prasad Pandit <ppandit@redhat.com> writes: > From: Prasad Pandit <pjp@fedoraproject.org> > > Hello, > > > * This series (v9) does minor refactoring and reordering changes as > suggested in the review of earlier series (v8). Also tried to > reproduce/debug a qtest hang issue, but it could not be reproduced. > From the shared stack traces it looked like Postcopy thread was > preparing to finish before migrating all the pages. The issue is that a zero page is being migrated by multifd but there's an optimization in place that skips faulting the page in on the destination. Later during postcopy when the page is found to be missing, postcopy (@migrate_send_rp_req_pages) believes the page is already present due to the receivedmap for that pfn being set and thus the code accessing the guest memory just sits there waiting for the page. It seems your series has a logical conflict with this work that was done a while back: https://lore.kernel.org/all/20240401154110.2028453-1-yuan1.liu@intel.com/ The usage of receivedmap for multifd was supposed to be mutually exclusive with postcopy. Take a look at the description of that series and at postcopy_place_page_zero(). We need to figure out what needs to change and how to do that compatibly. It might just be the case of memsetting the zero page always for postcopy, but I havent't thought too much about it. There's also other issues with the series: https://gitlab.com/farosas/qemu/-/pipelines/1770488059 The CI workers don't support userfaultfd so the tests need to check for that properly. We have MigrationTestEnv::has_uffd for that. Lastly, I have seem some weirdness with TLS channels disconnections leading to asserts in qio_channel_shutdown() in my testing. I'll get a better look at those tomorrow.
Fabiano Rosas <farosas@suse.de> writes: > Prasad Pandit <ppandit@redhat.com> writes: > >> From: Prasad Pandit <pjp@fedoraproject.org> >> >> Hello, >> >> >> * This series (v9) does minor refactoring and reordering changes as >> suggested in the review of earlier series (v8). Also tried to >> reproduce/debug a qtest hang issue, but it could not be reproduced. >> From the shared stack traces it looked like Postcopy thread was >> preparing to finish before migrating all the pages. > > The issue is that a zero page is being migrated by multifd but there's > an optimization in place that skips faulting the page in on the > destination. Later during postcopy when the page is found to be missing, > postcopy (@migrate_send_rp_req_pages) believes the page is already > present due to the receivedmap for that pfn being set and thus the code > accessing the guest memory just sits there waiting for the page. > > It seems your series has a logical conflict with this work that was done > a while back: > > https://lore.kernel.org/all/20240401154110.2028453-1-yuan1.liu@intel.com/ > > The usage of receivedmap for multifd was supposed to be mutually > exclusive with postcopy. Take a look at the description of that series > and at postcopy_place_page_zero(). We need to figure out what needs to > change and how to do that compatibly. It might just be the case of > memsetting the zero page always for postcopy, but I havent't thought too > much about it. > > There's also other issues with the series: > > https://gitlab.com/farosas/qemu/-/pipelines/1770488059 > > The CI workers don't support userfaultfd so the tests need to check for > that properly. We have MigrationTestEnv::has_uffd for that. > > Lastly, I have seem some weirdness with TLS channels disconnections > leading to asserts in qio_channel_shutdown() in my testing. I'll get a > better look at those tomorrow. Ok, you can ignore this last paragraph. I was seeing the postcopy recovery test disconnect messages, those are benign.
Hi,
On Wed, 16 Apr 2025 at 18:29, Fabiano Rosas <farosas@suse.de> wrote:
> > The issue is that a zero page is being migrated by multifd but there's
> > an optimization in place that skips faulting the page in on the
> > destination. Later during postcopy when the page is found to be missing,
> > postcopy (@migrate_send_rp_req_pages) believes the page is already
> > present due to the receivedmap for that pfn being set and thus the code
> > accessing the guest memory just sits there waiting for the page.
> >
> > It seems your series has a logical conflict with this work that was done
> > a while back:
> >
> > https://lore.kernel.org/all/20240401154110.2028453-1-yuan1.liu@intel.com/
> >
> > The usage of receivedmap for multifd was supposed to be mutually
> > exclusive with postcopy. Take a look at the description of that series
> > and at postcopy_place_page_zero(). We need to figure out what needs to
> > change and how to do that compatibly. It might just be the case of
> > memsetting the zero page always for postcopy, but I havent't thought too
> > much about it.
===
$ grep -i avx /proc/cpuinfo
flags : avx avx2 avx512f avx512dq avx512ifma avx512cd avx512bw
avx512vl avx512vbmi avx512_vbmi2 avx512_vnni avx512_bitalg
avx512_vpopcntdq avx512_vp2intersect
$
$ ./configure --enable-kvm --enable-avx512bw --enable-avx2
--disable-docs --target-list='x86_64-softmmu'
$ make -sj10 check-qtest
67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test
OK 193.80s 81 subtests passed
===
* One of my machines does seem to support 'avx*' instructions. QEMU is
configured and built with the 'avx2' and 'avx512bw' support. Still
migration-tests run fine, without any hang issue observed. Not sure
why the hang issue is not reproducing on my side. How do you generally
build QEMU to run these tests? Does this issue require some specific
h/w setup/support?
* Not sure how/why page faults happen during the Multifd phase when
the guest on the destination is not running. If 'receivedmap' says
that page is present, code accessing guest memory should just access
whatever is available/present in that space, without waiting. I'll try
to see what zero pages do, how page-faults occur during postcopy and
how they are serviced. Let's see..
* Another suggestion is, maybe we should review and pull at least the
refactoring patches so that in the next revisions we don't have to
redo them. We can hold back the "enable multifd and postcopy together"
patch that causes this guest hang issue to surface.
> > There's also other issues with the series:
> >
> > https://gitlab.com/farosas/qemu/-/pipelines/1770488059
> >
> > The CI workers don't support userfaultfd so the tests need to check for
> > that properly. We have MigrationTestEnv::has_uffd for that.
> >
> > Lastly, I have seem some weirdness with TLS channels disconnections
> > leading to asserts in qio_channel_shutdown() in my testing. I'll get a
> > better look at those tomorrow.
>
> Ok, you can ignore this last paragraph. I was seeing the postcopy
> recovery test disconnect messages, those are benign.
* ie. ignore everything after - "There's also other issues with this
series: " ? OR just the last one " ...with TLS channels..." ??
Postcopy tests are added only if (env->has_uffd) check returns true.
Thank you.
---
- Prasad
Prasad Pandit <ppandit@redhat.com> writes:
> Hi,
>
> On Wed, 16 Apr 2025 at 18:29, Fabiano Rosas <farosas@suse.de> wrote:
>> > The issue is that a zero page is being migrated by multifd but there's
>> > an optimization in place that skips faulting the page in on the
>> > destination. Later during postcopy when the page is found to be missing,
>> > postcopy (@migrate_send_rp_req_pages) believes the page is already
>> > present due to the receivedmap for that pfn being set and thus the code
>> > accessing the guest memory just sits there waiting for the page.
>> >
>> > It seems your series has a logical conflict with this work that was done
>> > a while back:
>> >
>> > https://lore.kernel.org/all/20240401154110.2028453-1-yuan1.liu@intel.com/
>> >
>> > The usage of receivedmap for multifd was supposed to be mutually
>> > exclusive with postcopy. Take a look at the description of that series
>> > and at postcopy_place_page_zero(). We need to figure out what needs to
>> > change and how to do that compatibly. It might just be the case of
>> > memsetting the zero page always for postcopy, but I havent't thought too
>> > much about it.
>
> ===
> $ grep -i avx /proc/cpuinfo
> flags : avx avx2 avx512f avx512dq avx512ifma avx512cd avx512bw
> avx512vl avx512vbmi avx512_vbmi2 avx512_vnni avx512_bitalg
> avx512_vpopcntdq avx512_vp2intersect
> $
> $ ./configure --enable-kvm --enable-avx512bw --enable-avx2
> --disable-docs --target-list='x86_64-softmmu'
> $ make -sj10 check-qtest
> 67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test
> OK 193.80s 81 subtests passed
> ===
>
> * One of my machines does seem to support 'avx*' instructions. QEMU is
> configured and built with the 'avx2' and 'avx512bw' support. Still
> migration-tests run fine, without any hang issue observed. Not sure
> why the hang issue is not reproducing on my side. How do you generally
> build QEMU to run these tests? Does this issue require some specific
> h/w setup/support?
>
There's nothing unusual here that I know of. Configure line is just
--target-list=x86_64-softmmu --enable-debug --disable-docs --disable-plugins.
> * Not sure how/why page faults happen during the Multifd phase when
> the guest on the destination is not running. If 'receivedmap' says
> that page is present, code accessing guest memory should just access
> whatever is available/present in that space, without waiting. I'll try
> to see what zero pages do, how page-faults occur during postcopy and
> how they are serviced. Let's see..
It's not that page faults happen during multifd. The page was already
sent during precopy, but multifd-recv didn't write to it, it just marked
the receivedmap. When postcopy starts, the page gets accessed and
faults. Since postcopy is on, the migration wants to request the page
from the source, but it's present in the receivedmap, so it doesn't
ask. No page ever comes and the code hangs waiting for the page fault to
be serviced (or potentially faults continuously? I'm not sure on the
details).
>
> * Another suggestion is, maybe we should review and pull at least the
> refactoring patches so that in the next revisions we don't have to
> redo them. We can hold back the "enable multifd and postcopy together"
> patch that causes this guest hang issue to surface.
>
That's reasonable. But I won't be available for the next two
weeks. Peter is going to be back in the meantime, let's hear what he has
to say about this postcopy issue. I'll provide my r-bs.
>> > There's also other issues with the series:
>> >
>> > https://gitlab.com/farosas/qemu/-/pipelines/1770488059
>> >
>> > The CI workers don't support userfaultfd so the tests need to check for
>> > that properly. We have MigrationTestEnv::has_uffd for that.
>> >
>> > Lastly, I have seem some weirdness with TLS channels disconnections
>> > leading to asserts in qio_channel_shutdown() in my testing. I'll get a
>> > better look at those tomorrow.
>>
>> Ok, you can ignore this last paragraph. I was seeing the postcopy
>> recovery test disconnect messages, those are benign.
>
> * ie. ignore everything after - "There's also other issues with this
> series: " ? OR just the last one " ...with TLS channels..." ??
> Postcopy tests are added only if (env->has_uffd) check returns true.
>
Only the TLS part. The CI is failing with just this series. I didn't
change anything there. Maybe there's a bug in the userfaultfd detection?
I'll leave it to you, here's the error:
# Running /ppc64/migration/multifd+postcopy/tcp/plain/cancel
# Using machine type: pseries-10.0
# starting QEMU: exec ./qemu-system-ppc64 -qtest
# unix:/tmp/qtest-1305.sock -qtest-log /dev/null -chardev
# socket,path=/tmp/qtest-1305.qmp,id=char0 -mon
# chardev=char0,mode=control -display none -audio none -accel kvm -accel
# tcg -machine pseries-10.0,vsmt=8 -name source,debug-threads=on -m 256M
# -serial file:/tmp/migration-test-X0SO42/src_serial -nodefaults
# -machine
# cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off,
# -bios /tmp/migration-test-X0SO42/bootsect 2>/dev/null -accel qtest
# starting QEMU: exec ./qemu-system-ppc64 -qtest
# unix:/tmp/qtest-1305.sock -qtest-log /dev/null -chardev
# socket,path=/tmp/qtest-1305.qmp,id=char0 -mon
# chardev=char0,mode=control -display none -audio none -accel kvm -accel
# tcg -machine pseries-10.0,vsmt=8 -name target,debug-threads=on -m 256M
# -serial file:/tmp/migration-test-X0SO42/dest_serial -incoming defer
# -nodefaults -machine
# cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off,
# -bios /tmp/migration-test-X0SO42/bootsect 2>/dev/null -accel qtest
# {
# "error": {
# "class": "GenericError",
# "desc": "Postcopy is not supported: Userfaultfd not available: Function not implemented"
# }
# }
On Thu, Apr 17, 2025 at 01:05:37PM -0300, Fabiano Rosas wrote: > It's not that page faults happen during multifd. The page was already > sent during precopy, but multifd-recv didn't write to it, it just marked > the receivedmap. When postcopy starts, the page gets accessed and > faults. Since postcopy is on, the migration wants to request the page > from the source, but it's present in the receivedmap, so it doesn't > ask. No page ever comes and the code hangs waiting for the page fault to > be serviced (or potentially faults continuously? I'm not sure on the > details). I think your previous analysis is correct on the zero pages. I am not 100% sure if that's the issue but very likely. I tend to also agree with you that we could skip zero page optimization in multifd code when postcopy is enabled (maybe plus some comment right above..). Thanks, -- Peter Xu
Hi,
> On Thu, Apr 17, 2025 at 01:05:37PM -0300, Fabiano Rosas wrote:
> > It's not that page faults happen during multifd. The page was already
> > sent during precopy, but multifd-recv didn't write to it, it just marked
> > the receivedmap. When postcopy starts, the page gets accessed and
> > faults. Since postcopy is on, the migration wants to request the page
> > from the source, but it's present in the receivedmap, so it doesn't
> > ask. No page ever comes and the code hangs waiting for the page fault to
> > be serviced (or potentially faults continuously? I'm not sure on the
> > details).
>
> I think your previous analysis is correct on the zero pages. I am not 100%
> sure if that's the issue but very likely. I tend to also agree with you
> that we could skip zero page optimization in multifd code when postcopy is
> enabled (maybe plus some comment right above..).
migration/multifd: solve zero page causing multiple page faults
-> https://gitlab.com/qemu-project/qemu/-/commit/5ef7e26bdb7eda10d6d5e1b77121be9945e5e550
* Is this the optimization that is causing the migration hang issue?
===
diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
index dbc1184921..00f69ff965 100644
--- a/migration/multifd-zero-page.c
+++ b/migration/multifd-zero-page.c
@@ -85,7 +85,8 @@ void multifd_recv_zero_page_process(MultiFDRecvParams *p)
{
for (int i = 0; i < p->zero_num; i++) {
void *page = p->host + p->zero[i];
- if (ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
+ if (!migrate_postcopy() &&
+ ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
memset(page, 0, multifd_ram_page_size());
} else {
ramblock_recv_bitmap_set_offset(p->block, p->zero[i]);
===
* Would the above patch help to resolve it?
* Another way could be when the page fault occurs during postcopy
phase, if we know (from receivedmap) that the faulted page is a
zero-page, maybe we could write it locally on the destination to
service the page-fault?
On Thu, 17 Apr 2025 at 21:35, Fabiano Rosas <farosas@suse.de> wrote:
> Maybe there's a bug in the userfaultfd detection? I'll leave it to you, here's the error:
>
> # Running /ppc64/migration/multifd+postcopy/tcp/plain/cancel
> # Using machine type: pseries-10.0
> # starting QEMU: exec ./qemu-system-ppc64 -qtest
> # {
> # "error": {
> # "class": "GenericError",
> # "desc": "Postcopy is not supported: Userfaultfd not available: Function not implemented"
> # }
> # }
* It is saying - function not implemented - does the Pseries machine
not support userfaultfd?
Thank you.
---
- Prasad
Prasad Pandit <ppandit@redhat.com> writes:
> Hi,
>
>> On Thu, Apr 17, 2025 at 01:05:37PM -0300, Fabiano Rosas wrote:
>> > It's not that page faults happen during multifd. The page was already
>> > sent during precopy, but multifd-recv didn't write to it, it just marked
>> > the receivedmap. When postcopy starts, the page gets accessed and
>> > faults. Since postcopy is on, the migration wants to request the page
>> > from the source, but it's present in the receivedmap, so it doesn't
>> > ask. No page ever comes and the code hangs waiting for the page fault to
>> > be serviced (or potentially faults continuously? I'm not sure on the
>> > details).
>>
>> I think your previous analysis is correct on the zero pages. I am not 100%
>> sure if that's the issue but very likely. I tend to also agree with you
>> that we could skip zero page optimization in multifd code when postcopy is
>> enabled (maybe plus some comment right above..).
>
> migration/multifd: solve zero page causing multiple page faults
> -> https://gitlab.com/qemu-project/qemu/-/commit/5ef7e26bdb7eda10d6d5e1b77121be9945e5e550
>
> * Is this the optimization that is causing the migration hang issue?
>
> ===
> diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
> index dbc1184921..00f69ff965 100644
> --- a/migration/multifd-zero-page.c
> +++ b/migration/multifd-zero-page.c
> @@ -85,7 +85,8 @@ void multifd_recv_zero_page_process(MultiFDRecvParams *p)
> {
> for (int i = 0; i < p->zero_num; i++) {
> void *page = p->host + p->zero[i];
> - if (ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
> + if (!migrate_postcopy() &&
> + ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
> memset(page, 0, multifd_ram_page_size());
> } else {
> ramblock_recv_bitmap_set_offset(p->block, p->zero[i]);
> ===
>
> * Would the above patch help to resolve it?
>
> * Another way could be when the page fault occurs during postcopy
> phase, if we know (from receivedmap) that the faulted page is a
> zero-page, maybe we could write it locally on the destination to
> service the page-fault?
>
> On Thu, 17 Apr 2025 at 21:35, Fabiano Rosas <farosas@suse.de> wrote:
>> Maybe there's a bug in the userfaultfd detection? I'll leave it to you, here's the error:
>>
>> # Running /ppc64/migration/multifd+postcopy/tcp/plain/cancel
>> # Using machine type: pseries-10.0
>> # starting QEMU: exec ./qemu-system-ppc64 -qtest
>> # {
>> # "error": {
>> # "class": "GenericError",
>> # "desc": "Postcopy is not supported: Userfaultfd not available: Function not implemented"
>> # }
>> # }
>
> * It is saying - function not implemented - does the Pseries machine
> not support userfaultfd?
>
We're missing a check on has_uffd for the multifd+postcopy tests.
> Thank you.
> ---
> - Prasad
Hi,
On Tue, 6 May 2025 at 00:34, Fabiano Rosas <farosas@suse.de> wrote:
> >> # Running /ppc64/migration/multifd+postcopy/tcp/plain/cancel
> >> # Using machine type: pseries-10.0
> >> # starting QEMU: exec ./qemu-system-ppc64 -qtest
> >> # {
> >> # "error": {
> >> # "class": "GenericError",
> >> # "desc": "Postcopy is not supported: Userfaultfd not available: Function not implemented"
> >> # }
> >> # }
> >
===
[ ~]#
...
PPC KVM module is not loaded. Try modprobe kvm_hv.
qemu-system-ppc64: -accel kvm: failed to initialize kvm: Invalid argument
qemu-system-ppc64: -accel kvm: ioctl(KVM_CREATE_VM) failed: Invalid argument
PPC KVM module is not loaded. Try modprobe kvm_hv.
qemu-system-ppc64: -accel kvm: failed to initialize kvm: Invalid argument
[ ~]#
[ ~]# modprobe kvm-hv
modprobe: ERROR: could not insert 'kvm_hv': No such device
[ ~]#
[ ~]# ls -l /dev/kvm /dev/userfaultfd
crw-rw-rw-. 1 root kvm 10, 232 May 6 07:06 /dev/kvm
crw----rw-. 1 root root 10, 123 May 6 06:30 /dev/userfaultfd
[ ~]#
===
* I tried to reproduce this issue across multiple Power9 and Power10
machines, but I -qtest could not run due to above errors.
> We're missing a check on has_uffd for the multifd+postcopy tests.
* If it is about missing the 'e->has_uffd' check, does that mean
Postcopy tests are skipped on this machine because 'e->has_uffd' is
false?
Thank you.
---
- Prasad
On Tue, Apr 29, 2025 at 06:21:13PM +0530, Prasad Pandit wrote:
> Hi,
>
> > On Thu, Apr 17, 2025 at 01:05:37PM -0300, Fabiano Rosas wrote:
> > > It's not that page faults happen during multifd. The page was already
> > > sent during precopy, but multifd-recv didn't write to it, it just marked
> > > the receivedmap. When postcopy starts, the page gets accessed and
> > > faults. Since postcopy is on, the migration wants to request the page
> > > from the source, but it's present in the receivedmap, so it doesn't
> > > ask. No page ever comes and the code hangs waiting for the page fault to
> > > be serviced (or potentially faults continuously? I'm not sure on the
> > > details).
> >
> > I think your previous analysis is correct on the zero pages. I am not 100%
> > sure if that's the issue but very likely. I tend to also agree with you
> > that we could skip zero page optimization in multifd code when postcopy is
> > enabled (maybe plus some comment right above..).
>
> migration/multifd: solve zero page causing multiple page faults
> -> https://gitlab.com/qemu-project/qemu/-/commit/5ef7e26bdb7eda10d6d5e1b77121be9945e5e550
>
> * Is this the optimization that is causing the migration hang issue?
I think that's what Fabiano mentioned, but ultimately we need to verify it
on a reproducer to know.
>
> ===
> diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
> index dbc1184921..00f69ff965 100644
> --- a/migration/multifd-zero-page.c
> +++ b/migration/multifd-zero-page.c
> @@ -85,7 +85,8 @@ void multifd_recv_zero_page_process(MultiFDRecvParams *p)
> {
> for (int i = 0; i < p->zero_num; i++) {
> void *page = p->host + p->zero[i];
> - if (ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
> + if (!migrate_postcopy() &&
> + ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
> memset(page, 0, multifd_ram_page_size());
> } else {
> ramblock_recv_bitmap_set_offset(p->block, p->zero[i]);
> ===
>
> * Would the above patch help to resolve it?
Looks ok, but please add some comments explain why postcopy needs to do it,
and especially do it during precopy phase.
I'd use migrate_postcopy_ram() instead. I wished migrate_dirty_bitmaps()
has a better name, maybe migrate_postcopy_block().. I have totally no idea
who is using the feature, especially when postcopy-ram is off.
>
> * Another way could be when the page fault occurs during postcopy
> phase, if we know (from receivedmap) that the faulted page is a
> zero-page, maybe we could write it locally on the destination to
> service the page-fault?
I don't think we can know that - receivedmap set doesn't mean it's a zero
page, but only says it's been received before. It can also happen e.g. >1
threads faulted on the same page then the 2nd thread faulted on it may see
receivedmap set because the 1st thread got faulted already got the fault
resolved.
--
Peter Xu
On Tue, 29 Apr 2025 at 18:34, Peter Xu <peterx@redhat.com> wrote:
> I think that's what Fabiano mentioned, but ultimately we need to verify it
> on a reproducer to know.
...
> Looks ok, but please add some comments explain why postcopy needs to do it,
> and especially do it during precopy phase.
>
> I'd use migrate_postcopy_ram() instead.
* Okay. It should be '||' instead of '&&' in the first conditional I
think, we want to write zeropage when postcopy is enabled.
===
diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
index dbc1184921..4d6677feab 100644
--- a/migration/multifd-zero-page.c
+++ b/migration/multifd-zero-page.c
@@ -85,9 +85,11 @@ void multifd_recv_zero_page_process(MultiFDRecvParams *p)
{
for (int i = 0; i < p->zero_num; i++) {
void *page = p->host + p->zero[i];
- if (ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
+ if (migrate_postcopy_ram() ||
+ ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
memset(page, 0, multifd_ram_page_size());
- } else {
+ }
+ if (!ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
ramblock_recv_bitmap_set_offset(p->block, p->zero[i]);
}
}
===
* I'll send this one if it looks okay.
> I don't think we can know that - receivedmap set doesn't mean it's a zero
> page, but only says it's been received before. It can also happen e.g. >1
> threads faulted on the same page then the 2nd thread faulted on it may see
> receivedmap set because the 1st thread got faulted already got the fault
> resolved.
* Okay.
Thank you.
---
- Prasad
Prasad Pandit <ppandit@redhat.com> writes:
> On Tue, 29 Apr 2025 at 18:34, Peter Xu <peterx@redhat.com> wrote:
>> I think that's what Fabiano mentioned, but ultimately we need to verify it
>> on a reproducer to know.
> ...
>> Looks ok, but please add some comments explain why postcopy needs to do it,
>> and especially do it during precopy phase.
>>
>> I'd use migrate_postcopy_ram() instead.
>
> * Okay. It should be '||' instead of '&&' in the first conditional I
> think, we want to write zeropage when postcopy is enabled.
> ===
> diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
> index dbc1184921..4d6677feab 100644
> --- a/migration/multifd-zero-page.c
> +++ b/migration/multifd-zero-page.c
> @@ -85,9 +85,11 @@ void multifd_recv_zero_page_process(MultiFDRecvParams *p)
> {
> for (int i = 0; i < p->zero_num; i++) {
> void *page = p->host + p->zero[i];
> - if (ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
> + if (migrate_postcopy_ram() ||
> + ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
> memset(page, 0, multifd_ram_page_size());
> - } else {
> + }
> + if (!ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
> ramblock_recv_bitmap_set_offset(p->block, p->zero[i]);
> }
> }
> ===
I applied this diff and I'm not seeing the hang anymore.
Hello Fabiano,
On Tue, 6 May 2025 at 00:31, Fabiano Rosas <farosas@suse.de> wrote:
> > +++ b/migration/multifd-zero-page.c
> > @@ -85,9 +85,11 @@ void multifd_recv_zero_page_process(MultiFDRecvParams *p)
> > {
> > for (int i = 0; i < p->zero_num; i++) {
> > void *page = p->host + p->zero[i];
> > - if (ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
> > + if (migrate_postcopy_ram() ||
> > + ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
> > memset(page, 0, multifd_ram_page_size());
> > - } else {
> > + }
> > + if (!ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
> > ramblock_recv_bitmap_set_offset(p->block, p->zero[i]);
> > }
> > }
> > ===
>
> I applied this diff and I'm not seeing the hang anymore.
* Great, thank you for the confirmation. I'll prepare a formal patch.
Thank you.
---
- Prasad
On Tue, Apr 29, 2025 at 06:58:29PM +0530, Prasad Pandit wrote:
> On Tue, 29 Apr 2025 at 18:34, Peter Xu <peterx@redhat.com> wrote:
> > I think that's what Fabiano mentioned, but ultimately we need to verify it
> > on a reproducer to know.
> ...
> > Looks ok, but please add some comments explain why postcopy needs to do it,
> > and especially do it during precopy phase.
> >
> > I'd use migrate_postcopy_ram() instead.
>
> * Okay. It should be '||' instead of '&&' in the first conditional I
> think, we want to write zeropage when postcopy is enabled.
> ===
> diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
> index dbc1184921..4d6677feab 100644
> --- a/migration/multifd-zero-page.c
> +++ b/migration/multifd-zero-page.c
> @@ -85,9 +85,11 @@ void multifd_recv_zero_page_process(MultiFDRecvParams *p)
> {
> for (int i = 0; i < p->zero_num; i++) {
> void *page = p->host + p->zero[i];
> - if (ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
> + if (migrate_postcopy_ram() ||
> + ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
> memset(page, 0, multifd_ram_page_size());
> - } else {
> + }
> + if (!ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
> ramblock_recv_bitmap_set_offset(p->block, p->zero[i]);
> }
> }
> ===
> * I'll send this one if it looks okay.
Please don't rush to send. Again, let's verify the issue first before
resending anything.
If you could reproduce it it would be perfect, then we can already verify
it. Otherwise we may need help from Fabiano. Let's not send anything if
you're not yet sure whether it works.. It can confuse people thinking
problem solved, but maybe not yet.
--
Peter Xu
On Tue, 29 Apr 2025 at 19:18, Peter Xu <peterx@redhat.com> wrote: > Please don't rush to send. Again, let's verify the issue first before > resending anything. > > If you could reproduce it it would be perfect, then we can already verify > it. Otherwise we may need help from Fabiano. Let's not send anything if > you're not yet sure whether it works.. It can confuse people thinking > problem solved, but maybe not yet. * No, the migration hang issue is not reproducing on my side. Earlier in this thread, Fabiano said you'll be better able to confirm the issue. (so its possible fix as well I guess) * You don't have access to the set-up that he uses for running tests and merging patches? Would it be possible for you to run the same tests? (just checking, I don't know how co-maintainers work to test/merge patches) * If we don't send the patch, how will Fabiano test it? Should we wait for Fabiano to come back and then make this same patch in his set-up and test/verify it? Thank you. --- - Prasad
On Tue, Apr 29, 2025 at 08:50:19PM +0530, Prasad Pandit wrote: > On Tue, 29 Apr 2025 at 19:18, Peter Xu <peterx@redhat.com> wrote: > > Please don't rush to send. Again, let's verify the issue first before > > resending anything. > > > > If you could reproduce it it would be perfect, then we can already verify > > it. Otherwise we may need help from Fabiano. Let's not send anything if > > you're not yet sure whether it works.. It can confuse people thinking > > problem solved, but maybe not yet. > > * No, the migration hang issue is not reproducing on my side. Earlier > in this thread, Fabiano said you'll be better able to confirm the > issue. (so its possible fix as well I guess) > > * You don't have access to the set-up that he uses for running tests > and merging patches? Would it be possible for you to run the same > tests? (just checking, I don't know how co-maintainers work to > test/merge patches) No I don't. > > * If we don't send the patch, how will Fabiano test it? Should we wait > for Fabiano to come back and then make this same patch in his set-up > and test/verify it? I thought you've provided a diff. That would be good enough for verifications. If you really want, you can repost, but please mention explicitly that you haven't verified the issue, so the patchset needs to be verified. Fabiano should come back early May. If you want, you can try to look into how to reproduce it by looking at why it triggered in vapic path: https://lore.kernel.org/all/87plhwgbu6.fsf@suse.de/#t Thread 1 (Thread 0x7fbc4849df80 (LWP 7487) "qemu-system-x86"): #0 __memcpy_evex_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:274 #1 0x0000560b135103aa in flatview_read_continue_step (attrs=..., buf=0x560b168a5930 "U\252\022\006\016\a1\300\271", len=9216, mr_addr=831488, l=0x7fbc465ff980, mr=0x560b166c5070) at ../system/physmem.c:3056 #2 0x0000560b1351042e in flatview_read_continue (fv=0x560b16c606a0, addr=831488, attrs=..., ptr=0x560b168a5930, len=9216, mr_addr=831488, l=9216, mr=0x560b166c5070) at ../system/physmem.c:3073 #3 0x0000560b13510533 in flatview_read (fv=0x560b16c606a0, addr=831488, attrs=..., buf=0x560b168a5930, len=9216) at ../system/physmem.c:3103 #4 0x0000560b135105be in address_space_read_full (as=0x560b14970fc0 <address_space_memory>, addr=831488, attrs=..., buf=0x560b168a5930, len=9216) at ../system/physmem.c:3116 #5 0x0000560b135106e7 in address_space_rw (as=0x560b14970fc0 <address_space_memory>, addr=831488, attrs=..., buf=0x560b168a5930, len=9216, is_write=false) at ../system/physmem.c:3144 #6 0x0000560b13510848 in cpu_physical_memory_rw (addr=831488, buf=0x560b168a5930, len=9216, is_write=false) at ../system/physmem.c:3170 #7 0x0000560b1338f5a5 in cpu_physical_memory_read (addr=831488, buf=0x560b168a5930, len=9216) at qemu/include/exec/cpu-common.h:148 #8 0x0000560b1339063c in patch_hypercalls (s=0x560b168840c0) at ../hw/i386/vapic.c:547 #9 0x0000560b1339096d in vapic_prepare (s=0x560b168840c0) at ../hw/i386/vapic.c:629 #10 0x0000560b13390e8b in vapic_post_load (opaque=0x560b168840c0, version_id=1) at ../hw/i386/vapic.c:789 #11 0x0000560b135b4924 in vmstate_load_state (f=0x560b16c53400, vmsd=0x560b147c6cc0 <vmstate_vapic>, opaque=0x560b168840c0, version_id=1) at ../migration/vmstate.c:234 #12 0x0000560b132a15b8 in vmstate_load (f=0x560b16c53400, se=0x560b16893390) at ../migration/savevm.c:972 #13 0x0000560b132a4f28 in qemu_loadvm_section_start_full (f=0x560b16c53400, type=4 '\004') at ../migration/savevm.c:2746 #14 0x0000560b132a5ae8 in qemu_loadvm_state_main (f=0x560b16c53400, mis=0x560b16877f20) at ../migration/savevm.c:3058 #15 0x0000560b132a45d0 in loadvm_handle_cmd_packaged (mis=0x560b16877f20) at ../migration/savevm.c:2451 #16 0x0000560b132a4b36 in loadvm_process_command (f=0x560b168c3b60) at ../migration/savevm.c:2614 #17 0x0000560b132a5b96 in qemu_loadvm_state_main (f=0x560b168c3b60, mis=0x560b16877f20) at ../migration/savevm.c:3073 #18 0x0000560b132a5db7 in qemu_loadvm_state (f=0x560b168c3b60) at ../migration/savevm.c:3150 #19 0x0000560b13286271 in process_incoming_migration_co (opaque=0x0) at ../migration/migration.c:892 #20 0x0000560b137cb6d4 in coroutine_trampoline (i0=377836416, i1=22027) at ../util/coroutine-ucontext.c:175 #21 0x00007fbc4786a79e in ??? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:103 So _if_ the theory is correct, vapic's patch_hypercalls() might be reading a zero page (with GPA 831488, over len=9216, which IIUC covers three pages). Maybe you can check when it'll be one zero page and when it will be not, then maybe you can figure out how you make it always a zero page hence reliably trigger a hang in post_load. You could also try to write a program in guest, zeroing most pages first, trigger migrate (hence send zero pages during multifd precopy), start postcopy, then you should be able to observe vcpu hang at least before postcopy completes. However I don't think it'll hang forever, since if migration all completes, UFFDIO_UNREGISTER will remove the userfaultfd trackings and then kick all hang threads out, causing the fault to be resolved right at the completion of postcopy. So it won't really hang forever like what Fabiano reported here. Meanwhile we'll always want to verify the original reproducer.. even if you could hang it temporarily in a vcpu thread. Thanks, -- Peter Xu
© 2016 - 2025 Red Hat, Inc.