[v9] Allow to enable multifd and postcopy migration together

[PATCH v9 0/7] Allow to enable multifd and postcopy migration together

Posted by Prasad Pandit 7 months, 1 week ago

From: Prasad Pandit <pjp@fedoraproject.org>

 Hello,


* This series (v9) does minor refactoring and reordering changes as
  suggested in the review of earlier series (v8). Also tried to
  reproduce/debug a qtest hang issue, but it could not be reproduced.
  From the shared stack traces it looked like Postcopy thread was
  preparing to finish before migrating all the pages.
===
67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test                 OK             170.50s   81 subtests passed
===


v8: https://lore.kernel.org/qemu-devel/20250318123846.1370312-1-ppandit@redhat.com/T/#t
* This series (v8) splits earlier patch-2 which enabled multifd and
  postcopy options together into two separate patches. One modifies
  the channel discovery in migration_ioc_process_incoming() function,
  and second one enables the multifd and postcopy migration together.

  It also adds the 'save_postcopy_prepare' savevm_state handler to
  enable different sections to take an action just before the Postcopy
  phase starts. Thank you Peter for these patches.
===
67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test                 OK             152.66s   81 subtests passed
===


v7: https://lore.kernel.org/qemu-devel/20250228121749.553184-1-ppandit@redhat.com/T/#t
* This series (v7) adds 'MULTIFD_RECV_SYNC' migration command. It is used
  to notify the destination migration thread to synchronise with the Multifd
  threads. This allows Multifd ('mig/dst/recv_x') threads on the destination
  to receive all their data, before they are shutdown.

  This series also updates the channel discovery function and qtests as
  suggested in the previous review comments.
===
67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test                 OK             147.84s   81 subtests passed
===


v6: https://lore.kernel.org/qemu-devel/20250215123119.814345-1-ppandit@redhat.com/T/#t
* This series (v6) shuts down Multifd threads before starting Postcopy
  migration. It helps to avoid an issue of multifd pages arriving late
  at the destination during Postcopy phase and corrupting the vCPU
  state. It also reorders the qtest patches and does some refactoring
  changes as suggested in previous review.
===
67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test                 OK             161.35s   73 subtests passed
===


v5: https://lore.kernel.org/qemu-devel/20250205122712.229151-1-ppandit@redhat.com/T/#t
* This series (v5) consolidates migration capabilities setting in one
  'set_migration_capabilities()' function, thus simplifying test sources.
  It passes all migration tests.
===
66/66 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test                 OK             143.66s   71 subtests passed
===


v4: https://lore.kernel.org/qemu-devel/20250127120823.144949-1-ppandit@redhat.com/T/#t
* This series (v4) adds more 'multifd+postcopy' qtests which test
  Precopy migration with 'postcopy-ram' attribute set. And run
  Postcopy migrations with 'multifd' channels enabled.
===
$ ../qtest/migration-test --tap -k -r '/x86_64/migration/multifd+postcopy' | grep -i 'slow test'
# slow test /x86_64/migration/multifd+postcopy/plain executed in 1.29 secs
# slow test /x86_64/migration/multifd+postcopy/recovery/tls/psk executed in 2.48 secs
# slow test /x86_64/migration/multifd+postcopy/preempt/plain executed in 1.49 secs
# slow test /x86_64/migration/multifd+postcopy/preempt/recovery/tls/psk executed in 2.52 secs
# slow test /x86_64/migration/multifd+postcopy/tcp/tls/psk/match executed in 3.62 secs
# slow test /x86_64/migration/multifd+postcopy/tcp/plain/zstd executed in 1.34 secs
# slow test /x86_64/migration/multifd+postcopy/tcp/plain/cancel executed in 2.24 secs
...
66/66 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test                 OK             148.41s   71 subtests passed
===


v3: https://lore.kernel.org/qemu-devel/20250121131032.1611245-1-ppandit@redhat.com/T/#t
* This series (v3) passes all existing 'tests/qtest/migration/*' tests
  and adds a new one to enable multifd channels with postcopy migration.


v2: https://lore.kernel.org/qemu-devel/20241129122256.96778-1-ppandit@redhat.com/T/#u
* This series (v2) further refactors the 'ram_save_target_page'
  function to make it independent of the multifd & postcopy change.


v1: https://lore.kernel.org/qemu-devel/20241126115748.118683-1-ppandit@redhat.com/T/#u
* This series removes magic value (4-bytes) introduced in the
  previous series for the Postcopy channel.


v0: https://lore.kernel.org/qemu-devel/20241029150908.1136894-1-ppandit@redhat.com/T/#u
* Currently Multifd and Postcopy migration can not be used together.
  QEMU shows "Postcopy is not yet compatible with multifd" message.

  When migrating guests with large (100's GB) RAM, Multifd threads
  help to accelerate migration, but inability to use it with the
  Postcopy mode delays guest start up on the destination side.

* This patch series allows to enable both Multifd and Postcopy
  migration together. Precopy and Multifd threads work during
  the initial guest (RAM) transfer. When migration moves to the
  Postcopy phase, Multifd threads are restrained and the Postcopy
  threads start to request pages from the source side.

* This series introduces magic value (4-bytes) to be sent on the
  Postcopy channel. It helps to differentiate channels and properly
  setup incoming connections on the destination side.


Thank you.
---
Peter Xu (2):
  migration: Add save_postcopy_prepare() savevm handler
  migration/ram: Implement save_postcopy_prepare()

Prasad Pandit (5):
  migration/multifd: move macros to multifd header
  migration: refactor channel discovery mechanism
  migration: enable multifd and postcopy together
  tests/qtest/migration: consolidate set capabilities
  tests/qtest/migration: add postcopy tests with multifd

 include/migration/register.h              |  15 +++
 migration/migration.c                     | 136 ++++++++++++----------
 migration/multifd-nocomp.c                |   3 +-
 migration/multifd.c                       |  12 +-
 migration/multifd.h                       |   5 +
 migration/options.c                       |   5 -
 migration/ram.c                           |  42 ++++++-
 migration/savevm.c                        |  33 ++++++
 migration/savevm.h                        |   1 +
 tests/qtest/migration/compression-tests.c |  38 +++++-
 tests/qtest/migration/cpr-tests.c         |   6 +-
 tests/qtest/migration/file-tests.c        |  58 +++++----
 tests/qtest/migration/framework.c         |  76 ++++++++----
 tests/qtest/migration/framework.h         |   9 +-
 tests/qtest/migration/misc-tests.c        |   4 +-
 tests/qtest/migration/postcopy-tests.c    |  35 +++++-
 tests/qtest/migration/precopy-tests.c     |  48 +++++---
 tests/qtest/migration/tls-tests.c         |  70 ++++++++++-
 18 files changed, 437 insertions(+), 159 deletions(-)

-- 
2.49.0

Re: [PATCH v9 0/7] Allow to enable multifd and postcopy migration together

Posted by Fabiano Rosas 7 months ago

Prasad Pandit <ppandit@redhat.com> writes:

> From: Prasad Pandit <pjp@fedoraproject.org>
>
>  Hello,
>
>
> * This series (v9) does minor refactoring and reordering changes as
>   suggested in the review of earlier series (v8). Also tried to
>   reproduce/debug a qtest hang issue, but it could not be reproduced.
>   From the shared stack traces it looked like Postcopy thread was
>   preparing to finish before migrating all the pages.

The issue is that a zero page is being migrated by multifd but there's
an optimization in place that skips faulting the page in on the
destination. Later during postcopy when the page is found to be missing,
postcopy (@migrate_send_rp_req_pages) believes the page is already
present due to the receivedmap for that pfn being set and thus the code
accessing the guest memory just sits there waiting for the page.

It seems your series has a logical conflict with this work that was done
a while back:

https://lore.kernel.org/all/20240401154110.2028453-1-yuan1.liu@intel.com/

The usage of receivedmap for multifd was supposed to be mutually
exclusive with postcopy. Take a look at the description of that series
and at postcopy_place_page_zero(). We need to figure out what needs to
change and how to do that compatibly. It might just be the case of
memsetting the zero page always for postcopy, but I havent't thought too
much about it.

There's also other issues with the series:

https://gitlab.com/farosas/qemu/-/pipelines/1770488059

The CI workers don't support userfaultfd so the tests need to check for
that properly. We have MigrationTestEnv::has_uffd for that.

Lastly, I have seem some weirdness with TLS channels disconnections
leading to asserts in qio_channel_shutdown() in my testing. I'll get a
better look at those tomorrow.

Re: [PATCH v9 0/7] Allow to enable multifd and postcopy migration together

Posted by Fabiano Rosas 7 months ago

Fabiano Rosas <farosas@suse.de> writes:

> Prasad Pandit <ppandit@redhat.com> writes:
>
>> From: Prasad Pandit <pjp@fedoraproject.org>
>>
>>  Hello,
>>
>>
>> * This series (v9) does minor refactoring and reordering changes as
>>   suggested in the review of earlier series (v8). Also tried to
>>   reproduce/debug a qtest hang issue, but it could not be reproduced.
>>   From the shared stack traces it looked like Postcopy thread was
>>   preparing to finish before migrating all the pages.
>
> The issue is that a zero page is being migrated by multifd but there's
> an optimization in place that skips faulting the page in on the
> destination. Later during postcopy when the page is found to be missing,
> postcopy (@migrate_send_rp_req_pages) believes the page is already
> present due to the receivedmap for that pfn being set and thus the code
> accessing the guest memory just sits there waiting for the page.
>
> It seems your series has a logical conflict with this work that was done
> a while back:
>
> https://lore.kernel.org/all/20240401154110.2028453-1-yuan1.liu@intel.com/
>
> The usage of receivedmap for multifd was supposed to be mutually
> exclusive with postcopy. Take a look at the description of that series
> and at postcopy_place_page_zero(). We need to figure out what needs to
> change and how to do that compatibly. It might just be the case of
> memsetting the zero page always for postcopy, but I havent't thought too
> much about it.
>
> There's also other issues with the series:
>
> https://gitlab.com/farosas/qemu/-/pipelines/1770488059
>
> The CI workers don't support userfaultfd so the tests need to check for
> that properly. We have MigrationTestEnv::has_uffd for that.
>
> Lastly, I have seem some weirdness with TLS channels disconnections
> leading to asserts in qio_channel_shutdown() in my testing. I'll get a
> better look at those tomorrow.

Ok, you can ignore this last paragraph. I was seeing the postcopy
recovery test disconnect messages, those are benign.

Re: [PATCH v9 0/7] Allow to enable multifd and postcopy migration together

Posted by Prasad Pandit 7 months ago

Hi,

On Wed, 16 Apr 2025 at 18:29, Fabiano Rosas <farosas@suse.de> wrote:
> > The issue is that a zero page is being migrated by multifd but there's
> > an optimization in place that skips faulting the page in on the
> > destination. Later during postcopy when the page is found to be missing,
> > postcopy (@migrate_send_rp_req_pages) believes the page is already
> > present due to the receivedmap for that pfn being set and thus the code
> > accessing the guest memory just sits there waiting for the page.
> >
> > It seems your series has a logical conflict with this work that was done
> > a while back:
> >
> > https://lore.kernel.org/all/20240401154110.2028453-1-yuan1.liu@intel.com/
> >
> > The usage of receivedmap for multifd was supposed to be mutually
> > exclusive with postcopy. Take a look at the description of that series
> > and at postcopy_place_page_zero(). We need to figure out what needs to
> > change and how to do that compatibly. It might just be the case of
> > memsetting the zero page always for postcopy, but I havent't thought too
> > much about it.

===
$ grep -i avx /proc/cpuinfo
flags        : avx avx2 avx512f avx512dq avx512ifma avx512cd avx512bw
avx512vl avx512vbmi avx512_vbmi2 avx512_vnni avx512_bitalg
avx512_vpopcntdq avx512_vp2intersect
$
$ ./configure --enable-kvm --enable-avx512bw --enable-avx2
--disable-docs --target-list='x86_64-softmmu'
$ make -sj10 check-qtest
67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test
     OK             193.80s   81 subtests passed
===

* One of my machines does seem to support 'avx*' instructions. QEMU is
configured and built with the 'avx2' and 'avx512bw' support. Still
migration-tests run fine, without any hang issue observed. Not sure
why the hang issue is not reproducing on my side. How do you generally
build QEMU to run these tests?  Does this issue require some specific
h/w setup/support?

* Not sure how/why page faults happen during the Multifd phase when
the guest on the destination is not running. If 'receivedmap' says
that page is present, code accessing guest memory should just access
whatever is available/present in that space, without waiting. I'll try
to see what zero pages do, how page-faults occur during postcopy and
how they are serviced. Let's see..

* Another suggestion is, maybe we should review and pull at least the
refactoring patches so that in the next revisions we don't have to
redo them. We can hold back the "enable multifd and postcopy together"
patch that causes this guest hang issue to surface.

> > There's also other issues with the series:
> >
> > https://gitlab.com/farosas/qemu/-/pipelines/1770488059
> >
> > The CI workers don't support userfaultfd so the tests need to check for
> > that properly. We have MigrationTestEnv::has_uffd for that.
> >
> > Lastly, I have seem some weirdness with TLS channels disconnections
> > leading to asserts in qio_channel_shutdown() in my testing. I'll get a
> > better look at those tomorrow.
>
> Ok, you can ignore this last paragraph. I was seeing the postcopy
> recovery test disconnect messages, those are benign.

* ie. ignore everything after - "There's also other issues with this
series: " ?  OR just the last one " ...with TLS channels..." ??
Postcopy tests are added only if (env->has_uffd) check returns true.

Thank you.
---
  - Prasad

Re: [PATCH v9 0/7] Allow to enable multifd and postcopy migration together

Posted by Fabiano Rosas 7 months ago

Prasad Pandit <ppandit@redhat.com> writes:

> Hi,
>
> On Wed, 16 Apr 2025 at 18:29, Fabiano Rosas <farosas@suse.de> wrote:
>> > The issue is that a zero page is being migrated by multifd but there's
>> > an optimization in place that skips faulting the page in on the
>> > destination. Later during postcopy when the page is found to be missing,
>> > postcopy (@migrate_send_rp_req_pages) believes the page is already
>> > present due to the receivedmap for that pfn being set and thus the code
>> > accessing the guest memory just sits there waiting for the page.
>> >
>> > It seems your series has a logical conflict with this work that was done
>> > a while back:
>> >
>> > https://lore.kernel.org/all/20240401154110.2028453-1-yuan1.liu@intel.com/
>> >
>> > The usage of receivedmap for multifd was supposed to be mutually
>> > exclusive with postcopy. Take a look at the description of that series
>> > and at postcopy_place_page_zero(). We need to figure out what needs to
>> > change and how to do that compatibly. It might just be the case of
>> > memsetting the zero page always for postcopy, but I havent't thought too
>> > much about it.
>
> ===
> $ grep -i avx /proc/cpuinfo
> flags        : avx avx2 avx512f avx512dq avx512ifma avx512cd avx512bw
> avx512vl avx512vbmi avx512_vbmi2 avx512_vnni avx512_bitalg
> avx512_vpopcntdq avx512_vp2intersect
> $
> $ ./configure --enable-kvm --enable-avx512bw --enable-avx2
> --disable-docs --target-list='x86_64-softmmu'
> $ make -sj10 check-qtest
> 67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test
>      OK             193.80s   81 subtests passed
> ===
>
> * One of my machines does seem to support 'avx*' instructions. QEMU is
> configured and built with the 'avx2' and 'avx512bw' support. Still
> migration-tests run fine, without any hang issue observed. Not sure
> why the hang issue is not reproducing on my side. How do you generally
> build QEMU to run these tests?  Does this issue require some specific
> h/w setup/support?
>

There's nothing unusual here that I know of. Configure line is just
--target-list=x86_64-softmmu --enable-debug --disable-docs --disable-plugins.

> * Not sure how/why page faults happen during the Multifd phase when
> the guest on the destination is not running. If 'receivedmap' says
> that page is present, code accessing guest memory should just access
> whatever is available/present in that space, without waiting. I'll try
> to see what zero pages do, how page-faults occur during postcopy and
> how they are serviced. Let's see..

It's not that page faults happen during multifd. The page was already
sent during precopy, but multifd-recv didn't write to it, it just marked
the receivedmap. When postcopy starts, the page gets accessed and
faults. Since postcopy is on, the migration wants to request the page
from the source, but it's present in the receivedmap, so it doesn't
ask. No page ever comes and the code hangs waiting for the page fault to
be serviced (or potentially faults continuously? I'm not sure on the
details).

>
> * Another suggestion is, maybe we should review and pull at least the
> refactoring patches so that in the next revisions we don't have to
> redo them. We can hold back the "enable multifd and postcopy together"
> patch that causes this guest hang issue to surface.
>

That's reasonable. But I won't be available for the next two
weeks. Peter is going to be back in the meantime, let's hear what he has
to say about this postcopy issue. I'll provide my r-bs.

>> > There's also other issues with the series:
>> >
>> > https://gitlab.com/farosas/qemu/-/pipelines/1770488059
>> >
>> > The CI workers don't support userfaultfd so the tests need to check for
>> > that properly. We have MigrationTestEnv::has_uffd for that.
>> >
>> > Lastly, I have seem some weirdness with TLS channels disconnections
>> > leading to asserts in qio_channel_shutdown() in my testing. I'll get a
>> > better look at those tomorrow.
>>
>> Ok, you can ignore this last paragraph. I was seeing the postcopy
>> recovery test disconnect messages, those are benign.
>
> * ie. ignore everything after - "There's also other issues with this
> series: " ?  OR just the last one " ...with TLS channels..." ??
> Postcopy tests are added only if (env->has_uffd) check returns true.
>

Only the TLS part. The CI is failing with just this series. I didn't
change anything there. Maybe there's a bug in the userfaultfd detection?
I'll leave it to you, here's the error:

# Running /ppc64/migration/multifd+postcopy/tcp/plain/cancel
# Using machine type: pseries-10.0
# starting QEMU: exec ./qemu-system-ppc64 -qtest
#  unix:/tmp/qtest-1305.sock -qtest-log /dev/null -chardev
#  socket,path=/tmp/qtest-1305.qmp,id=char0 -mon
#  chardev=char0,mode=control -display none -audio none -accel kvm -accel
#  tcg -machine pseries-10.0,vsmt=8 -name source,debug-threads=on -m 256M
#  -serial file:/tmp/migration-test-X0SO42/src_serial -nodefaults
#  -machine
#  cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off,
#  -bios /tmp/migration-test-X0SO42/bootsect 2>/dev/null -accel qtest
# starting QEMU: exec ./qemu-system-ppc64 -qtest
#  unix:/tmp/qtest-1305.sock -qtest-log /dev/null -chardev
#  socket,path=/tmp/qtest-1305.qmp,id=char0 -mon
#  chardev=char0,mode=control -display none -audio none -accel kvm -accel
#  tcg -machine pseries-10.0,vsmt=8 -name target,debug-threads=on -m 256M
#  -serial file:/tmp/migration-test-X0SO42/dest_serial -incoming defer
#  -nodefaults -machine
#  cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off,
#  -bios /tmp/migration-test-X0SO42/bootsect 2>/dev/null -accel qtest
# {
#     "error": {
#         "class": "GenericError",
#         "desc": "Postcopy is not supported: Userfaultfd not available: Function not implemented"
#     }
# }

Re: [PATCH v9 0/7] Allow to enable multifd and postcopy migration together

Posted by Peter Xu 6 months, 3 weeks ago

On Thu, Apr 17, 2025 at 01:05:37PM -0300, Fabiano Rosas wrote:
> It's not that page faults happen during multifd. The page was already
> sent during precopy, but multifd-recv didn't write to it, it just marked
> the receivedmap. When postcopy starts, the page gets accessed and
> faults. Since postcopy is on, the migration wants to request the page
> from the source, but it's present in the receivedmap, so it doesn't
> ask. No page ever comes and the code hangs waiting for the page fault to
> be serviced (or potentially faults continuously? I'm not sure on the
> details).

I think your previous analysis is correct on the zero pages.  I am not 100%
sure if that's the issue but very likely.  I tend to also agree with you
that we could skip zero page optimization in multifd code when postcopy is
enabled (maybe plus some comment right above..).

Thanks,

-- 
Peter Xu

Re: [PATCH v9 0/7] Allow to enable multifd and postcopy migration together

Posted by Prasad Pandit 6 months, 2 weeks ago

Hi,

> On Thu, Apr 17, 2025 at 01:05:37PM -0300, Fabiano Rosas wrote:
> > It's not that page faults happen during multifd. The page was already
> > sent during precopy, but multifd-recv didn't write to it, it just marked
> > the receivedmap. When postcopy starts, the page gets accessed and
> > faults. Since postcopy is on, the migration wants to request the page
> > from the source, but it's present in the receivedmap, so it doesn't
> > ask. No page ever comes and the code hangs waiting for the page fault to
> > be serviced (or potentially faults continuously? I'm not sure on the
> > details).
>
> I think your previous analysis is correct on the zero pages.  I am not 100%
> sure if that's the issue but very likely.  I tend to also agree with you
> that we could skip zero page optimization in multifd code when postcopy is
> enabled (maybe plus some comment right above..).

   migration/multifd: solve zero page causing multiple page faults
     -> https://gitlab.com/qemu-project/qemu/-/commit/5ef7e26bdb7eda10d6d5e1b77121be9945e5e550

* Is this the optimization that is causing the migration hang issue?

===
diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
index dbc1184921..00f69ff965 100644
--- a/migration/multifd-zero-page.c
+++ b/migration/multifd-zero-page.c
@@ -85,7 +85,8 @@ void multifd_recv_zero_page_process(MultiFDRecvParams *p)
 {
     for (int i = 0; i < p->zero_num; i++) {
         void *page = p->host + p->zero[i];
-        if (ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
+        if (!migrate_postcopy() &&
+            ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
             memset(page, 0, multifd_ram_page_size());
         } else {
             ramblock_recv_bitmap_set_offset(p->block, p->zero[i]);
===

* Would the above patch help to resolve it?

* Another way could be when the page fault occurs during postcopy
phase, if we know (from receivedmap) that the faulted page is a
zero-page, maybe we could write it locally on the destination to
service the page-fault?

On Thu, 17 Apr 2025 at 21:35, Fabiano Rosas <farosas@suse.de> wrote:
> Maybe there's a bug in the userfaultfd detection? I'll leave it to you, here's the error:
>
> # Running /ppc64/migration/multifd+postcopy/tcp/plain/cancel
> # Using machine type: pseries-10.0
> # starting QEMU: exec ./qemu-system-ppc64 -qtest
> # {
> #     "error": {
> #         "class": "GenericError",
> #         "desc": "Postcopy is not supported: Userfaultfd not available: Function not implemented"
> #     }
> # }

* It is saying - function not implemented - does the Pseries machine
not support userfaultfd?

Thank you.
---
  - Prasad

Re: [PATCH v9 0/7] Allow to enable multifd and postcopy migration together

Posted by Fabiano Rosas 6 months, 2 weeks ago

Prasad Pandit <ppandit@redhat.com> writes:

> Hi,
>
>> On Thu, Apr 17, 2025 at 01:05:37PM -0300, Fabiano Rosas wrote:
>> > It's not that page faults happen during multifd. The page was already
>> > sent during precopy, but multifd-recv didn't write to it, it just marked
>> > the receivedmap. When postcopy starts, the page gets accessed and
>> > faults. Since postcopy is on, the migration wants to request the page
>> > from the source, but it's present in the receivedmap, so it doesn't
>> > ask. No page ever comes and the code hangs waiting for the page fault to
>> > be serviced (or potentially faults continuously? I'm not sure on the
>> > details).
>>
>> I think your previous analysis is correct on the zero pages.  I am not 100%
>> sure if that's the issue but very likely.  I tend to also agree with you
>> that we could skip zero page optimization in multifd code when postcopy is
>> enabled (maybe plus some comment right above..).
>
>    migration/multifd: solve zero page causing multiple page faults
>      -> https://gitlab.com/qemu-project/qemu/-/commit/5ef7e26bdb7eda10d6d5e1b77121be9945e5e550
>
> * Is this the optimization that is causing the migration hang issue?
>
> ===
> diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
> index dbc1184921..00f69ff965 100644
> --- a/migration/multifd-zero-page.c
> +++ b/migration/multifd-zero-page.c
> @@ -85,7 +85,8 @@ void multifd_recv_zero_page_process(MultiFDRecvParams *p)
>  {
>      for (int i = 0; i < p->zero_num; i++) {
>          void *page = p->host + p->zero[i];
> -        if (ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
> +        if (!migrate_postcopy() &&
> +            ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
>              memset(page, 0, multifd_ram_page_size());
>          } else {
>              ramblock_recv_bitmap_set_offset(p->block, p->zero[i]);
> ===
>
> * Would the above patch help to resolve it?
>
> * Another way could be when the page fault occurs during postcopy
> phase, if we know (from receivedmap) that the faulted page is a
> zero-page, maybe we could write it locally on the destination to
> service the page-fault?
>
> On Thu, 17 Apr 2025 at 21:35, Fabiano Rosas <farosas@suse.de> wrote:
>> Maybe there's a bug in the userfaultfd detection? I'll leave it to you, here's the error:
>>
>> # Running /ppc64/migration/multifd+postcopy/tcp/plain/cancel
>> # Using machine type: pseries-10.0
>> # starting QEMU: exec ./qemu-system-ppc64 -qtest
>> # {
>> #     "error": {
>> #         "class": "GenericError",
>> #         "desc": "Postcopy is not supported: Userfaultfd not available: Function not implemented"
>> #     }
>> # }
>
> * It is saying - function not implemented - does the Pseries machine
> not support userfaultfd?
>

We're missing a check on has_uffd for the multifd+postcopy tests.

> Thank you.
> ---
>   - Prasad

Re: [PATCH v9 0/7] Allow to enable multifd and postcopy migration together

Posted by Prasad Pandit 6 months, 1 week ago

Hi,

On Tue, 6 May 2025 at 00:34, Fabiano Rosas <farosas@suse.de> wrote:
> >> # Running /ppc64/migration/multifd+postcopy/tcp/plain/cancel
> >> # Using machine type: pseries-10.0
> >> # starting QEMU: exec ./qemu-system-ppc64 -qtest
> >> # {
> >> #     "error": {
> >> #         "class": "GenericError",
> >> #         "desc": "Postcopy is not supported: Userfaultfd not available: Function not implemented"
> >> #     }
> >> # }
> >
===
[ ~]#
...
PPC KVM module is not loaded. Try modprobe kvm_hv.
qemu-system-ppc64: -accel kvm: failed to initialize kvm: Invalid argument
qemu-system-ppc64: -accel kvm: ioctl(KVM_CREATE_VM) failed: Invalid argument
PPC KVM module is not loaded. Try modprobe kvm_hv.
qemu-system-ppc64: -accel kvm: failed to initialize kvm: Invalid argument
[ ~]#

[ ~]# modprobe kvm-hv
modprobe: ERROR: could not insert 'kvm_hv': No such device
[ ~]#
[ ~]# ls -l /dev/kvm /dev/userfaultfd
crw-rw-rw-. 1 root kvm  10, 232 May  6 07:06 /dev/kvm
crw----rw-. 1 root root 10, 123 May  6 06:30 /dev/userfaultfd
[ ~]#
===

* I tried to reproduce this issue across multiple Power9 and Power10
machines, but I -qtest could not run due to above errors.

> We're missing a check on has_uffd for the multifd+postcopy tests.

* If it is about missing the 'e->has_uffd' check, does that mean
Postcopy tests are skipped on this machine because 'e->has_uffd' is
false?


Thank you.
---
  - Prasad

Re: [PATCH v9 0/7] Allow to enable multifd and postcopy migration together

Posted by Peter Xu 6 months, 2 weeks ago

On Tue, Apr 29, 2025 at 06:21:13PM +0530, Prasad Pandit wrote:
> Hi,
> 
> > On Thu, Apr 17, 2025 at 01:05:37PM -0300, Fabiano Rosas wrote:
> > > It's not that page faults happen during multifd. The page was already
> > > sent during precopy, but multifd-recv didn't write to it, it just marked
> > > the receivedmap. When postcopy starts, the page gets accessed and
> > > faults. Since postcopy is on, the migration wants to request the page
> > > from the source, but it's present in the receivedmap, so it doesn't
> > > ask. No page ever comes and the code hangs waiting for the page fault to
> > > be serviced (or potentially faults continuously? I'm not sure on the
> > > details).
> >
> > I think your previous analysis is correct on the zero pages.  I am not 100%
> > sure if that's the issue but very likely.  I tend to also agree with you
> > that we could skip zero page optimization in multifd code when postcopy is
> > enabled (maybe plus some comment right above..).
> 
>    migration/multifd: solve zero page causing multiple page faults
>      -> https://gitlab.com/qemu-project/qemu/-/commit/5ef7e26bdb7eda10d6d5e1b77121be9945e5e550
> 
> * Is this the optimization that is causing the migration hang issue?

I think that's what Fabiano mentioned, but ultimately we need to verify it
on a reproducer to know.

> 
> ===
> diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
> index dbc1184921..00f69ff965 100644
> --- a/migration/multifd-zero-page.c
> +++ b/migration/multifd-zero-page.c
> @@ -85,7 +85,8 @@ void multifd_recv_zero_page_process(MultiFDRecvParams *p)
>  {
>      for (int i = 0; i < p->zero_num; i++) {
>          void *page = p->host + p->zero[i];
> -        if (ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
> +        if (!migrate_postcopy() &&
> +            ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
>              memset(page, 0, multifd_ram_page_size());
>          } else {
>              ramblock_recv_bitmap_set_offset(p->block, p->zero[i]);
> ===
> 
> * Would the above patch help to resolve it?

Looks ok, but please add some comments explain why postcopy needs to do it,
and especially do it during precopy phase.

I'd use migrate_postcopy_ram() instead. I wished migrate_dirty_bitmaps()
has a better name, maybe migrate_postcopy_block()..  I have totally no idea
who is using the feature, especially when postcopy-ram is off.

> 
> * Another way could be when the page fault occurs during postcopy
> phase, if we know (from receivedmap) that the faulted page is a
> zero-page, maybe we could write it locally on the destination to
> service the page-fault?

I don't think we can know that - receivedmap set doesn't mean it's a zero
page, but only says it's been received before.  It can also happen e.g. >1
threads faulted on the same page then the 2nd thread faulted on it may see
receivedmap set because the 1st thread got faulted already got the fault
resolved.

-- 
Peter Xu

Re: [PATCH v9 0/7] Allow to enable multifd and postcopy migration together

Posted by Prasad Pandit 6 months, 2 weeks ago

On Tue, 29 Apr 2025 at 18:34, Peter Xu <peterx@redhat.com> wrote:
> I think that's what Fabiano mentioned, but ultimately we need to verify it
> on a reproducer to know.
...
> Looks ok, but please add some comments explain why postcopy needs to do it,
> and especially do it during precopy phase.
>
> I'd use migrate_postcopy_ram() instead.

* Okay. It should be '||' instead of '&&' in the first conditional I
think, we want to write zeropage when postcopy is enabled.
===
diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
index dbc1184921..4d6677feab 100644
--- a/migration/multifd-zero-page.c
+++ b/migration/multifd-zero-page.c
@@ -85,9 +85,11 @@ void multifd_recv_zero_page_process(MultiFDRecvParams *p)
 {
     for (int i = 0; i < p->zero_num; i++) {
         void *page = p->host + p->zero[i];
-        if (ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
+        if (migrate_postcopy_ram() ||
+            ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
             memset(page, 0, multifd_ram_page_size());
-        } else {
+        }
+        if (!ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
             ramblock_recv_bitmap_set_offset(p->block, p->zero[i]);
         }
     }
===
* I'll send this one if it looks okay.

> I don't think we can know that - receivedmap set doesn't mean it's a zero
> page, but only says it's been received before.  It can also happen e.g. >1
> threads faulted on the same page then the 2nd thread faulted on it may see
> receivedmap set because the 1st thread got faulted already got the fault
> resolved.

* Okay.

Thank you.
---
  - Prasad

Re: [PATCH v9 0/7] Allow to enable multifd and postcopy migration together

Posted by Fabiano Rosas 6 months, 2 weeks ago

Prasad Pandit <ppandit@redhat.com> writes:

> On Tue, 29 Apr 2025 at 18:34, Peter Xu <peterx@redhat.com> wrote:
>> I think that's what Fabiano mentioned, but ultimately we need to verify it
>> on a reproducer to know.
> ...
>> Looks ok, but please add some comments explain why postcopy needs to do it,
>> and especially do it during precopy phase.
>>
>> I'd use migrate_postcopy_ram() instead.
>
> * Okay. It should be '||' instead of '&&' in the first conditional I
> think, we want to write zeropage when postcopy is enabled.
> ===
> diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
> index dbc1184921..4d6677feab 100644
> --- a/migration/multifd-zero-page.c
> +++ b/migration/multifd-zero-page.c
> @@ -85,9 +85,11 @@ void multifd_recv_zero_page_process(MultiFDRecvParams *p)
>  {
>      for (int i = 0; i < p->zero_num; i++) {
>          void *page = p->host + p->zero[i];
> -        if (ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
> +        if (migrate_postcopy_ram() ||
> +            ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
>              memset(page, 0, multifd_ram_page_size());
> -        } else {
> +        }
> +        if (!ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
>              ramblock_recv_bitmap_set_offset(p->block, p->zero[i]);
>          }
>      }
> ===

I applied this diff and I'm not seeing the hang anymore.

Re: [PATCH v9 0/7] Allow to enable multifd and postcopy migration together

Posted by Prasad Pandit 6 months, 1 week ago

Hello Fabiano,

On Tue, 6 May 2025 at 00:31, Fabiano Rosas <farosas@suse.de> wrote:
> > +++ b/migration/multifd-zero-page.c
> > @@ -85,9 +85,11 @@ void multifd_recv_zero_page_process(MultiFDRecvParams *p)
> >  {
> >      for (int i = 0; i < p->zero_num; i++) {
> >          void *page = p->host + p->zero[i];
> > -        if (ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
> > +        if (migrate_postcopy_ram() ||
> > +            ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
> >              memset(page, 0, multifd_ram_page_size());
> > -        } else {
> > +        }
> > +        if (!ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
> >              ramblock_recv_bitmap_set_offset(p->block, p->zero[i]);
> >          }
> >      }
> > ===
>
> I applied this diff and I'm not seeing the hang anymore.

* Great, thank you for the confirmation. I'll prepare a formal patch.

Thank you.
---
  - Prasad

Re: [PATCH v9 0/7] Allow to enable multifd and postcopy migration together

Posted by Peter Xu 6 months, 2 weeks ago

On Tue, Apr 29, 2025 at 06:58:29PM +0530, Prasad Pandit wrote:
> On Tue, 29 Apr 2025 at 18:34, Peter Xu <peterx@redhat.com> wrote:
> > I think that's what Fabiano mentioned, but ultimately we need to verify it
> > on a reproducer to know.
> ...
> > Looks ok, but please add some comments explain why postcopy needs to do it,
> > and especially do it during precopy phase.
> >
> > I'd use migrate_postcopy_ram() instead.
> 
> * Okay. It should be '||' instead of '&&' in the first conditional I
> think, we want to write zeropage when postcopy is enabled.
> ===
> diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
> index dbc1184921..4d6677feab 100644
> --- a/migration/multifd-zero-page.c
> +++ b/migration/multifd-zero-page.c
> @@ -85,9 +85,11 @@ void multifd_recv_zero_page_process(MultiFDRecvParams *p)
>  {
>      for (int i = 0; i < p->zero_num; i++) {
>          void *page = p->host + p->zero[i];
> -        if (ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
> +        if (migrate_postcopy_ram() ||
> +            ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
>              memset(page, 0, multifd_ram_page_size());
> -        } else {
> +        }
> +        if (!ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
>              ramblock_recv_bitmap_set_offset(p->block, p->zero[i]);
>          }
>      }
> ===
> * I'll send this one if it looks okay.

Please don't rush to send. Again, let's verify the issue first before
resending anything.

If you could reproduce it it would be perfect, then we can already verify
it.  Otherwise we may need help from Fabiano.  Let's not send anything if
you're not yet sure whether it works..  It can confuse people thinking
problem solved, but maybe not yet.

-- 
Peter Xu

Re: [PATCH v9 0/7] Allow to enable multifd and postcopy migration together

Posted by Prasad Pandit 6 months, 2 weeks ago

On Tue, 29 Apr 2025 at 19:18, Peter Xu <peterx@redhat.com> wrote:
> Please don't rush to send. Again, let's verify the issue first before
> resending anything.
>
> If you could reproduce it it would be perfect, then we can already verify
> it.  Otherwise we may need help from Fabiano.  Let's not send anything if
> you're not yet sure whether it works..  It can confuse people thinking
> problem solved, but maybe not yet.

* No, the migration hang issue is not reproducing on my side. Earlier
in this thread, Fabiano said you'll be better able to confirm the
issue. (so its possible fix as well I guess)

* You don't have access to the set-up that he uses for running tests
and merging patches? Would it be possible for you to run the same
tests? (just checking, I don't know how co-maintainers work to
test/merge patches)

* If we don't send the patch, how will Fabiano test it? Should we wait
for Fabiano to come back and then make this same patch in his set-up
and test/verify it?

Thank you.
---
  - Prasad

Re: [PATCH v9 0/7] Allow to enable multifd and postcopy migration together

Posted by Peter Xu 6 months, 2 weeks ago

On Tue, Apr 29, 2025 at 08:50:19PM +0530, Prasad Pandit wrote:
> On Tue, 29 Apr 2025 at 19:18, Peter Xu <peterx@redhat.com> wrote:
> > Please don't rush to send. Again, let's verify the issue first before
> > resending anything.
> >
> > If you could reproduce it it would be perfect, then we can already verify
> > it.  Otherwise we may need help from Fabiano.  Let's not send anything if
> > you're not yet sure whether it works..  It can confuse people thinking
> > problem solved, but maybe not yet.
> 
> * No, the migration hang issue is not reproducing on my side. Earlier
> in this thread, Fabiano said you'll be better able to confirm the
> issue. (so its possible fix as well I guess)
> 
> * You don't have access to the set-up that he uses for running tests
> and merging patches? Would it be possible for you to run the same
> tests? (just checking, I don't know how co-maintainers work to
> test/merge patches)

No I don't.

> 
> * If we don't send the patch, how will Fabiano test it? Should we wait
> for Fabiano to come back and then make this same patch in his set-up
> and test/verify it?

I thought you've provided a diff.  That would be good enough for
verifications.  If you really want, you can repost, but please mention
explicitly that you haven't verified the issue, so the patchset needs to be
verified.

Fabiano should come back early May.  If you want, you can try to look into
how to reproduce it by looking at why it triggered in vapic path:

https://lore.kernel.org/all/87plhwgbu6.fsf@suse.de/#t

Thread 1 (Thread 0x7fbc4849df80 (LWP 7487) "qemu-system-x86"):
#0  __memcpy_evex_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:274
#1  0x0000560b135103aa in flatview_read_continue_step (attrs=..., buf=0x560b168a5930 "U\252\022\006\016\a1\300\271", len=9216, mr_addr=831488, l=0x7fbc465ff980, mr=0x560b166c5070) at ../system/physmem.c:3056
#2  0x0000560b1351042e in flatview_read_continue (fv=0x560b16c606a0, addr=831488, attrs=..., ptr=0x560b168a5930, len=9216, mr_addr=831488, l=9216, mr=0x560b166c5070) at ../system/physmem.c:3073
#3  0x0000560b13510533 in flatview_read (fv=0x560b16c606a0, addr=831488, attrs=..., buf=0x560b168a5930, len=9216) at ../system/physmem.c:3103
#4  0x0000560b135105be in address_space_read_full (as=0x560b14970fc0 <address_space_memory>, addr=831488, attrs=..., buf=0x560b168a5930, len=9216) at ../system/physmem.c:3116
#5  0x0000560b135106e7 in address_space_rw (as=0x560b14970fc0 <address_space_memory>, addr=831488, attrs=..., buf=0x560b168a5930, len=9216, is_write=false) at ../system/physmem.c:3144
#6  0x0000560b13510848 in cpu_physical_memory_rw (addr=831488, buf=0x560b168a5930, len=9216, is_write=false) at ../system/physmem.c:3170
#7  0x0000560b1338f5a5 in cpu_physical_memory_read (addr=831488, buf=0x560b168a5930, len=9216) at qemu/include/exec/cpu-common.h:148
#8  0x0000560b1339063c in patch_hypercalls (s=0x560b168840c0) at ../hw/i386/vapic.c:547
#9  0x0000560b1339096d in vapic_prepare (s=0x560b168840c0) at ../hw/i386/vapic.c:629
#10 0x0000560b13390e8b in vapic_post_load (opaque=0x560b168840c0, version_id=1) at ../hw/i386/vapic.c:789
#11 0x0000560b135b4924 in vmstate_load_state (f=0x560b16c53400, vmsd=0x560b147c6cc0 <vmstate_vapic>, opaque=0x560b168840c0, version_id=1) at ../migration/vmstate.c:234
#12 0x0000560b132a15b8 in vmstate_load (f=0x560b16c53400, se=0x560b16893390) at ../migration/savevm.c:972
#13 0x0000560b132a4f28 in qemu_loadvm_section_start_full (f=0x560b16c53400, type=4 '\004') at ../migration/savevm.c:2746
#14 0x0000560b132a5ae8 in qemu_loadvm_state_main (f=0x560b16c53400, mis=0x560b16877f20) at ../migration/savevm.c:3058
#15 0x0000560b132a45d0 in loadvm_handle_cmd_packaged (mis=0x560b16877f20) at ../migration/savevm.c:2451
#16 0x0000560b132a4b36 in loadvm_process_command (f=0x560b168c3b60) at ../migration/savevm.c:2614
#17 0x0000560b132a5b96 in qemu_loadvm_state_main (f=0x560b168c3b60, mis=0x560b16877f20) at ../migration/savevm.c:3073
#18 0x0000560b132a5db7 in qemu_loadvm_state (f=0x560b168c3b60) at ../migration/savevm.c:3150
#19 0x0000560b13286271 in process_incoming_migration_co (opaque=0x0) at ../migration/migration.c:892
#20 0x0000560b137cb6d4 in coroutine_trampoline (i0=377836416, i1=22027) at ../util/coroutine-ucontext.c:175
#21 0x00007fbc4786a79e in ??? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:103

So _if_ the theory is correct, vapic's patch_hypercalls() might be reading
a zero page (with GPA 831488, over len=9216, which IIUC covers three
pages).  Maybe you can check when it'll be one zero page and when it will
be not, then maybe you can figure out how you make it always a zero page
hence reliably trigger a hang in post_load.

You could also try to write a program in guest, zeroing most pages first,
trigger migrate (hence send zero pages during multifd precopy), start
postcopy, then you should be able to observe vcpu hang at least before
postcopy completes.  However I don't think it'll hang forever, since if
migration all completes, UFFDIO_UNREGISTER will remove the userfaultfd
trackings and then kick all hang threads out, causing the fault to be
resolved right at the completion of postcopy.  So it won't really hang
forever like what Fabiano reported here.  Meanwhile we'll always want to
verify the original reproducer.. even if you could hang it temporarily in a
vcpu thread.

Thanks,

-- 
Peter Xu