[PATCH 0/2] migration/multifd: Fix rb->receivedmap cleanup race

Fabiano Rosas posted 2 patches 2 months, 1 week ago
migration/migration.c | 1 +
migration/savevm.c    | 8 ++++----
2 files changed, 5 insertions(+), 4 deletions(-)
[PATCH 0/2] migration/multifd: Fix rb->receivedmap cleanup race
Posted by Fabiano Rosas 2 months, 1 week ago
v2: Keep skipping the cpu_synchronize_all_post_init() call if the
postcopy listen thread is live. Don't copy stable on the first patch.

CI run: https://gitlab.com/farosas/qemu/-/pipelines/1457418838
====
v1:
https://lore.kernel.org/r/20240913220542.18305-1-farosas@suse.de

This fixes the crash we've been seing recently in migration-test. The
first patch is a cleanup to have only one place calling
qemu_loadvm_state_cleanup() and the second patch reorders the cleanup
calls to make multifd_recv_cleanup() run first and stop the recv
threads.

Fabiano Rosas (2):
  migration/savevm: Remove extra load cleanup calls
  migration/multifd: Fix rb->receivedmap cleanup race

 migration/migration.c | 1 +
 migration/savevm.c    | 8 ++++----
 2 files changed, 5 insertions(+), 4 deletions(-)

-- 
2.35.3
Re: [PATCH 0/2] migration/multifd: Fix rb->receivedmap cleanup race
Posted by Peter Xu 2 months, 1 week ago
On Tue, Sep 17, 2024 at 03:58:00PM -0300, Fabiano Rosas wrote:
> v2: Keep skipping the cpu_synchronize_all_post_init() call if the
> postcopy listen thread is live. Don't copy stable on the first patch.
> 
> CI run: https://gitlab.com/farosas/qemu/-/pipelines/1457418838
> ====
> v1:
> https://lore.kernel.org/r/20240913220542.18305-1-farosas@suse.de
> 
> This fixes the crash we've been seing recently in migration-test. The
> first patch is a cleanup to have only one place calling
> qemu_loadvm_state_cleanup() and the second patch reorders the cleanup
> calls to make multifd_recv_cleanup() run first and stop the recv
> threads.
> 
> Fabiano Rosas (2):
>   migration/savevm: Remove extra load cleanup calls
>   migration/multifd: Fix rb->receivedmap cleanup race

queued.

Let's see whether this can quiesce all multifd cancel test failures..  If
not, we can still follow that up.

-- 
Peter Xu