[RFC PATCH v2 0/4] migration: Fix multifd qemu_mutex_destroy race

Fabiano Rosas posted 4 patches 1 year ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20231110200241.20679-1-farosas@suse.de
Maintainers: Juan Quintela <quintela@redhat.com>, Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>, Leonardo Bras <leobras@redhat.com>
migration/migration.c |  4 +-
migration/multifd.c   | 87 +++++++++++++++++++++++--------------------
migration/multifd.h   |  9 ++---
3 files changed, 53 insertions(+), 47 deletions(-)
[RFC PATCH v2 0/4] migration: Fix multifd qemu_mutex_destroy race
Posted by Fabiano Rosas 1 year ago
changes:
- dropped the Error patch
- removed p->running
- joined the TLS thread

v1:
https://lore.kernel.org/r/20231109165856.15224-1-farosas@suse.de

We're calling qemu_sem_post() in threads other than the multifd
channel and the migration thread. This is vulnerable to a race with
multifd_save_cleanup() which calls qemu_sem_destroy(). If we attempt
to destroy the semaphore mutex with the lock taken, the code asserts.

We're hitting this in the current master and we've had reports of this
in the past already:

[PATCH] migrate/multifd: fix coredump when the multifd thread cleanup
https://lore.kernel.org/r/20230621081826.3203053-1-zhangjianguo18@huawei.com

Fabiano Rosas (4):
  migration/multifd: Stop setting p->ioc before connecting
  migration/multifd: Join the TLS thread
  migration/multifd: Remove p->running
  migration/multifd: Move semaphore release into main thread

 migration/migration.c |  4 +-
 migration/multifd.c   | 87 +++++++++++++++++++++++--------------------
 migration/multifd.h   |  9 ++---
 3 files changed, 53 insertions(+), 47 deletions(-)

-- 
2.35.3