v2:
- patch 1: dropped the qmp_ prefix;
- patch 2: dropped the qemu_mutex_destroy;
stopped moving the _remove functions (don't strictly need it
anymore since not destroying the mutex explicitly);
added the lock to protect the loop in
qmp_query_migrationthreads;
added __attribute__((constructor)).
CI run: https://gitlab.com/farosas/qemu/-/pipelines/892563231
v1:
https://lore.kernel.org/r/20230606144551.24367-1-farosas@suse.de
When doing cleanup of the multifd send threads we're calling
QLIST_REMOVE concurrently on the migration_threads list. This seems to
be the source of the crashes we've seen on the
multifd/tcp/plain/cancel tests.
I'm running the test in a loop and after a few dozen iterations I see
the crash in dmesg.
QTEST_QEMU_BINARY=./qemu-system-x86_64 \
QEMU_TEST_FLAKY_TESTS=1 \
./tests/qtest/migration-test -p /x86_64/migration/multifd/tcp/plain/cancel
multifdsend_10[11382]: segfault at 18 ip 0000564b77de1e25 sp
00007fdf767fb610 error 6 in qemu-system-x86_64[564b777b4000+e1c000]
Code: ec 10 48 89 7d f8 48 83 7d f8 00 74 58 48 8b 45 f8 48 8b 40 10
48 85 c0 74 14 48 8b 45 f8 48 8b 40 10 48 8b 55 f8 48 8b 52 18 <48> 89
50 18 48 8b 45 f8 48 8b 40 18 48 8b 55 f8 48 8b 52 10 48 89
the offending instruction is a mov dereferencing the
thread->node.le_next pointer at QLIST_REMOVE in MigrationThreadDel:
void MigrationThreadDel(MigrationThread *thread)
{
if (thread) {
QLIST_REMOVE(thread, node);
g_free(thread);
}
}
where:
#define QLIST_REMOVE(elm, field) do { \
if ((elm)->field.le_next != NULL) \
(elm)->field.le_next->field.le_prev = \ <-- HERE
(elm)->field.le_prev; \
*(elm)->field.le_prev = (elm)->field.le_next; \
(elm)->field.le_next = NULL; \
(elm)->field.le_prev = NULL; \
} while (/*CONSTCOND*/0)
The MigrationThreadDel function is called from the multifd threads and
is not under any lock, so several calls can race when accessing the
list.
(I actually hit this first on my fixed-ram branch which changes some
synchronization in multifd and makes the issue more frequent)
CI run: https://gitlab.com/farosas/qemu/-/pipelines/891000519
Fabiano Rosas (3):
migration/multifd: Rename threadinfo.c functions
migration/multifd: Protect accesses to migration_threads
tests/qtest: Re-enable multifd cancel test
migration/migration.c | 4 ++--
migration/multifd.c | 4 ++--
migration/threadinfo.c | 19 ++++++++++++++++---
migration/threadinfo.h | 7 ++-----
tests/qtest/migration-test.c | 10 ++--------
5 files changed, 24 insertions(+), 20 deletions(-)
--
2.35.3