When a channel fails to create, the code currently just returns. This
is wrong for two reasons:
1) Channel n+1 will not get to initialize it's semaphores, leading to
an assert when terminate_threads tries to post to it:
qemu-system-x86_64: ../util/qemu-thread-posix.c:92:
qemu_mutex_lock_impl: Assertion `mutex->initialized' failed.
2) (theoretical) If channel n-1 already started creation it will
defeat the purpose of the channels_created logic which is in place
to avoid migrate_fd_cleanup() to run while channels are still being
created.
This cannot really happen today because the current failure cases
for multifd_new_send_channel_create() are all synchronous,
resulting from qio_channel_file_new_path() getting a bad
filename. This would hit all channels equally.
But I don't want to set a trap for future people, so have all
channels try to create (even if failing), and only fail after the
channels_created semaphore has been posted.
While here, remove the error_report_err call. There's one already at
migrate_fd_cleanup later on.
Cc: qemu-stable@nongnu.org
Reported-by: Jim Fehlig <jfehlig@suse.com>
Fixes: bd8b0a8f82 ("migration/multifd: Move multifd_send_setup error handling in to the function")
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/multifd.c | 26 +++++++++++++++-----------
1 file changed, 15 insertions(+), 11 deletions(-)
diff --git a/migration/multifd.c b/migration/multifd.c
index 0b4cbaddfe..552f9723c8 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1156,7 +1156,6 @@ static bool multifd_new_send_channel_create(gpointer opaque, Error **errp)
bool multifd_send_setup(void)
{
MigrationState *s = migrate_get_current();
- Error *local_err = NULL;
int thread_count, ret = 0;
uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size();
bool use_packets = multifd_use_packets();
@@ -1177,6 +1176,7 @@ bool multifd_send_setup(void)
for (i = 0; i < thread_count; i++) {
MultiFDSendParams *p = &multifd_send_state->params[i];
+ Error *local_err = NULL;
qemu_sem_init(&p->sem, 0);
qemu_sem_init(&p->sem_sync, 0);
@@ -1196,7 +1196,8 @@ bool multifd_send_setup(void)
p->write_flags = 0;
if (!multifd_new_send_channel_create(p, &local_err)) {
- return false;
+ migrate_set_error(s, local_err);
+ ret = -1;
}
}
@@ -1209,24 +1210,27 @@ bool multifd_send_setup(void)
qemu_sem_wait(&multifd_send_state->channels_created);
}
+ if (ret) {
+ goto err;
+ }
+
for (i = 0; i < thread_count; i++) {
MultiFDSendParams *p = &multifd_send_state->params[i];
+ Error *local_err = NULL;
ret = multifd_send_state->ops->send_setup(p, &local_err);
if (ret) {
- break;
+ migrate_set_error(s, local_err);
+ goto err;
}
}
- if (ret) {
- migrate_set_error(s, local_err);
- error_report_err(local_err);
- migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
- MIGRATION_STATUS_FAILED);
- return false;
- }
-
return true;
+
+err:
+ migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
+ MIGRATION_STATUS_FAILED);
+ return false;
}
bool multifd_recv(void)
--
2.35.3
On Thu, Aug 01, 2024 at 02:41:01PM -0300, Fabiano Rosas wrote: > When a channel fails to create, the code currently just returns. This > is wrong for two reasons: > > 1) Channel n+1 will not get to initialize it's semaphores, leading to > an assert when terminate_threads tries to post to it: > > qemu-system-x86_64: ../util/qemu-thread-posix.c:92: > qemu_mutex_lock_impl: Assertion `mutex->initialized' failed. > > 2) (theoretical) If channel n-1 already started creation it will > defeat the purpose of the channels_created logic which is in place > to avoid migrate_fd_cleanup() to run while channels are still being > created. > > This cannot really happen today because the current failure cases > for multifd_new_send_channel_create() are all synchronous, > resulting from qio_channel_file_new_path() getting a bad > filename. This would hit all channels equally. > > But I don't want to set a trap for future people, so have all > channels try to create (even if failing), and only fail after the > channels_created semaphore has been posted. > > While here, remove the error_report_err call. There's one already at > migrate_fd_cleanup later on. > > Cc: qemu-stable@nongnu.org > Reported-by: Jim Fehlig <jfehlig@suse.com> > Fixes: bd8b0a8f82 ("migration/multifd: Move multifd_send_setup error handling in to the function") Should it be this one instead? b7b03eb614 ("migration/multifd: Add outgoing QIOChannelFile support") > Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> PS: what's your plan on your other multifd SendData series? I got a bit overloaded on downstream stuff and I still have plenty review debts recently (CPR one of them.. needs follow ups), so just to say I may delay a bit on reading that one. I assume it's next-release stuff anyway, but let me know otherwise. Thanks, -- Peter Xu
Peter Xu <peterx@redhat.com> writes: > On Thu, Aug 01, 2024 at 02:41:01PM -0300, Fabiano Rosas wrote: >> When a channel fails to create, the code currently just returns. This >> is wrong for two reasons: >> >> 1) Channel n+1 will not get to initialize it's semaphores, leading to >> an assert when terminate_threads tries to post to it: >> >> qemu-system-x86_64: ../util/qemu-thread-posix.c:92: >> qemu_mutex_lock_impl: Assertion `mutex->initialized' failed. >> >> 2) (theoretical) If channel n-1 already started creation it will >> defeat the purpose of the channels_created logic which is in place >> to avoid migrate_fd_cleanup() to run while channels are still being >> created. >> >> This cannot really happen today because the current failure cases >> for multifd_new_send_channel_create() are all synchronous, >> resulting from qio_channel_file_new_path() getting a bad >> filename. This would hit all channels equally. >> >> But I don't want to set a trap for future people, so have all >> channels try to create (even if failing), and only fail after the >> channels_created semaphore has been posted. >> >> While here, remove the error_report_err call. There's one already at >> migrate_fd_cleanup later on. >> >> Cc: qemu-stable@nongnu.org >> Reported-by: Jim Fehlig <jfehlig@suse.com> >> Fixes: bd8b0a8f82 ("migration/multifd: Move multifd_send_setup error handling in to the function") > > Should it be this one instead? > > b7b03eb614 ("migration/multifd: Add outgoing QIOChannelFile support") Yep, thanks. I'll fix it up. > >> Signed-off-by: Fabiano Rosas <farosas@suse.de> > > Reviewed-by: Peter Xu <peterx@redhat.com> > > PS: what's your plan on your other multifd SendData series? I got a bit > overloaded on downstream stuff and I still have plenty review debts > recently (CPR one of them.. needs follow ups), so just to say I may delay a > bit on reading that one. I assume it's next-release stuff anyway, but let > me know otherwise. That one is pretty ready. From my side I don't intend to change anything else, save for review comments. And it's definitely 9.2 material. I think CPR is more important at this point because it's been lagging behind for a while. I have a PR to send with these fixes and catch up on that virtio-net discussion. After that I should be able to get some reviews done. > > Thanks,
© 2016 - 2024 Red Hat, Inc.