migration/migration.c | 14 ++-- migration/multifd.c | 168 +++++++++++++++++++++++++----------------- migration/multifd.h | 11 ++- 3 files changed, 109 insertions(+), 84 deletions(-)
Based-on: 20240202102857.110210-1-peterx@redhat.com [PATCH v2 00/23] migration/multifd: Refactor ->send_prepare() and cleanups https://lore.kernel.org/r/20240202102857.110210-1-peterx@redhat.com Hi, For v3 I fixed the refcounting issue spotted by Avihai. The situation there is a bit clunky due to historical reasons. The gist is that we have an assumption that channel creation never fails after p->c has been set, so when 'p->c == NULL' we have to unref and when 'p->c != NULL' the cleanup code will do the unref. CI run: https://gitlab.com/farosas/qemu/-/pipelines/1166889341 v2: https://lore.kernel.org/r/20240205194929.28963-1-farosas@suse.de In this v2 I made sure NO channel is created after the semaphores are posted. Feel free to call me out if that's not the case. Not much changes, except that now both TLS and non-TLS go through the same code, so there's a centralized place to do error handling and releasing the semaphore. CI run: https://gitlab.com/farosas/qemu/-/pipelines/1165206107 based on Peter's code: https://gitlab.com/farosas/qemu/-/pipelines/1165303276 v1: https://lore.kernel.org/r/20240202191128.1901-1-farosas@suse.de This contains 2 patches from my previous series addressing the p->running misuse and the TLS thread leak and 3 new patches to fix the cleanup-while-creating-threads race. For the p->running I'm keeping the idea from the other series to remove p->running and use a more narrow p->thread_created flag. This flag is used only inform whether the thread has been created so we can join it. For the cleanup race I have moved some code around and added a semaphore to make multifd_save_setup() only return once all channel creation tasks have started. The idea is that after multifd_save_setup() returns, no new creations are in flight and the p->thread_created flags will never change again, so they're enough to cause the cleanup code to wait for the threads to join. CI run: https://gitlab.com/farosas/qemu/-/pipelines/1162798843 @Peter: I can rebase this on top of your series once we decide about it. Fabiano Rosas (6): migration/multifd: Join the TLS thread migration/multifd: Remove p->running migration/multifd: Move multifd_send_setup error handling in to the function migration/multifd: Move multifd_send_setup into migration thread migration/multifd: Unify multifd and TLS connection paths migration/multifd: Add a synchronization point for channel creation migration/migration.c | 14 ++-- migration/multifd.c | 168 +++++++++++++++++++++++++----------------- migration/multifd.h | 11 ++- 3 files changed, 109 insertions(+), 84 deletions(-) -- 2.35.3
On Tue, Feb 06, 2024 at 06:51:12PM -0300, Fabiano Rosas wrote: > Based-on: 20240202102857.110210-1-peterx@redhat.com > [PATCH v2 00/23] migration/multifd: Refactor ->send_prepare() and cleanups > https://lore.kernel.org/r/20240202102857.110210-1-peterx@redhat.com > > Hi, > > For v3 I fixed the refcounting issue spotted by Avihai. The situation > there is a bit clunky due to historical reasons. The gist is that we > have an assumption that channel creation never fails after p->c has > been set, so when 'p->c == NULL' we have to unref and when 'p->c != > NULL' the cleanup code will do the unref. > > CI run: https://gitlab.com/farosas/qemu/-/pipelines/1166889341 Apologize if I queue this too fast, but i'll disappear tomorrow, so I want to have this thread race fixed soon. I hope that's already complete from angle of all race can happen, but if otherwise we work on top. queued, thanks. -- Peter Xu
On Tue, Feb 06, 2024 at 06:51:12PM -0300, Fabiano Rosas wrote: > Based-on: 20240202102857.110210-1-peterx@redhat.com > [PATCH v2 00/23] migration/multifd: Refactor ->send_prepare() and cleanups > https://lore.kernel.org/r/20240202102857.110210-1-peterx@redhat.com > > Hi, > > For v3 I fixed the refcounting issue spotted by Avihai. The situation > there is a bit clunky due to historical reasons. The gist is that we > have an assumption that channel creation never fails after p->c has > been set, so when 'p->c == NULL' we have to unref and when 'p->c != > NULL' the cleanup code will do the unref. Yes, this looks good to me. That's a good catch. I'll leave at least one more day for Avihai and/or Dan to have another look. My r-b persist as of now on patch 5. Actually I think the conditional unref is slightly tricky, but it's not its own fault, IMHO, OTOH it's more about a1af605bd5ad where p->c is slightly abused. My understanding is we can avoid that conditional unref with below patch 1 as a cleanup (on top of this series). Then patch 2 comes all alongside. We don't need to rush on these, though, we should fix the thread race first because multiple of us hit it, and all cleanups can be done later. ===== From 0830819d86e08c5175d6669505aa712a0a09717f Mon Sep 17 00:00:00 2001 From: Peter Xu <peterx@redhat.com> Date: Wed, 7 Feb 2024 10:08:35 +0800 Subject: [PATCH 1/2] migration/multifd: Cleanup TLS iochannel referencing Commit a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to blocking handshake") introduced a thread for TLS channels, which will resolve the issue on blocking the main thread. However in the same commit p->c is slightly abused just to be able to pass over the pointer "p" into the thread. That's the major reason we'll need to conditionally free the io channel in the fault paths. To clean it up, using a separate structure to pass over both "p" and "tioc" in the tls handshake thread. Then we can make it a rule that p->c will never be set until the channel is completely setup. With that, we can drop the tricky conditional unref of the io channel in the error path. Signed-off-by: Peter Xu <peterx@redhat.com> --- migration/multifd.c | 37 +++++++++++++++++++++++-------------- 1 file changed, 23 insertions(+), 14 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index adfe8c9a0a..4a85a6b7b3 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -873,16 +873,22 @@ out: static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque); +typedef struct { + MultiFDSendParams *p; + QIOChannelTLS *tioc; +} MultiFDTLSThreadArgs; + static void *multifd_tls_handshake_thread(void *opaque) { - MultiFDSendParams *p = opaque; - QIOChannelTLS *tioc = QIO_CHANNEL_TLS(p->c); + MultiFDTLSThreadArgs *args = opaque; - qio_channel_tls_handshake(tioc, + qio_channel_tls_handshake(args->tioc, multifd_new_send_channel_async, - p, + args->p, NULL, NULL); + g_free(args); + return NULL; } @@ -892,6 +898,7 @@ static bool multifd_tls_channel_connect(MultiFDSendParams *p, { MigrationState *s = migrate_get_current(); const char *hostname = s->hostname; + MultiFDTLSThreadArgs *args; QIOChannelTLS *tioc; tioc = migration_tls_client_create(ioc, hostname, errp); @@ -906,11 +913,14 @@ static bool multifd_tls_channel_connect(MultiFDSendParams *p, object_unref(OBJECT(ioc)); trace_multifd_tls_outgoing_handshake_start(ioc, tioc, hostname); qio_channel_set_name(QIO_CHANNEL(tioc), "multifd-tls-outgoing"); - p->c = QIO_CHANNEL(tioc); + + args = g_new0(MultiFDTLSThreadArgs, 1); + args->tioc = tioc; + args->p = p; p->tls_thread_created = true; qemu_thread_create(&p->tls_thread, "multifd-tls-handshake-worker", - multifd_tls_handshake_thread, p, + multifd_tls_handshake_thread, args, QEMU_THREAD_JOINABLE); return true; } @@ -923,6 +933,7 @@ static bool multifd_channel_connect(MultiFDSendParams *p, migration_ioc_register_yank(ioc); p->registered_yank = true; + /* Setup p->c only if the channel is completely setup */ p->c = ioc; p->thread_created = true; @@ -976,14 +987,12 @@ out: trace_multifd_new_send_channel_async_error(p->id, local_err); multifd_send_set_error(local_err); - if (!p->c) { - /* - * If no channel has been created, drop the initial - * reference. Otherwise cleanup happens at - * multifd_send_channel_destroy() - */ - object_unref(OBJECT(ioc)); - } + /* + * For error cases (TLS or non-TLS), IO channel is always freed here + * rather than when cleanup multifd: since p->c is not set, multifd + * cleanup code doesn't even know its existance. + */ + object_unref(OBJECT(ioc)); error_free(local_err); } -- 2.43.0 ===== From 9e574c3216f6459e3a808096d905e2554d962cad Mon Sep 17 00:00:00 2001 From: Peter Xu <peterx@redhat.com> Date: Wed, 7 Feb 2024 10:24:39 +0800 Subject: [PATCH 2/2] migration/multifd: Drop registered_yank With a clear definition of p->c protocol, where we only set it up if the channel is fully established (TLS or non-TLS), registered_yank boolean will have equal meaning of "p->c != NULL". Drop registered_yank by checking p->c instead. Signed-off-by: Peter Xu <peterx@redhat.com> --- migration/multifd.h | 2 -- migration/multifd.c | 7 +++---- 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/migration/multifd.h b/migration/multifd.h index 8a1cad0996..b3fe27ae93 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -78,8 +78,6 @@ typedef struct { bool tls_thread_created; /* communication channel */ QIOChannel *c; - /* is the yank function registered */ - bool registered_yank; /* packet allocated len */ uint32_t packet_len; /* guest page size */ diff --git a/migration/multifd.c b/migration/multifd.c index 4a85a6b7b3..278453cf84 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -648,11 +648,11 @@ static int multifd_send_channel_destroy(QIOChannel *send) static bool multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp) { - if (p->registered_yank) { + if (p->c) { migration_ioc_unregister_yank(p->c); + multifd_send_channel_destroy(p->c); + p->c = NULL; } - multifd_send_channel_destroy(p->c); - p->c = NULL; qemu_sem_destroy(&p->sem); qemu_sem_destroy(&p->sem_sync); g_free(p->name); @@ -932,7 +932,6 @@ static bool multifd_channel_connect(MultiFDSendParams *p, qio_channel_set_delay(ioc, false); migration_ioc_register_yank(ioc); - p->registered_yank = true; /* Setup p->c only if the channel is completely setup */ p->c = ioc; -- 2.43.0 ==== Thanks, -- Peter Xu
Peter Xu <peterx@redhat.com> writes: > On Tue, Feb 06, 2024 at 06:51:12PM -0300, Fabiano Rosas wrote: >> Based-on: 20240202102857.110210-1-peterx@redhat.com >> [PATCH v2 00/23] migration/multifd: Refactor ->send_prepare() and cleanups >> https://lore.kernel.org/r/20240202102857.110210-1-peterx@redhat.com >> >> Hi, >> >> For v3 I fixed the refcounting issue spotted by Avihai. The situation >> there is a bit clunky due to historical reasons. The gist is that we >> have an assumption that channel creation never fails after p->c has >> been set, so when 'p->c == NULL' we have to unref and when 'p->c != >> NULL' the cleanup code will do the unref. > > Yes, this looks good to me. That's a good catch. > > I'll leave at least one more day for Avihai and/or Dan to have another > look. My r-b persist as of now on patch 5. > > Actually I think the conditional unref is slightly tricky, but it's not its > own fault, IMHO, OTOH it's more about a1af605bd5ad where p->c is slightly > abused. My understanding is we can avoid that conditional unref with below > patch 1 as a cleanup (on top of this series). Then patch 2 comes all > alongside. Yes, I even wrote some code to always set p->c and leave the unref to the cleanup routine. Doing reference counting in the middle of the code like that leaves us exposed to new code relying on p->c being valid during cleanup. There's already yank and the cleanup itself which expect p->c to be valid. However, I couldn't get past the uglyness of overwriting p->c, so I kept the changes minimal for this version. I'm also wondering whether we can remove the TLS handshake thread altogether now that we moved the multifd_send_setup call into the migration thread. My (poor) understanding is that a1af605bd5ad hit the issue that the QIOTask completion would just execute after multifd_save_setup returned. Otherwise I don't see how adding a thread to an already async task would have helped. But I need to think about that a bit more. > > We don't need to rush on these, though, we should fix the thread race >first because multiple of us hit it, and all cleanups can be done >later. I said we should not merge these two changes right now, but I take that back. I'll leave it up to you, there doesn't seem to be that much impact in adding them. > > ===== > From 0830819d86e08c5175d6669505aa712a0a09717f Mon Sep 17 00:00:00 2001 > From: Peter Xu <peterx@redhat.com> > Date: Wed, 7 Feb 2024 10:08:35 +0800 > Subject: [PATCH 1/2] migration/multifd: Cleanup TLS iochannel referencing > > Commit a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to > blocking handshake") introduced a thread for TLS channels, which will > resolve the issue on blocking the main thread. However in the same commit > p->c is slightly abused just to be able to pass over the pointer "p" into > the thread. > > That's the major reason we'll need to conditionally free the io channel in > the fault paths. > > To clean it up, using a separate structure to pass over both "p" and "tioc" > in the tls handshake thread. Then we can make it a rule that p->c will > never be set until the channel is completely setup. With that, we can drop > the tricky conditional unref of the io channel in the error path. > > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > migration/multifd.c | 37 +++++++++++++++++++++++-------------- > 1 file changed, 23 insertions(+), 14 deletions(-) > > diff --git a/migration/multifd.c b/migration/multifd.c > index adfe8c9a0a..4a85a6b7b3 100644 > --- a/migration/multifd.c > +++ b/migration/multifd.c > @@ -873,16 +873,22 @@ out: > > static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque); > > +typedef struct { > + MultiFDSendParams *p; > + QIOChannelTLS *tioc; > +} MultiFDTLSThreadArgs; > + > static void *multifd_tls_handshake_thread(void *opaque) > { > - MultiFDSendParams *p = opaque; > - QIOChannelTLS *tioc = QIO_CHANNEL_TLS(p->c); > + MultiFDTLSThreadArgs *args = opaque; > > - qio_channel_tls_handshake(tioc, > + qio_channel_tls_handshake(args->tioc, > multifd_new_send_channel_async, > - p, > + args->p, > NULL, > NULL); > + g_free(args); > + > return NULL; > } > > @@ -892,6 +898,7 @@ static bool multifd_tls_channel_connect(MultiFDSendParams *p, > { > MigrationState *s = migrate_get_current(); > const char *hostname = s->hostname; > + MultiFDTLSThreadArgs *args; > QIOChannelTLS *tioc; > > tioc = migration_tls_client_create(ioc, hostname, errp); > @@ -906,11 +913,14 @@ static bool multifd_tls_channel_connect(MultiFDSendParams *p, > object_unref(OBJECT(ioc)); > trace_multifd_tls_outgoing_handshake_start(ioc, tioc, hostname); > qio_channel_set_name(QIO_CHANNEL(tioc), "multifd-tls-outgoing"); > - p->c = QIO_CHANNEL(tioc); This assignment also meant multifd_send_channel_destroy() would call object_unref on the tioc object. Removing it means qio_channel_tls_finalize() will never be called. It also means the socket channel (ioc) refcount will be decremented one too many times, due to the object_unref above^. So I think we should find a point where tioc is not needed anymore and unref it and remove the object_unref(ioc) above. Right? > + > + args = g_new0(MultiFDTLSThreadArgs, 1); > + args->tioc = tioc; > + args->p = p; > > p->tls_thread_created = true; > qemu_thread_create(&p->tls_thread, "multifd-tls-handshake-worker", > - multifd_tls_handshake_thread, p, > + multifd_tls_handshake_thread, args, > QEMU_THREAD_JOINABLE); > return true; > } > @@ -923,6 +933,7 @@ static bool multifd_channel_connect(MultiFDSendParams *p, > > migration_ioc_register_yank(ioc); > p->registered_yank = true; > + /* Setup p->c only if the channel is completely setup */ > p->c = ioc; > > p->thread_created = true; > @@ -976,14 +987,12 @@ out: > > trace_multifd_new_send_channel_async_error(p->id, local_err); > multifd_send_set_error(local_err); > - if (!p->c) { > - /* > - * If no channel has been created, drop the initial > - * reference. Otherwise cleanup happens at > - * multifd_send_channel_destroy() > - */ > - object_unref(OBJECT(ioc)); > - } > + /* > + * For error cases (TLS or non-TLS), IO channel is always freed here > + * rather than when cleanup multifd: since p->c is not set, multifd > + * cleanup code doesn't even know its existance. > + */ > + object_unref(OBJECT(ioc)); > error_free(local_err); > } > > -- > 2.43.0 > ===== > From 9e574c3216f6459e3a808096d905e2554d962cad Mon Sep 17 00:00:00 2001 > From: Peter Xu <peterx@redhat.com> > Date: Wed, 7 Feb 2024 10:24:39 +0800 > Subject: [PATCH 2/2] migration/multifd: Drop registered_yank > > With a clear definition of p->c protocol, where we only set it up if the > channel is fully established (TLS or non-TLS), registered_yank boolean will > have equal meaning of "p->c != NULL". > > Drop registered_yank by checking p->c instead. > > Signed-off-by: Peter Xu <peterx@redhat.com> This one looks good. I know it depends on the previous patch, but if you plan to add it: Reviewed-by: Fabiano Rosas <farosas@suse.de> > --- > migration/multifd.h | 2 -- > migration/multifd.c | 7 +++---- > 2 files changed, 3 insertions(+), 6 deletions(-) > > diff --git a/migration/multifd.h b/migration/multifd.h > index 8a1cad0996..b3fe27ae93 100644 > --- a/migration/multifd.h > +++ b/migration/multifd.h > @@ -78,8 +78,6 @@ typedef struct { > bool tls_thread_created; > /* communication channel */ > QIOChannel *c; > - /* is the yank function registered */ > - bool registered_yank; > /* packet allocated len */ > uint32_t packet_len; > /* guest page size */ > diff --git a/migration/multifd.c b/migration/multifd.c > index 4a85a6b7b3..278453cf84 100644 > --- a/migration/multifd.c > +++ b/migration/multifd.c > @@ -648,11 +648,11 @@ static int multifd_send_channel_destroy(QIOChannel *send) > > static bool multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp) > { > - if (p->registered_yank) { > + if (p->c) { > migration_ioc_unregister_yank(p->c); > + multifd_send_channel_destroy(p->c); > + p->c = NULL; > } > - multifd_send_channel_destroy(p->c); > - p->c = NULL; > qemu_sem_destroy(&p->sem); > qemu_sem_destroy(&p->sem_sync); > g_free(p->name); > @@ -932,7 +932,6 @@ static bool multifd_channel_connect(MultiFDSendParams *p, > qio_channel_set_delay(ioc, false); > > migration_ioc_register_yank(ioc); > - p->registered_yank = true; > /* Setup p->c only if the channel is completely setup */ > p->c = ioc; > > -- > 2.43.0 > ==== > > Thanks,
On Wed, Feb 07, 2024 at 10:31:51AM -0300, Fabiano Rosas wrote: > Peter Xu <peterx@redhat.com> writes: > > > On Tue, Feb 06, 2024 at 06:51:12PM -0300, Fabiano Rosas wrote: > >> Based-on: 20240202102857.110210-1-peterx@redhat.com > >> [PATCH v2 00/23] migration/multifd: Refactor ->send_prepare() and cleanups > >> https://lore.kernel.org/r/20240202102857.110210-1-peterx@redhat.com > >> > >> Hi, > >> > >> For v3 I fixed the refcounting issue spotted by Avihai. The situation > >> there is a bit clunky due to historical reasons. The gist is that we > >> have an assumption that channel creation never fails after p->c has > >> been set, so when 'p->c == NULL' we have to unref and when 'p->c != > >> NULL' the cleanup code will do the unref. > > > > Yes, this looks good to me. That's a good catch. > > > > I'll leave at least one more day for Avihai and/or Dan to have another > > look. My r-b persist as of now on patch 5. > > > > Actually I think the conditional unref is slightly tricky, but it's not its > > own fault, IMHO, OTOH it's more about a1af605bd5ad where p->c is slightly > > abused. My understanding is we can avoid that conditional unref with below > > patch 1 as a cleanup (on top of this series). Then patch 2 comes all > > alongside. > > Yes, I even wrote some code to always set p->c and leave the unref to > the cleanup routine. Doing reference counting in the middle of the code > like that leaves us exposed to new code relying on p->c being valid > during cleanup. There's already yank and the cleanup itself which expect > p->c to be valid. > > However, I couldn't get past the uglyness of overwriting p->c, so I kept > the changes minimal for this version. Yep. The good part of "only set p->c if channel fully established" is that then the migration thread fully owns the ioc as long as set, and no overwritting possible. > > I'm also wondering whether we can remove the TLS handshake thread > altogether now that we moved the multifd_send_setup call into the > migration thread. My (poor) understanding is that a1af605bd5ad hit the > issue that the QIOTask completion would just execute after > multifd_save_setup returned. Otherwise I don't see how adding a thread > to an already async task would have helped. But I need to think about > that a bit more. It could be even simpler than that, iiuc. Note that in the stack showed in that commit message, it hasn't even reached the async handling: Src: (multifd_send_0) multifd_channel_connect multifd_tls_channel_connect multifd_tls_channel_connect qio_channel_tls_handshake qio_channel_tls_handshake_task <---- async handling provided here.. qcrypto_tls_session_handshake gnutls_handshake <-------------- but we're still at sync phase.. ... recvmsg (Blocking I/O waiting for response) IMHO it'll be much easier if we can remove those threads. Please keep the adventure there, just a heads-up that the async paths seem to have a close dependency so far on gmainloop contexts, while the migration thread may not provide that anymore (and I hope we don't introduce anything either; I think it's better we stick with a full threaded model in migration rather than event based). > > > > > We don't need to rush on these, though, we should fix the thread race > >first because multiple of us hit it, and all cleanups can be done > >later. > > I said we should not merge these two changes right now, but I take that > back. I'll leave it up to you, there doesn't seem to be that much impact > in adding them. Thanks. I just sent the pull without them, as I don't yet have plan to queue them so soon; I'll be accused to abuse the maintainership. :-) If you think worthwhile, I can still repost them as formal patches later and put there on the list. If your explore on a bigger hammer works then I think we can ignore these two patches. But if you or anyone thinks we could have these as good cleanups, we can also merge them first for 9.0 and leave whatever else for later. > > > > > ===== > > From 0830819d86e08c5175d6669505aa712a0a09717f Mon Sep 17 00:00:00 2001 > > From: Peter Xu <peterx@redhat.com> > > Date: Wed, 7 Feb 2024 10:08:35 +0800 > > Subject: [PATCH 1/2] migration/multifd: Cleanup TLS iochannel referencing > > > > Commit a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to > > blocking handshake") introduced a thread for TLS channels, which will > > resolve the issue on blocking the main thread. However in the same commit > > p->c is slightly abused just to be able to pass over the pointer "p" into > > the thread. > > > > That's the major reason we'll need to conditionally free the io channel in > > the fault paths. > > > > To clean it up, using a separate structure to pass over both "p" and "tioc" > > in the tls handshake thread. Then we can make it a rule that p->c will > > never be set until the channel is completely setup. With that, we can drop > > the tricky conditional unref of the io channel in the error path. > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > migration/multifd.c | 37 +++++++++++++++++++++++-------------- > > 1 file changed, 23 insertions(+), 14 deletions(-) > > > > diff --git a/migration/multifd.c b/migration/multifd.c > > index adfe8c9a0a..4a85a6b7b3 100644 > > --- a/migration/multifd.c > > +++ b/migration/multifd.c > > @@ -873,16 +873,22 @@ out: > > > > static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque); > > > > +typedef struct { > > + MultiFDSendParams *p; > > + QIOChannelTLS *tioc; > > +} MultiFDTLSThreadArgs; > > + > > static void *multifd_tls_handshake_thread(void *opaque) > > { > > - MultiFDSendParams *p = opaque; > > - QIOChannelTLS *tioc = QIO_CHANNEL_TLS(p->c); > > + MultiFDTLSThreadArgs *args = opaque; > > > > - qio_channel_tls_handshake(tioc, > > + qio_channel_tls_handshake(args->tioc, > > multifd_new_send_channel_async, > > - p, > > + args->p, > > NULL, > > NULL); > > + g_free(args); > > + > > return NULL; > > } > > > > @@ -892,6 +898,7 @@ static bool multifd_tls_channel_connect(MultiFDSendParams *p, > > { > > MigrationState *s = migrate_get_current(); > > const char *hostname = s->hostname; > > + MultiFDTLSThreadArgs *args; > > QIOChannelTLS *tioc; > > > > tioc = migration_tls_client_create(ioc, hostname, errp); > > @@ -906,11 +913,14 @@ static bool multifd_tls_channel_connect(MultiFDSendParams *p, > > object_unref(OBJECT(ioc)); > > trace_multifd_tls_outgoing_handshake_start(ioc, tioc, hostname); > > qio_channel_set_name(QIO_CHANNEL(tioc), "multifd-tls-outgoing"); > > - p->c = QIO_CHANNEL(tioc); > > This assignment also meant multifd_send_channel_destroy() would call > object_unref on the tioc object. Removing it means > qio_channel_tls_finalize() will never be called. I think it'll still be properly released / called? Just to make sure we're on the same page: qio_channel_tls_finalize() will be called on the last ref released. Then let's discuss error paths on how this patch affects the last unref. Before this patch, it will be called until multifd_send_channel_destroy() as you said when cleanup, because that did the last unref (while your patch has the "if" where you'll skip the cleanup even if error): multifd_send_channel_destroy socket_send_channel_destroy object_unref After this patch, that object_unref() will be called in multifd_channel_connect() directly, and the cleanup code, seeing p->c==NULL, does nothing later. > > It also means the socket channel (ioc) refcount will be decremented one > too many times, due to the object_unref above^. > > So I think we should find a point where tioc is not needed anymore and > unref it and remove the object_unref(ioc) above. > > Right? [...] > > From 9e574c3216f6459e3a808096d905e2554d962cad Mon Sep 17 00:00:00 2001 > > From: Peter Xu <peterx@redhat.com> > > Date: Wed, 7 Feb 2024 10:24:39 +0800 > > Subject: [PATCH 2/2] migration/multifd: Drop registered_yank > > > > With a clear definition of p->c protocol, where we only set it up if the > > channel is fully established (TLS or non-TLS), registered_yank boolean will > > have equal meaning of "p->c != NULL". > > > > Drop registered_yank by checking p->c instead. > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > This one looks good. I know it depends on the previous patch, but if you > plan to add it: > > Reviewed-by: Fabiano Rosas <farosas@suse.de> I'll pick it up when posting the formal patches, thanks. Let's see whether above will address your concern. If not, we can move the discussion over to that thread. -- Peter Xu
© 2016 - 2024 Red Hat, Inc.