migration/channel.h | 3 +++ migration/migration.h | 2 ++ migration/tls.h | 1 - migration/channel.c | 20 ++++++++++++++++++++ migration/migration.c | 27 +++++++++++++++++++++++++-- migration/multifd.c | 40 +++++++--------------------------------- migration/tls.c | 5 ----- 7 files changed, 57 insertions(+), 41 deletions(-)
Fabiano fixed graceful shutdowns for multifd channels previously: https://lore.kernel.org/qemu-devel/20250206175824.22664-1-farosas@suse.de/ However we likely forgot the rest channels. Do it the same for the main and postcopy channels. This fixes a warning message when running unit test /ARCH/migration/postcopy/preempt/tls/psk. Thanks, Peter Xu (3): migration/tls: Gracefully shutdown main and preempt channels migration: Make migration_has_failed() work even for CANCELLING migration/multifd: Use the new graceful termination helper migration/channel.h | 3 +++ migration/migration.h | 2 ++ migration/tls.h | 1 - migration/channel.c | 20 ++++++++++++++++++++ migration/migration.c | 27 +++++++++++++++++++++++++-- migration/multifd.c | 40 +++++++--------------------------------- migration/tls.c | 5 ----- 7 files changed, 57 insertions(+), 41 deletions(-) -- 2.50.1
On Wed, Sep 10, 2025 at 12:01:41PM -0400, Peter Xu wrote: > Fabiano fixed graceful shutdowns for multifd channels previously: > > https://lore.kernel.org/qemu-devel/20250206175824.22664-1-farosas@suse.de/ > > However we likely forgot the rest channels. Do it the same for the main > and postcopy channels. This fixes a warning message when running unit test > /ARCH/migration/postcopy/preempt/tls/psk. > > Thanks, > > Peter Xu (3): > migration/tls: Gracefully shutdown main and preempt channels > migration: Make migration_has_failed() work even for CANCELLING > migration/multifd: Use the new graceful termination helper Please hold off the review on this one. Juraj reported the issue wasn't resolved by the changes, and I can also reproduce. I'll have a look and repost.. -- Peter Xu
Peter Xu <peterx@redhat.com> writes: > On Wed, Sep 10, 2025 at 12:01:41PM -0400, Peter Xu wrote: >> Fabiano fixed graceful shutdowns for multifd channels previously: >> >> https://lore.kernel.org/qemu-devel/20250206175824.22664-1-farosas@suse.de/ >> >> However we likely forgot the rest channels. Do it the same for the main >> and postcopy channels. This fixes a warning message when running unit test >> /ARCH/migration/postcopy/preempt/tls/psk. >> >> Thanks, >> >> Peter Xu (3): >> migration/tls: Gracefully shutdown main and preempt channels >> migration: Make migration_has_failed() work even for CANCELLING >> migration/multifd: Use the new graceful termination helper > > Please hold off the review on this one. Juraj reported the issue wasn't > resolved by the changes, and I can also reproduce. I'll have a look and > repost.. I'm wondering if the assumption that only succeeded migrations should gracefully exit is correct. My understanding is that we need to always exit gracefully, but after failure, the channel might not be there, so we ignore failures. But that does not seem to mean a failed migration can simply not exit gracefully.
On Wed, Sep 17, 2025 at 05:56:40PM -0300, Fabiano Rosas wrote: > Peter Xu <peterx@redhat.com> writes: > > > On Wed, Sep 10, 2025 at 12:01:41PM -0400, Peter Xu wrote: > >> Fabiano fixed graceful shutdowns for multifd channels previously: > >> > >> https://lore.kernel.org/qemu-devel/20250206175824.22664-1-farosas@suse.de/ > >> > >> However we likely forgot the rest channels. Do it the same for the main > >> and postcopy channels. This fixes a warning message when running unit test > >> /ARCH/migration/postcopy/preempt/tls/psk. > >> > >> Thanks, > >> > >> Peter Xu (3): > >> migration/tls: Gracefully shutdown main and preempt channels > >> migration: Make migration_has_failed() work even for CANCELLING > >> migration/multifd: Use the new graceful termination helper > > > > Please hold off the review on this one. Juraj reported the issue wasn't > > resolved by the changes, and I can also reproduce. I'll have a look and > > repost.. > > I'm wondering if the assumption that only succeeded migrations should > gracefully exit is correct. My understanding is that we need to always > exit gracefully, but after failure, the channel might not be there, so > we ignore failures. But that does not seem to mean a failed migration > can simply not exit gracefully. Currently tls channels will ignore premature terminations whenever there's a shutdown() on READ. So when failed / cancelled and whenever there's a shutdown(), iochannel already does the bypass no matter what migration does. Or do you mean we should remove that, and still try to do graceful shutdowns even if the channel was shutdown()? Feel free to have a look at v2 of this series, especially patch 1. v3 will come soon, but just to say, v2 is hugely different from v1, and should be fairly close to upcoming v3, at least on patch 1 which is the real fix. I should have mentioned that earlier.. :( -- Peter Xu
Peter Xu <peterx@redhat.com> writes: > On Wed, Sep 17, 2025 at 05:56:40PM -0300, Fabiano Rosas wrote: >> Peter Xu <peterx@redhat.com> writes: >> >> > On Wed, Sep 10, 2025 at 12:01:41PM -0400, Peter Xu wrote: >> >> Fabiano fixed graceful shutdowns for multifd channels previously: >> >> >> >> https://lore.kernel.org/qemu-devel/20250206175824.22664-1-farosas@suse.de/ >> >> >> >> However we likely forgot the rest channels. Do it the same for the main >> >> and postcopy channels. This fixes a warning message when running unit test >> >> /ARCH/migration/postcopy/preempt/tls/psk. >> >> >> >> Thanks, >> >> >> >> Peter Xu (3): >> >> migration/tls: Gracefully shutdown main and preempt channels >> >> migration: Make migration_has_failed() work even for CANCELLING >> >> migration/multifd: Use the new graceful termination helper >> > >> > Please hold off the review on this one. Juraj reported the issue wasn't >> > resolved by the changes, and I can also reproduce. I'll have a look and >> > repost.. >> >> I'm wondering if the assumption that only succeeded migrations should >> gracefully exit is correct. My understanding is that we need to always >> exit gracefully, but after failure, the channel might not be there, so >> we ignore failures. But that does not seem to mean a failed migration >> can simply not exit gracefully. > > Currently tls channels will ignore premature terminations whenever there's > a shutdown() on READ. So when failed / cancelled and whenever there's a > shutdown(), iochannel already does the bypass no matter what migration does. > I'm thinking if it's possible for a premature termination to be detected by TLS before we did the shutdown(). So my suggestion was to always bye() before shutdown(), not matter the state migration is in. But maybe your way is ok, I'm not sure now. Let me read the other versions of the series... > Or do you mean we should remove that, and still try to do graceful > shutdowns even if the channel was shutdown()? > > Feel free to have a look at v2 of this series, especially patch 1. v3 will > come soon, but just to say, v2 is hugely different from v1, and should be > fairly close to upcoming v3, at least on patch 1 which is the real fix. > Ah, sorry, I didn't see the v2 on my list. But it's there. > I should have mentioned that earlier.. :(
On Thu, Sep 18, 2025 at 10:47:48AM -0300, Fabiano Rosas wrote: > I'm thinking if it's possible for a premature termination to be detected > by TLS before we did the shutdown(). So my suggestion was to always > bye() before shutdown(), not matter the state migration is in. But maybe > your way is ok, I'm not sure now. Let me read the other versions of the > series... For failing / cancelling migrations, premature termination is likely fine. IMHO we also shouldn't care too much on error reports on premature terminations because it's failing anyway. So maybe you're talking about shutdown()s when the migration is successfully completed. We can try to do that, but maybe it's not easily doable. E.g., we have the preempt thread currently only be able to be kicked out by a shutdown() from the dst main thread. While we can start to inject a bye() there before the shutdown(), it'll be: (1) a bye() sent concurrently while the preempt thread is still logically owning and operating on the preempt channel (luckily, so far read-only), and, (2) we need to double check if this works if we send bye(WR) from dest to src and whether it'll also gracefully shutdown the src side. (3) currently, a shutdown() is synchronous. bye() is yet not. We may then need similiar treatment (e.g. changing IO to block tempoararily?) when doing explicit shutdown()s. I very vaguely remember when working on this series I tried (2) and it didn't really work, but I'm not very sure. Anyway, all these will add some complexity, and we'd better justify it's worthwhile.. [...] > Ah, sorry, I didn't see the v2 on my list. But it's there. Nah, that's not your fault, maybe mine. -- Peter Xu
© 2016 - 2025 Red Hat, Inc.