[PATCH 0/3] migration/tls: Graceful shutdowns for main and postcopy channels

Peter Xu posted 3 patches 2 weeks, 4 days ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20250910160144.1762894-1-peterx@redhat.com
Maintainers: Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>
There is a newer version of this series
migration/channel.h   |  3 +++
migration/migration.h |  2 ++
migration/tls.h       |  1 -
migration/channel.c   | 20 ++++++++++++++++++++
migration/migration.c | 27 +++++++++++++++++++++++++--
migration/multifd.c   | 40 +++++++---------------------------------
migration/tls.c       |  5 -----
7 files changed, 57 insertions(+), 41 deletions(-)
[PATCH 0/3] migration/tls: Graceful shutdowns for main and postcopy channels
Posted by Peter Xu 2 weeks, 4 days ago
Fabiano fixed graceful shutdowns for multifd channels previously:

https://lore.kernel.org/qemu-devel/20250206175824.22664-1-farosas@suse.de/

However we likely forgot the rest channels.  Do it the same for the main
and postcopy channels.  This fixes a warning message when running unit test
/ARCH/migration/postcopy/preempt/tls/psk.

Thanks,

Peter Xu (3):
  migration/tls: Gracefully shutdown main and preempt channels
  migration: Make migration_has_failed() work even for CANCELLING
  migration/multifd: Use the new graceful termination helper

 migration/channel.h   |  3 +++
 migration/migration.h |  2 ++
 migration/tls.h       |  1 -
 migration/channel.c   | 20 ++++++++++++++++++++
 migration/migration.c | 27 +++++++++++++++++++++++++--
 migration/multifd.c   | 40 +++++++---------------------------------
 migration/tls.c       |  5 -----
 7 files changed, 57 insertions(+), 41 deletions(-)

-- 
2.50.1
Re: [PATCH 0/3] migration/tls: Graceful shutdowns for main and postcopy channels
Posted by Peter Xu 2 weeks, 3 days ago
On Wed, Sep 10, 2025 at 12:01:41PM -0400, Peter Xu wrote:
> Fabiano fixed graceful shutdowns for multifd channels previously:
> 
> https://lore.kernel.org/qemu-devel/20250206175824.22664-1-farosas@suse.de/
> 
> However we likely forgot the rest channels.  Do it the same for the main
> and postcopy channels.  This fixes a warning message when running unit test
> /ARCH/migration/postcopy/preempt/tls/psk.
> 
> Thanks,
> 
> Peter Xu (3):
>   migration/tls: Gracefully shutdown main and preempt channels
>   migration: Make migration_has_failed() work even for CANCELLING
>   migration/multifd: Use the new graceful termination helper

Please hold off the review on this one.  Juraj reported the issue wasn't
resolved by the changes, and I can also reproduce.  I'll have a look and
repost..

-- 
Peter Xu
Re: [PATCH 0/3] migration/tls: Graceful shutdowns for main and postcopy channels
Posted by Fabiano Rosas 1 week, 3 days ago
Peter Xu <peterx@redhat.com> writes:

> On Wed, Sep 10, 2025 at 12:01:41PM -0400, Peter Xu wrote:
>> Fabiano fixed graceful shutdowns for multifd channels previously:
>> 
>> https://lore.kernel.org/qemu-devel/20250206175824.22664-1-farosas@suse.de/
>> 
>> However we likely forgot the rest channels.  Do it the same for the main
>> and postcopy channels.  This fixes a warning message when running unit test
>> /ARCH/migration/postcopy/preempt/tls/psk.
>> 
>> Thanks,
>> 
>> Peter Xu (3):
>>   migration/tls: Gracefully shutdown main and preempt channels
>>   migration: Make migration_has_failed() work even for CANCELLING
>>   migration/multifd: Use the new graceful termination helper
>
> Please hold off the review on this one.  Juraj reported the issue wasn't
> resolved by the changes, and I can also reproduce.  I'll have a look and
> repost..

I'm wondering if the assumption that only succeeded migrations should
gracefully exit is correct. My understanding is that we need to always
exit gracefully, but after failure, the channel might not be there, so
we ignore failures. But that does not seem to mean a failed migration
can simply not exit gracefully.
Re: [PATCH 0/3] migration/tls: Graceful shutdowns for main and postcopy channels
Posted by Peter Xu 1 week, 3 days ago
On Wed, Sep 17, 2025 at 05:56:40PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Wed, Sep 10, 2025 at 12:01:41PM -0400, Peter Xu wrote:
> >> Fabiano fixed graceful shutdowns for multifd channels previously:
> >> 
> >> https://lore.kernel.org/qemu-devel/20250206175824.22664-1-farosas@suse.de/
> >> 
> >> However we likely forgot the rest channels.  Do it the same for the main
> >> and postcopy channels.  This fixes a warning message when running unit test
> >> /ARCH/migration/postcopy/preempt/tls/psk.
> >> 
> >> Thanks,
> >> 
> >> Peter Xu (3):
> >>   migration/tls: Gracefully shutdown main and preempt channels
> >>   migration: Make migration_has_failed() work even for CANCELLING
> >>   migration/multifd: Use the new graceful termination helper
> >
> > Please hold off the review on this one.  Juraj reported the issue wasn't
> > resolved by the changes, and I can also reproduce.  I'll have a look and
> > repost..
> 
> I'm wondering if the assumption that only succeeded migrations should
> gracefully exit is correct. My understanding is that we need to always
> exit gracefully, but after failure, the channel might not be there, so
> we ignore failures. But that does not seem to mean a failed migration
> can simply not exit gracefully.

Currently tls channels will ignore premature terminations whenever there's
a shutdown() on READ.  So when failed / cancelled and whenever there's a
shutdown(), iochannel already does the bypass no matter what migration does.

Or do you mean we should remove that, and still try to do graceful
shutdowns even if the channel was shutdown()?

Feel free to have a look at v2 of this series, especially patch 1. v3 will
come soon, but just to say, v2 is hugely different from v1, and should be
fairly close to upcoming v3, at least on patch 1 which is the real fix.

I should have mentioned that earlier.. :(

-- 
Peter Xu
Re: [PATCH 0/3] migration/tls: Graceful shutdowns for main and postcopy channels
Posted by Fabiano Rosas 1 week, 3 days ago
Peter Xu <peterx@redhat.com> writes:

> On Wed, Sep 17, 2025 at 05:56:40PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Wed, Sep 10, 2025 at 12:01:41PM -0400, Peter Xu wrote:
>> >> Fabiano fixed graceful shutdowns for multifd channels previously:
>> >> 
>> >> https://lore.kernel.org/qemu-devel/20250206175824.22664-1-farosas@suse.de/
>> >> 
>> >> However we likely forgot the rest channels.  Do it the same for the main
>> >> and postcopy channels.  This fixes a warning message when running unit test
>> >> /ARCH/migration/postcopy/preempt/tls/psk.
>> >> 
>> >> Thanks,
>> >> 
>> >> Peter Xu (3):
>> >>   migration/tls: Gracefully shutdown main and preempt channels
>> >>   migration: Make migration_has_failed() work even for CANCELLING
>> >>   migration/multifd: Use the new graceful termination helper
>> >
>> > Please hold off the review on this one.  Juraj reported the issue wasn't
>> > resolved by the changes, and I can also reproduce.  I'll have a look and
>> > repost..
>> 
>> I'm wondering if the assumption that only succeeded migrations should
>> gracefully exit is correct. My understanding is that we need to always
>> exit gracefully, but after failure, the channel might not be there, so
>> we ignore failures. But that does not seem to mean a failed migration
>> can simply not exit gracefully.
>
> Currently tls channels will ignore premature terminations whenever there's
> a shutdown() on READ.  So when failed / cancelled and whenever there's a
> shutdown(), iochannel already does the bypass no matter what migration does.
>

I'm thinking if it's possible for a premature termination to be detected
by TLS before we did the shutdown(). So my suggestion was to always
bye() before shutdown(), not matter the state migration is in. But maybe
your way is ok, I'm not sure now. Let me read the other versions of the
series...

> Or do you mean we should remove that, and still try to do graceful
> shutdowns even if the channel was shutdown()?
>
> Feel free to have a look at v2 of this series, especially patch 1. v3 will
> come soon, but just to say, v2 is hugely different from v1, and should be
> fairly close to upcoming v3, at least on patch 1 which is the real fix.
>

Ah, sorry, I didn't see the v2 on my list. But it's there.

> I should have mentioned that earlier.. :(
Re: [PATCH 0/3] migration/tls: Graceful shutdowns for main and postcopy channels
Posted by Peter Xu 1 week, 3 days ago
On Thu, Sep 18, 2025 at 10:47:48AM -0300, Fabiano Rosas wrote:
> I'm thinking if it's possible for a premature termination to be detected
> by TLS before we did the shutdown(). So my suggestion was to always
> bye() before shutdown(), not matter the state migration is in. But maybe
> your way is ok, I'm not sure now. Let me read the other versions of the
> series...

For failing / cancelling migrations, premature termination is likely fine.
IMHO we also shouldn't care too much on error reports on premature
terminations because it's failing anyway.

So maybe you're talking about shutdown()s when the migration is
successfully completed.

We can try to do that, but maybe it's not easily doable.  E.g., we have the
preempt thread currently only be able to be kicked out by a shutdown() from
the dst main thread.  While we can start to inject a bye() there before the
shutdown(), it'll be:

  (1) a bye() sent concurrently while the preempt thread is still logically
  owning and operating on the preempt channel (luckily, so far read-only),
  and,

  (2) we need to double check if this works if we send bye(WR) from dest to
  src and whether it'll also gracefully shutdown the src side.

  (3) currently, a shutdown() is synchronous.  bye() is yet not.  We may
  then need similiar treatment (e.g. changing IO to block tempoararily?)
  when doing explicit shutdown()s.

I very vaguely remember when working on this series I tried (2) and it
didn't really work, but I'm not very sure. Anyway, all these will add some
complexity, and we'd better justify it's worthwhile..

[...]

> Ah, sorry, I didn't see the v2 on my list. But it's there.

Nah, that's not your fault, maybe mine.

-- 
Peter Xu