[PATCH 0/5] Restore vmstate on cancelled/failed migration

Vladimir Sementsov-Ogievskiy posted 5 patches 11 months, 3 weeks ago
Failed in applying to current master (apply log)
Maintainers: Juan Quintela <quintela@redhat.com>, Peter Xu <peterx@redhat.com>, Leonardo Bras <leobras@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>
include/migration/global_state.h |  2 +-
include/sysemu/runstate.h        |  2 +-
migration/global_state.c         | 23 +++++++------
migration/migration.c            | 56 +++++++++++++++-----------------
migration/migration.h            |  9 +++--
migration/savevm.c               |  6 +---
softmmu/runstate.c               | 25 +++++++-------
7 files changed, 59 insertions(+), 64 deletions(-)
[PATCH 0/5] Restore vmstate on cancelled/failed migration
Posted by Vladimir Sementsov-Ogievskiy 11 months, 3 weeks ago
Hi all.

The problem I want to solve is that guest-panicked state may be lost
when migration is failed (or cancelled) after source stop.

Still, I try to go further and restore all possible paused states in the
same way. The key patch is the last one and others are refactoring and
preparation.

Vladimir Sementsov-Ogievskiy (5):
  runstate: add runstate_get()
  migration: never fail in global_state_store()
  runstate: drop unused runstate_store()
  migration: switch from .vm_was_running to .vm_old_state
  migration: restore vmstate on migration failure

 include/migration/global_state.h |  2 +-
 include/sysemu/runstate.h        |  2 +-
 migration/global_state.c         | 23 +++++++------
 migration/migration.c            | 56 +++++++++++++++-----------------
 migration/migration.h            |  9 +++--
 migration/savevm.c               |  6 +---
 softmmu/runstate.c               | 25 +++++++-------
 7 files changed, 59 insertions(+), 64 deletions(-)

-- 
2.34.1
Re: [PATCH 0/5] Restore vmstate on cancelled/failed migration
Posted by Juan Quintela 11 months, 3 weeks ago
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> wrote:
> Hi all.
>
> The problem I want to solve is that guest-panicked state may be lost
> when migration is failed (or cancelled) after source stop.
>
> Still, I try to go further and restore all possible paused states in the
> same way. The key patch is the last one and others are refactoring and
> preparation.

Hi

I like and agree with the spirit of the series in general.  But I think
that we need to drop the "never fail in global_state_store()".  We
shouldn't kill a guest because we found a bug on migration.

Later, Juan.
Re: [PATCH 0/5] Restore vmstate on cancelled/failed migration
Posted by Vladimir Sementsov-Ogievskiy 11 months, 3 weeks ago
On 18.05.23 14:23, Juan Quintela wrote:
> Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> wrote:
>> Hi all.
>>
>> The problem I want to solve is that guest-panicked state may be lost
>> when migration is failed (or cancelled) after source stop.
>>
>> Still, I try to go further and restore all possible paused states in the
>> same way. The key patch is the last one and others are refactoring and
>> preparation.
> 
> Hi
> 
> I like and agree with the spirit of the series in general.  But I think
> that we need to drop the "never fail in global_state_store()".  We
> shouldn't kill a guest because we found a bug on migration.
> 

Why migration is better in this sense than non-migration? We have a lot of places where we just assert things instead of creating unreachable error messages. I think assert/abort is always better in such cases. Really, if we fail in this assertion it means that memory is corrupted, and stopping the execution is the best thing to do.

(Should we consider the case that in future we add 100 character length vmstate? I hope we should not)

-- 
Best regards,
Vladimir
Re: [PATCH 0/5] Restore vmstate on cancelled/failed migration
Posted by Juan Quintela 11 months, 2 weeks ago
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> wrote:
> On 18.05.23 14:23, Juan Quintela wrote:
>> Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> wrote:
>>> Hi all.
>>>
>>> The problem I want to solve is that guest-panicked state may be lost
>>> when migration is failed (or cancelled) after source stop.
>>>
>>> Still, I try to go further and restore all possible paused states in the
>>> same way. The key patch is the last one and others are refactoring and
>>> preparation.
>> Hi
>> I like and agree with the spirit of the series in general.  But I
>> think
>> that we need to drop the "never fail in global_state_store()".  We
>> shouldn't kill a guest because we found a bug on migration.
>> 
>
> Why migration is better in this sense than non-migration? We have a
> lot of places where we just assert things instead of creating
> unreachable error messages. I think assert/abort is always better in
> such cases. Really, if we fail in this assertion it means that memory
> is corrupted, and stopping the execution is the best thing to do.
>
> (Should we consider the case that in future we add 100 character length vmstate? I hope we should not)

Ok, I give up and integrate the series as they are O:-)

I agree that this is a case that shouldn't happen, so assert() is not as
out of question.

What I am trying to get migration is to really detect errors and be able
to recover from them.  My long term crusade is getting rid of
qemu_file_get_error() and just check the return value for functions that
do IO.  Yes, it is a big long term because we need to change the whole
interface to something saner.

Later, Juan.