[Qemu-devel] [PATCH v3 0/7] migration: pause-before-switchover

Dr. David Alan Gilbert (git) posted 7 patches 6 years, 6 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20171018174013.22709-1-dgilbert@redhat.com
Test checkpatch passed
Test docker passed
Test s390x passed
There is a newer version of this series
hmp-commands.hx       | 12 +++++++
hmp.c                 | 13 ++++++++
hmp.h                 |  1 +
migration/migration.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++--
migration/migration.h |  4 +++
qapi/migration.json   | 30 ++++++++++++++++--
6 files changed, 144 insertions(+), 4 deletions(-)
[Qemu-devel] [PATCH v3 0/7] migration: pause-before-switchover
Posted by Dr. David Alan Gilbert (git) 6 years, 6 months ago
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Hi,
  This set attempts to make a race condition between migration and
drive-mirror (and other block users) soluble by allowing the migration
to be paused after the source qemu releases the block devices but
before the serialisation of the device state.

The symptom of this failure, as reported by Wangjie, is a:
   _co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed

and the source qemu dieing; so the problem is pretty nasty.
This has only been seen on 2.9 onwards, but the theory is that
prior to 2.9 it might have been happening anyway and we were
perhaps getting unreported corruptions (lost writes); so this
really needs fixing.

This flow came from discussions between Kevin and me, and we can't
see a way of fixing it without exposing a new state to the management
layer.

The flow is now:

(qemu) migrate_set_capability pause-before-switchover on
(qemu) migrate -d ...
(qemu) info migrate
...
Migration status: pre-switchover
...
<< issue commands to clean up any block jobs>>

(qemu) migrate_continue pre-switchover
(qemu) info migrate
...
Migration status: completed

This set has been _very_ lightly tested just at the normal migration
code, without the addition of the drive mirror; so this is a first
cut.  I'd appreciate some feedback from libvirt whether the inteface
is OK and ideally a hack to test it in a full libvirt setup to see
if we hit any other issues.

The precopy flow is:
active->pre-switchover->device->completed

The postcopy flow is:
active->pre-switchover->postcopy-active->completed

Although the behaviour with postcopy only gets interesting when
we add something like Max's active-sync.

Dave

--
v3
  A couple of FIXUPs that had escaped v2's merge

v2
  Pause *before* block inactivation (thanks Peter)
  Rename state and capability to Dan+KWolf's combined suggestion


Dr. David Alan Gilbert (7):
  migration: Add 'pause-before-switchover' capability
  migration: Add 'pre-switchover' and 'device' statuses
  migration: Wait for semaphore before completing migration
  migration: migrate-continue
  migrate: HMP migate_continue
  migration: allow cancel to unpause
  migration: pause-before-switchover for postcopy

 hmp-commands.hx       | 12 +++++++
 hmp.c                 | 13 ++++++++
 hmp.h                 |  1 +
 migration/migration.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++--
 migration/migration.h |  4 +++
 qapi/migration.json   | 30 ++++++++++++++++--
 6 files changed, 144 insertions(+), 4 deletions(-)

-- 
2.13.6


Re: [Qemu-devel] [PATCH v3 0/7] migration: pause-before-switchover
Posted by Peter Xu 6 years, 6 months ago
On Wed, Oct 18, 2017 at 06:40:06PM +0100, Dr. David Alan Gilbert (git) wrote:

[...]

> The precopy flow is:
> active->pre-switchover->device->completed
> 
> The postcopy flow is:
> active->pre-switchover->postcopy-active->completed

The naming is still slightly confusing to me:

(1) we have a capability called "pause-before-switchover", so it feels
    like there is something called "switchover" and if we enable this
    we'll pause before that point;

(2) we have a new status "pre-switchover", it feels like that's the
    point before we are in "switchover" state;

(3) we don't really have a "switchover" state, but instead it's called
    "device" which is exactly the "switchover" action.

Considering (1) and (2), I would prefer "device" state to be just
"switchover"...

Further, not sure we can unify the state transition as well (say, we
add this switchover state even without cap "pause-before-switchover"
set, although it does not make much sense itself). Then, we can also
unify the precopy/postcopy state machine into one:

active->
  [pre-switchover->]      (optional, decided by "pause-before-switchover")
    switchover->
      [postcopy-active->] (optional, decided by "postcopy-arm")
        completed

(Sorry I am discussing the naming again instead of reviewing real
 stuff!)

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v3 0/7] migration: pause-before-switchover
Posted by Dr. David Alan Gilbert 6 years, 6 months ago
* Peter Xu (peterx@redhat.com) wrote:
> On Wed, Oct 18, 2017 at 06:40:06PM +0100, Dr. David Alan Gilbert (git) wrote:
> 
> [...]
> 
> > The precopy flow is:
> > active->pre-switchover->device->completed
> > 
> > The postcopy flow is:
> > active->pre-switchover->postcopy-active->completed
> 
> The naming is still slightly confusing to me:
> 
> (1) we have a capability called "pause-before-switchover", so it feels
>     like there is something called "switchover" and if we enable this
>     we'll pause before that point;
> 
> (2) we have a new status "pre-switchover", it feels like that's the
>     point before we are in "switchover" state;
> 
> (3) we don't really have a "switchover" state, but instead it's called
>     "device" which is exactly the "switchover" action.
> 
> Considering (1) and (2), I would prefer "device" state to be just
> "switchover"...

Yes I stuck to pause-before-device and device originally; but
what we're doing during the 'device' stage is mostly saving device
state; the actual switchover occurs at the end.  So hmm.

> Further, not sure we can unify the state transition as well (say, we
> add this switchover state even without cap "pause-before-switchover"
> set, although it does not make much sense itself). Then, we can also
> unify the precopy/postcopy state machine into one:
> 
> active->
>   [pre-switchover->]      (optional, decided by "pause-before-switchover")
>     switchover->
>       [postcopy-active->] (optional, decided by "postcopy-arm")
>         completed

I didn't want to change the state transition behaviour without the
capability set, since that could upset an existing libvirt that would
get confused by the new state.

Dave

> (Sorry I am discussing the naming again instead of reviewing real
>  stuff!)
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v3 0/7] migration: pause-before-switchover
Posted by Peter Xu 6 years, 6 months ago
On Thu, Oct 19, 2017 at 12:21:23PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Wed, Oct 18, 2017 at 06:40:06PM +0100, Dr. David Alan Gilbert (git) wrote:
> > 
> > [...]
> > 
> > > The precopy flow is:
> > > active->pre-switchover->device->completed
> > > 
> > > The postcopy flow is:
> > > active->pre-switchover->postcopy-active->completed
> > 
> > The naming is still slightly confusing to me:
> > 
> > (1) we have a capability called "pause-before-switchover", so it feels
> >     like there is something called "switchover" and if we enable this
> >     we'll pause before that point;
> > 
> > (2) we have a new status "pre-switchover", it feels like that's the
> >     point before we are in "switchover" state;
> > 
> > (3) we don't really have a "switchover" state, but instead it's called
> >     "device" which is exactly the "switchover" action.
> > 
> > Considering (1) and (2), I would prefer "device" state to be just
> > "switchover"...
> 
> Yes I stuck to pause-before-device and device originally; but
> what we're doing during the 'device' stage is mostly saving device
> state; the actual switchover occurs at the end.  So hmm.

That's fine to me.

> 
> > Further, not sure we can unify the state transition as well (say, we
> > add this switchover state even without cap "pause-before-switchover"
> > set, although it does not make much sense itself). Then, we can also
> > unify the precopy/postcopy state machine into one:
> > 
> > active->
> >   [pre-switchover->]      (optional, decided by "pause-before-switchover")
> >     switchover->
> >       [postcopy-active->] (optional, decided by "postcopy-arm")
> >         completed
> 
> I didn't want to change the state transition behaviour without the
> capability set, since that could upset an existing libvirt that would
> get confused by the new state.

Indeed.  However this (and also Juan's xbzrle cache size series) lets
me think about whether we should loosen the "compatibility" sometimes.

For most of the times, we are paying the compatibility bill by
complicating the code logic.  For this one, we satisfy live block
migration logic to introduce two new state transition paths (for
precopy and postcopy). I am just afraid we need to pay a larger bill
some day.

But I'd say it's only my worry; maybe it's just too superfluous.

(I provided all r-bs, so the series looks good to me after all)

Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v3 0/7] migration: pause-before-switchover
Posted by Dr. David Alan Gilbert 6 years, 6 months ago
* Peter Xu (peterx@redhat.com) wrote:
> On Thu, Oct 19, 2017 at 12:21:23PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Wed, Oct 18, 2017 at 06:40:06PM +0100, Dr. David Alan Gilbert (git) wrote:
> > > 
> > > [...]
> > > 
> > > > The precopy flow is:
> > > > active->pre-switchover->device->completed
> > > > 
> > > > The postcopy flow is:
> > > > active->pre-switchover->postcopy-active->completed
> > > 
> > > The naming is still slightly confusing to me:
> > > 
> > > (1) we have a capability called "pause-before-switchover", so it feels
> > >     like there is something called "switchover" and if we enable this
> > >     we'll pause before that point;
> > > 
> > > (2) we have a new status "pre-switchover", it feels like that's the
> > >     point before we are in "switchover" state;
> > > 
> > > (3) we don't really have a "switchover" state, but instead it's called
> > >     "device" which is exactly the "switchover" action.
> > > 
> > > Considering (1) and (2), I would prefer "device" state to be just
> > > "switchover"...
> > 
> > Yes I stuck to pause-before-device and device originally; but
> > what we're doing during the 'device' stage is mostly saving device
> > state; the actual switchover occurs at the end.  So hmm.
> 
> That's fine to me.
> 
> > 
> > > Further, not sure we can unify the state transition as well (say, we
> > > add this switchover state even without cap "pause-before-switchover"
> > > set, although it does not make much sense itself). Then, we can also
> > > unify the precopy/postcopy state machine into one:
> > > 
> > > active->
> > >   [pre-switchover->]      (optional, decided by "pause-before-switchover")
> > >     switchover->
> > >       [postcopy-active->] (optional, decided by "postcopy-arm")
> > >         completed
> > 
> > I didn't want to change the state transition behaviour without the
> > capability set, since that could upset an existing libvirt that would
> > get confused by the new state.
> 
> Indeed.  However this (and also Juan's xbzrle cache size series) lets
> me think about whether we should loosen the "compatibility" sometimes.
> 
> For most of the times, we are paying the compatibility bill by
> complicating the code logic.  For this one, we satisfy live block
> migration logic to introduce two new state transition paths (for
> precopy and postcopy). I am just afraid we need to pay a larger bill
> some day.
> 
> But I'd say it's only my worry; maybe it's just too superfluous.

Yes, it's true - almost all the behaviour we have forms part of our
API that we expose to libvirt; we have to be pretty careful.

> (I provided all r-bs, so the series looks good to me after all)

Thanks! One comment fix coming up soon as spotted by Jiri.

Dave

> 
> Thanks,
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v3 0/7] migration: pause-before-switchover
Posted by Jiri Denemark 6 years, 6 months ago
The libvirt changes which will make use of this new migration capability
can be found in migration-pause branch of my gitlab repository:

    git fetch https://gitlab.com/jirkade/libvirt.git migration-pause

It's not properly split into patches, it has no commit message etc.,
but the functionality should be complete.

Feel free to test it and report any issues.

Thanks,

Jirka

Re: [Qemu-devel] [PATCH v3 0/7] migration: pause-before-switchover
Posted by Dr. David Alan Gilbert 6 years, 6 months ago
* Jiri Denemark (jdenemar@redhat.com) wrote:
> The libvirt changes which will make use of this new migration capability
> can be found in migration-pause branch of my gitlab repository:
> 
>     git fetch https://gitlab.com/jirkade/libvirt.git migration-pause
> 
> It's not properly split into patches, it has no commit message etc.,
> but the functionality should be complete.
> 
> Feel free to test it and report any issues.

Looks promising:

virsh migrate --live --copy-storage-all --verbose

2017-10-19 17:52:38.665+0000: 31999: debug : qemuMonitorSetMigrationCapability:3948 : capability=pause-before-switchover, state=1
2017-10-19 17:52:38.666+0000: 31999: debug : virJSONValueToString:1914 : result={"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"pause-before-switchover","state":true}]},"id":"libvirt-1861"}
2017-10-19 17:52:38.693+0000: 31999: debug : qemuMonitorJSONCommandWithFd:298 : Send command '{"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"},"id":"libvirt-1865"}' for write with FD -1
2017-10-19 17:52:38.695+0000: 31998: debug : qemuMonitorJSONIOProcessLine:193 : Line [{"timestamp": {"seconds": 1508435558, "microseconds": 695732}, "event": "MIGRATION", "data": {"status": "setup"}}]
2017-10-19 17:52:38.743+0000: 31998: debug : qemuMonitorJSONIOProcessLine:193 : Line [{"timestamp": {"seconds": 1508435558, "microseconds": 743564}, "event": "MIGRATION_PASS", "data": {"pass": 1}}]
2017-10-19 17:52:38.744+0000: 31998: debug : qemuMonitorJSONIOProcessLine:193 : Line [{"timestamp": {"seconds": 1508435558, "microseconds": 743724}, "event": "MIGRATION", "data": {"status": "active"}}]
2017-10-19 17:52:43.193+0000: 31998: debug : qemuMonitorJSONIOProcessLine:193 : Line [{"timestamp": {"seconds": 1508435563, "microseconds": 192728}, "event": "MIGRATION_PASS", "data": {"pass": 2}}]
2017-10-19 17:52:43.389+0000: 31998: debug : qemuMonitorJSONIOProcessLine:193 : Line [{"timestamp": {"seconds": 1508435563, "microseconds": 388947}, "event": "STOP"}]
2017-10-19 17:52:43.862+0000: 31998: debug : qemuMonitorJSONIOProcessLine:193 : Line [{"timestamp": {"seconds": 1508435563, "microseconds": 862428}, "event": "MIGRATION", "data": {"status": "pre-switchover"}}]
2017-10-19 17:52:43.863+0000: 31999: debug : qemuMigrationDriveMirrorReady:634 : All disk mirrors are ready
2017-10-19 17:52:43.863+0000: 31999: debug : qemuMigrationCompleted:1534 : Migration paused before switchover
2017-10-19 17:52:43.865+0000: 31998: debug : qemuMonitorJSONIOProcessLine:193 : Line [{"return": {"expected-downtime": 300, "status": "pre-switchover", "setup-time": 47, "total-time": 5169, "ram": {"total": 4430053376, "postcopy-requests": 0, "dirty-sync-count": 2, "page-size": 4096, "remaining": 7204864, "mbps": 941.43529, "transferred": 450864646, "duplicate": 973832, "dirty-pages-rate": 243277, "skipped": 0, "normal-bytes": 441237504, "normal": 107724}}, "id": "libvirt-1876"}]
2017-10-19 17:52:43.866+0000: 31999: debug : qemuMigrationCancelDriveMirror:803 : Cancelling drive mirrors for domain debianlocalqemu
2017-10-19 17:52:43.866+0000: 31999: debug : qemuMonitorJSONCommandWithFd:298 : Send command '{"execute":"block-job-cancel","arguments":{"device":"drive-virtio-disk0"},"id":"libvirt-1877"}' for write with FD -1
2017-10-19 17:52:43.868+0000: 31999: debug : qemuMigrationDriveMirrorCancelled:715 : Waiting for 1 disk mirrors to finish
2017-10-19 17:52:43.872+0000: 31998: info : qemuMonitorIOProcess:439 : QEMU_MONITOR_IO_PROCESS: mon=0x7f4544008840 buf={"timestamp": {"seconds": 1508435563, "microseconds": 871816}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk0", "len": 58430259200, "offset": 58430259200, "speed": 9223372036853727232, "type": "mirror"}}^M
2017-10-19 17:52:43.873+0000: 31998: debug : qemuProcessHandleBlockJob:1014 : Block job for device drive-virtio-disk0 (domain: 0x7f45440254c0,debianlocalqemu) type 2 status 0
2017-10-19 17:52:43.873+0000: 31999: debug : qemuBlockJobEventProcess:106 : disk=vda, mirrorState=yes, type=2, status=0
2017-10-19 17:52:43.916+0000: 31999: debug : qemuMonitorJSONCommandWithFd:298 : Send command '{"execute":"migrate-continue","arguments":{"state":"pre-switchover"},"id":"libvirt-1880"}' for write with FD -1
2017-10-19 17:52:43.918+0000: 31998: debug : qemuMonitorJSONIOProcessLine:193 : Line [{"timestamp": {"seconds": 1508435563, "microseconds": 917872}, "event": "MIGRATION", "data": {"status": "device"}}]
2017-10-19 17:52:43.921+0000: 31998: debug : qemuMonitorJSONIOProcessLine:193 : Line [{"timestamp": {"seconds": 1508435563, "microseconds": 921194}, "event": "MIGRATION_PASS", "data": {"pass": 3}}]
2017-10-19 17:52:43.991+0000: 31998: info : qemuMonitorIOProcess:439 : QEMU_MONITOR_IO_PROCESS: mon=0x7f4544008840 buf={"timestamp": {"seconds": 1508435563, "microseconds": 991528}, "event": "MIGRATION", "data": {"status": "completed"}}^M

So I think libvirt is doing the right thing - thanks!

I'll post the version with your minor comment change 1st thing tomorrow.

(I'm not too convinced qemu is that happy during the drive-mirror; the
guest complained about tasks blocked for 120seconds+ - I was running a
heavy cp;  and looking at an iostat in the guest I could see there 
were a few minute chunks where nothing was happening; and the write
performance after migrate seems way low;  however - neither of those
are related directly to this change - since the first problem
is happening before the actual migration code has started).

Dave

> Thanks,
> 
> Jirka
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK