[Qemu-devel] [PULL 00/41] Migration queue

Juan Quintela posted 41 patches 7 years, 5 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20180509112406.6183-1-quintela@redhat.com
Test checkpatch passed
Test docker-mingw@fedora passed
Test docker-quick@centos7 passed
Test s390x passed
docs/devel/migration.rst     | 532 ++++++++++++++++++++++++++++------------
hmp-commands.hx              |  34 ++-
hmp.c                        |  23 +-
hmp.h                        |   2 +
include/migration/register.h |   2 +
migration/channel.c          |  12 +-
migration/exec.c             |   9 +-
migration/fd.c               |   9 +-
migration/migration.c        | 559 +++++++++++++++++++++++++++++++++++++++----
migration/migration.h        |  22 ++
migration/postcopy-ram.c     |  54 ++++-
migration/ram.c              | 500 +++++++++++++++++++++++++++++++++++---
migration/ram.h              |   6 +
migration/rdma.c             |   7 +
migration/savevm.c           | 191 ++++++++++++++-
migration/savevm.h           |   3 +
migration/socket.c           |  39 ++-
migration/socket.h           |   7 +
migration/trace-events       |  21 ++
qapi/migration.json          |  57 ++++-
tests/migration-test.c       | 149 +++++++++---
21 files changed, 1928 insertions(+), 310 deletions(-)
[Qemu-devel] [PULL 00/41] Migration queue
Posted by Juan Quintela 7 years, 5 months ago
Hi

this includes the reviewed patches for migration:
- update docs (dave)
- fixes for blocktime (text cleatups) (dave)
- migration+tls (dave)
- rdma index fix (lidong)
- Postcopy recovery (peterx)
- Parts reviewed of multifd and tests (me)

There are missing parts of RDMA, will be sent after this is in.  This got already too big.

Please, apply.

The following changes since commit e5cd695266c5709308aa95b1baae499e4b5d4544:

  Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging (2018-05-08 17:05:58 +0100)

are available in the Git repository at:

  git://github.com/juanquintela/qemu.git tags/migration/20180509

for you to fetch changes up to c14eb5ac63b0d2cd146ca004daaeaf56677b7ed1:

  Migration+TLS: Fix crash due to double cleanup (2018-05-09 12:17:22 +0200)

----------------------------------------------------------------
migration/next for 20180509

----------------------------------------------------------------
Dr. David Alan Gilbert (3):
      migration: update docs
      migration: Textual fixups for blocktime
      Migration+TLS: Fix crash due to double cleanup

Juan Quintela (12):
      tests: Add migration precopy test
      tests: Add migration xbzrle test
      tests: Migration ppc now inlines its program
      migration: Set error state in case of error
      migration: Introduce multifd_recv_new_channel()
      migration: terminate_* can be called for other threads
      migration: Be sure all recv channels are created
      migration: Export functions to create send channels
      migration: Create multifd channels
      migration: Delay start of migration main routines
      migration: Transmit initial package through the multifd channels
      migration: Define MultifdRecvParams sooner

Lidong Chen (1):
      migration: update index field when delete or qsort RDMALocalBlock

Peter Xu (24):
      migration: let incoming side use thread context
      migration: new postcopy-pause state
      migration: implement "postcopy-pause" src logic
      migration: allow dst vm pause on postcopy
      migration: allow src return path to pause
      migration: allow fault thread to pause
      qmp: hmp: add migrate "resume" option
      migration: rebuild channel on source
      migration: new state "postcopy-recover"
      migration: wakeup dst ram-load-thread for recover
      migration: new cmd MIG_CMD_RECV_BITMAP
      migration: new message MIG_RP_MSG_RECV_BITMAP
      migration: new cmd MIG_CMD_POSTCOPY_RESUME
      migration: new message MIG_RP_MSG_RESUME_ACK
      migration: introduce SaveVMHandlers.resume_prepare
      migration: synchronize dirty bitmap for resume
      migration: setup ramstate for resume
      migration: final handshake for the resume
      migration: init dst in migration_object_init too
      qmp/migration: new command migrate-recover
      hmp/migration: add migrate_recover command
      migration: introduce lock for to_dst_file
      migration/qmp: add command migrate-pause
      migration/hmp: add migrate_pause command

Xiao Guangrong (1):
      migration: fix saving normal page even if it's been compressed

 docs/devel/migration.rst     | 532 ++++++++++++++++++++++++++++------------
 hmp-commands.hx              |  34 ++-
 hmp.c                        |  23 +-
 hmp.h                        |   2 +
 include/migration/register.h |   2 +
 migration/channel.c          |  12 +-
 migration/exec.c             |   9 +-
 migration/fd.c               |   9 +-
 migration/migration.c        | 559 +++++++++++++++++++++++++++++++++++++++----
 migration/migration.h        |  22 ++
 migration/postcopy-ram.c     |  54 ++++-
 migration/ram.c              | 500 +++++++++++++++++++++++++++++++++++---
 migration/ram.h              |   6 +
 migration/rdma.c             |   7 +
 migration/savevm.c           | 191 ++++++++++++++-
 migration/savevm.h           |   3 +
 migration/socket.c           |  39 ++-
 migration/socket.h           |   7 +
 migration/trace-events       |  21 ++
 qapi/migration.json          |  57 ++++-
 tests/migration-test.c       | 149 +++++++++---
 21 files changed, 1928 insertions(+), 310 deletions(-)

Re: [Qemu-devel] [PULL 00/41] Migration queue
Posted by Peter Maydell 7 years, 5 months ago
On 9 May 2018 at 12:23, Juan Quintela <quintela@redhat.com> wrote:
> Hi
>
> this includes the reviewed patches for migration:
> - update docs (dave)
> - fixes for blocktime (text cleatups) (dave)
> - migration+tls (dave)
> - rdma index fix (lidong)
> - Postcopy recovery (peterx)
> - Parts reviewed of multifd and tests (me)
>
> There are missing parts of RDMA, will be sent after this is in.  This got already too big.
>
> Please, apply.
>
> The following changes since commit e5cd695266c5709308aa95b1baae499e4b5d4544:
>
>   Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging (2018-05-08 17:05:58 +0100)
>
> are available in the Git repository at:
>
>   git://github.com/juanquintela/qemu.git tags/migration/20180509
>
> for you to fetch changes up to c14eb5ac63b0d2cd146ca004daaeaf56677b7ed1:
>
>   Migration+TLS: Fix crash due to double cleanup (2018-05-09 12:17:22 +0200)
>
> ----------------------------------------------------------------
> migration/next for 20180509
>

Hi. I get some test failures here:

S390x host:
TEST: tests/migration-test... (pid=57456)
  /ppc64/migration/deprecated:                                         OK
  /ppc64/migration/bad_dest:                                           OK
  /ppc64/migration/postcopy/unix:                                      OK
  /ppc64/migration/precopy/unix:                                       OK
  /ppc64/migration/xbzrle/unix:
Unexpected 32 on dest_serial serial
**
ERROR:/home/linux1/qemu/tests/migration-test.c:144:wait_for_serial:
code should not be reached
FAIL

aarch64 host:

Memory content inconsistency at 44f7000 first_byte = 7 last_byte = 6
current = 5 hit_edge = 1
ERROR:/home/peter.maydell/qemu/tests/migration-test.c:281:check_guests_ram:
'bad' should be FALSE
(this is probably for ppc64 guest; unfortunately this system doesn't
have a make that knows about --output-sync, so the make check output
is hard to interpret.)

thanks
-- PMM

Re: [Qemu-devel] [PULL 00/41] Migration queue
Posted by Dr. David Alan Gilbert 7 years, 5 months ago
* Peter Maydell (peter.maydell@linaro.org) wrote:
> On 9 May 2018 at 12:23, Juan Quintela <quintela@redhat.com> wrote:
> > Hi
> >
> > this includes the reviewed patches for migration:
> > - update docs (dave)
> > - fixes for blocktime (text cleatups) (dave)
> > - migration+tls (dave)
> > - rdma index fix (lidong)
> > - Postcopy recovery (peterx)
> > - Parts reviewed of multifd and tests (me)
> >
> > There are missing parts of RDMA, will be sent after this is in.  This got already too big.
> >
> > Please, apply.
> >
> > The following changes since commit e5cd695266c5709308aa95b1baae499e4b5d4544:
> >
> >   Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging (2018-05-08 17:05:58 +0100)
> >
> > are available in the Git repository at:
> >
> >   git://github.com/juanquintela/qemu.git tags/migration/20180509
> >
> > for you to fetch changes up to c14eb5ac63b0d2cd146ca004daaeaf56677b7ed1:
> >
> >   Migration+TLS: Fix crash due to double cleanup (2018-05-09 12:17:22 +0200)
> >
> > ----------------------------------------------------------------
> > migration/next for 20180509
> >
> 
> Hi. I get some test failures here:
> 
> S390x host:
> TEST: tests/migration-test... (pid=57456)
>   /ppc64/migration/deprecated:                                         OK
>   /ppc64/migration/bad_dest:                                           OK
>   /ppc64/migration/postcopy/unix:                                      OK
>   /ppc64/migration/precopy/unix:                                       OK
>   /ppc64/migration/xbzrle/unix:
> Unexpected 32 on dest_serial serial
> **
> ERROR:/home/linux1/qemu/tests/migration-test.c:144:wait_for_serial:
> code should not be reached
> FAIL
> 
> aarch64 host:
> 
> Memory content inconsistency at 44f7000 first_byte = 7 last_byte = 6
> current = 5 hit_edge = 1
> ERROR:/home/peter.maydell/qemu/tests/migration-test.c:281:check_guests_ram:
> 'bad' should be FALSE
> (this is probably for ppc64 guest; unfortunately this system doesn't
> have a make that knows about --output-sync, so the make check output
> is hard to interpret.)

I'd be tempted to drop 03/41 xbzrle test  and see what happens;
it's a heavy CPU eater so might be making things worse when in parallel.
I suspect that aarch64 case is the same one we occasionally hit
due to the page flag dirtying being wrong in TCG.

Can you do that without another pull?

Dave

> thanks
> -- PMM
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PULL 00/41] Migration queue
Posted by Peter Maydell 7 years, 5 months ago
On 11 May 2018 at 15:20, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> I'd be tempted to drop 03/41 xbzrle test  and see what happens;
> it's a heavy CPU eater so might be making things worse when in parallel.
> I suspect that aarch64 case is the same one we occasionally hit
> due to the page flag dirtying being wrong in TCG.
>
> Can you do that without another pull?

I can't, no. It has to be a signed tag.

thanks
-- PMM

Re: [Qemu-devel] [PULL 00/41] Migration queue
Posted by Peter Maydell 7 years, 5 months ago
On 11 May 2018 at 14:41, Peter Maydell <peter.maydell@linaro.org> wrote:
> Hi. I get some test failures here:
>
> S390x host:
> TEST: tests/migration-test... (pid=57456)
>   /ppc64/migration/deprecated:                                         OK
>   /ppc64/migration/bad_dest:                                           OK
>   /ppc64/migration/postcopy/unix:                                      OK
>   /ppc64/migration/precopy/unix:                                       OK
>   /ppc64/migration/xbzrle/unix:
> Unexpected 32 on dest_serial serial
> **
> ERROR:/home/linux1/qemu/tests/migration-test.c:144:wait_for_serial:
> code should not be reached
> FAIL

It turns out that this one is an intermittent that we've seen
before (I just hit it again this morning on an unrelated test run
on an x86-64 host, and last time it was on a ppc host).
In this mail Laurent tracked it down to an overly optimistic
setting for downtime-limit in migration-test.c:
https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg00107.html

Can we push that up to something that we won't hit even if the
test is being run on a machine that's under CPU load, please?

thanks
-- PMM

Re: [Qemu-devel] [PULL 00/41] Migration queue
Posted by Peter Maydell 7 years, 5 months ago
On 18 May 2018 at 11:19, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 11 May 2018 at 14:41, Peter Maydell <peter.maydell@linaro.org> wrote:
>> Hi. I get some test failures here:
>>
>> S390x host:
>> TEST: tests/migration-test... (pid=57456)
>>   /ppc64/migration/deprecated:                                         OK
>>   /ppc64/migration/bad_dest:                                           OK
>>   /ppc64/migration/postcopy/unix:                                      OK
>>   /ppc64/migration/precopy/unix:                                       OK
>>   /ppc64/migration/xbzrle/unix:
>> Unexpected 32 on dest_serial serial
>> **
>> ERROR:/home/linux1/qemu/tests/migration-test.c:144:wait_for_serial:
>> code should not be reached
>> FAIL
>
> It turns out that this one is an intermittent that we've seen
> before (I just hit it again this morning on an unrelated test run
> on an x86-64 host, and last time it was on a ppc host).
> In this mail Laurent tracked it down to an overly optimistic
> setting for downtime-limit in migration-test.c:
> https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg00107.html
>
> Can we push that up to something that we won't hit even if the
> test is being run on a machine that's under CPU load, please?

Also, it would be nice if the test reported failure to converge
with an error including the phrase "failed to converge" rather
than obscure stuff about serial :-)

thanks
-- PMM