[v1] migration: Postcopy Preempt-Full

[PATCH 00/14] migration: Postcopy Preempt-Full
Posted by Peter Xu 1 year, 7 months ago
Based-on: <20220920223800.47467-1-peterx@redhat.com>
  [PATCH 0/5] migration: Bug fixes (prepare for preempt-full)

Tree is here:
  https://github.com/xzpeter/qemu/tree/preempt-full

RFC:
  https://lore.kernel.org/qemu-devel/20220829165659.96046-1-peterx@redhat.com

This patchset is the v1 formal version of preempt-full series.  The RFC tag
is removed as more tests was done, and all items mentioned in the TODO
items previously in the RFC cover letters are implemented in this patchset.

A few patches are added. Most of the patches are still the same as the RFC
ones but may have some trivial touch ups here and there.  E.g., comment
touch-ups as Dave used to suggest on bitmap_mutex.  If to have a look on
the diff stat we'll see mostly "+" is the same as "-" this time though,
because with the rp-return thread change we can logically drop a lot of
complicated preempt logic previously maintained in migration thread:

  3 files changed, 371 insertions(+), 399 deletions(-)

Feel free to have a look at patch "migration: Remove old preempt code
around state maintainance" where we dropped a lot of old code on preempt
state maintainance of migration thread (but the major part of the preempt
old code still needs to be there, e.g. channel managements) along with the
break-huge parameter (as we never need to break-huge anymore.. because we
already run in parallel).

Comparing to the recently merged preempt mode I called it "preempt-full"
because it threadifies the postcopy channels so now urgent pages can be
fully handled separately outside of the ram save loop.

The existing preempt code has reduced ramdom page req latency over 10Gbps
network from ~12ms to ~500us which has already landed.

This preempt-full series can further reduces that ~500us to ~230us per my
initial test.  More to share below.

Note that no new capability is needed, IOW it's fully compatible with the
existing preempt mode.  So the naming is actually not important but just to
identify the difference on the binaries.

The logic of the series is simple: send urgent pages in rp-return thread
rather than migration thread.  It also means rp-thread will take over the
ownership of the newly created preempt channel.  It can slow down rp-return
thread on receiving page requests, but I don't really see a major issue
with it so far but only benefits.

For detailed performance numbers, please refer to the rfc cover letter.

Please have a look, thanks.

Peter Xu (14):
  migration: Add postcopy_preempt_active()
  migration: Cleanup xbzrle zero page cache update logic
  migration: Trivial cleanup save_page_header() on same block check
  migration: Remove RAMState.f references in compression code
  migration: Yield bitmap_mutex properly when sending/sleeping
  migration: Use atomic ops properly for page accountings
  migration: Teach PSS about host page
  migration: Introduce pss_channel
  migration: Add pss_init()
  migration: Make PageSearchStatus part of RAMState
  migration: Move last_sent_block into PageSearchStatus
  migration: Send requested page directly in rp-return thread
  migration: Remove old preempt code around state maintainance
  migration: Drop rs->f

 migration/migration.c |  47 +--
 migration/multifd.c   |   2 +-
 migration/ram.c       | 721 ++++++++++++++++++++----------------------
 3 files changed, 371 insertions(+), 399 deletions(-)

-- 
2.32.0